Explainable machine learning in outcome prediction of high-grade aneurysmal subarachnoid hemorrhage

Objective: Accurate prognostic prediction in patients with high-grade aneruysmal subarachnoid hemorrhage (aSAH) is essential for personalized treatment. In this study, we developed an interpretable prognostic machine learning model for high-grade aSAH patients using SHapley Additive exPlanations (SHAP). Methods: A prospective registry cohort of high-grade aSAH patients was collected in one single-center hospital. The endpoint in our study is a 12-month follow-up outcome. The dataset was divided into training and validation sets in a 7:3 ratio. Machine learning algorithms, including Logistic regression model (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost), were employed to develop a prognostic prediction model for high-grade aSAH. The optimal model was selected for SHAP analysis. Results: Among the 421 patients, 204 (48.5%) exhibited poor prognosis. The RF model demonstrated superior performance compared to LR (AUC = 0.850, 95% CI: 0.783-0.918), SVM (AUC = 0.862, 95% CI: 0.799-0.926), and XGBoost (AUC = 0.850, 95% CI: 0.783-0.917) with an AUC of 0.867 (95% CI: 0.806-0 .929). Primary prognostic features identified through SHAP analysis included higher World Federation of Neurosurgical Societies (WFNS) grade, higher modified Fisher score (mFS) and advanced age, were found to be associated with 12-month unfavorable outcome, while the treatment of coiling embolization for aSAH drove the prediction towards favorable prognosis. Additionally, the SHAP force plot visualized individual prognosis predictions. Conclusions: This study demonstrated the potential of machine learning techniques in prognostic prediction for high-grade aSAH patients. The features identified through SHAP analysis enhance model interpretability and provide guidance for clinical decision-making.


INTRODUCTION
Subarachnoid hemorrhage caused by ruptured intracranial aneurysms (aSAH) is a global health concern due to its significant impact on mortality and long-term disability rates [1].Approximately 35% of patients succumb to severe cerebrovascular injury in the initial weeks [2].Among the survivors, a considerable number experience disability.Highgrade aSAH is associated with profound neurological consequences resulting from a combination of direct blood-induced damage, secondary vasospasm, and delayed cerebral ischemia [3].Currently, the assessment of aSAH severity and prediction of clinical outcomes rely on neurological examinations and neuroimaging studies; however, estimating high-grade SAH patients can be challenging due to the frequent administration of sedatives and analgesics during their management [3].A timely and accurate diagnosis of high-grade SAH is pivotal for instituting appropriate therapeutic interventions.Hence, there is an urgent demand for a functional prediction model to aid in the treatment and evaluation of patients with highgrade SAH.
In recent years, numerous prediction models have been extensively developed to forecast the clinical outcomes in high-grade aSAH patients [4].However, the majority of these models rely on conventional algorithms with limited clinical features.Recent advancements in artificial intelligence have led to significant breakthroughs in medical machine learning (ML) [5][6][7], and these models have demonstrated promising discrimination capabilities.In a recent study, a support vector machine (SVM) model was constructed utilizing high-throughput metabolomics data to identify potential biomarkers and targets for the diagnosis and treatment of colorectal cancer, and the model achieved an AUC of 0.985 [8].Nevertheless, due to the inherent "black box" nature of ML algorithms that lack transparency and explanatory research, elucidating the prediction process within the model becomes a challenge [9].Shapley Additive Explanations (SHAP) is a novel game theory-based approach in explainable ML introduced by Lundberg and Lee [10], it can well solve the issue of inexplicability by providing a solution for better understanding and interpreting complex models, and this method allows for representing the contribution of each feature to the outcome.Yagin et al. proposed an explainable artificial intelligence model to predict COVID-19 using meta-genomic next-generation sequencing (mNGS) data, and the model allowed physicians to enhance their comprehension of the decision-making process in COVID-19 genomic prediction [11].Another study developed a XGBoost model combined with SHAP to effectively predict the 3-year all-cause mortality in coronary heart disease and heart failure patients.The model offers clear explanations for individualized risk predictions, aiding doctors in understanding the impact of key features [12].These researches showed explainable machine learning holds great promise in assisting physicians in intuitively grasping the influence of key features in models.This aids clinicians in gaining a deeper understanding of decisions made for disease severity assessment.
Therefore, this study aims to develop and validate an explainable ML model to predict 12-month outcomes in patients with high-grade aSAH.Besides, the SHAP values of each feature were analyzed to elucidate the overall prediction process.This effort will contribute to the development of explainable and personalized predictive models for prognosis in high-grade aSAH, marking a substantial advancement for the application of machine learning in the field of medicine.

Baseline characteristics
Among a total of 421 patients with high-grade aSAH, 204 patients suffered poor outcomes in our final cohort.The detailed baseline characteristic of the patients was represented in Table 1.The mean age was 62 (range: 54, 69), and there were 259 (61.5%) female patients in the cohort.The poor outcome group had an older age (P < 0.001), higher rate of coil treatment (P < 0.001), higher rates of hypertension (P < 0.05), higher World Federation of Neurosurgical Societies (WFNS) grade (P < 0.001), and higher modified Fisher score (mFS) (P < 0.001), as well as larger aneurysm length size (P < 0.05) than favorable outcome.Table 2 presents the baseline characteristics of the training set and validation set.And The flowchart of our study is shown in Figure 1.

Model development and validation
We constructed LR, XGBoost, RF, and SVM models using the training dataset.The prediction performances of these four models are presented in Table 3.When evaluated on the validation dataset, these models achieved AUCs of 0.850 (95% CI: 0.783-0.918),0.850 (95% CI: 0.783-0.917),0.867 (95% CI: 0.806-0.929),0.862 (95% CI: 0.799-0.926),respectively (Figure 2).The confusion matrix and balanced accuracy of four models can be found in Supplementary value of 0.850 (95% CI: 0.783-0.917),respectively.Therefore, we selected the RF model for subsequent analysis.Table 4 provides the detailed information regarding multivariable LR analysis.Moreover, we applied decision curve analysis to the RF model.As shown in Figure 3, the decision curve analysis demonstrated that when the threshold probability ranges from 4% to 93%, the net benefit level of applying the random forest model is significantly higher than the "Treat all" and "Treat none" strategies.This suggests that our model exhibits favorable clinical applicability.

SHAP model explanation
The SHAP values were calculated to represent the feature importance for the RF model, which exhibited superior discriminatory capability in the validation cohort.In Figure 4A, the clinical features are ranked

Individual SHAP force plot
The SHAP force plot analysis was employed to explain the individualized prediction outcomes of two specific samples.Figure 6 illustrates a visual representation of the predictions made by the RF model, with red and blue bars denoting risk factors and protective factors, respectively.The length of each bar corresponds to its feature importance.According to our constructed model (Figure 6A), this patient was predicted to have a 75% probability of poor prognosis.Notably, WFNS grade 5, mFS grade 4, aneurysm width of 3.8 mm, and hypertension were identified as the primary factors influencing this prediction outcome.In contrast, another patient was projected by our model to have a 25% likelihood of experiencing a poor prognosis (Figure 6B).

DISCUSSION
In this study, we developed and validated four distinct machine learning models (LR, RF, SVM, and XGBoost).We observed that the RF model outperformed LR, SVM, and XGBoost in terms of performance (AUC=0.867,95% CI: 0.806-0.929).
To ensure both model performance and clinical interpretability, we employed the SHAP method to elucidate the decision-making process of the RF model.This effort will greatly aid physicians in gaining a  High-grade aSAH is associated with elevated mortality and unfavorable neurologic outcomes [13].A recent investigation revealed significant proportion of survivors of high-grade aSAH showed a good quality of life after appropriate clinical decision making [14].Therefore, there is a critical need for early prediction of long-term functional outcomes and the identification of risk factors.[16].In a recent investigation on ML modeling for high-grade aSAH patient prognosis, Liu et al. reported an AUC of 0.88 achieved by their decision tree model [17].As we can see, the predictive capability of machine learning models surpasses that of conventional predictive models.However, their model was constructed using limited algorithmic tools and lacked interpretability, functioning as a "black box" [18].Meanwhile, explainable ML has been demonstrated successful in various medical domains such as early prognosis prediction in sepsis [19] and enabling precision medicine in acute myeloid leukemia [20].The SHAP method introduced by Lundberg and Lee offers a gametheoretic approach that effectively addresses the black   box nature of ML models [10].Nevertheless, its application in predicting long-term prognosis for high-grade aSAH remains unexplored to the best of our knowledge.Therefore, this study represents the first attempt to employ the SHAP method within RF models for long-term prognostic prediction in patients with high-grade aSAH.
In this study, our research indicated that RF model outperforms other models in predicting the long-term prognosis of high-grade aSAH patients.Random Forest, an ensemble algorithm based on decision trees derived from random feature subsets, is widely recognized for its robust utility in feature classification and prediction tasks [21].Moreover, RF exhibits significant advantages over other models in addressing highly non-linearly correlated data, demonstrating robustness to noise, simplicity in tuning, and facilitating efficient parallel processing [22].
Another notable strength of our study lies in the application of SHAP values, allowing us to uncover the black box of machine learning models.And the interpretable machine learning model have revealed that significant clinical variables contribute to predict the long-term prognosis of high-grade aSAH.
WFNS grade, a widely recognized classification schema for assessing the severity of aSAH, categorizes patients into five grades based on clinical neurological manifestations [23,24].Higher WFNS grades often associate with more profound neurological deficits and poorer clinical outcomes [25].Bogossian et al. found that patients with high-grade aSAH contribute to have significant rates of poor prognosis, particularly those classified as WFNS grade 5 upon admission [26].Another study revealed that even with prompt intervention, patients with WFNS grade 5 exhibited a prevalence of severe disability at discharge reaching 27% [27].Our finding in the context of high-grade aSAH consistent with the well-established understanding that higher WFNS grade correlate with worse prognosis, which can be attributed to aggravated neurological impairment and increased risk of subsequent complications [28,29].
Advanced age was positively correlated with higher WFNS grade, and the older the patient, the higher the probability of presenting in a deteriorated condition after aSAH [30].Previous studies have revealed advanced age as independent predictors of poor prognosis in patients with high-grade aSAH [25,31,32].Advancing age causes patients more susceptible to cerebral insults, diminishes physiological reserves, and impairs recovery mechanisms [33,34].Elderly patients often suffer increased burdens of comorbidities and declined physiological recovery capacity, resulting them being susceptible to unfavorable outcomes following high-grade aSAH [25,35].mFS system, a radiological tool assessing hemorrhage extent on computed tomography scans, plays a key role in evaluating the severity of bleeding and vasospasm risk [36].Similar to the WFNS grade, a higher mFS score indicates more extensive hemorrhage and are linked to unfavorable outcomes [15].The severity of bleeding and its subsequent complications such as vasospasm and delayed cerebral ischemia contribute to the observed correlation between elevated mFS score and poor prognosis in high-grade aSAH [15,17,37].Endovascular coiling, a minimally invasive strategy for the treatment of intracranial aneurysm, represents a promising technique for high-grade aSAH patients [38,39].Recent years have witnessed a significant advancement in the prognosis of patients with highgrade aSAH, with rates of functional independence ranging from 30%-57% [40,41].These improved outcomes have been attributed to the early and aggressive implementation of endovascular coiling [42,43].However, another recent study indicated that, in comparison to surgery being a short-term morbidity risk factor, endovascular treatment is associated with higher mortality rates at 1 year [44].The observed findings are probably a consequence of selection bias inherent in the retrospective nature of the data.In general, through the effective occlusion of aneurysms with metal coils, the risk of rebleeding and subsequent complications can be attenuated.
In summary, our study established four ML models (LR, SVM, RF, XGBoost) and selected the RF model to conduct a comprehensive SHAP analysis based on its superior predictive performance.The SHAP analysis revealed the significant contributions of clinical features in predicting long-term prognosis in highgrade aSAH.Elevated WFNS grades and mFS, along with advanced age, were associated with unfavorable outcomes, indicating aggravated neurological impairment and bleeding severity.Conversely, the strategic implementation of endovascular coiling emerges as a promising method to improve patient prognosis by preventing rebleeding and mitigating associated complication.Incorporating these insights into clinical decision-making holds great potential to guide therapeutic strategies and optimize patient neurocritical care.Moreover, the predictive model we developed paves the way for personalized treatment strategies.Patients identified as having an elevated risk of unfavorable outcomes according to our model could gain benefits from intensive monitoring, early intervention, and personalized rehabilitation approaches.In summary, the explainable ML models serve as a valuable tool to improve clinical decision-making regarding the prognosis of high-grade aSAH.
However, this study also had several limitations.Firstly, the single-center design may limit the generalizability of findings to broader patient populations, and an independent validation cohort from other centers for model evaluation is necessary.Secondly, there might be several unobserved confounders that could potentially influence the prognosis outcomes of high-grade aSAH patients.Thirdly, the lack of external validation data from other medical center will further restrict the model's generalizability, thus additional prospective randomized clinical trials are essential to validate our model.It should be noted that our modeling study exclusively enrolled adult patients, leaving the predictive validity of the RF model for pediatric high-grad aSAH remains unclear.

CONCLUSIONS
In this study, we employed four machine learning algorithms and identified the Random Forest (RF) model as the most effective predictor of long-term prognosis for high-grade aSAH patients.By utilizing SHAP analysis, we highlighted the crucial role of key variables such as WFNS grade, modified Fisher score, age, and endovascular coiling, in prognosis determination.Our approach not only ensures precise prognostic predictions, but also enhances transparency and interpretability of clinical decisions, thereby leading to improved patient outcomes.Ultimately, our study highlights the significance of RF and SHAP in enhancing prognostic accuracy and guiding personalized care for high-grade aSAH patients.

Study design and participant enrollment
This single-center, prospective registered (NCRIA- Exclusion criteria included the presence of vascular malformation or other cerebrovascular disease, postoperative status at admission, permanent brain injury at presentation, death within 3 days after operation, and missing data.

Prediction variables collection
The following variables were collected from the hospital electronic health record system: (

Definition of outcome
The neurological outcome of these patients was evaluated at 12 months after initial aSAH using modified Rankin scale (mRS) system [45,46].A favorable neurological outcome was defined as mRs 0 to 2, while a poor outcome was considered when mRs ranged from 3 to 6.The patient follow-ups were conducted by a neurosurgeon through telephone consultations.And the neurosurgeon was blinded to the patients' clinical information.Moreover, to avoid overfitting in LR, the features were selected and filtered by the Least Absolute Shrinkage and Selection Operator (LASSO) model with the "λ-1se" criterion.Each algorithm was finetuned through hyperparameter optimization to optimize model performance.The ten-fold cross-validation was employed to mitigate any potential bias in the data and ensure the generalization of model performance.

Feature importance with Shapley Additive Explanation values
The Shapley Additive Explanation was employed to enhance the interpretability of the final model.In cases where the SHAP value is positive, it suggests that the associated feature contributes to an increased risk of the complications.Meanwhile, a negative SHAP value indicates that the corresponding feature is linked to a decreased risk of the complications.The magnitude of SHAP values signifies the extent of a feature's contribution to the prediction performance.
The SHAP summary plot was utilized to demonstrate the contributions of each feature attributed to the model.Besides, the SHAP force plot was used to visualize the effects of pivotal features on the final model for individual patients.The "fastshap" package in R software was used to analyze the SHAP values, the "ggbeeswarm" and "shapviz" packages were used to visualize the SHAP values for each feature.

Statistical analysis
Before conducting the formal analysis on the dataset, the Kolmogorov-Smirnov test was employed to ascertain the distribution type of the data.Continuous variables were analyzed using the independent t-test or Mann-Whitney U-test and were reported as a median with interquartile ranges or mean ± SD.For categorical variables, the Chi-square test or Fisher's exact test was used for analysis, and were represented as frequencies.
The performance of the models was evaluated using the following statistical parameters: true positives (TP), false positives (FP), false negatives (FN), true negatives (TN), sensitivity, specificity and the accuracy of the models.Besides, the area under the receiver operating characteristic curve (AUC-ROC) with 95% CI and the balanced accuracy were also used to evaluate model performance.We calculated these statistical parameters on both training and validation set to show the generalization capability of these models.A two-tailed P <0.05 was considered to have a statistical significance.All statistical analyses were conducted using SPSS (Version 26.0,IBM Corp., Armonk, NY, USA) and R software (Version 4.3.0).

Figure 1 .
Figure 1.The flowchart of this study.

Figure 2 .
Figure 2. ROC curves for four machine learning models.(A) AUCs of four machine learning models in the training cohort; (B) AUCs of four machine learning models in the test cohort.ROC, receiver operating characteristic curve; AUC, area under the curve; LR, logistic regression; RF, random forest; SVM, support vector machine; XGB, extreme gradient boosting.

Figure 3 .
Figure 3. Decision curve analysis of random forest model.The black line is the net benefit for a strategy of treating all men; the yellow line is the net benefit of treating none.The y-axis indicates the overall net benefit, which is calculated by summing the benefits (true positive results and subtracting the harms (false positive results).

Figure 4 .
Figure 4. Summary plots of SHapley Additive exPlanations (SHAP) values.(A) SHAP feature importance quantified through the average absolute Shapley values.This plot illustrates the significance of each feature in development of the predictive model.(B) Representation of the influence exerted by each feature on the final model output, assessed via SHAP values distribution.Every individual patient is denoted by a data point within each row.The color indicates whether the continuous feature is at a high level (displayed in blue) or a low level (displayed in red) for that specific observation.When it comes to categorical features, the color blue signifies "yes", while the color red corresponds to "no".Location 1, 2, 3, 4, 5, 6, 7 denotes anterior cerebral artery, middle cerebral artery, internal cerebral artery, posterior cerebral artery, anterior communicating artery, posterior communicating artery and others, respectively.

Figure 6 .
Figure 6.SHAP force plot for interpreting individual's prediction outcomes.This plot offers a visual illustration of the RF model's predictions, wherein the red and blue bars signify risk factors and protective factors, respectively.The length of the bars corresponds to the extent of feature importance.(A) Poor outcome; (B) favorable outcome.

Table 2 . Baseline characteristics of the training set and validation set.
Abbreviations: CHD, coronary heart disease; WFNS, World Federation of Neurological Societies; mFS, modified Fisher Scale; ACA, anterior cerebral artery; MCA, middle cerebral artery; ICA, internal cerebral artery; PCA, posterior cerebral artery; ACoA, anterior communicating artery; PCoA, posterior communicating artery.basedontheiraverage absolute SHAP values to showcase their relative significance.Figure4Bprovides a comprehensive visualization of how factors influence the RF model, with blue indicating high level and red representing low levels for continues features in each specific observation.For categorical features, blue denotes "yes" while red corresponds to "no".

Table 3 . Model performance using training and validation cohorts.
understanding of the underlying model's decision-making process and facilitate the utilization of prediction results.The feature importance analysis revealed WFNS grade, age, mFS, and treatment of coiling embolization as predominant predictors for poor prognosis in the RF model.Our findings indicate that higher WFNS grade, higher mFS grade, and advanced age are distinct predictors for poor prognosis in highgrade SAH patients; while the treatment modality of coiling embolization serves as a protective factor. comprehensive