Identification of immune and stromal cell infiltration-related gene signature for prognosis prediction in acute lymphoblastic leukemia

Acute lymphoblastic leukemia (ALL) is a common and life-threatening hematologic malignancy, its occurrence and progression are closely related to immune/stromal cell infiltration in the bone marrow (BM) microenvironment. However, no studies have described an immune/stromal cell infiltration-related gene (ISCIRG)-based prognostic signature for ALL. A total of 444 patients involving 437 bulk and 7 single-cell RNA-seq datasets were included in this study. Eligible datasets were searched and reviewed from the database of TCGA, TARGET project and GEO. Then an integrated bioinformatics analysis was performed to select optimal prognosis-related genes from ISCIRGs, construct a nomogram model for predicting prognosis, and assess the predictive power. After LASSO and multivariate Cox regression analyses, a seven ISCIRGs-based signature was proved to be able to significantly stratify patients into high- and low-risk groups in terms of OS. The seven genes were confirmed that directly related to the composition and status of immune/stromal cells in BM microenvironment by analyzing bulk and single-cell RNA-seq datasets. The calibration plot showed that the predicted results of the nomogram were consistent with the actual observation results of training/validation cohort. This study offers a reference for future research regarding the role of ISCIRGs in ALL and the clinical care of patients.


INTRODUCTION
Acute lymphoblastic leukemia (ALL) is one of the most frequent hematologic malignancies, especially in children, accounting for approximately 25% of all pediatric cancers [1]. The immunophenotypes of ALL include T cell ALL (T-ALL), B cell ALL (B-ALL) and mixed-phenotype acute leukemia (MPAL), where MPAL is a rare immunophenotype with features of both ALL and acute myeloid leukemia (AML) [2]. Although the application of immunotherapy led by chimeric antigen receptor T (CAR T) cell therapy has greatly improved the clinical remission rate of ALL patients [3], a shorter overall survival (OS) rate caused by adverse prognosis is still a crucial challenge for clinicians and patients. For example, the OS rate of adult ALL patients is less than 45% [4].
Accumulating evidence indicates that the crosstalk between tumor and immune cells plays a crucial role in cancer development by regulating tumor malignancy, immune/stromal cell infiltration, and immune evasion in the tumor microenvironment [5][6][7][8]. Similar to solid tumors, the bone marrow (BM) microenvironment of ALL is a dynamic system of immune cells, endothelial progenitor cells, stromal cells, extracellular matrix, growth factors and cytokines. Therefore, evaluating the role of immune/stromal cell infiltration of the BM microenvironment in survival and progression and identifying novel accurate biomarkers for assessing the immune/stromal cell infiltration-related risk in patients with ALL is of utmost importance for improving the prognosis.
Previous studies confirmed that activated stromal cells could rescue ALL cells from oxidative stress by transferring mitochondria [9], and some immune cells could predict outcomes in ALL patients [10]. For example, the proportion of PD1+TIM3+ double-positive CD4+ T cells could predict poor survival in adult B-ALL patients [10,11], and increased frequencies of activated cytokine-producing natural killer (NK) cells could independently predict poor clinical outcome in ALL patient [12]. In tumor microenvironment, cytokines played a major role in the regulation of the cellular responses between tumor cells and immune cells, for example, TGFβ, IL-10 and IL-6 could dampen the anti-tumor response of NK cells by suppressing activity and promote subsequent tumor evasion and progression [13,14].

The Estimation of Stromal and Immune cells in Malignant Tumors using Expression data (ESTIMATE)
algorithm is an analysis approach based on single sample gene expression signatures to infer the fraction of stromal and immune cells and to generate immune/stromal scores for predicting the infiltration of stromal and immune cells in malignant tumors [15]. It has made outstanding progress in a variety of solid malignancies and some hematological malignancies, such as glioma [16], prostate cancer [17], gastric cancer [18], colon cancer [19] and AML [20]. At present, the prognosis prediction models regarding ALL are still based on specific genes or some clinicopathologic characteristics [21][22][23]. Therefore, this study aims to investigate the infiltration of immune/stromal cells in the BM microenvironment and construct an accurate immune/stromal cell infiltrationrelated genes (ISCIRGs)-based model for prognosis prediction of ALL patients.

Data source and clinicopathologic characteristics of patients
The datasets selection process is shown in Figure 1. By reviewing the information of ALL related datasets from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project database, 494 and 325 primary datasets were identified for training group and validation group, respectively. After eliminating 95 datasets that were not derived from BM samples and 5 datasets without follow-up time, survival information and detailed clinical information, 394 datasets were included in training group. Meanwhile,

Identification of immune and stromal cell infiltration in training cohort
The ESTIMATE algorithm was applied to calculate the immune/stromal score for all included samples, in training cohort, the former ranged from 524.82 to 3304.62, and the latter ranged from −2316.13 to −362.69 (Supplementary Table 1). The immune score had a significant association with immunophenotype (p < 0.0001, Figure 2A) but not with race (p = 0.3753, Figure 2C), and the stromal scores were significantly associated with both the immunophenotype (p < 0.0001, Figure 2B) and race (p = 0.0465, Figure 2D).
Subsequently, the included ALL patients were classified into high (n = 197) and low score (n = 197) groups to explore the potential relationships in OS vs. immune scores and OS vs. stromal scores. Kaplan-Meier (KM) survival analysis showed that the OS time of patients in both the high immune score group and the high stromal score group was significantly shorter than that of patients in the low immune score group (p = 0.015, Figure 2E) and the low stromal score group (p = 0.003) (p > 0.05, Figure 2F). The above findings suggest that the OS of ALL patients is significantly associated with both the immune score and stromal score.

Identification of common differentially expressed genes (DEGs) based on immune/stromal scores
A total of 440 and 692 DEGs between high/low immune and stromal scores were identified, respectively ( Figure 3A). The heatmap is shown in Figure 3B. Moreover, 233 commonly downregulated genes and 102 commonly upregulated genes were identified from the immune score/stromal score groups through integrated bioinformatics analysis, and the Venn plot is shown in Figure 3C. These common DEGs are the raw data for subsequent studies.
To investigate the biological functions of the common DEGs, we further performed Gene Ontology (GO) and Kyoto Encyclopedia of Gene and Genomes (KEGG) pathway enrichment analysis. GO enrichment analysis included three subontologies: biological process (BP), cellular component (CC) and molecular function (MF). For BP, the common DEGs mainly enriched in immune response-activating cell surface receptor signaling pathway; For CC, the common DEGs were chiefly enriched in MHC protein complex; For MF, DEGs were principally enriched in immune receptor activity Supplementary Figures 1-3). KEGG pathway enrichment analysis showed that the enrichment of the common DEGs was chiefly concentrated in cell adhesion molecules, systemic lupus erythematosus, and hematopoietic cell lineage (Supplementary Figure 4).

Identification of a ISCIRGs-based signature
To investigate the potential value of the common DEGs in predicting the OS of ALL patients, first, KM survival analysis was performed for all common DEGs. The results showed that 335 common DEGs were significantly associated with OS (p < 0.05, Supplementary Based on the BM sample of healthy persons and ALL patients from TCGA, we found that, compared to healthy BM, in ALL, the gene expression level of LILRA1, NRGN, MT-ND6, EMP2, and IGHM were significantly decreased ( Figure 4C). Furthermore, based on four datasets of ALL patients (i.e.,   Figure 6). Furthermore, GO enrichment analysis showed that the seven ISCIRGs were mainly enriched in mitochondrial electron transport and oxidative phosphorylation (BP), NADH dehydrogenase (ubi-quinone) activity (MF), and respirasome (CC). KEGG pathway enrichment analysis demonstrated that these genes were also significantly associated with oxidative phosphorylation (Supplementary Table 4). sequence. A vertical line was drawn at the value selected using ten-fold cross-validation, where an optimal lambda value resulted in ten features with nonzero coefficients. (B) Optimal parameter (lambda) selection in the LASSO model used ten-fold cross-validation via minimum criteria. The partial likelihood deviance (binomial deviance) curve was plotted versus the log(lambda) value. Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the I standard error of the minimum criteria. (C) The mRNA expression profiles of the seven prognostic ISCIRGs between health and tumor tissues (bulk RNA-seq datasets). ** p < 0.01, *** p < 0.001 and **** p < 0.0001. (D) Genetic alterations of seven prognostic ISCIRGs in ALL calculated by cBioPortal database. * p < 0.05. AGING

Prognostic value of the ISCIRG-based signature
After calculating the risk scores of all patients, we used the median risk score as a cutoff value to classify patients of training cohort into high-and low-risk groups. KM survival analysis showed that the patients in high-risk group had significantly lower OS than those in low-risk group (log-rank text p < 0.0001, Figure 5A, 5B). Time-dependent ROC analysis confirmed favorable values in predicting OS in this validation set ( Figure 5C). We also performed univariate and multivariate Cox regression analysis. Univariate Cox regression analysis confirmed that risk score was a significant prognostic factor (hazard ratio (HR) 95% confidence interval (CI): 5.915 [3.208, 10.907], p < 0.001, Figure 6A). Multivariate Cox regression analysis showed that after adjusting for clinicopathological features and tumor purity, the ISCIRG-based signature was still an independent prognostic factor and predictor for ALL patients (HR 95% CI: 2.527 [1.052, 6.070], p = 0.038, Figure 6B). In addition, tumor purity was not a factor (p = 0.271) influencing the significant association between the ISCIRG-based signature and prognosis in multivariate cox model, demonstrating that this model was not just reporting the level of tumor burden in patients.
Subsequently, subgroup analysis was performed to further confirm the prognostic value of the gene signature in different clinicopathological factors. The results showed that the association between risk score and OS remained markedly significant after controlling for race and immunophenotype ( Figure 7A-7D).

Association between the ISCIRG-based signature and immune cell infiltration
Four normal and seven B-ALL single-cell RNA-seq datasets from GSE134759 were included in this analysis. The included samples were collected at the  Figure 8D). Moreover, the mRNA expression levels of LILRA1, VPREB3, EMP2, and IGHM were statistically increased, and the mRNA expression levels of NRGN and MT-ND6 were statistically decreased in ALL sample compared to normal sample (Supplementary Figure 8).
Based on the annotated single-cell gene expression matrix in the above results, we built a custom signature matrix file by using Cell-type Identification by Estimating Relative Subsets of RNA Transcripts x (CIBERSORTx) (Supplementary Table 5). Subsequently, we used the custom signature matrix file to impute the BM cell fractions for patients of training cohort ( Figure 9A). After removing B cells and T cells that may correspond to the immunophenotype of ALL, the results showed that the low-risk group exhibited higher levels of HSPCs. The high-risk group exhibited higher levels of myeloid cells, DCs, and NK cells ( Figure 9B). Figure 9C demonstrates correlations between the various immune cells. Furthermore, the survival analysis showed that patients with higher DCs and NK cells had a shorter OS time than those with lower DCs (p < 0.001) and NK cells (p = 0.01). While, patients with higher HSPCs had a longer OS time than those with lower HSPCs (p < 0.001, Figure 9D).

Construction and validation of clinically applicable prognostic nomogram
Based on the four valuable factors (immunophenotype, race, age and gender) and the ISCIRG-based signature in the bulk RNA-seq data set of training group, a prognostic nomogram was developed to provide a clinically applicable quantitative approach for individual OS prediction ( Figure 10A). The calibration test indicated the perfect prediction ability of this model (concordance index (C-index): 0.802 95% CI: 0.7473−0.8572, S: p = 0.887, ROC = 0.840, Figure  10B). The calibration plots suggested that the agreement between the predicted 1-, 3-, and 5-year OS rates and actual observations was excellent ( Figure 10C). These results suggest that the prognostic nomogram is reliable and can be applied for ALL patients.
To validate the feasibility and robustness of our nomogram, we performed a similar analysis process for dataset of validation group. KM survival analysis showed that the population with higher risk score also had significantly lower OS than those with lower risk score (log-rank text p = 0.03, Supplementary Figure 9A, 9C). Time-dependent ROC analysis confirmed favorable values in predicting OS in this validation set (Supplementary Figure 9B). And the calibration test showed the good prediction ability as well (C-index: 0.6861 95% CI: 0.5777−0.7946, S: p = 0.962, ROC = 0.778, Supplementary Figure 10A). The calibration plots also suggested that the agreement between the predicted 1-, 3-, and 5-year OS rates and actual observations was good (Supplementary Figure 10B). Besides, we further validated the prognostic significance of this model by analyzing two external validation datasets, including GSE13576 (n = 197) and GSE50999 (n = 43). The results showed that patients (GSE13576) with pediatric B-ALL in higher risk score group had higher early and late relapse rates (19.4%) compared to those in lower risk score group (13.3%, Supplementary Figure 11, Supplementary Table 6), and patients (GSE50999) with T-ALL in higher risk score group also had a higher relapse rate (28.6%) than those in lower risk score group (13.6%, Supplementary Figure  12, Supplementary Table 6).
Furthermore, to visualize and facilitate the clinical use of the prognostic nomogram, we developed an easy-tooperate web-based model that predicted the OS of ALL based on the established nomogram ( Figure 11A, 11B). The estimated probability of disease progression could be obtained by drawing a vertical line from the total point axis to the outcome axis.

DISCUSSION
ALL is a rapidly progressive hematologic malignancy whose maintenance and progression are highly dependent upon the interaction between immune/ stromal cells and the nonmalignant microenvironment [24]. Therefore, in the early stage of disease progression, adequate assessments of the infiltration of immune/stromal cells and accurate prediction tools are essential for treatment decision-making and prognostic evaluation in patients with ALL. This study investigated immune/stromal cell infiltration and assessed the involvement of ISCIRGs in prognosis in patients with ALL. First, the scores regarding immune/stromal cell infiltration were proved to be of prognostic value. Second, a ISCIRGs-based signature was constructed. Third, seven key ISCIRGs were confirmed that had prognostic value and were associated with the CI of the web-based progression-free survival rate. AGING infiltration of specific types of immune cells based on the analysis results combined single-cell RNA-seq and bulk RNA-seq datasets. Finally, a prognostic nomogram composed of the ISCIRGs-based signature, age, gender, race and immunophenotype was successfully developed and validated for predicting the prognosis of ALL patients.
Some previous studies have showed that there are differences in the prognosis of ALL patients with various immunophenotypes. For example, for T-ALL, 50% of adult patients have poor prognosis after chemotherapy [25], while, for B-ALL, some patients still have more than 60% relapse rate even after treatment with CAR T therapy [26]. Since three immunophenotypes (i.e., T-ALL, B-ALL and MPAL) were included in this study, we first calculated the immune/stromal scores for patient groups with three immunophenotypes, respectively, and found that the scores were significantly different between three patient groups. This suggested that the immune/stromal cells of patients with various immunophenotypes were in different states, and this difference might be a nonnegligible factor leading to poor prognosis of various immunophenotypes. Subsequently, we found that patients with different race also had different stromal scores. It is known that racial and ethnic disparities persist in the outcome and prognosis of ALL [27]. Therefore, the proportion and states of stromal cells might be two important factors leading to the different prognosis of various races. Then, the KM curve showed that the immune and stromal scores were significantly associated with OS, respectively. Thus, both the immune and stromal cell infiltration were meaningful in predicting OS, and it is necessary to identify prognostic genes based on immune scores, stromal scores or both.
After obtaining the common DEGs from the low vs. high immune score/stromal score groups, we performed a functional enrichment analysis for these genes, and found that these genes mainly enriched in immune system. This result not only verified the reliability of the ESTIMATE algorithm, but also indicated that ISCIRGs mainly regulated the immune system. Previous studies have found that some individual genes regulated immune system are crucial factors for ALL with poor prognosis, such as IKZF1 [28]. While, in this study, the prognostic signature was constructed based on seven ISCIRGs (i.e., LILRA1, NRGN, VPREB3, MT-ND6, EMP2, IGHM and FFAR1). The OS association of this prognostic signature (p = 3.957e−13) was significantly higher than ESTIMATE immune scores-based signature (p = 0.015) or stromal scores-based signature (p = 0.003), suggesting the great potential of prognostic prediction.
Among the seven ISCIRGs, VPREB3 and IGHM have been confirmed that are associated with the prognosis of ALL [29], and MT-ND6 are found mutation in ALL patients [30]. In terms of each gene, the higher expression of all seven ISCIRGs was positively associated with shorter OS time. The major function of the seven genes included: LILRA1 is to regulate immune responses by interacting with MHC class I ligands [31]; NRGN is a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium [32]; VPREB3 and IGHM is thought to be involved in B cell maturation [33], mutation or absence can cause either an arrest or a severe impairment at the pro-B cell stage [34,35]; MT-ND6 involved in mitochondrial electron transport, NADH to ubiquinone and mitochondrial respiratory chain complex I assembly [36]; EMP2 regulates cell membrane composition, and up-regulation of this gene has been linked to cancer progression in multiple different tissues [37,38]; and FFAR1 involved in the metabolic regulation of insulin secretion [39]. In terms of the synergies of these genes, the results of PPI network analysis and functional enrichment analysis for the seven DEGs showed that these genes were mainly enriched in mitochondrial electron transport and oxidative phosphorylation. Jaramillo et al. [40] found that Rho (0) malignant T cells with impaired mitochondrial electron transport chain function had lower sensitivity to the combination treatment than wild-type cells, and Chen et al. [41] confirmed that oxidative phosphorylation can enhance resistance to chemotherapy in B-ALL. And all patients included in the survival analysis were treated with conventional combined chemotherapy. Therefore, the high feasibility of the prognostic signature may also be related to the reduction of combined chemotherapy sensitivity caused by the differential expression of the seven genes. These genes may also be the key hints for overcoming resistance to immunotherapy.
Subsequently, we investigated the expression of the seven genes in various cell types by analyzing the single-cell RNA-datasets, and further performed the difference in the proportion of immune cells in highand low-risk groups by integrating single-cell and bulk RNA-seq datasets. In the single-cell RNA-seq datasets, we found that due to the accumulation of a large number of immature lymphocytes, the immune cells in the BM of ALL patients were significantly reduced compared to the healthy BM, and the types of other immune cells (except for immature lymphocytes) showed differences in each patients. And this difference was related to the ISCIRGs-based signature and the prognosis of patients of bulk RNA-seq datasets.
Previous studies have suggested that some immune cells could predict the prognosis of all patients, for example, Hohtari et al. [10] and Zhou et al. [11] found that the AGING proportion of PD1+TIM3+ double-positive CD4+ T cells could differentiate the poor survival group of B-ALL, and Duault et al. [12] revealed that increased frequencies of activated cytokine-producing NK cells could independently predict poor clinical outcome in ALL patient. Our results showed that the patients in high-risk group had significant higher proportions of DCs and NK cells compared to those in low-risk group, and higher proportions of DCs and NK cells had significant shorter OS time compared to lower proportions. The results regarding association between NK cells and poor prognosis were consistent with previous study [12], while the results of DCs were contrary to previous cognition. DCs were thought to be able to present tumor antigens, stimulating immune system to play an anti-tumor role in ALL [42]. However, our results showed that higher proportions of DCs was no benefit for good prognosis for the first time. Given that the seven ISCIRGs-based signature are directly related to the proportion and even the status of immune cells, future studies are required to elucidate the mechanisms associated with prognosis of these genes in specific immune cells (e.g., DCs).
Nomogram is a multivariable regression model that widely used in studies to predict clinical outcomes with intuitive visual presentations [43]. In this study, we constructed a prognostic nomogram based on the ISCIRGs-based signature and clinicopathological factors of training group to predict ALL patient outcome. The results of validation group and long-term follow-up examinations indicated the high reliability of this nomogram. To our knowledge, this nomogram is the first to predict ALL patient survival based on ISCIRGs. And compared with the prediction models published in previous studies [21][22][23], our prediction model has higher accuracy and wider application range. In addition, considering the imbalance of the prognostic model, we included two more independent studies related to the prognosis of ALL patients (external validation cohorts) to further validate the prognostic value of the ISCIRGs-based signature. The results showed that the higher risk groups of the both external validation cohorts had higher relapse rates compared to the lower risk groups. A large number of clinical data have confirmed that relapse is the major reason for poor prognosis of ALL patients, once relapse, the completed cure rate will be less than 40% [44,45]. And the major factor leading to relapse is the resistance of patients to combined chemotherapy. Therefore, the results of two external validation cohorts not only further verified the great prognostic prediction ability of the ISCIRGsbased signature, but also further suggested the potential correlation between seven prognostic genes and resistance of combined chemotherapy. Furthermore, we established the corresponding web-based calculator.
The scoring system and web-based calculator may help clinicians make survival predictions based on clinicopathological factors and cellular differentiation information of the BM microenvironment and further suggest better treatment options.
This study still has some limitations. First, the results of this study were obtained only through an integrated bioinformatics analysis. Although we verified the results through one validation cohort and two external validation cohorts, and included some immunohistochemical pictures of crucial ISCIRGs in both health and ALL BM, this is still inadequate. Before the results are used in clinical prediction, clinical experimental validation is needed to confirm them, and the underlying mechanisms associated with the prognostic significance of the identified ISCIRGs in ALL need to be further investigated. Second, although we have tried our best to search the RNA-seq datasets of ALL patients with prognostic information through various databases, literatures and search engines, the datasets containing survival information are still limited. Third, due to the lack of clinicopathologic information in the included datasets, other potential risk factors influenced prognosis (e.g., chemotherapy, molecular drugs, SNP and CNV) were not considered.
In summary, we highlight the immune/stromal cell infiltration differences in the BM environment and their essential roles in predicting the clinical outcome of ALL patients, constructing a prognostic model based on the ISCIRG signature and clinicopathologic factors. This model could serve as a reliable tool for predicting outcomes and determining treatment strategies in patients with ALL.

Raw data retrieval and calculation of immune/stromal cell infiltration
The gene expression file datasets of training group as well as corresponding metadata and clinical profiles were obtained from TCGA database after restricting "Disease Type" to "lymphoid leukemia", "Experimental Strategy" to "RNA-Seq", "Data Category" to "transcriptome profiling" and "Data Type" to "Gene Expression Quantification". The format of the downloaded data is "HTSeq-Counts". All patients included in this study suffered from ALL. We excluded the following datasets: 1) not BM sample; 2) without follow-up time; 3) unknown death or not; and 4) without detail clinical characteristics. The gene expression file dataset and clinical profile of validation group were selected from TARGET project database. We merged the data of ALL patients in Phases I, II and III, and then excluded the following datasets: 1) duplicate patients with training cohort; 2) not BM sample; 3) without follow-up time; 4) unknown death or not; and 5) without detail clinical characteristic. The calculation of immune score, stromal score and tumor purity by employing the ESTIMATE algorithm for all downloaded dataset was performed by a custom script of Python 3.9.5 (Python Software Foundation, Delaware, USA) and "estimate" R package of R software 4.0.5 (R Foundation for Statistical Computing, Vienna, Austria) [46]. The stromal and immune scores were calculated by perform single-sample gene setenrichment analysis (ssGSEA) [47,48], and the tumor purity was calculated by using the following formula: cos(0.6049872018 0.0001467884 ESTIMATE score) Tumor purity = +  ESTIMATE score represents the sum of stromal score and immune score [15].

Common DEGs identification
Based on the immune/stromal score, the included patients were classified into high-and low-score groups. The DEGs of training group were analyzed and identified through the "limma" R package, and the cutoff values were set as | fold change (FC)| > 2 and p < 0.05 [49]. The heatmap and volcano plots were generated by the "ggplot2" and "pheatmap" R packages, respectively [50]. The identification of the common DEGs from the immune score and stromal score groups was processed by Python script [46].

Construction and validation of the prognostic genes signature
First, KM survival analysis was used to illuminate correlations between immune/stromal scores and OS as well as each common DEG and OS, and the log-rank test was utilized to assess the statistical significance of the correlation [51]. Second, univariate Cox regression analysis was performed to evaluate the association between the expression levels of common DEGs and OS. Third, the identified prognosis-related genes (p < 0.05) were screened by LASSO and multivariate Cox regression analyses [52]. An ISCIRG-based signature was constructed by using the following formula: ExpGENEi represents the expression level of the identified ISCIRGs, and βi represents the regression coefficient calculated by multivariate Cox analysis [53].
Fourth, based on the risk score model, patients were stratified into either the low score (low-risk) group or the high score (high-risk) group. KM survival analysis was used to estimate the OS of these two groups. The predictive accuracy of this model was assessed by Harrell's C-index and time-dependent ROC curve analysis within 1-, 3-and 5-years [54]. Univariate and multivariate Cox regression analyses were performed to assess whether the predictive performance of this model could be independent of tumor purity or other clinicopathologic factors.

Construction of prognostic nomogram and validation of model
Based on the results of univariate and multivariate Cox regression analyses, the identified independent prognostic factors were used to develop a prognostic nomogram for predicting the 1-, 3-, and 5-year survival outcomes by the "rms" R package. Subsequently, the Cindex, U test, ROC and calibration plot were used to assess the discrimination performance of this prognostic nomogram in training and validation cohorts [54]. Furthermore, two external validation datasets (i.e., GSE13576 and GSE28703) from Gene Expression Omnibus (GEO) database were used to verify the significance of the prognostic model. The risk scores were first calculated. Then, we verified the clinical significance of our model by comparing the prognostic information of patients. Moreover, to facilitate clinical application, we constructed a visualization tool with a web-based calculator [55].

Single-cell RNA-seq data processing
Single-cell RNA-seq datasets of ALL (GSE134759) were downloaded from GEO database. First, we merged patient and healthy sample datasets by the "Seurat" R package [56]. Then, we removed the lowquality cells based on the following standards: 1) genes detected in fewer than three cells; 2) cells with fewer than fifty total detected genes; 3) cells with more than 10% mitochondrion-expressed genes; and 4) cells with fewer than 1500 or more than 4000 expressed genes. The 2000 most variable genes were generated and used to perform principal component analysis (PCA). The tdistributed stochastic neighbor embedding (t-SNE) algorithm was used to compute 20 initial PCs and perform cluster classification analysis. Subsequently, we used the Wilcoxon rank sum test with Bonferroni multiple-comparison correction to determine cluster marker genes, and p < 0.05 and | log2(FC) | > 0.25 were set as cutoffs. Finally, each cell cluster was annotated by the "singleR" R package and then verified with broad cell type markers (Supplementary Table 7) [57,58]. AGING

Immune cell infiltration analysis in high and low risk populations
We calculated the fractions of immune cell subsets in ALL samples (bulk RNA-seq) with the CIBERSORTx algorithm. First, based on the annotated single-cell gene expression matrix in the above results (single-cell RNAseq), a custom signature matrix file was constructed by using CIBERSORTx Create Signature Matrix module. Then, the proportion of each types of cell in bulk RNAseq datasets were calculated by using CIBERSORTx Impute Cell Fractions module [59]. Correlation relation matrix was performed by the "corrgram" and "corrplot" R packages. KM survival analysis was used to assess correlations between the proportion of each types of cell and OS [60].

Functional analysis
GO and KEGG pathway enrichment analysis as well as the interrelation analysis were calculated and plotted by using the ClueGO plug-in in Cytoscape software 3.6.1 (National Resource for Network Biology, California, USA), and p < 0.05 was set as the cutoff [61]. The cBio Cancer Genomics Portal (cBioPortal) database was used to evaluate the mutations and copy number variations in tumor patients [60]. STRING database was used to assess the gene-encoded proteins and PPI information [62]. The downloaded PPI information file was subsequently established as a PPI network using Cytoscape software. The modular analysis for the PPI network was performed by the MCODE plug-in in Cytoscape based on score and node number, and the most significant module was identified [63].
The complete method design of this study is shown in Supplementary Figure 13.

Data availability statement
Publicly available datasets were analyzed in this study. These data can be found as follow links:

ACKNOWLEDGMENTS
We would like to thank Liang-Jun Jia for his technical guidance on the analysis.  Tables   Please browse Full Text version to see the data of Supplementary Tables 1, 2, 5