A risk score for the prognosis prediction of the muscle-invasive bladder cancer patients who received gemcitabine plus cisplatin chemotherapy

To develop an individualized gene-based risk score to predict the prognosis of muscle-invasive bladder cancer (MIBC) patients who received GC regimens. We downloaded transcriptome profiling data and clinical information from the Cancer Genome Atlas (TCGA) database. We identified 1854 survival-associated genes and then constructed the risk score based on six special genes selected from the survival-associated genes. We divided patients into high-risk and low-risk groups according to the median risk score. High-risk patients have significantly poorer overall survival than low-risk patients (log-rank test chi-square = 38.08, p = 7e-10, C-index = 0.785, se = 0.032). The risk score was evaluated by Kaplan–Meier survival curve, time-dependent ROC curves, and C-index. Multivariate Cox regression and nomogram suggested that the risk score was an independent prognostic indicator. Gene set enrichment analysis indicated that the survival-associated genes were significantly enriched in immune-related terms. Among six special genes, CHPF2, TRAV26-2, and BTF3P12 were found to be immune-related genes. In conclusion, our risk score provided an indicator to predict the prognosis of MIBC patients who received GC regimens and potential immunotherapeutic targets for MIBC.


INTRODUCTION
Bladder cancer (BC) is the ninth most common cancer worldwide. According to GLOBOCAN data, approximately 550,000 new cases were diagnosed in 2018, accounting for roughly 3% of all new cancer diagnoses worldwide [1]. In the United States (US), BC is the sixth most common cancer and the eighth most common cause of cancer death, with an estimated 80,500 newly diagnosed cases and an estimated 17,600 death in 2019 [2]. Although the incidence of BC has fallen over the past decades, the mortality of BC has remained steadfast since 1987 in the US [3].
Approximately 25% of all newly diagnosed BC patients have muscle-invasive bladder cancer (MIBC) every year [4]. MIBC treatment generally involves a combination of chemotherapy, radiation, and surgery. Radical cystectomy (RC) and pelvic lymph node dissection are the standard surgery approach. Still, many clinical trials showed patients would benefit more from additional chemotherapy by increasing five years overall survival (OS) and ten years OS by roughly 10%, respectively, than RC alone [5][6][7][8][9][10], and therefore chemotherapy plays an essential role in MIBC prognosis improvement. Within two decades, cisplatinbased neoadjuvant chemotherapy has become the standard approach for MIBC treatment. Compared with loco-regional therapy alone, cisplatin-based chemotherapy had a significant 5-year OS benefit and reduced risk of death [11,12]. Gemcitabine and cisplatin (GC) regimens and methotrexate, vinblastine, adriamycin, and cisplatin (MVAC) regimens are the most common cisplatin-based regimens. Researchers and clinicians generally believed neither of the combinations was superior to the other in terms of progression and prognosis [13,14], but new evidence suggested that patients who received MVAC regimens may have better survival outcomes than those who received GC regimens [12]. Many clinicians prefer the GC regimens due to their low toxicity and ease of administration [15] in clinical practice. However, many studies reported that 50% of MIBC patients are ineligible for cisplatin-based chemotherapy due to either age-related or disease-related risk comorbidities [4]. There is not a validated approach to predict the prognosis of MIBC patients who received cisplatinbased regimens yet, not to mention a similar study for the GC regimens.
This study aimed to develop an individualized genebased risk score to predict the prognosis of MIBC patients who received GC regimens, verify its role in prognosis prediction and cisplatin-based regimens choice, and investigate its potential mechanisms.

Clinical samples and data acquisition
We downloaded a transcriptome profiling dataset of 409 patients, a clinical characteristics dataset of 405 patients, and a drug treatment dataset of 427 patients from the BLCA project of the Cancer Genome Atlas (TCGA) database. As these three datasets contained different data, we merged three datasets into a study dataset that contained complete information about MIBC patients who received GC regimens. Before merging, we log2transformed gene expression data in the transcriptome profiling dataset and converted nonstandard drug names to standardized drug names in the drug treatment dataset by querying the NCI Drug Dictionary (https://www.cancer.gov/publications/dictionaries/cance r-drug). As we had removed duplicate and unmatched patients during merging, the study dataset contained clinical characteristics information and gene expression data of 65 unduplicated MIBC patients.

Analysis of survival-associated genes of the MIBC patients who received GC regimens
After converting gene expression data from the quantitative expression level to the relative expression level-high expression and low expression-by the median gene expression, we performed univariate cox regression analysis on the study dataset. We selected genes significantly associated with OS as the survivalassociated gene for further analysis (adjusted P-value < 0.05). Overall survival time was defined by days from the first diagnosis to death or the last follow-up. Status 'death' meant the event happened and status' last follow-up' meant censored. We used Gene Ontology (GO) enrichment analysis to investigate the molecular functions of the survival-associated genes and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis to explore the potential molecular mechanisms of the survival-associated genes (adjusted P-value < 0.05). The survival-associated genes were annotated by the R package biomaRt [16]. GO enrichment analysis and KEGG enrichment analysis were performed using the R package clusterProfiler [17].

Development of a risk score for the prognosis prediction of MIBC patients who received GC regimens
We selected a few special genes from the survivalassociated genes by Least Absolute Shrinkage and Selection Operator (LASSO) cox regression analysis. Then we applied a multivariate cox regression analysis on these special survival-associated genes with Akaike's information criterion (AIC). We calculated the individualized risk score of MIBC patients who received GC regimens by the formula generated from the multivariate cox regression analysis: risk score = n ii i x =  1  where xi denoted the relative gene expression level and βi denoted the coefficient of the multivariate cox regression analysis. LASSO cox regression was performed using the R package glmnet [18].

Evaluation of the predictive value of the risk score
We evaluated the predictive value of the risk score on the study dataset, where we considered its performance in the subsets of the clinical characteristics (age at diagnosis, gender, and ACJJ pathological stage). We divided the patients into high-risk and low-risk groups by the median risk score. Kaplan-Meier survival curve and log-rank test were employed to compare the OS of the high-risk and low-risk groups. Time-dependent receiver operating characteristic curve (ROC) was plotted to evaluate the predictive power of the risk score. The overall concordance statistic (C-index) was introduced to assess the predictive accuracy of the risk score. The relationship between the risk score and the clinical characteristics was investigated by comparing the average risk score level of different clinical characteristics subcategories and building a nomogram combing the risk score and the clinical characteristics.

RESULTS
The workflow of this study is summarized in Figure 1.

Clinical characteristics of MIBC patients who received GC regimens
A total of 65 MIBC patients were involved in this study. The clinical characteristics of the 65 MIBC patients are summarized in Table 1.

Identification of survival-associated genes of the MIBC patients who received GC regimens
A total of 1854 genes significantly associated with OS were identified as survival-associated genes, among which 1248 genes with a hazard ratio (HR) >1, indicating patients with a high expression level of these genes might have a poor survival time, and 606 genes with an HR <1, indicating patients with the high expression level of these genes might have a better survival time.

Molecular functional analysis of the survivalassociated genes
The GO enrichment analysis showed that survivalassociated genes were significantly enriched in 'positive regulation of natural killer cell mediated immunity' and 'positive regulation of cell killing' (adjust P-value < 0.05) The study dataset contained complete information about 65 MIBC patients who received GC regimens, which was obtained by merging the clinical characteristics dataset, the Drug treatment dataset, and the transcriptome profiling dataset. Survival-associated genes were identified by univariate cox regression analysis. The KEGG enrichment analysis and the GO enrichment analysis were utilized to explore the molecular functions of the survival-associated genes. LASSO cox regression analysis was used to select special survival-associated genes. The individual risk score was calculated by multivariate cox regression analysis. Kaplan-Meier survival curve and log-rank test were employed to compare the OS of the high-risk and low-risk patients. To assess the risk score's ability to predict prognosis, a time-dependent ROC plot was created. High-risk and low-risk MIBC patients' average risk scores were compared. Risk score, gender, age at diagnosis, and AJCC pathological stage were all combined to create the nomogram. GSEA was utilized to explore key signal pathways for DEGs between high-risk and low-risk patients. terms. The KEGG enrichment analysis showed that survival-associated genes were significantly enriched in 'Natural killer cell-mediated cytotoxicity' and 'Cell adhesion molecules' pathways (adjust P-value < 0.05). We further performed GO enrichment analysis and KEGG enrichment analysis on the survival-associated genes of the patients in different ACJJ stages, and then several enrichment analyses ended with significant results. For ACJJ stage III, the survival-associated genes were significantly enriched in "RNA binding involved in posttranscriptional gene silencing", "mRNA binding involved in posttranscriptional gene silencing", and "G proteincoupled receptor binding" in GO enrichment analysis (adjust P-value < 0.05). In KEGG enrichment analysis, the survival-associated genes were significantly involved in regulating signaling pathways related to "MicroRNA in cancer" and "Staphylococcus aureus infection" (adjust P-value < 0.05). For stage IV, the survival-associated genes were significantly enriched in terms such as "response to virus", "regulation of defense response to virus", and "leukocyte migration" in the GO enrichment analysis (adjust P-value < 0.05) (Figures 2, 3).

A risk score for the prognosis prediction of MIBC patients who received GC regimens
Twelve survival-associated genes were picked by LASSO cox regression analysis, where a leave-one-out cross-validation approach was utilized to tune the optimal parameters. These twelve survival-associated genes were subjected to multivariate cox regression analysis, and six genes were left in the model by AIC (Table 2). Based on the relative expression level of the selected six genes, we constructed a risk score which was calculated by the formula: risk score = (1.3056 × ENSG00000033100) + (−1.2720 × ENSG000 00211812) + (1.8782 × ENSG00000213003) + (−1.9134 × ENSG00000231150) + (−1.0926 × ENSG 00000236047) + (−2.1814 × ENSG00000239224), where the coefficients denoted the coefficient of the multivariate cox regression analysis. We used this formula to calculate the individualized risk score of each MIBC patient.
Evaluate the predictive power and accuracy of the risk score in MIBC patients who received GC regimens 65 MIBC patients were divided into high-risk and lowrisk groups by the median risk score. The Kaplan-Meier survival curve showed that patients in the high-risk group have significantly poorer OS than those in the low-risk group. (log-rank test chi-square = 38.08, p = 7e-10, C-index = 0.785, se = 0.032). ( Figure 4A). Time-dependent ROC curves were plotted to evaluate the predictive power of the risk score in the OS prediction of 1-5 years. The time-dependent ROC curves showed that the risk score performed best in the OS prediction of 4 years (AUC = 0.987581) ( Figure 5A). Further, we plotted the Kaplan-Meier survival curve in the subsets of clinical characteristics (age at diagnosis, gender, and ACJJ pathological stage) and got consistent results with the entire dataset ( Figure  4B-4H, Figure 5B-5H) ( Table 3). Besides, the risk score got the high C-indexes in the entire dataset and the subsets of clinical characteristics (Table 3).  The relationship between the risk score and the clinical characteristics The relationship between the risk score and the clinical characteristics (gender, age at diagnosis, and AJCC pathological stage) was investigated. The risk score was independent of gender, age at diagnosis, and AJCC pathological stage (Table 4) ( Figure 6). A multivariate cox analysis and a corresponding nomogram based on risk score, gender, age at diagnosis, and AJCC pathological stage showed that risk score was an essential indicator for OS prediction, while gender, age at diagnosis, and AJCC pathological stage were not significantly associated with OS ( Figure 7A, 7B).

Analysis of DEGs between high-risk and low-risk groups in MIBC patients who received GC regimens
1849 DEGs were recognized by comparing gene expression between high-risk and low-risk groups in MIBC patients who received GC regimens, among which 1463 DEGs were up-regulated, and 386 DEGs were down-regulated. These 1849 DEGs were then submitted to GSEA, and the result showed the DEGs were primarily enriched in titles related to immune processes, such as "GOBP_ADAPTIVE_IMMUNE_ RESPONSE", "GOCC_IMMUNOGLOBULIN_ COMPLEX", "GOCC_T_CELL_RECEPTOR_ COMPLEX" (Figure 8).   classical treatment to improve the OS. Cisplatin-based neoadjuvant chemotherapy (NAC), for its low toxicity and high OS, is recommended as the standard MIBC chemotherapy regimen by the American Urological Association [4] and the European Association [13]; however, some topics are still worth discussing. First, in actual practice, around 19% of all patients undergo NAC before RC [20], while the remaining 81% do not   for various reasons. As a result, clinicians adopt adjuvant chemotherapy (AC) after RC as an alternative. Wosnitzer MS et al. [21], Matsubara N et al. [22], and Bae WK et al. [23] all concluded no statistically significant difference between NAC and AC, while Bene G Del et al. [24] found that NAC was superior to AC in terms of disease-free survival (DFS). According to Berg et al., the survival effect of AC was only seen in individuals with pure urothelial carcinoma [25]. However, all these researches are retrospective.

MIBC patients usually have poor prognoses. In clinical practice, chemotherapy in conjunction with RC is the
Macleod LC et al. [26] found significant treatment selection bias in RC timing, limiting the capacity to distinguish between NAC and AC efficacy using observational data. Patients with a higher propensity to receive NAC were expected to have a longer survival time. NAC was associated with a modest but significant survival benefit in healthier, younger patients once selection bias was considered. These findings are still expected to be validated by a prospective randomized experiment. Second, NAC/AC is not eligible for all MIBC patients. In a retrospective analysis of MIBC patients who received either NAC with RC or RC alone, Bhindi Bet et al. [27] discovered that patients who received NAC had inferior disease control and survival when their malignancy remained after chemotherapy. Third, although GC and MVAC regimens are the most used chemotherapy regimens in NAC/AC, some clinicians prefer the GC regimens to the MVAC regimens for their low toxicity and ease of administration, so how identifying individuals who would benefit from the GC regimens is a problem. In summary, it is vital to assess MIBC patients' state to determine if they would benefit from GC regimens chemotherapy. However, relevant research was rare.
In recent years, high-throughput sequencing (HTS) technologies have generated massive amounts of gene expression data about MIBC. Previous studies have investigated the potential predictive value of the genes associated with MIBC in terms of progression and prognosis [28][29][30][31][32][33][34], but none has looked into the potential predictive value of the genes related to MIBC patients who received GC regimens in terms of progression and prognosis.
In this study, we developed a risk score to assess MIBC patients and to offer chemotherapy advice. The risk score was calculated by the formula: risk score = (1.3056 × ENSG00000033100) + (−1.2720 × ENSG00000211812) + (1.8782 × ENSG00000213003) + (−1.9134 × ENSG00000231150) + (−1.0926 × ENSG00000236047) + (−2.1814 × ENSG00000239224). After dividing the MIBC patients into high-risk and low-risk groups by the median risk score, we found high-risk patients had a poorer OS than low-risk patients in the entire dataset and subsets of clinical characteristics (age at diagnosis, gender, and ACJJ pathological stage). Time-dependent ROC and C-index further evaluated the risk score. The results showed that the risk score could predict the survival outcomes of MIBC patients who received GC regimens accurately and steadily in the entire dataset and subsets of clinical characteristics (age at diagnosis, gender, and ACJJ pathological stage). By analyzing the relationship between the risk score and the clinical characteristics, we found that the risk score is independent of the general clinical characteristics and a good indicator of chemotherapy management. Our risk score is useful for assessing MIBC patients' state when clinicians advise chemotherapy for GC regimens.
In this study, we recognized 1854 survival-associated genes. We found these survival-associated genes were more enriched in terms about immunology, such as "positive regulation of natural killer cell mediated immunity" and "positive regulation of cell killing" in the GO enrichment analysis, and pathway terms about immunology, "Natural killer cell-mediated cytotoxicity", in the KEGG enrichment analysis. While in the GSEA on DEGs between high-risk and low-risk levels, the 1849 DEGs were also more enriched in terms about immunology, such as "GOBP_ADAPTIVE_IMMUNE_RESPONSE", "GOCC_IMMUNOGLOBULIN_COMPLEX", "GOCC_T_CELL_RECEPTOR_COMPLEX". Many studies have reported that immune-related genes play an important role in bladder cancer prognosis [31,[35][36][37][38], which is consistent with our findings. Besides, among the genes in the formula of the risk score, CHPF2, TRAV26-2, and BTF3P12 are found in the C7: immunologic signature gene sets downloaded from the MSigDB website, that implies the risk score reflects the immunological mechanism in the GC regimens chemotherapy of MIBC and its predictive ability may be due to introducing immune-related genes. As research on the molecular mechanisms of CHPF2, TRAV26-2, and BTF3P12 is rare, further studies are looking forward. On the other hand, when we performed GO enrichment analysis and KEGG enrichment analysis on the survival-associated genes of the patients in different ACJJ stages, the results differed from that of the entire dataset. Because of this study's small sample size, we remain cautious about these results and look forward to further research.
We acknowledge some limitations of this study. First, we found that only the TCGA database offered the complete chemotherapeutic information we needed in this study, and the study dataset we used contained 65 patients. As a result, we developed the risk score on a small-size dataset, and we could not validate it against a valid external dataset. Next step, we will recruit additional patients to participate in our study to update or validate the risk score. Second, we found that the prognosis of MIBC patients who received GC regimens is associated with immune-related genes, where further biological experiments are warranted to validate their functions in bladder cancer. Third, our study is retrospective. Prospective randomized clinical trials are required.
In summary, we developed a risk score for the prognosis prediction of MIBC patients who received GC regimens based on the TCGA-BLCA gene profile data. The risk score was confirmed to be an independent prognostic indicator for MIBC patients who received GC regimens and potential immunotherapeutic targets for MIBC.

ACKNOWLEDGMENTS
The results published here are in whole based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.