Introduction
It is estimated that about 90% of human pre-mRNA undergo alternative splicing events [1, 2]. However, the abnormal regulation of RNA alternative splicing is a crucial process in carcinogenesis [3]. RNA alternative splicing regulators play essential roles in this process and can act as oncogenes, leading to cancer progression and metastasis by producing some RNA subtypes in key oncogenic pathways [4–6].
Serine/arginine-rich splicing factors (SRSFs) are important splicing regulators [7], usually consisting of 12 members (SRSF1-12), which play pivotal roles in mRNA’s turnover, output, and various post-transcriptional regulation of other splices [8]. Dysregulation of SRSFs expression is closely related to carcinogenesis. It can lead to the change of alternative splicing patterns of essential genes [3, 9, 10]; On the other hand, it can also significantly disrupt genomic stability, resulting in abnormal biological function [11, 12]. However, a global and comprehensive pattern to elaborate the molecular characteristics, mechanisms, and clinical links of SRSFs in multiple human cancer is still lacking. Here, we attempted to systematically uncover the molecular features and clinical implications of SRSFs from various levels, including genome (somatic mutation, copy number alteration), mRNA expression, transcriptional regulation (DNA methylation, N6-methyladenosine (m6A) modification, miRNA), immune links, survival prediction and drug sensitivity, in more than 10000 patients across 33 cancer types. Our comprehensive analysis highlights the pivotal roles of SRSFs in cancer occurrence and development. This study provides rich resources for understanding the biological characteristics of SRSFs, offering a new and unique perspective for developing cancer therapeutic strategies based on RNA alternative splicing. The flowchart and important findings of this study were showed in Figure 1.
Figure 1. Workflow chart of this study.
Materials and Methods
Genomic, transcriptomic data acquisition and analysis
Somatic mutation data and TCGA threshold SCNA scores from 33 human cancer types were obtained from the Genomic Data Commons (https://gdc.cancer.gov/) and processed as previously described [13–16]. Our study defined copy number variation (CNV) amplification and deletion in the same way, as described by Knijnenburg et al. [17]. The frequency of genomic alterations (including mutation and SCNA) of specific genes in pan-cancer were shown as heatmaps. We also obtained the relevant data for genomic instability scores, including the mutation burden, the aneuploidy score, the SCNA burden, loss of heterozygosity (LOH) score and homologous recombination deficiency (HRD) score from the study of Knijnenburg et al. [17], and defined these scores as previously described [15, 17]. Additionally, we obtained the batch effects-corrected TCGA mRNA data (FPKM-based gene expression) and clinical information (including the survival status and survival time) for patients across 33 cancer types from TCGA Genomic Data Commons.
Differential expression analysis and unsupervised consensus clustering
19 cancer types, with at least 3 matched tumours and normal samples, were selected out of 33 cancer types for differential expression analysis. Differentially expressed genes (DEGs) were identified by Wilcox’s rank sum test, and the p-values were adjusted by BH method. We defined the DEGs as those whose expression differences were associated with adjusted p-values < 0.05. Additionally, unsupervised consensus clustering was performed to identify distinct patient clusters based on SRSFs mRNA expression in all tumor samples across 33 cancer types. The “ConsensuClusterPlus” R package [18] was used to carry out the steps above. To guarantee classification stability, 1000-time repetitions were conducted.
DNA methylation, m6A modification and miRNA analysis
DNA methylation data files across 33 cancer types were retrieved from the Genomic Data Commons. We processed this data as previously described [15]. We calculated median beta value of these 12 SRSFs in each sample for better assessment of their overall methylation levels. Subsequently, Pearson correlation analysis between DNA methylation beta values and mRNA expression values for each SRSF was conducted for deeper investigation of the regulation of SRSFs expression by DNA methylation (| Cor |> 0 and P-value < 0.05 was set as the threshold). Also, 23 m6A regulators (including 13 readers, 8 writers and 2 erasers) were collected from previously published papers [19–21]. As the most common type of RNA modification, m6A plays an essential role in the occurrence and development of tumors [22]. We investigated the Pearson correlation coefficients between m6A regulators and SRSFs in 33 cancer types to further investigate the regulation of SRSFs expression by m6A methylation. Additionally, normalized miRNA expression data were downloaded from the Genomic Data Commons. We calculated the Pearson correlation coefficients between miRNAs and SRSFs in 33 cancer types to further investigate the regulation of SRSFs expression by miRNA. We also screened the preliminary results, as in the method of Luo et al. [15]. By setting the more stringent criteria, we further filtered the miRNAs acting with these SRSFs. The specific criteria were as follows: 1) |Pearson correlation coefficient| ≥ 0.25 and p < 0.05; 2) In at least one-third of all cancer types (at least ten cancer types), this miRNA acts on the same SRSF. This miRNA-mRNA interaction network was displayed using Cytoscape v3.9.0. The interrelationship between miRNAs and SRSFs in specific cancer types was also shown in a heatmap.
Biological pathway activity across cancer types
FPKM-based gene expression was converted into Z-score using the “zFPKM” R package [23] for assessing the activity of 50 hallmark pathways. Gene Set Variation Analysis (GSVA) was performed [24]. In addition, Pearson correlation coefficients between mRNA expression of SRSFs and pathway activity were calculated to identify the SRSFs related to pathway activation or inhibition. And SRSF-pathway pairs were identified by the criterion we set (|Pearson correlation coefficient| ≥ 0.20 and p < 0.05). Genes do not function independently. To infer the overall SRSFs activity, “SRSFscore” was calculated based on 12 SRSFs mRNA expression within each cancer type as previously described [15, 25, 26]. Identification of biological pathways associated with the SRSFscore was also performed in the same way as in Luo et al. [15]. Here, we reported only pathways that showed consistent significant correlations (q-value < 0.05) in at least 10 cancer types.
Immune correlation analysis
The Xcell algorithm [27] was used to quantify the infiltrating abundance of immune cell and tumor microenvironment scores for all tumor samples. We obtained a list of immunomodulators that are crucial in immunotherapy from previous publications [14]. To investigate the interconnection between cancer immunity and SRSFs, we examined the Pearson correlation coefficients between mRNA expression of immunomodulators or abundance of immune cells and the SRSFscore across 33 cancer types. Additionally, in order to explore the association between SRSFscore and the effect of immunotherapy as well as prognosis in patients receiving anti-PD-1 or anti-CTLA4 treatment, we further obtained the transcriptome data before immunotherapy and clinical information from two cancer immunotherapy cohorts [28, 29]. In our study, responders to immunotherapy included the patients with complete and partial remission (CR/PR), while non-responders included the patients with stable disease (SD) and progressive disease (PD). Wilcox’s rank sum test was applied to compare the SRSFscore between the two groups (responders and non-responders).
Survival analysis
TCGA pan-cancer clinical data were downloaded from Genomic Data Commons. We used the “maxstat” R package to determine the best cut point for the SRSFscore, and divided the patients into high and low SRSFscore groups. Probability of overall survival was estimated by the Kaplan-Meier method, with differences between two groups tested using the log-rank test. Using the “survival” R package, we also performed Cox regression analysis to test the association between SRSFs expression and survival. Moreover, to validate the potential of SRSFscore in pan-cancer survival prediction, from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/), we also downloaded gene expression data and corresponding clinical information for 9 gene chips (GSE14520 [30, 31], GSE15459 [32–34], GSE23554 [35], GSE30219 [36], GSE31210 [37, 38], GSE40967 [39], GSE50081 [40], GSE57303 [41], GSE72094 [42]) including 5 cancer types (OV, LUAD, LIHC, STAD and COAD) (Table 1). According to the corresponding annotation files, we converted the probes to gene symbols. For genes with multiple probe set signals, their values were averaged to generate a single expression value. In each independent dataset, we calculated the SRSFscore for each sample in the same way and investigated its relationship with the prognosis.
Table 1. Information on the nine GEO datasets included in this study.
Datasets | Cancer types | No. of patients | Platforms |
GSE14520 | LIHC | N = 21 | GPL571 |
GSE15459 | STAD | N = 192 | GPL570 |
GSE23554 | OV | N = 28 | GPL96 |
GSE30219 | LUAD | N = 85 | GPL570 |
GSE31210 | LUAD | N = 226 | GPL570 |
GSE40967 | COAD | N = 556 | GPL570 |
GSE50081 | LUAD | N = 127 | GPL570 |
GSE57303 | STAD | N = 70 | GPL570 |
GSE72094 | LUAD | N = 398 | GPL15048 |
CellMiner analysis
We downloaded the RNA data (RNA: RNA-seq) of 22379 genes identified in NCI-60 cell lines and the related data of 20503 compounds analyzed (compound activity: DTP NCI-60) from CellMiner (https://discover.nci.nih.gov/cellminer/home.do) [43]. We investigated the Pearson correlation coefficients between Z scores of clinical trials and FDA-approved drugs and mRNA expression values for each SRSF. We reported drug-SRSF pairs showing significant correlation (|Pearson correlation coefficient| ≥ 0.3 and P-value < 0.01).
Statistical analysis
All of the analyses were performed in the R 3.6.3 software. A P-value < 0.05 was considered statistically different.
Results
SRSFs experienced prevalent genomic alterations and expression perturbations in multiple cancer types
As previously described [44], genomic alteration was defined as somatic mutations and SCNA (amplification and deep deletion). We first determined the overall genomic alteration level of SRSFs in human cancer. The frequencies were very low, only between 1% and 4% (Figure 2A). Among the 12 SRSFs, SRSF2 and SRSF6 displayed the highest alterations (4%), mostly from CNV amplification. While SRSF5, SRSF9 and SRSF10 were in the opposite situations (Figure 2A). SCNA counted for the majority of genomic alterations. Thus, we speculated that at the genomic level, it was SCNA, especially CNV amplification, but not mutation, which was the major cause of SRSFs dysregulation in cancer.
Figure 2. The genomic alterations of SRSFs in human cancer. (A) Genomic alterations [non-silent mutation and SCNA] landscape in the SRSFs in 33 cancer types. (B) Distribution of mutation frequencies across cancer types. (C, D) Distribution of SCNA (C: CNV amplification; D: CNV deep deletion) frequencies across cancer types. The darkness of color is proportional to the frequency.
From an in-depth exploration of the genomic alteration pattern across 33 cancer types, we observed a low overall average mutation frequency of SRSFs from 0 - 7.53%. Uterine Corpus Endometrial Carcinoma (UCEC), with a higher global mutation burden [45], presented higher mutation frequencies in all 12 SRSFs, while Testicular Germ Cell Tumors (TGCT) and Pheochromocytoma and Paraganglioma (PCPG) lacked any mutation (Figure 2B). SRSF10 was slightly mutated (0.75%) only in UCEC, but not in other cancer types. Analogously, in LAMA, only SRSF2 experienced somatic mutation (2.14%). We noted that individual SRSF exhibited a cancer-type-dependent CNV amplification or deep deletion pattern (Figure 2C, 2D). For example, SRSF1 and SRSF2 presented relatively higher CNV amplification in the vast majority of cancer types, with the lack of deep deletion. SRSF12 showed higher deep deletion in DLBC (10.64%) and PRAD (13.96%) (Figure 2D). Interestingly, among these SRSFs, SRSF6 displayed higher amplification in digestive tract tumors, such as ESCA (5.52%), STAD (8.77%), COAD (12.15%) and READ (19.57%) (Figure 2C).
A thought-provoking question is whether these DNA alterations influence the expression of SRSFs. Thus, we investigated the expression perturbations of 12 SRSFs across 19 selected cancer types (Figure 3A). As expected, the SRSFs with CNV amplification displayed significantly higher expression in tumor tissues compared to normal ones (e.g., SRSF1, SRSF2, SRSF3, and SRSF6). In addition, we also noticed that despite CNV amplification or deep deletion in some SRSFs, there was no corresponding increase or decrease in mRNA expression level. For example, SRSF12 displayed CNV deep deletion in LIHC, LUAD, and PRAD; however, showing higher expression in cancer tissues. It is not unique, for example, the CNV alteration of SRSF10 in CHOL and its mRNA expression was also not synergistic (Figures 2D, 3A). We admit that gene expression is profoundly affected by CNV amplification and deletion [46]. Our findings suggested that specific to individual cancer types or genes, this relationship may not be fully established. We therefore believe that the CNV alterations are only one of the mechanisms, but not only, leading to expression perturbations of SRSFs. Taken together, the results demonstrate a picture of highly heterogeneous genetic and expression alteration of SRSFs across cancer types, suggesting that SRSFs dysregulation plays a significant role in different cancer contexts.
Figure 3. Expression pattern of SRSFs and their association with genome instability in human cancer. (A) Heatmap demonstrating differential expression of SRSFs between tumor and normal tissue. (B) Unsupervised consensus clustering based on SRSFs mRNA expression identifies two distinct patient clusters. (C) Sample distribution across 33 cancer types in the two patient clusters. The values represent the proportion of the two clusters in the different cancer types. (D–H) Differences in five scores (D, mutation burden; E, SCNA burden; F, aneuploidy scores; G, LOH; H, HRD) representing genome instability between the two clusters. The differences between the two clusters were tested by Wilcox’s rank sum test. The asterisks represented the statistical P-value (*P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001).
Cluster analysis based on the pan-cancer SRSFs expression profiles identified two patient subgroups with significant genomic heterogeneity
To gain an insight into the integrated landscape of SRSFs expression, an unsupervised consensus clustering of all tumor samples from TCGA was performed, with the global expression pattern of SRSFs as the basis. Two distinct patient clusters were eventually identified, including 4079 cases in cluster A and 6248 cases in cluster B, with cluster B displaying a higher expression of all 12 SRSFs (Figure 3B). Further investigating the distribution of cancer types in the two clusters, we found that ACC, three hepatobiliary and pancreatic tumors (CHOL, LIHC and PAAD), two upper digestive tract tumors (ESCA, and STAD), HNSC, PCPG, and three kidney cancers (KICH, KIRC, KIRP) largely located in cluster A, whereas two lower digestive tract tumors (COAD, and READ), BLCA, BRCA, two lung cancers (LUAD and LUSC), three female reproductive system tumors (OV, UCEC, and UCS), TGCT, THYM and so on predominately resided in cluster B (Figure 3C). The results indicated there is a dysregulated expression profile of SRSFs in a cancer-type dependent manner.
Previous studies [47–50] have reported that SRSFs not only regulated RNA splicing, but also played important roles in transcriptional regulation. The co-transcriptional complex of SRSFs contributed to genomic stability, while abnormal expression of SRSFs might impair DNA stability [51]. So, we further assessed the genomic instability in the two clusters. Mutation and SCNA levels were used to characterize the genomic instability as previously described [17]. We used some scores, including mutation burden, SCNA burden, aneuploidy, LOH and HRD, to display the global mutations and SCNA levels [15]. We observed that cluster B, with higher expression of SRSFs, presented higher levels of all these scores reflecting genome instability (Figure 3D–3H). Genomic instability leads to abnormal expression of genes [52], which in turn contributes to genomic instability. Our finding indicated that this phenomenon might occur more in the cancer types of cluster B.
DNA methylation, m6A modification and miRNAs regulate SRSFs expression in a cancer-type dependent manner
DNA methylation has long been recognized as an epigenetic determinant of gene expression [53–55], and DNA hypermethylation in promoter regions can lead to the downregulation of gene expression and gene silencing [56–58]. However, this relationship is not universal, at least in some special cases [59, 60]. An interesting question is worth exploring: how does DNA methylation affect the mRNA expression of SRSFs in human pan-cancer? To evaluate the relationships between SRSFs mRNA expression and DNA methylation levels, we investigated the correlation between 12 SRSFs expression and overall methylation levels in each cancer type. Our findings suggested that DNA methylation levels of SRSFs were negatively correlated with gene expression in most cancer types. However, when specific to certain cancer types or genes, this negative correlation did not hold, such as SRSF1 in LUSC and KICH; SRSF4 in TGCT, and SRSF5 in LAML and MESO (p < 0.05, Figure 4A).
Figure 4. DNA methylation, m6A modification and miRNA regulations of SRSFs. (A) Heatmap demonstrating the correlation between DNA methylation level within promoters and SRSFs mRNA expression in each cancer type. (B) m6A-SRSFs protein-protein interaction network showing m6A regulators interacted with SRSFs frequently. (C) miRNA–SRSF interaction network and heatmap demonstrating miRNA-SRSF pairs displayed different correlation directions in different cancer contexts. Red represents a positive correlation, and green represents a negative correlation.
m6A, as the most common internal modification of RNA, can affect various RNA biological processes [61, 62]. Previous studies [63–67] have reported that m6A regulators were extensively involved in RNA splicing process and interacted with splicing factors involved in the regulation of the splicing process, thereby affecting the expression level of many RNAs. To obtain a global picture of the association between m6A regulators and SRSFs in different cancer contexts, we performed a Pearson correlation analysis across 33 cancer types (Supplementary Table 1). Surprisingly, we noticed that in any cancer type, m6A regulators belonging to three different functional classifications (readers, writers, and erasers) presented highly correlated expression patterns with SRSFs, and this correlation was dominated by positive regulation. Moreover, we found that these 23 m6A regulators interacted with SRSFs frequently in protein-protein interaction networks (Figure 4B), which once again confirmed the close association of m6A modification with the expression of SRSFs. Our study further extended the perception of m6A-SRSF pairs. For example, a previous study [68] has confirmed that FTO controls mRNA splicing by modulating the RNA-binding capacity of SRSF2. However, the direction of this regulation (positive or negative) has not been established in a wide range of cancer types. Our findings showed that FTO was somewhat linked, either positively or negatively associated with SRSF2 in 19 cancer types. The same m6A-SRSF pair displayed completely opposite regulation directions in different cancer types, suggesting that m6A modification of SRSFs was cancer-type-dependent.
Massive reports [69–71] have demonstrated that miRNA was involved in gene expression regulation. Previous studies [72, 73] have piecemeally reported expression regulation patterns between miRNAs and SRSFs. To gain a more comprehensive understanding of the miRNA regulatory network of SRSFs, we identified all miRNAs with the potential to bind to SRSFs through strict screening criteria. From the overall distribution, we found that the same miRNA-SRSF pair had completely opposite regulation directions in different cancer types. For example, miR-101-3p and SRSF10 were negatively correlated in ACC, DLBC, LGG, LIHC and STAD, while they showed positive correlations in COAD, HNSC, KIRC, LUSC, OV, PAAD, TGCT, and THCA (Supplementary Table 2). Moreover, as we expected, the same miRNA could act with multiple different SRSFs, while different miRNAs could also act on the same SRSF. This phenomenon also corroborated the findings of Mahony et al. [74]. Through more stringent criteria (described in the Materials and Methods section) for further filtering miRNAs targeting SRSFs, we finally identified 11 miRNA-SRSF pairs (Figure 4C). These miRNA-SRSF pairs also displayed different correlation strengths and directions in different cancer context (Supplementary Table 3). For example, the miR-107-SRSF1 pair showed positive correlations in female reproductive system cancers (OV, UCEC and UCS) and upper digestive tract cancers (ESCA and STAD), while presenting negative correlation in KIRP and LGG (Figure 4C). Taken together, these results indicated that miRNA could regulate the expression of SRSFs, and whether this regulation was positive or negative depended on the cancer context.
SRSFs associated oncogenic pathways
Given that previous studies [75, 76] have sporadically mentioned the involvement of SRSFs in cancer occurrence and progression, however lacking specific molecular mechanism to elucidate how SRSFs affect cancer from a global view, we investigated the relationship of individual SRSFs as well as overall SRSFs with the activity of 50 hallmark pathways across 33 cancer types. We noted that each SRSF was associated with multiple pathways, some with pathway activation and some with pathway inhibition (Figure 5A and Supplementary Table 4). The expressions of SRSF2, SRSF7, SRSF1 and SRSF3 were correlated to more activated pathways, such as the E2F targets, G2M checkpoint, MYC targets V1 and Hedgehog signalling. In particular, we observed that the SRSF2 was associated with most pathway activation (in 10/50 pathways) or inhibition (in 17/50 pathways), while SRSF5 was the opposite (Figure 5B). Our above findings suggested that DNA methylation, m6A modification and miRNA regulation of SRSFs were cancer-type-dependent. Therefore, we further explored the association of SRSFs expression with oncogenic pathway activity in different cancer types. As expected, we observed that whether SRSFs inhibited or activated the carcinogenic pathway was closely related to different cancer contexts (Supplementary Table 5). For example, SRSF2 expression was associated with pathway inhibition of DNA repair in MESO and SKCM, but the opposite was observed in LUAD, LUSC, as well as STAD. In addition, we also noted that SRSFs expression was correlated with the activation or inhibition of multiple immune-related pathways in different cancer types, such as interferon alpha response, interferon gamma response, Il-6/JAK/STST3 signaling and so on.
Figure 5. SRSFs associated oncogenic pathways and immune links. (A) SRSFs and hallmark pathways interaction network. Red and blue marks represent positive and negative correlation respectively. Node sizes correspond to the link numbers. (B) The count of hallmark pathways is related to individual SRSFs. (C) Heatmap demonstrating normalized enrichment score of significant pathways. We reported pathways that showed consistent significant correlations (q-value < 0.05) in at least 10 cancer types. (D) Barplots presenting number of cancer types which are negative and positive correlations between SRSFs and significant pathways. (E) Heatmap demonstrating Pearson correlation coefficient between SRSFscore and immunomodulator mRNA expression in each cancer type. Unsupervised clustering identified three subgroups with different correlations, including positive-, intermediate-, and negative-groups. (F) Differences in SRSFscore between the responder (CR/PR) and non-responder (SD/PD) groups. Left, in metastatic urothelial carcinoma; Right, in advanced melanoma. The differences between the two groups were tested by Wilcox’s rank sum test. (G) Kaplan-Meier survival curves showing the differences in overall survival between high and low SRSFscore patients. Left, in metastatic urothelial carcinoma; Right, in advanced melanoma. Statistical significance was assessed by log-rank test.
Previous studies [77, 78] have pointed out that genes functioned in concert, not in isolation. SRSFs, as the same gene family, share the same or similar functional properties. They may function more in a genetically synergistic manner. This was indirectly confirmed by SRSFs expression correlation analysis (Supplementary Figure 1) and protein-protein interaction network (Figure 4B). Therefore, we took SRSFs expression as a whole and further evaluated the relationship between overall SRSFs expression and oncogenic pathway activity. Here, “SRSFscore” was used to represent the estimated level of overall SRSFs expression as previously described [15, 25, 26]. From the results of GSEA analysis, we observed that through strict screening criteria, SRSFs had positive or negative correlation pathways in different cancer types (Figure 5C, 5D). We noted that E2F targets and G2M checkpoints were associated with SRSFs in 32 cancer types, with positive associations appearing in 20 cancer types and negative associations in 12 cancer types. This suggested that SRSFs played critical roles in regulating cell cycle and proliferation. Moreover, in different cancer types, the correlation between SRSFs and the same pathway also showed completely opposite directions (Figure 5D). For example, DNA repair was negatively correlated with SRSFs in LAML, but positively correlated with SRSFs in other cancer types (ACC, BLCA, BRCA, COAD, ESCA, GBM, HNSC, KICH, LGG, LIHC, LUSC, OV, PAAD, SARC, STAD, TGCT and UCEC), which again indicated that whether SRSFs inhibited or activated the carcinogenic pathway depended on cancer contexts.
SRSFs are associated with cancer immunity
In recent years, the close relationship between cancer and immunity has gradually been deeply recognized [79–81]. Given the widespread genomic alterations and expression disturbances of SRSFs in various types of cancers and their close associations with multiple oncogenic pathways (especially immune-related pathways), we were interested in further investigating the potential links between SRSFs and cancer immunity. Xcell analysis revealed that SRSFs were significantly associated with the infiltration abundance of multiple immune or stromal cells in various cancer types (Supplementary Figure 2). We observed that CD8 + T cell infiltration abundance was negatively correlated with SRSFs expression in OV and UCEC, but positively correlated in BRCA, CHOL, COAD, KICH, KIRC, LIHC, LUSC, PAAD, PRAD, THYM, and UVM. A similar situation was also seen in most other cells (e.g., myeloid dendritic cell, endothelial cell, eosinophil, cancer associated fibroblast, etc.). This indicated a large heterogeneous association pattern between SRSFs and immunity in different cancer types. Additionally, we also examined the correlation between SRSFscore and some immunomodulators that are crucial in immunotherapy [14]. Our results suggested that SRSFs were negatively correlated with some common immune checkpoints (LAG3, CD274, CTLA4, TIGIT, PDCD1) in UCS, THYM, OV, CESC and BLCA, while positively correlated in UVM, STAD, PAAD, MESO, LUAD, LIHC, KIRC, KICH, HNSC, CHOL and BRCA (Figure 5E). Based on the correlation pattern between SRSFscore and immunomodulators, we clustered cancer types into three subgroups with different correlations, including positive-, intermediate-, and negative- groups. Interestingly, three female reproductive system tumors (OV, UCS and UCEC) were in the negative group, while two renal carcinomas (KICH and KIRC) and three cancer types from the hepatobiliary and pancreatic system (LIHC, CHOL, PAAD) were in the positive group. Strikingly, HMGB1 was found consistently positively correlated with the SRSFscore in all cancer types (Figure 5E).
Given that we identified the close link between SRSFs and cancer immunity, we therefore sought to further explore the potential impact of SRSFs on cancer immunotherapy. We obtained pre-immunotherapy gene expression data and clinical information from two cancer immunotherapy cohorts [28, 29] with relatively large sample sizes. According to tumor responses to immunotherapy, we classified the patients into responder (CR/PR) and non-responder (SD/PD) groups, and compared the differences in SRSFscore between the two groups. We found that high SRSFscore was associated with better immunotherapy efficacy in metastatic urothelial carcinoma cohort. But in advanced melanoma, we did not observe this (Figure 5F). We believed that the predictive value of SRSFscore as a hint of immunotherapy response might be influenced by different cancer contexts, as described above. Additionally, whether SRSFscore could indicate the prognostic risk of patients receiving immunotherapy was also worth exploring. Our results showed that high SRSFscore was associated with better prognosis in the two immunotherapy cohorts (Figure 5G), which further suggested the close relationship between SRSFs and cancer immunity.
SRSFs displays the potential to predict survival
Above investigations identified the close association of SRSFs to cancer, so it was necessary to further determine the relationship between SRSFs and the survival of cancer patients. According to SRSFs expression level determined by rank statistics selected maximally, patients were divided into two groups to achieve a predictive value of SRSFs in survival [15]. Differences in survival between two groups were tested using the log-rank test. We observed that SRSFscore was significantly linked to patient survival in 21 cancer types (Figure 6A, 6B). Moreover, high SRSFscore could also indicate better survival in both metastatic urothelial carcinoma and advanced melanoma (Figure 5G). In ACC, HNSC, KICH, KIRC, LGG, LIHC, PRAD and SARC, high SRSFscore was associated with poor prognosis (Figure 6A), while SRSFscore was considered as a favorable prognostic factor in BLCA, CESC, COAD, LUAD, LUSC, MESO, PAAD, READ, SKCM, THYM, GBM, OV and STAD (Figure 6B). We also performed survival analysis on nine independent data cohorts from Gene Expression Omnibus involving five cancer types. The results again confirmed that high SRSFscore was associated with better survival of OV, LUAD, STAD and COAD, suggesting that SRSFs might be a key protective factor for OV, LUAD, STAD and COAD. Conversely, high SRSFscore was associated with poor prognosis of LIHC, suggesting that SRSF might be a risk factor in LIHC (Supplementary Figure 3).
Figure 6. SRSFscore displayed the potential to predict survival in 21 cancer types. (A) Kaplan-Meier survival curves showing high SRSFscore is associated with worse survival in eight cancer types. (B) Kaplan-Meier survival curves showing high SRSFscore is related to better survival in thirteen cancer types. Differences in survival between two groups were tested using the log-rank test. (C) Bubble diagram showing the Cox regression correlation of SRSFs and survival. The diagram only displayed significant dots with P-value <0.05.
In order to further investigated the contribution of each SRSF to the survival prediction value of SRSFscore, Cox regression analysis was performed on all 12 SRSFs and combined SRSFscore (Figure 6C). We observed that in ACC, KICH, LIHC, PRAD, and SARC, most SRSFs and SRSFscore showed consistent tendency toward worse survival. In the same cancer type, these 12 SRSFs did not all display direction-consistent prognostic risk. For example, in KIRC, SRSF1, SRSF2, SRSF4, SRSF7, SRSF9, SRSF11 were prognostic risk factors, while SRSF3 and SRSF8 were prognostic friendly factors. Similar cases were also seen in BLAC, KIRP, LAML, LGG, LUAD, and UCEC. In particular, for ACC, 11 of 12 SRSFs and SRSFscore; For LIHC, 10 of 12 SRSFs and SRSFscore; For SARC, 9 of 12 SRSFs and SRSFscore were all predictive of worse survival. Collectively, these findings suggested that SRSFs or SRSFscore had the potential to predict survival in certain cancer contexts.
CellMiner analysis reveals the association of SRSFs to drug sensitivity
A growing number of studies [82–84] have suggested that genomic and expression alterations played essential roles in drug responsiveness. Our study observed widespread genomic alterations and expression perturbations of SRSFs across cancer types. To further investigate the role of SRSFs on chemotherapy or targeted therapy, we performed drug sensitivity analysis, and calculated the Pearson correlation coefficients between Z scores of clinical trials and FDA-approved drugs and mRNA expression values for each SRSF. Through stringent screening criteria (|Pearson correlation coefficient| ≥ 0.3 and P-value < 0.01), we found 27 compounds, of which the sensitivity of 5 drugs (Dasatinib, Everolimus, Pazopanib, AP-26113 and Okadaic acid) was negatively correlated with gene expression and 22 drugs were positively correlated (Figure 7). Among these drugs, we found that 4 DNA synthesis inhibitors (Oxaliplatin, Nelarabine, Cytarabine and Hydroxyurea) were positively correlated with SRSFs expression, among which Nelarabine was positively correlated with 11 individual SRSFs, suggesting that the high expression of SRSFs might inhibit the sensitivity of tumor cells to DNA synthesis inhibitors. In contrast, overexpression of SRSFs (SRSF5 and SRSF8) might enhance the sensitivity of tumor to Src/ABL, mTOR1, and protein phosphatase inhibitors. These results suggested that the abnormal expression of SRSFs might mediate the resistance of tumor tissues to chemotherapy or targeted drug therapy.
Figure 7. The correlation network showing the association of SRSFs to drug sensitivity. Red represents a positive correlation, and green represents a negative correlation.
Discussion
A growing number of studies [11, 12, 75, 76] have reported the important roles of SRSFs in tumorigenesis and progression. However, existing research on SRSFs mainly focus on single SRSFs or individual tumor types, which raises some interesting questions remained unsolved. For example, do SRSFs share common regulatory targets or molecular alteration characteristics in different cancer contexts? Is the regulatory mechanism similar for all these targets? Do diverse SRSFs share similar oncogenic pathways within the same cancer type? To clarify these doubts, it is particularly important to depict a global landscape of SRSFs in pan-cancer from multiple perspectives.
In this study, we performed a systematic characterization of SRSFs in over 10000 samples across 33 cancer types. Our results not only revealed diverse molecular characteristics and potential mechanisms of SRSFs in different cancer contexts, but also uncovered common SRSFs related oncogenic pathways, thus depicting a global landscape of SRSFs regulation in human pan-cancer.
In the genomics analysis, we observed widespread genomic alterations of SRSFs in multiple cancer types. SCNA, especially CNV amplification, accounted for a very large proportion of DNA alterations. Thus, we speculated that at the genomic level, it was SCNA, especially CNV amplification, but not mutation, which was the major cause of SRSFs dysregulation in cancer. Unlike the previous description by Luo et al. [15] that gene CNV was significantly associated with expression, we found that not all SRSFs expression changed with CNV amplification or deletion. For example, SRSF12 presented CNV deep deletion in LIHC, LUAD, and PRAD, however showing higher expression in cancer tissues. The CNV alteration of SRSF10 in CHOL and its mRNA expression was also not synergistic. Previous studies [85–87] have reported that SRSFs expression was dysregulated in a variety of cancers, which was consistent with our findings. Abnormal expression of SRSFs usually lead to changes in cancer-related pathways (e.g., epithelial mesenchymal transformation) [88] and alternative splicing [89]. Additionally, we found that SRSFs expression was closely related to the activation or inhibition of various carcinogenic pathways, such as E2F target, G2M checkpoint, DNA repair, mTORC1 signaling, etc. However, same SRSFs presented completely opposite regulatory directions with the same oncogenic pathway in different cancer types, suggesting that the effect of SRSFs on the specific pathway was dependent on cancer contexts. Similar findings were observed in Luo et al. [15]. Overall, SRSFs experienced widespread genomic alterations and expression dysregulation in multiple cancers, which in turn could affect key oncogenic pathways and contribute to tumorigenesis and progression [90, 91].
In pathway analysis and cancer immune correlation analysis, we observed that SRSFs were associated with the activation or inhibition of multiple immune pathways, and were also closely related to the expression of multiple immune checkpoints (LAG3, CD274, CTLA4, TIGIT, PDCD1). Interestingly, we found that SRSFs displayed negative associations with more immunoregulators in female reproductive system tumors (OV, UCS and UCEC), while positive in two renal tumors (KICH and KIRC) and tumors derived from the hepatobiliary-pancreatic system (LIHC, CHOL, PAAD). This study is the first to discover this interesting phenomenon, which will provide a new perspective for the in-depth study of SRSFs in cancer immunotherapy.
SRSFs experienced widespread genomic alterations and dysregulated expression in various types of cancers, and were closely related to multiple immune pathways and immunomodulators. This may provide important insights for the development of clinical translational medicine. A previous study [92] has reported that in leukemia, mutant SRSF2 cells were more sensitive to E71079, a splicing-altering compound that binds to the SF3B complex, than wild-type SRSF2 cells [93, 94]. So, we believe that SRSFs may be key biomarkers for predicting cancer treatment response in the near future. Our study found that in metastatic urothelial carcinoma, SRSFs could predict response to PD-1 blockade therapy, whereas in advanced melanoma, this predictive value was not accepted. We believed that SRSFs predicted response to immunotherapy depending on cancer contexts. In the following survival analysis, we found that both in metastatic urological carcinoma and in advanced melanoma, SRSFs could well predict patient prognosis, that is, higher SRSFscore was associated with better survival. The potential mechanisms underlying the different predictive values of SRSFs in specific cancer types remained poorly defined. In the future, individualized development of immunotherapy still requires a lot of work to understand its detailed mechanisms.
Chemical resistance is responsible for the failure of cancer treatment. Splicing disorders play a critical role in tumorigenesis [95], and the involvement of SRSFs in cancer resistance to chemotherapy remains elusive [96]. It has been reported that SRSF4 played important roles in the process of cisplatin affecting splicing, and that reduced SRSF4 expression inhibited multiple cisplatin-induced splicing alterations and cell death [97]. Fukuhara et al. [98] suggested that targeting kinases involved in alternative splicing by regulating the phosphorylation of SRSFs was a good option for splicing regulation. SRSFs could be phosphorylated by their kinases (SRPK1 and SRPK2), as well as by DNA topoisomerase. In addition, Pilch et al. [99] reported that NB-506, acting as a topoisomerase inhibitor, could regulate the phosphorylation and splicing patterns of SRSF1 and thereby regulate gene expression. Consistent with this, our study also found that the sensitivity of two DNA topoisomerase inhibitors (Amonafide and Pyrazoloacridine) were positively correlated with SRSFs expression. In addition, we observed that the sensitivity of other chemotherapeutic drugs (e.g., DNA synthesis inhibitors) and targeted drugs (e.g., EGFR-TKI) were also associated with the expression of SRSFs. Therefore, we believe that targeting SRSFs may be a cancer therapy approach with great potential for clinical application in the future. The effects of these drugs on SRSFs expression and tumor growth, as well as the specific mechanisms by which SRSFs are involved in tumor resistance, are still worthy of further investigation.
Importantly, we noted that the regulations associated with SRSFs were all cancer context-dependent, including genomic alterations, dysregulated expression, DNA methylation, m6A modification, miRNA, oncogenic pathways, etc. This property also determined the varying predictive potentials of SRSFs for drug response and survival in different cancer types. These findings substantiated once again the nature of intertumor diversity [100, 101], and highlighted the importance of individualized cancer treatment.
To sum up, we systematically characterized the molecular characteristics and clinical relevance of SRSFs from multiple levels in over 10000 patients across 33 cancer types. However, in this study, a systematic characterization of SRSFs was performed using pan-cancer bulk data, but wet experiments are lacking. We will try to clarify the oncogenic patterns and mechanisms of specific SRSFs molecules in future studies and explore their potential anti-cancer strategies. The current study highlights the pivotal roles of SRSFs in cancer progression, provides rich resources for understanding the biological characteristics of SRSFs, which will contribute to the development of therapeutic strategies based on RNA alternative splicing.
SRSF: Serine/arginine-rich splicing factor;
TGGA: The Cancer Genome Atlas;
ACC: Adrenocortical Carcinoma;
BLCA: Bladder Urothelial Carcinoma;
BRCA: Breast Cancer;
CESC: Cervical Aquamous Cell Carcinoma and Endocervical Adenocarcinoma;
CHOL: Cholangiocarcinoma;
COAD: Colon Adenocarcinoma;
DLBC: Lymphageal Neoplasm Diffuse Large B-Cell Lymphoma;
ESCA: Esophageal Carcinoma;
GBM: Glioblastoma Multiforme;
HNSC: Head And Neck Squamous Carcinoma;
READ: Rectum Adenocarcinoma;
KICH: Kidney Chromophobe;
KIRC: Kidney Renal Clear Cell Carcinoma;
KIRP: Kidney Renal Papillary Cell Carcinoma;
LAML: Acute Myeloid Leukemia;
LGG: Brain Lower Grade Glioma;
LIHC: Liver Hepatocellular Carcinoma;
LUAD: Lung Adenocarcinoma;
LUSC: Lung Squamous cell carcinoma;
MESO: Mesothelioma;
OV: Ovarian serous cystadenocarcinoma;
PADD: Pancereatic Adenocarcinoma;
PCPG: Pheochromocytoma and Paraganglima;
PRAD: Prostate Adenocarcinoma;
SARC: Sarcoma;
SKCM: Skin Cutaneous Melanoma;
STAD: Stomach Adenocarcinoma;
TGCT: Testicular Germ Cell Tumors;
THCA: Thyroid Carcinoma;
THYM: Thymoma;
UCEC: Uterine Corpus Endometrial Carcinoma;
UCS: Uterine Carcinosarcoma;
UVM: Uveal Melanoma;
GSEA: Gene-Set Enrichment Analysis;
CNV: copy number variation;
GSVA: Gene set variation analysis;
SCNA: Somatic copy number alterations;
LOH: Loss of heterozygosity;
HRD: Homologous recombination deficiency.
CKS and JJZ: designed the study and drafted the manuscript. CKS conducted all statistical analyses. JJZ and ZCF polished this manuscript. All authors read and approved the final manuscript.
This work benefited from open databases. We are grateful for the efforts made by the resources and staff to expand and improve the databases. We also thank Qing Geng, Ning Li, Kai Lai, Donghang Li, Heng Meng, Ruyuan He, Bo Hao, Shize Pan for our preliminary revision, review and constructive comments. They have made great and selfless contributions to this paper.
The authors declare that they had no conflicts of interest.
The study was approved by the Institutional Research Committee of Taihe Hospital, Hubei University of Medicine. Data of this study were downloaded from the open databases including TCGA and GEO, so informed consent for participation was not required for this study, in accordance with the national legislation and the institutional requirements.
This study was supported by the Foundation of Health commission of Hubei Province (No. WJ2023M164).