Research Paper Volume 13, Issue 2 pp 2959—2981
A novel five-lncRNA signature panel improves high-risk survival prediction in patients with cholangiocarcinoma
- 1 Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- 2 Key Laboratory of Diagnosis and Treatment of Severe Hepato-Pancreatic Diseases of Zhejiang Province, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- 3 Department of Epidemiology and Biostatistics, School of Public Health and Management, Wenzhou Medical University, Wenzhou, China
- 4 School of Public Health, Inner Mongolia Medical University, Hohhot, China
- 5 Department of Gastroenterology, The Affiliated Drum Tower Hospital of Nanjing University Medical School, Nanjing, China
- 6 Department of Hepatobiliary Surgery, The Affiliated Cixi Hospital of Wenzhou Medical University, Ningbo, China
- 7 Division of Clinical Medicine, First School of Clinical Medicine, Wenzhou Medical University, Wenzhou, China
- 8 Department of Hepatobiliary Surgery, Shenzhen People’s Hospital, Shenzhen, China
- 9 Department of Radiotherapy and Chemotherapy, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- 10 Department of Infectious Diseases, Shandong Provincial Hospital, Jinan, China
Received: June 19, 2020 Accepted: November 23, 2020 Published: January 20, 2021https://doi.org/10.18632/aging.202446
How to Cite
Copyright: © 2021 Xie et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Cholangiocarcinoma (CCA) is a fatal disease with dismal survival rates. Long non-coding RNA (lncRNA) expression profiling as potential prognostic biomarkers play critical roles in tumor initiation, development, and poor prognosis. Identifying specific lncRNA to predict the prognosis of CCA patients in the early stages is very important for improving a patient’s survival. In the current study, we aimed to establish a novel risk-stratification lncRNA signature panel in CCA. The initial lncRNA discovery was identified in The Cancer Genome Atlas database (TCGA cohort). The Cox regression analysis was used to establish the lncRNA prognostic model and the receiver operating characteristic (ROC) curve analysis was performed to assess the specificity and sensitivity of the model. This was followed by independent validation of the lncRNA signature in the CCA patients from the First Affiliated Hospital of Wenzhou Medical University (WMU cohort). Furthermore, by using the Gene Ontology function and Kyoto Encyclopedia Gene and Genome pathway enrichment analysis, we explored the potential function of prognosis lncRNA. Finally, five lncRNA (HULC; AL359715.5; AC006504.8; AC090114.2; AP00943.4) were screened to establish the predictive model that significantly associated with poor overall survival(HR:4.879;95%CI,1.587-14.996;p=0.006). This five-lncRNA signature model showed excellent accuracy in the TCGA cohort (AUC=0.938), and also robustly predicted survival in the validation WMU cohort(AUC=0.816). Functional enrichment analysis suggested prognostic lncRNA was primarily associated with CCA-related biological processes. Our data established a novel lncRNA signature model for CCA risk-stratification and robust identification of CCA patients with poor molecular genotypes. Moreover, it revealed new molecular mechanisms of CCA.
Cholangiocarcinoma (CCA) is the second most common primary liver cancer after hepatocellular carcinoma, with incidence and mortality rates rising across the world [1, 2]. Despite surgery and liver transplantation being options for patients, the high recurrence rate leads to CCA patients’ median survival time of less than one year . Moreover, whether adjuvant therapy after surgical resection is effective, because data about its overall efficacy and survival benefits are limited . Clinicopathological factors of CCA such as grade and stage are strongly associated with prognosis and are also key factors determining the therapeutic regimen. However, even with similar clinical characteristics, the prognosis of CCA patients is significantly different. Therefore, it is important to identify efficient tumor features to help clinicians stratify high-risk patients and tailor personalized treatment regimens for improving treatment outcomes.
With the development of gene sequencing technology, there has been a growing interest in using gene expression signature for risk-stratification of cancer patients. Besides, anti-cancer drugs based on genetic research are developing rapidly [5–7]. Therefore, conducting further studies on CCA-related genes and epigenetic markers such as long noncoding RNA (lncRNA) to guide personalized treatment in order to reduce recurrence and improve survival rate is warranted . In the past decades, lots of evidence has suggested that lncRNA is strongly related to tumor occurrence, metastasis, and prognosis [9–11]. For CCA, studies have confirmed that lncRNA plays a key role in CCA occurrence and progression . For instance, MALAT1 promote CCA cell proliferation and invasion , UCA1 affect migration and invasion potential of CCA cells by regulating EMT . Besides, lncRNA such as TUG1 , CCAT1 , and AFAP1-AS1  could serve as valuable predictive markers for CCA patients prognosis. However, the role and mechanism of lncRNA in the metastasis and recurrence of tumors even in CCA is not completely understood.
In this study, we collected lncRNA expression data and clinical information of CCA patients from two independent database sources to identify and develop a novel lncRNA-based signature panel as an independent predictor, for the prognosis of CCA patients, to guide personalized treatment and hence improve survival. This was achieved by using simple, inexpensive quantitative PCR assays that can be incorporated into the clinical approach. Furthermore, we investigate the possible molecular mechanisms related to this prognostic lncRNA with the occurrence and progress of CCA. We believe this lncRNA-base signature panel offers an effective platform for risk-stratification in CCA patients, which has great implications in the clinical management of patients suffering from this fatal malignancy.
Establishment of a five-lncRNA signature predictive model from the TCGA cohort
Based on the screening criteria, we obtained1192 differentially expressed lncRNA, including 744 up-regulated and 448 down-regulated. Among them, 33 lncRNA showed >4-fold decreased expression including HULC, and 51 lncRNA exhibited >4 fold increased expression (Figure 1A). Unsupervised hierarchical cluster analysis showed that the expression of differentially expressed lncRNA distinguished CCA samples from the normal samples (Figure 1B).
Figure 1. Differentially expressed lncRNA(DElncRNA) between Cholangiocarcinoma and normal tissues. (A) 1192 DElncRNA were detected based on an unsupervised hierarchical clustering heatmap. 744 DElncRNA expression increased, and 448 DElncRNA expression decreased. The indication from green to red represents the expression level from low to high. (B) Volcano plot depicts 1192 DElncRNA (green and red dots) between Cholangiocarcinoma and normal tissues. The X-axis represents log10 fold change; Y-axis represents -log10 of the p-value for each lncRNA. LncRNA with fold change >1 or <-1 and FDR < 0.01 were considered to be differentially expressed between tumor and normal tissues. (C–G) The differentially expressed level of five-lncRNA in 36 CCA tissue and 9 para-carcinoma normal tissue.
Univariate Cox regression analysis assessing the association between 1192 differentially expressed lncRNA and CCA patients’ overall survival in TCGA cohort, 10 lncRNA (LINC01336; AP000943.4; AC006504.8; AC090114.2; AC004921.1; AC134682.1; AL449106.1; AL359715.5; AC016876.1 and HULC) were selected for further multivariate Cox regression analysis. In this multivariate Cox regression analysis, five lncRNA (AC006504.8, AC090114.2, AP000943.4, AL359715.5, and HULC) were identified as independent predictors for CCA survival outcomes (Table 1). Among them, the expression of AC006504.8, AC090114.2, and AP000943.4 was up-regulated (Figure 1C–1E); expression of AL359715.5 and HULC was down-regulated (Figure 1F, 1G). Similarly, these 5 lncRNAs also showed similar behaviors in paired carcinoma and paracancerous tissues (Supplementary Figure 1). Based on these five lncRNA expression levels and their corresponding coefficient in multivariate Cox regression analysis, a five-lncRNA signature predictive model for CCA patients prognosis was established as follows: Risk Score= (0.6542 × expression level of HULC) + (-1.2388 × expression level of AL359715.5) + (1.3769 × expression level of AC006504.8) + (-3.6697 × expression level of AC090114.2) + (-1.5165 × expression level of AP000943.4).
Table 1. Five lncRNA significantly correlated with the overall survival of cholangiocarcinoma.
|Gene name||Ensemble ID||Chromosome||HR*||Coefficient*||P-value*|
|Statistics derived from univariate Cox proportional hazards regression analysis in 36 patients in TCGA cohort; HR, Hazard ratio.|
Performance evaluation of the five-lncRNA signature model for CCA prognosis in the discovery cohort
We calculated the risk score for each CCA patient in the TCGA cohort based on the five-lncRNA signature model. According to the median of log(Risk Score), -0.0995 was set as the cutoff level to stratify CCA patients into a high-risk or low-risk group(sensitivity is 82.01%, specificity is 86.02%). Finally, 18 patients whose log (Risk Score) >-0.0995 were assigned to the high-risk group, and 18 patients whose log(Risk Score) <-0.0995 were assigned to the low-risk group (Figure 2A–2C). The KM curves show that patients in the high-risk group had a worse prognosis than those in the low-risk group (median OS is 18.5 months vs. 60 months; log-rank p=0.002) (Figure 2D). The univariate Cox regression analysis between the lncRNA-based risk score and CCA patient survival score showed that high-risk patients have a worse prognosis (HR=6.760; 95%CI, 1.572-29.068; p=0.008). The multivariate Cox regression analysis revealed that high-risk score is an independent predictor for CCA patient survival after adjusting the clinical covariate including recurrence status and residual tumor status (HR=4.879; 95%CI, 1.587-14.996; p=0.006) (Table 2).
Figure 2. Prognostic evaluation of the five-lncRNA signature in CCA patients in TCGA. (A) The survival status and duration of CCA patients; (B) LncRNA risk score distribution; the blue dashed line indicates dividing the patient into a low-risk group and a high-risk group with a median value as a cut off value. (C) Heatmap of five lncRNA expressions in CCA patients. (D) KM curves based on OS outcomes for risk cutoffs with a p-value of less than 0.01 for the log-rank test. (E) Time-dependent ROC curve analysis was predicted by five lncRNA for 3-year survival, and each was performed separately.
Table 2. Univariate and multivariate Cox regression analysis of cholangiocarcinoma patients in the TCGA cohort.
|Variables||Univariate analysis||Multivariate analysis|
|HR (95% CI)||P-value||HR (95% CI)||P-value|
|Primary pathology residual tumor|
|Family cancer history|
|(Yes/No)||17.500 (3.312-92.475)||<0.001||7.145 (1.481-37.477)||0.014|
|History hepatoma risk factors|
|(With tumor/Tumor free)||24.375 (3.822-155.448)||<0.001||1.845 (0.306-11.142)||0.502|
|Primary histological type|
|Five-lncRNA risk score|
|(High/Low)||6.760 (1.572-29.068)||0.008||4.879 (1.587-14.996)||0.006|
|HR: Hazard Ratio; CI: Confidence Interval.|
The AUC value and F1 score were calculated to assess the performance of the five-lncRNA signature model. Among the down-regulated lncRNA, the AUC for AL359715.5 and HULC were 0.777 and 0.821, respectively. The AUC for up-regulated lncRNA AC090114.2, AC006504.8, and AP000943.4 were 0.727, 0.728 and 0.77, respectively. For the whole model, the AUC was 0.938 (Figure 2E), and the F1 score was 0.7222. These results indicate that the five-lncRNA signature model has a high predictive value for CCA prognosis.
Performance validation of the five-lncRNA signature model in the validation cohort
We validated the prediction ability of the five-lncRNA signature model in the WMU cohort to assess the robustness of the model for survival prediction in patients with CCA. Based on the five-lncRNA model and cutoff point derived from the TCGA cohort, patients in the WMU cohort were divided into high-risk group (n=54) and low-risk group(n=36). We compared the KM curve between these two groups and found that the OS of patients in the high-risk group was worse than those in the low-risk group (p<0.001, Figure 3B). The AUC value was 0.816 in the WMU cohort (Figure 3C).
Figure 3. Prognostic evaluation of the five-lncRNA signature in CCA patients in the WMU cohort. (A) Primer sequence of five-lncRNA markers; (B) KM curve analysis of OS validated the prognostic differences between high and low-risk groups in the WMU cohort; (C) ROC curve analysis of 3-year survival validated the reliability of five-lncRNA model.
Stratified survival analysis of five-lncRNA model in the TCGA cohort
Further stratified analysis showed that the five-lncRNA signature model could accurately distinguish the prognosis between the high-risk group and the low-risk group in young patients <60 years (n=14, p=0.032, Figure 4E) and older patients ≥60 years (n=22, p=0.037, Figure 4A). Similarly, stratifying the patients based on the stage of disease, revealed that the five-lncRNA signature model has good discriminatory ability for earlier-stage patients (n=28, p=0.007, Figure 4F) and advanced-stage patients (n=8, p=0.028, Figure 4B). For patients with or without recurrence, the five-lncRNA model could divide patients into the high-risk or low-risk groups in those with recurrence (n=19, p=0.005, Figure 4C) and without recurrence (n=17, p=0.02, Figure 4G). Moreover, the five-lncRNA signature model could separate the high-risk group and low-risk group for patients with tumors (n=19, p=0.008, Figure 4D) and those who were tumor-free (n=15, p=0.18, Figure 4H). Multivariate Cox regression analysis combined with stratification analysis showed that there was no significant difference in OS between the high-risk and low-risk groups with five-LncRNA markers in tumor-free patients, and it this suggests that patients in the early stages of tumor development may benefit significantly from these prognostic biomarkers.
Figure 4. KM curve of OS of patients stratified by age, stage, recurrence, and current tumor status by five-lncRNA signature. (A) KM curves of the elder patients’ group and (E) KM curves in the younger patients’ group; (B) KM curves in the advanced-stage patients' group and (F) KM curves in the earlier-stage patients' group. (C) KM curves in the recurrence patients’ group and (G) KM curves in the no recurrence patients’ group. (D) KM curves in the with tumor patients’ group and (H) KM curves in the tumor-free patients’ group.
Identifying the functions of the five-lncRNA signature model
Co-expression analysis showed significant co-expression of 1429 DPCGs, 1440 DPCGs, 300 DPCGs, 495 DPCGs, and 552 PCGs with HULC, AL359715.5, AP000943.4, AC006504.8, AC090114.2, respectively. Functional enrichment analysis indicated that 72 GO biological processes (BP) terms, 21 GO cellular components (CC) terms, and 35 GO molecular functions (MF) terms were enriched for HULC-related DPCGs. Biological processes were mainly involved in the oxidation-reduction process, xenobiotic metabolic process, metabolic process; cellular components were mainly involved in extracellular exosome, mitochondrial matrix, blood microparticle; molecular functions were mainly involved in oxidoreductase activity, electron carrier activity, monooxygenase activity (Supplementary Figure 2A, 2C). There was significant enrichment of 60 KEGG pathways in HULC-associated DPCGs, including leucine, isoleucine and valine degradation, complement and coagulation cascades, fatty acid degradation, carbon metabolism and chemical carcinogenesis (Supplementary Figure 2B, 2C).
47 GO BP terms, 11 GO CC terms, and 33 GO MF terms were enriched for AL359715.5-related DPCGs, whose biological processes were mainly associated with drug metabolic process, lipid metabolic process, lipoprotein metabolic process; cellular components were mainly associated with organelle membrane, mitochondrion, peroxisome; molecular functions were mainly associated with iron ion binding, heme binding, cholesterol transporter activity (Supplementary Figure 3A, 3C).
56 KEGG pathways were enriched for AL359715.5-related DPCGs, which were primarily linked to Drug metabolism - cytochrome P450, Metabolism of xenobiotics by cytochrome P450, Retinol metabolism, Peroxisome, and Cholesterol metabolism (Supplementary Figure 3B, 3C).
16 GO BP terms, 18 GO CC terms and 4 GO MF terms were enriched for AC006504.8-related DPCGs, whose biological processes were mainly involved in cell division, sister chromatid cohesion, mitotic nuclear division; cellular components were mainly involved in condensed chromosome kinetochore, midbody, nucleoplasm; molecular functions were mainly involved in protein binding, ATP binding, and cadherin binding involved in cell-cell adhesion (Supplementary Figure 4A, 4C). It was a significant enrichment of 7 KEGG pathways in AC006504.8-related DPCGs, including DNA replication, cell cycle, the p53 signaling pathway, Fanconi anemia pathway and progesterone-mediated oocyte maturation (Supplementary Figure 4B, 4C).
7 GO BP terms, 2 GO CC terms and 1 GO MF terms were raised in AC090114.2-associated DPCGs, whose biological processes were mainly related to sister chromatid cohesion, DNA replication initiation, G1/S transition of mitotic cell cycle; cellular components were mainly associated with cytosol and MCM complex; molecular functions were mainly associated with protein binding (Supplementary Figure 5A, 5C). 4 KEGG pathways were raised in AC090114.2-related DPCGs, which were mainly related to DNA replication, cell cycle, cellular senescence and oocyte meiosis (Supplementary Figure 5B, 5C).
1 GO CC terms and 1 GO MF terms could be found for DPCGs related to AP000943.4, whose cellular components were associated with cytoskeleton; molecular functions were associated with extracellular matrix organization and involved a KEGG pathway for Human papillomavirus infection (Data not shown).
Functional evaluation of common DPCGs for five-lncRNA signature model
The intersection of the DPCGs corresponding to the five-LncRNA signature model showed that 171 DPCGs were shared by this five-LncRNA signature (Figure 5A). The common KEGG pathway is cell cycle, DNA replication, oocyte meiosis, Fanconi anemia pathway, and progesterone-mediated oocyte maturation (Figure 5B and Supplementary Table 1).
Figure 5. Enrichment and analysis of five lncRNA in the presence of common DPCGs. (A) Venn diagram showing 171 common DPCGs of the five-lncRNA. (B) The KEGG pathways were significantly associated with the enrichment of 171 common protein-coding genes co-expressed with five-lncRNA. The ordinate is the number of DPCGs that is enriched to the target gene. (C) Mutation of FANCD1 and FADCD2 genes in cholangiocarcinoma (from the cbioportal database http://www.cbioportal.org/). (D) Expression of FANCD1 and FADCD2 genes in cholangiocarcinoma.
GSEA between the high-risk group and low-risk group
Through GSEA analysis, we clarified the significant difference in survival between the high-risk and low-risk groups. The results showed significant enrichment of markers including the "complement pathway" in the high-risk group. Pathways including IL-2 Receptor Beta Chain in T cell Activation, Keratinocyte Differentiation, T cell receptor pathway, and Neurotrophin signaling pathway were enriched in the low-risk group , many of these being closely related to the occurrence and development of cancer  (Figure 6 and Supplementary Table 2).
Figure 6. Gene Set Enrichment Analysis (GSEA) was performed between the high risk score group and the low-risk score group. (A–D) Pathways including IL-2 Receptor Beta Chain in T cell Activation, Keratinocyte Differentiation, T cell receptor pathway, and Neurotrophin signaling pathway were enriched in the low-risk group. (E) The results showed significant enrichment of markers including the "complement pathway" in the high-risk group.
Currently, the molecular genotype for a variety of tumors (breast cancer, gastric cancer, and colorectal cancer) has been applied in a clinical setting. Some molecular genotypes are not only used to predict the prognosis but also to select the best therapy target . The comprehensive study of the mechanism has led to the discovery of many kinds of targeted drugs used in the treatment of these diseases . However, for CCA, there are relatively few studies on prognostic molecular markers. Therefore, establishing a molecular prediction model in CCA for guiding personalized treatment and predicting prognosis is particularly urgent. In this study, we established a prediction model based on five lncRNA for the prognosis of CCA and validate its reliability in an independent clinical center biobank. The molecular mechanism of these five lncRNA was further explored by the signal pathway analysis.
There is growing evidence that lncRNA plays a key role in transcription and post-transcriptional regulation of gene expression [22–24] as well as in different cells and developmental processes [25–27]. Experimental evidence indicates that abnormal expression of lncRNA is relative to the onset of various diseases including gastric cancer, breast cancer, HCC, lung cancer, and CCA [28–30]. Recent reports indicate that oxidative stress up-regulates the dysfunction of lncRNA H19 and HULC, and then modulates CCA cell migration and invasion through ceRNA targeting IL-6 and CXCR4 . Similarly, the lncRNA CPS1-IT1 is up-regulated in intrahepatic CCA. Conversely, knockdown of CPS1 and/or CPS1-IT1 reduced the proliferation and increased apoptosis of ICC-9810 cells . By comparing the expression of AFAP1-AS1 in CCA tissues and paired adjacent tissues and analyzing the relationship between AFAP1-AS1 expression and the clinical features of CCA, it was discovered that AFAP1-AS1 is significantly associated with the malignant degree and poor prognosis of CCA. Studies have shown that knockdown AFAP1-AS1 inhibits tumor growth in vivo and inhibits cell proliferation and invasion in vitro . Other studies have found that certain lncRNA play a critical role in the metastasis and malignant progression of CCA. It has been reported that some lncRNA increased in the tissues of patients with advanced CCA and lymph node metastasis, and through inhibition and overexpression in lncRNA experiments, it was found that this overexpression of certain lncRNA may promote the growth and metastasis of CCA through some miRNA (miRNA-200c, miR-296-5p, et al.) . Another study has found that lncRNA-DANCR can bind to EZH2 and regulate histone methylation FBP1 promoter expression, which regulates the growth and migration of CCA cells .
Although the study of the lncRNA function has attracted more and more attention and a great number of lncRNA have been identified in the human genome, the function of most lncRNA has not been fully revealed. Functional annotation of the gene encoding the lncRNA-associated co-expressed protein is a viable method for finding the biological characteristics of lncRNA . By extension, annotation of LncRNA function through co-expressed genes was reported to be effective . In this study, GO and KEGG enrichment analysis was used to identify co-expressed mRNAs of the five lncRNA to speculate on the functions of the predictive lncRNA. Our data revealed that the HULC and AL359715.5 participated in a number of biological processes that were most relevant to the cholesterol and fatty acid metabolism which is reported to be responsible for the growth and accelerated development of CCA [34, 35]. Also, of interest is the identification of the complement and coagulation cascades that are involved in many physiological and pathological processes, including those in the inflammatory process which, once dysregulated become an important factor in tumorigenesis . In this study, we found that AC006504.8 was enriched in the p53 signaling pathway. The molecular epidemiological analysis revealed that p53 is mutated in almost all kinds of tumors, and approximately 5% of patients with colorectal cancer, lung cancer, melanoma, sarcoma, head and neck cancer, leukemia, esophageal cancer, ovarian cancer, testicular cancer, and cervical cancer have been found to have p53 mutations [37, 38]. Of significance to this study is the amount of research that has indicated p53 inactivation plays a key role in the occurrence and development of CCA . The mechanisms by which AC006504.8 is involved in CCA are probably related to cell cycle and DNA replication. The 171 DPCGs intersected by the five-lncRNA signature were enriched in the function of the Fanconi anemia (FA) pathway. Fanconi anemia is a recessive genetic disorder characterized by congenital malformation, bone marrow failure, and high susceptibility to cancers [36, 40]. It is a cancer susceptibility gene involved in the repairing of genomic damage and maintaining genomic stability . Recent evidence indicates that genetic instability is a key factor in the metastasis and recurrence of malignant tumors. Many studies have shown that mutations and abnormal expression of the FANCD1 and FANCD2, two major genes in the Fanconi anemia pathway, are significantly associated with poor prognosis of CCA . Our study also showed that FANCD1 and FANCD2 mutated to different degrees in CCA (Figure 5C), and their expression in CCA and matched para-carcinoma tissues was also significantly different (Figure 5D). These results would seem to suggest that the predictive five-lncRNA may mediate the development and progression of CCA via DPCG interactions in biological processes related to cancer. However, more experimental studies are needed to further explain the potential roles of these lncRNA in CCA. To our knowledge, four out of the five lncRNA biomarker functions have never been reported. Therefore, we postulate that further investigation of the function of the lncRNA will contribute to early diagnosis and provide a clinical basis for the development of new prognostic factors in CCA.
In summary, we systematically studied the lncRNA expression profiles of CCA patients and their corresponding clinical information and found five-lncRNA (HULC, AP000943.4, AC006504.8, AC090114.2, AL359715.5) signature showing the risk scoring model in this study was an excellent way to classify patients with different survival outcomes. Also, the five lncRNA molecular biomarkers performed very well in predicting 3-year survival in patients with CCA, which could be an independent predictor of survival prognosis, and which could provide novel insights into the molecular mechanisms of CCA tumorigenesis and development. However, this study has some limitations. Firstly, our five-lncRNA signature model was only tested and validated in the TCGA and WMU cohort. If possible, it should be validated in other independent cohorts. Secondly, our research only investigated the biological function of predictive lncRNA by computational methods. It should be supplemented with in vitro cell research and animal experiments in vivo. Combining these data will help to unravel the mechanism of lncRNA involved in CCA tumorigenesis.
Materials and Methods
Screening differentially expressed lncRNA in CCA patients in the discovery cohort
Pre-processed level 3 RNA sequencing count data and relative clinical information for CCA patients were obtained from the Genomic Data Commons Data Portal database (TCGA, https://portal.gdc.cancer.gov/projects/TCGA-CHOL) . Ultimately, 60,483 RNA-ENSG_ID expression profiles in 36 CCA patients were included for further analysis as the discovery cohort (TCGA cohort).
The Gencode.v27.long_noncoding_RNAs.gtf compressed file was downloaded from the GENCODE database (release 27) (http://www.gencodegenes.org/) and the transformed data (antisense, lincRNA, and sense_intronic) was determined as lncRNA . It was then filtered by removing the exon-expressing lncRNA from any known coding gene by GENCODE-based gene annotation. A total of 59,264 lncRNA-ENSG_IDs, and RNA-ENSG_ID and lncRNA-ENSG_ID were intersected, 13,126 lncRNA-Gene_names were obtained (Supplementary Figure 6). Samples in which the lncRNA with RPKM expression value is 0 more than 20% were removed. Finally, 3651 lncRNA were keptfor further analysis. Furthermore, we downloaded Homo_sapiens.GRCh38.91.chr.gtf zip file from Ensembl genome browser 91 (ftp://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens) to obtain 57,000 RNA-Gene_symbol expression profiles, then used "mygene" package in R to find the RNA with human entrezgene symbol (GeneID) in the 57,000 RNAs.
Differentially expressed genes (DEGs) between 36 CCA tissues (30 iCCA tissues, 6 pCCA/dCCA tissues) and 9 normal tissues with the |log2 fold change (log2 FC)| ≥1 and false discovery rate(FDR) <0.01 were considered as selection criteria for subsequent analysis. At the same time, differentially expressed analysis was performed on the RNAs identified with GeneID.
Establishment and performance evaluation of a lncRNA-based prediction model in the discovery cohort
Firstly, we assessed the association between differentially expressed lncRNA and the overall survival (OS) of CCA patients in the TCGA cohort by univariate Cox proportional hazards regression analysis. LncRNA that were statistically significant were included in further multivariate Cox regression analysis. Then, using the multivariate Cox regression analysis, we identified independent lncRNA predictors. Furthermore, the corresponding coefficients of each lncRNA in the Cox regression model were obtained to calculate the risk score . Finally, a lncRNA-based prediction model to evaluate the CCA patient survival outcomes was established as below:
In this formula, N represents the number of prognostic lncRNA, Coei is the coefficient of the lncRNAi in the multivariate Cox regression analysis, EVi represents the expression level of the lncRNAi.
We used the lncRNA-based model to calculate the risk score for each CCA patients in the TCGA cohort. Setting the median value of log (Risk Score) as a threshold, CCA patients were divided into high-risk and low-risk groups and the difference of OS between these two groups was compared. Then univariate and multivariate Cox regression analysis was performed to determine whether the lncRNA signature was an independent predictor variable of other clinicopathologic features for survival outcomes. Further stratification analysis was conducted on clinicopathologic characteristics, which were statistically significant in a multivariate Cox regression model to determine the lncRNA signature model’s predictive capacity within the same clinical features. We calculated the area under the time-dependent receiver operating characteristic (ROC) curve (AUC) within a 3-year survival period to evaluate the sensitivity and specificity of the lncRNA model to predict survival outcomes.
Verification of lncRNA signature model for survival prediction in the validation cohort
We performed validation of discovered lncRNA in fresh frozen tissues from 90 CCA patients who underwent surgery between November 2012 and December 2015 at the First Affiliated Hospital of Wenzhou Medical University (WMU cohort). The patient inclusion criteria were as follows: (1) pathological diagnosed with primary CCA; (2) had completed clinicopathological and follow-up monitoring; (3) had no anti-tumor treatment before this surgical resection. Exclusion criteria were as follows: (1) previous radiofrequency or other anti-tumor treatment before surgery; (2) patients who were lost to follow-up after surgery. The study was approved by the institutional review boards of First Affiliated Hospital of Wenzhou Medical University and written informed consent was obtained from all patients. Patients demographics and clinicopathological characteristics are shown in Supplementary Table 3.
The lncRNA expression of primary CCA tumor fresh frozen samples was assessed by real-time quantitative PCR. RNAeasy mini kit (Qiagen, CA, USA) was used to extract the total RNA. High-Capacity cDNA Reverse Transcription kit from Applied Biosystems (Grand Island, NY, USA) was used to synthesize cDNA from 2 μg of total RNA. Semi-quantitative detection of mRNA was performed using an ABI 7300 RT-PCR system. Relative quantification of mRNA levels was performed with 18S ribosomal RNA as an internal reference gene and data from the ΔΔCt method. The statistical mean and standard error were determined by the ΔCt value. All data were independently inputed three times. The primer sequences used in the present study are shown in Figure 4A.
Through the same lncRNA signature model and cutoff level derived from the discovery cohort, patients in the WMU cohort were divided into high-risk and low-risk groups. Then, we investigated the performance of the lncRNA signature model in the WMU cohort.
Co-expression and functional enrichment analysis
The Spearman correlation coefficient was calculated between lncRNA expression level and the differentially expressed protein-coding genes (DPCGs). The DPCGs with Spearman correlation coefficient greater than 0.50 were considered to be lncRNA-related DPCGs. Gene Ontology (GO), Cell Component (CC), Biological Process (BP), Molecular Function (MF), Cell Component (CC) and Kyoto Encyclopedia Gene and Genome (KEGG) pathway enrichment assays were performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, version 6.8, https://david.ncifcrf.gov/) . Gene Set Enrichment Analysis (GSEA 3.0) analyses were carried out to elucidate the survival difference between high-risk and low-risk groups. GSEA analyses were implemented with java software GSEA (http://software.broadinstitute.org/gsea/index.jsp) . GO (BP, MF, CC) terms, KEGG pathways, and GSEA analyses with adjusted P value or a false discovery rate (FDR) less than 0.05 were considered statistically significant.
The continuous variables were expressed as mean ± standard deviation or median (quartile range), and categorical variables were presented as frequencies (percentages). A Chi-square test was used to compare the differences between independent groups. Cox proportional hazard regression analysis was conducted to evaluate the association of lncRNA signature in predicting overall survival in CCA patients. Kaplan-Meier (KM) analysis was used to determine survival outcomes. The median values were used as a cut-off level to plot the KM curves, and the log-rank test was performed to evaluate the statistical significance. The results of the stepwise multivariate Cox regression analysis of the AIC (Akaike Information Criterion, assessing the goodness of fit of a statistical model) test yielded a predictive model with optimal interpretation and information effectiveness. A linear correlation model was performed to evaluate the relationships between the variables and the Pearson correlation coefficient or Spearman rank correlation coefficient was used to present the result. Unless otherwise indicated, all statistical tests were two-sided and a p-value<0.05 was considered as statistically significant. All data analysis was performed with R (version 3.3.3; http://www.r-project.org/). The differential expression of the lncRNA profile was estimated by the R “edgeR” package. Unsupervised hierarchical clustering analysis was accomplished by the R “pheatmap” package and represented as a volcano plot. KM survival analysis and Cox proportional hazard regression analysis was performed by the R “survival” package. The AUC value was calculated by the R “Survival ROC” package. The R “clusterProfiler”, “pathview” and “venn” package were used to find the common DPCGs of lncRNAs for KEGG pathways.
Data sharing and data accessibility
The data used to support the findings of this study are available from the corresponding author upon request.
X.X., Y.W., S.Z., and G.C. designed the research. J.L., Y.W., Z.Y., X.D., L.Y., P.G., C.H., and Q.Z. acquired and analyzed the data. X.X., Y.W., G.C., J.L., Z.C., Z.D., Q.Z., and Z.Y. contributed to the writing of the manuscript. Finally, all authors have reviewed and approved the final submitted manuscript.
Conflicts of Interest
The authors declare that there is no conflicts of interest.
Research Foundation of the National Health Commission of China-Major Medical and Health Technology Project for Zhejiang Province, Grant/Award Number: WKJ-ZJ-1706; Natural Science Foundation of Zhejiang Province, Grant/Award Number: LY17H160047; Public Projects of Zhejiang Province, Grant/Award Numbers: 2016C37127, 2018C37114; National Natural Science Foundation of China, Grant/Award Numbers: 81470868, 81772628, 81703310; Basic Projects of Wenzhou Science and Technology Bureau, Grant/Award Number: Y20190206.
- 1. Khan SA, Davidson BR, Goldin RD, Heaton N, Karani J, Pereira SP, Rosenberg WM, Tait P, Taylor-Robinson SD, Thillainayagam AV, Thomas HC, Wasan H, and British Society of Gastroenterology. Guidelines for the diagnosis and treatment of cholangiocarcinoma: an update. Gut. 2012; 61:1657–69. https://doi.org/10.1136/gutjnl-2011-301748 [PubMed]
- 2. Shaib Y, El-Serag HB. The epidemiology of cholangiocarcinoma. Semin Liver Dis. 2004; 24:115–25. https://doi.org/10.1055/s-2004-828889 [PubMed]
- 3. Valle J, Wasan H, Palmer DH, Cunningham D, Anthoney A, Maraveyas A, Madhusudan S, Iveson T, Hughes S, Pereira SP, Roughton M, Bridgewater J, and ABC-02 Trial Investigators. Cisplatin plus gemcitabine versus gemcitabine for biliary tract cancer. N Engl J Med. 2010; 362:1273–81. https://doi.org/10.1056/NEJMoa0908721 [PubMed]
- 4. Guro H, Kim JW, Choi Y, Cho JY, Yoon YS, Han HS. Multidisciplinary management of intrahepatic cholangiocarcinoma: current approaches. Surg Oncol. 2017; 26:146–52. https://doi.org/10.1016/j.suronc.2017.03.001 [PubMed]
- 5. Kou Y, Koag MC, Lee S. N7 methylation alters hydrogen-bonding patterns of guanine in duplex DNA. J Am Chem Soc. 2015; 137:14067–70. https://doi.org/10.1021/jacs.5b10172 [PubMed]
- 6. Kou Y, Koag MC, Cheun Y, Shin A, Lee S. Application of hypoiodite-mediated aminyl radical cyclization to synthesis of solasodine acetate. Steroids. 2012; 77:1069–74. https://doi.org/10.1016/j.steroids.2012.05.002 [PubMed]
- 7. Wen R, Umeano AC, Kou Y, Xu J, Farooqi AA. Nanoparticle systems for cancer vaccine. Nanomedicine (Lond). 2019; 14:627–48. https://doi.org/10.2217/nnm-2018-0147 [PubMed]
- 8. Shen FZ, Zhang BY, Feng YJ, Jia ZX, An B, Liu CC, Deng XY, Kulkarni AD, Lu Y. Current research in perineural invasion of cholangiocarcinoma. J Exp Clin Cancer Res. 2010; 29:24. https://doi.org/10.1186/1756-9966-29-24 [PubMed]
- 9. Spizzo R, Almeida MI, Colombatti A, Calin GA. Long non-coding RNAs and cancer: a new frontier of translational research? Oncogene. 2012; 31:4577–87. https://doi.org/10.1038/onc.2011.621 [PubMed]
- 10. Zeng JH, Liang L, He RQ, Tang RX, Cai XY, Chen JQ, Luo DZ, Chen G. Comprehensive investigation of a novel differentially expressed lncRNA expression profile signature to assess the survival of patients with colorectal adenocarcinoma. Oncotarget. 2017; 8:16811–28. https://doi.org/10.18632/oncotarget.15161 [PubMed]
- 11. Zhou M, Xu W, Yue X, Zhao H, Wang Z, Shi H, Cheng L, Sun J. Relapse-related long non-coding RNA signature to improve prognosis prediction of lung adenocarcinoma. Oncotarget. 2016; 7:29720–38. https://doi.org/10.18632/oncotarget.8825 [PubMed]
- 12. Zhang F, Wan M, Xu Y, Li Z, Kang P, Jiang X, Wang Y, Wang Z, Zhong X, Li C, Cui Y. Transcriptome analysis reveals dysregulated long non-coding RNAs and mRNAs associated with extrahepatic cholangiocarcinoma progression. Oncol Lett. 2017; 14:6079–84. https://doi.org/10.3892/ol.2017.6987 [PubMed]
- 13. Wang C, Mao ZP, Wang L, Wu GH, Zhang FH, Wang DY, Shi JL. Long non-coding RNA MALAT1 promotes cholangiocarcinoma cell proliferation and invasion by activating PI3K/Akt pathway. Neoplasma. 2017; 64:725–31. https://doi.org/10.4149/neo_2017_510 [PubMed]
- 14. Xu Y, Yao Y, Leng K, Li Z, Qin W, Zhong X, Kang P, Wan M, Jiang X, Cui Y. Long non-coding RNA UCA1 indicates an unfavorable prognosis and promotes tumorigenesis via regulating AKT/GSK-3β signaling pathway in cholangiocarcinoma. Oncotarget. 2017; 8:96203–14. https://doi.org/10.18632/oncotarget.21884 [PubMed]
- 15. Xu Y, Leng K, Li Z, Zhang F, Zhong X, Kang P, Jiang X, Cui Y. The prognostic potential and carcinogenesis of long non-coding RNA TUG1 in human cholangiocarcinoma. Oncotarget. 2017; 8:65823–35. https://doi.org/10.18632/oncotarget.19502 [PubMed]
- 16. Jiang XM, Li ZL, Li JL, Zheng WY, Li XH, Cui YF, Sun DJ. LncRNA CCAT1 as the unfavorable prognostic biomarker for cholangiocarcinoma. Eur Rev Med Pharmacol Sci. 2017; 21:1242–47. [PubMed]
- 17. Shi X, Zhang H, Wang M, Xu X, Zhao Y, He R, Zhang M, Zhou M, Li X, Peng F, Shi C, Shen M, Wang X, et al. LncRNA AFAP1-AS1 promotes growth and metastasis of cholangiocarcinoma cells. Oncotarget. 2017; 8:58394–404. https://doi.org/10.18632/oncotarget.16880 [PubMed]
- 18. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005; 102:15545–50. https://doi.org/10.1073/pnas.0506580102 [PubMed]
- 19. Janelle V, Langlois MP, Tarrab E, Lapierre P, Poliquin L, Lamarre A. Transient complement inhibition promotes a tumor-specific immune response through the implication of natural killer cells. Cancer Immunol Res. 2014; 2:200–06. https://doi.org/10.1158/2326-6066.CIR-13-0173 [PubMed]
- 20. Sia D, Villanueva A, Friedman SL, Llovet JM. Liver cancer cell of origin, molecular class, and effects on patient prognosis. Gastroenterology. 2017; 152:745–61. https://doi.org/10.1053/j.gastro.2016.11.048 [PubMed]
- 21. Liu B, Guo H, Zhang J, Xue J, Yang Y, Qin T, Xu J, Guo Q, Zhang D, Qian W, Li B, Hou S, Dai J, et al. In-depth characterization of a pro-antibody-drug conjugate by LC-MS. Mol Pharm. 2016; 13:2702–10. https://doi.org/10.1021/acs.molpharmaceut.6b00280 [PubMed]
- 22. Singh P, Patel T. Advances in the diagnosis, evaluation and management of cholangiocarcinoma. Curr Opin Gastroenterol. 2006; 22:294–99. https://doi.org/10.1097/01.mog.0000218967.60633.64 [PubMed]
- 23. Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009; 10:155–59. https://doi.org/10.1038/nrg2521 [PubMed]
- 24. Hawkins PG, Morris KV. Transcriptional regulation of Oct4 by a long non-coding RNA antisense to Oct4-pseudogene 5. Transcription. 2010; 1:165–75. https://doi.org/10.4161/trns.1.3.13332 [PubMed]
- 25. Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT, Freier SM, Bennett CF, Sharma A, Bubulya PA, Blencowe BJ, Prasanth SG, Prasanth KV. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell. 2010; 39:925–38. https://doi.org/10.1016/j.molcel.2010.08.011 [PubMed]
- 26. Evans JR, Feng FY, Chinnaiyan AM. The bright side of dark matter: lncRNAs in cancer. J Clin Invest. 2016; 126:2775–82. https://doi.org/10.1172/JCI84421 [PubMed]
- 27. Quinn JJ, Chang HY. Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet. 2016; 17:47–62. https://doi.org/10.1038/nrg.2015.10 [PubMed]
- 28. Wang F, Yang H, Deng Z, Su Y, Fang Q, Yin Z. HOX antisense lincRNA HOXA-AS2 promotes tumorigenesis of hepatocellular carcinoma. Cell Physiol Biochem. 2016; 40:287–96. https://doi.org/10.1159/000452545 [PubMed]
- 29. Wang WT, Ye H, Wei PP, Han BW, He B, Chen ZH, Chen YQ. LncRNAs H19 and HULC, activated by oxidative stress, promote cell migration and invasion in cholangiocarcinoma through a ceRNA manner. J Hematol Oncol. 2016; 9:117. https://doi.org/10.1186/s13045-016-0348-0 [PubMed]
- 30. Ma SL, Li AJ, Hu ZY, Shang FS, Wu MC. Co-expression of the carbamoyl-phosphate synthase 1 gene and its long non-coding RNA correlates with poor prognosis of patients with intrahepatic cholangiocarcinoma. Mol Med Rep. 2015; 12:7915–26. https://doi.org/10.3892/mmr.2015.4435 [PubMed]
- 31. Ma H, Hao Y, Dong X, Gong Q, Chen J, Zhang J, Tian W. Molecular mechanisms and function prediction of long noncoding RNA. ScientificWorldJournal. 2012; 2012:541786. https://doi.org/10.1100/2012/541786 [PubMed]
- 32. Lu X, Zhou C, Li R, Deng Y, Zhao L, Zhai W. Long noncoding RNA AFAP1-AS1 promoted tumor growth and invasion in cholangiocarcinoma. Cell Physiol Biochem. 2017; 42:222–30. https://doi.org/10.1159/000477319 [PubMed]
- 33. Han L, Zhang EB, Yin DD, Kong R, Xu TP, Chen WM, Xia R, Shu YQ, De W. Low expression of long noncoding RNA PANDAR predicts a poor prognosis of non-small cell lung cancer and affects cell apoptosis by regulating Bcl-2. Cell Death Dis. 2015; 6:e1665. https://doi.org/10.1038/cddis.2015.30 [PubMed]
- 34. Samudio I, Harmancey R, Fiegl M, Kantarjian H, Konopleva M, Korchin B, Kaluarachchi K, Bornmann W, Duvvuri S, Taegtmeyer H, Andreeff M. Pharmacologic inhibition of fatty acid oxidation sensitizes human leukemia cells to apoptosis induction. J Clin Invest. 2010; 120:142–56. https://doi.org/10.1172/JCI38942 [PubMed]
- 35. Iyengar NM, Hudis CA, Dannenberg AJ. Obesity and cancer: local and systemic mechanisms. Annu Rev Med. 2015; 66:297–309. https://doi.org/10.1146/annurev-med-050913-022228 [PubMed]
- 36. Vundinti BR, Korgaonkar S, Ghosh K. Incidence of malignancy and clonal chromosomal abnormalities in Fanconi anemia. Indian J Cancer. 2010; 47:397–99. https://doi.org/10.4103/0019-509X.73575 [PubMed]
- 37. Robles AI, Harris CC. Clinical outcomes and correlates of TP53 mutations and cancer. Cold Spring Harb Perspect Biol. 2010; 2:a001016. https://doi.org/10.1101/cshperspect.a001016 [PubMed]
- 38. Khan SA, Thomas HC, Toledano MB, Cox IJ, Taylor-Robinson SD. P53 mutations in human cholangiocarcinoma: a review. Liver Int. 2005; 25:704–16. https://doi.org/10.1111/j.1478-3231.2005.01106.x [PubMed]
- 39. Scheffner M, Werness BA, Huibregtse JM, Levine AJ, Howley PM. The E6 oncoprotein encoded by human papillomavirus types 16 and 18 promotes the degradation of p53. Cell. 1990; 63:1129–36. https://doi.org/10.1016/0092-8674(90)90409-8 [PubMed]
- 40. Mirchandani KD, D’Andrea AD. The Fanconi anemia/BRCA pathway: a coordinator of cross-link repair. Exp Cell Res. 2006; 312:2647–53. https://doi.org/10.1016/j.yexcr.2006.06.014 [PubMed]
- 41. Masserot C, Peffault de Latour R, Rocha V, Leblanc T, Rigolet A, Pascal F, Janin A, Soulier J, Gluckman E, Socié G. Head and neck squamous cell carcinoma in 13 patients with Fanconi anemia after hematopoietic stem cell transplantation. Cancer. 2008; 113:3315–22. https://doi.org/10.1002/cncr.23954 [PubMed]
- 42. Jacquemont C, Taniguchi T. The Fanconi anemia pathway and ubiquitin. BMC Biochem. 2007 (Suppl 1); 8:S10. https://doi.org/10.1186/1471-2091-8-S1-S10 [PubMed]
- 43. Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011; 474:609–15. https://doi.org/10.1038/nature10166 [PubMed]
- 44. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22:1760–74. https://doi.org/10.1101/gr.135350.111 [PubMed]
- 45. Zhou M, Zhao H, Wang Z, Cheng L, Yang L, Shi H, Yang H, Sun J. Identification and validation of potential prognostic lncRNA biomarkers for predicting survival in patients with multiple myeloma. J Exp Clin Cancer Res. 2015; 34:102. https://doi.org/10.1186/s13046-015-0219-5 [PubMed]
- 46. Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009; 4:44–57. https://doi.org/10.1038/nprot.2008.211 [PubMed]