Comprehensive analysis of the differential expression and prognostic value of COL1A2 in colon adenocarcinoma

Background: Colon adenocarcinoma (COAD) is a highly heterogeneous disease, which is the second most common cancer in females and third in males. Collagen type I alpha 2 (COL1A2) has been documented to be involved in the carcinogenesis of multiple tumors; however, the expression and prognostic significance of COL1A2 and its underlying mechanism in COAD remains unclarified. Materials and Methods: The general profile of COL1A2, its expression pattern, and prognostic value were systematically assessed through various bioinformatics tools. The protein level of COL1A2 was verified in COAD patients using immunohistochemistry analysis. In addition, enrichment analyses were performed to explore the possible regulatory pathways of COL1A2 in COAD. Results: The mRNA and protein levels of COL1A2 were significantly increased in COAD than that in normal tissues (P < 0.05). The COL1A2 expression tended to increase along with cancer stages and nodal metastasis status in COAD, while the promoter methylation levels of COL1A2 might negatively related to its mRNA expression. Survival analysis showed that COL1A2 was a reliable predictor for distinguishing the status of disease-specific survival (DSS), overall survival (OS), and progression-free survival (PFS), and might serve as a robust independent prognostic biomarker for DSS and OS in COAD patients (P < 0.05). The enrichment analysis showed focal adhesion as the most possible regulatory pathway by COL1A2. Conclusion: Collectively, COL1A2 functioned as an independent prognostic biomarker and might be a potential therapeutic target in COAD.

AGING survival (OS) rate remains unsatisfactory at only 50%-65% and it is still incurable for advanced or metastatic COAD patients [8]. Therefore, it is imperative to understand the underlying mechanism responsible for COAD and develop a potential biomarker to prolong the survival time of COAD patients.
Collagen is the primary component of the extracellular matrix (ECM) and the most abundant type is collagen type I [9]. Collagen type I as a structural protein is observed in connective tissues such as tendon, bone, and skin [10]. It is a triple helix composed of two chains of collagen type I alpha 1 (COL1A1) and one chain of collagen type I alpha 2 (COL1A2) [10]. The structural integrity and coordinated biosynthesis of these chains are critical for tissue morphogenesis, growth, homeostasis, and repair [11]. Changes in collagen type I synthesis occur in embryogenesis, wound healing, and some pathological conditions, including fibrosis of the kidney, lung, and liver, scleroderma as well as cancers [12]. Moreover, the expression level of the fibrosisrelated protein COL1A2 was obviously upregulated in the colonic tissue with intestinal fibrosis [13,14]. Previous studies have demonstrated the involvement of COL1A2 in the cancer process, which can be both stimulatory and inhibitory. COL1A2 accelerated the development and angiogenesis of melanoma and medulloblastoma [15,16]. On the other hand, COL1A2 suppressed the proliferation, migration, and invasion of bladder cancer cells [17]. Nevertheless, the prognostic role of COL1A2 and the actual significance across the various clinicopathological parameters such as age, gender, weight, histological subtypes, cancer stage, and nodal metastasis status in COAD have not been systemically studied yet.
Herein, multiple bioinformatics tools were adopted and the dataset of well-established cancer data from various demographic and clinicopathological patients was downloaded for comprehensive research of COAD. Firstly, we analyzed the expression and prognostic values of COL1A2 in COAD. Further, the underlying mechanisms of COAD based on COL1A2-related genes were explored by enrichment analysis.

The general profile of COL1A2
The normalized RNA sequencing data based on TCGA Pan-Cancer were downloaded from the UCSC Xena database (https://xenabrowser.net/datapages/) to analyze the difference in COL1A2 mRNA expression levels between various tumors and normal tissues. The data were used to extract the mRNA expression values of COL1A2 and the tumors with less than three samples were deleted. Differential COL1A2 mRNA expression analyses in each tumor type were performed using the Student's t-test. P < 0.05 was considered to be statistically significant. Then, the Human Protein Atlas (HPA) (https://www.proteinatlas.org/) database was utilized to examine the localization of COL1A2 protein in human tumor cells in the "SUBCELL" column. The HPA database aims to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies. As the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer, the Catalogue of Somatic Mutations in Cancer (COSMIC) (https://cancer.sanger.ac.uk/cosmic) was applied for analyzing the different mutation types related to the COL1A2 gene.

Expression analysis of COL1A2 in COAD
The expression data and corresponding clinical information in TCGA-COAD were obtained from the UCSC Xena database. The differential gene expression levels of COL1A2 in COAD and normal tissues were compared using unpaired and paired t-tests. A P-value less than 0.05 was regarded as the threshold of significance. Through the HPA database, we evaluated the protein levels of COL1A2 in COAD and normal tissues by immunohistochemical analysis, which was validated in an additional population with COAD.
Subsequently, the expression and promoter methylation level of the COL1A2 gene in COAD based on clinicopathological parameters such as age, gender, weight, histological subtypes, cancer stage, and nodal metastasis status was assessed using the UALCAN web server (http://ualcan.path.uab.edu/analysis.html), which is an interactive database for analyzing cancer OMICS data.

Prognosis analysis of COL1A2 in COAD
The relationship between COL1A2 expression and disease-free interval (DFI), disease-specific survival (DSS), overall survival (OS), and progression-free survival (PFS) in COAD was examined in the Gene Set Cancer Analysis (GSCA) (http://bioinfo.life.hust.edu. cn/GSCA/#/) which is an integrated database for genomic and immunogenomic gene set cancer analysis. P < 0.05 indicates statistical significance. Besides, the value of COL1A2 in distinguishing the survival status with regard to DFI, DSS, OS, and PFS was determined by the receiver operating characteristic (ROC) curve based on the TCGA-COAD data. The computed area under the curve (AUC) value ranging from 0.5 to 1.0 indicated the discrimination ability from 50-100%. Moreover, Cox regression analysis was performed to evaluate the independent prognostic value of COL1A2 by SPSS software (version 23.0), and P < 0.05 was considered to be statistically significant. Further, the R package "rms" was adopted to construct the nomogram and plot the calibration curves to predict 1-, 3-, and 5-year OS and DSS for COAD patients. The concordance index (C-index) was used to evaluate the predictive accuracy of the nomogram. C-index from 0.50 to 0.70 (lower accuracy), 0.71 to 0.90 (medium accuracy), above 0.90 (high accuracy) [18].

Enrichment analysis
To explore the underlying mechanism of COL1A2 in COAD, the TCGA-COAD samples were divided into high-COL1A2 and low-COL1A2 expression groups based on the median value of the COL1A2 gene. The DEGs between two expression groups were screened using the "limma" package. |Log2 Fold change (FC)| Cutoff >1 and P < 0.05 were set as the criterion of significant differences.
Following this, we conducted Gene Ontology (GO) annotations and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway of the significant DEGs using the R package "cluster profiler". A P-value less than 0.05 was considered statistically significant. Gene Set Enrichment Analysis (GSEA) was conducted to further investigate the signaling pathways regulated by COL1A2-related genes in COAD. The gene set permutations were performed 1000 times. Clusters with a false discovery rate (FDR) q-value < 0.05 and P-value < 0.05 were identified as significant.
Subsequently, we defined the enrichment level of a pathway in COAD samples as the single-sample geneset enrichment analysis (ssGSEA) score [19]. The gene set represents the collection of all marker genes of a pathway. The association of the COL1A2 mRNA expression with the enrichment levels (ssGSEA scores) of the pathway was evaluated using Pearson's correlation test.

Data sharing statement
The datasets used and/or analyzed during the current study are available from the corresponding authors upon reasonable request.

Patient characteristics
The TCGA-COAD dataset contains 293 samples with complete survival and gene expression data. As shown in Table 1

The general profile of COL1A2
We firstly analyzed the expression levels of COL1A2 among various cancer types ( Figure 1A). Subcellular location and immunofluorescence image of COL1A2 expression in human tumor cells were retrieved from the HPA ( Figure 1B, 1C). The COL1A2 gene mutation in pan-cancer was evaluated through the COSMIC database. A total of 1246 samples were recorded for mutations, among which the missense substitution (52.77%) had the highest proportion, followed by synonymous substitution (15.81%), and other types (8.95%) ( Figure 1D). Figure 1E presented the breakdown of different substitution mutations, exhibiting the highest type of C > T (31.70%), and the lowest type of T > G (1.39%). Of note, no mutation of the COL1A2 gene occurred in COAD.

Expression analysis of COL1A2
Based on the TCGA-COAD data, the mRNA expression levels of COL1A2 in COAD and normal tissues were compared. Both unpaired and paired t-tests demonstrated that COL1A2 gene expression was remarkably upregulated in the COAD compared with that in normal tissues (all P < 0.001) (Figure 2A, 2B). The HPA database showed the different COL1A2 protein expressions in COAD and normal colon samples ( Figure 2C, 2D). The immunohistochemistry analysis validated the higher protein level of COL1A2 in COAD tissue ( Figure 2E, 2F). We have observed the elevated mRNA level of the COL1A2 in COAD in the earlier section, and then we evaluated the COL1A2 expression in COAD based on various clinicopathological characteristics like age, gender, weight, histological subtypes, cancer stage, and nodal metastasis status ( Figure 3A-3F). Patients with mucinous adenocarcinoma had remarkably higher COL1A2 mRNA expression than those with adenocarcinoma (P < 0.05) ( Figure 3D). It tended to increase the COL1A2 expression at advanced cancer stages (stage 1 < stage 2 < stage 3) and COL1A2 expression increased along with the nodal metastasis status (N0 < N1 < N2) ( Figure 3E, 3F). DNA methylation is closely related to the development of cancer in the human body [20]. From our data, the promoter methylation of COL1A2 was significantly related to age, cancer stage, and nodal metastasis status, but had no notable relation with gender, weight, and histological subtypes ( Figure 4A-4F). Intriguingly, the promoter methylation level of COL1A2 decreased with the development of cancer stages (stage 4 < stage 3 < stage 2 < stage 1) and nodal metastasis status (N2 < N1 < N0) ( Figure 4E, 4F). The above findings suggested that the promoter methylation of COL1A2 might have a negative relation with its mRNA expression in COAD.    To further probe into the independent prognostic value of COL1A2 in predicting DSS, OS, and PFS of COAD patients, Cox regression analysis was conducted using the TCGA-COAD data. The results showed that age, N1, N2, stage IV, and COL1A2 had a significant relationship with DSS in COAD patients (all P < 0.05), but other clinical factors were not associated with DSS in the univariate analysis. When these variables were integrated into multivariate Cox regression analysis, age and COL1A2 were independent prognostic factors (all P < 0.05) ( Table 2). In addition, N2, stage III, stage IV, and COL1A2 were closely connected with OS (all P < 0.05), while only COL1A2 (P < 0.05) was still significantly associated with OS in the multivariate analysis (Table 3). However, COL1A2 could not independently predict the PFS of COAD patients (P < 0.05) ( Table 4). These results indicated that high COL1A2 expression was an independent prognostic factor for predicting worse DSS and OS in COAD.
To better predict the prognosis of COAD patients, two nomogram models on basis of the multivariate Cox regression analysis results were constructed and calibration curves were drawn to evaluate their efficiency. Two statistically significant prognostic factors, age, and COL1A2 were included in the nomogram model based on DSS, which had a C-index of 0.765 ( Figure 6A). N status and COL1A2 were enrolled in the construction of the nomogram model based on OS, which had a C-index of 0.734 ( Figure  6B). The calibration curves for 1-, 3-, and 5-year DSS and OS of COAD patients were presented in Figure 6C, and Figure 6D, respectively. Importantly, the calibration curve showed good agreement between prediction and observation in the 5-year survival probability. These findings revealed that the nomogram models had high efficiency in anticipating DSS and OS of COAD patients.

Enrichment analysis
To elucidate the pathological function of COL1A2, the DEGs between the high-and low-COL1A2 expression groups were firstly identified as shown in the volcano plot ( Figure 7A) and heat map ( Figure 7B). According to the selection criterion, the significant DEGs were used for GO and KEGG enrichment analyses. GO annotations predicted the functional effect of the target genes in three aspects: biological process (BP), cellular component (CC), and molecular function (MF). We found that cellular response to an organic substance, extracellular matrix organization, extracellular matrix, vesicle, signaling receptor binding, and growth factor binding were significantly regulated in COAD ( Figure  7C). The KEGG enrichment analysis showed that the pathways related to the functions of COL1A2 interactive genes were oxidative phosphorylation, phagosome, focal adhesion, fatty acid metabolism ( Figure 7D).
Following this, we analyzed the enrichment levels of the focal adhesion pathway using ssGSEA. The COL1A2 gene expression level had a significant positive correlation with the enrichment levels (ssGSEA scores) of the focal adhesion pathway (P < 0.001) AGING ( Figure 9). Therefore, the COL1A2 might affect the development of COAD via positive regulation of the focal adhesion pathway.

DISCUSSION
COAD is a heterogeneous disease always occurring in the elderly [21]. Besides, COAD is highly invasive and is easily spread to other organs with the liver as the most common site of metastasis. Although significant achievements have been made in the treatment strategies, the 5-year survival rate is still low and the 5year OS for metastatic patients is approximately 13% [22]. Hence, searching for novel molecular biomarkers for carcinogenesis and pathogenesis of COAD was essential for the COAD diagnosis and treatment. COL1A2 encodes the alpha 2 chain of collagen type I which is the main ECM component of bone and skin [23]. It has been demonstrated that the abnormal COL1A2 mRNA expression led to a worse prognosis of    gastric cancer patients and enhances the proliferation of prostate cancer cell [24,25]. In addition, COL1A2 overexpression was a significant risk factor for intracranial aneurysm susceptibility [26]. Whereas, COL1A2 decreased the proliferative activity of liver epithelial cells [27]. These results suggested that COL1A2 was a double-edged sword in the biological process. This study evaluated the utility of COL1A2 as a potential biomarker in COAD and we explored the underlying mechanisms by which COL1A2 affected COAD through enrichment analysis.
Firstly, we had a general overview of COL1A2 from the following aspects: the differential mRNA expression of COL1A2 in various tumors, the location of COL1A2 protein in human tumor cells, and the different mutation types related to the COL1A2 gene. We found that COL1A2 was differentially expressed in most of the tumors, and the COL1A2 protein was located in the endoplasmic reticulum. The missense substitution of COL1A2 had the highest proportion of the mutation type. Then, we explored the COL1A2 expression in COAD and observed significant upregulation of COL1A2 transcriptional levels in COAD, and tended to increase along with stages and the nodal metastasis status. Epigenetic changes in genes are thought to be the leading cause of neoplastic transformation [28], and hence we analyzed the promoter methylation level of Next, we explored the relationship between COL1A2 mRNA expression and the clinical outcomes of COAD patients. Upon the survival analyses, COL1A2 overexpression contributed to a lower survival probability of DSS, OS, and PFS. Meanwhile, COL1A2 exhibited satisfactory performance in predicting the DSS, OS, and PFS status of patients with COAD. Further, Cox regression analyses showed that the elevated mRNA level of COL1A2 was an independent predictor for worse DSS and OS, revealing COL1A2 as a potential prognostic biomarker for COAD. Changes in COL1A2 in the ECM microenvironment are often accompanied by stromal invasion and the occurrence of epithelial neoplasms [29]. It is also involved in the induction of epithelial-mesenchymal transformation (EMT) in breast and lung cancer cells [30,31]. Importantly, COAD arises from the epithelial cells of the colon or rectum, whose functions such as cellular differentiation, migration, and invasion are regulated by the physical interactions with ECM [29,32]. Thus, it is reasonable to speculate that COL1A2 might promote tumorigenesis, invasion, and metastasis of COAD, which might be an explanation for the high expression of COL1A2 leading to a worse prognosis for COAD patients. Additionally, COL1A2 was secreted by stromal fibroblasts, while tumor-associated fibroblasts stimulate the occurrence of COAD in part by inducing inflammation in the tumor microenvironment [33][34][35]. The cancer cells and stromal cells in the COAD microenvironment generate high levels of proinflammatory eicosanoids, which are robust lipid mediators implicated in cancer cell angiogenesis, proliferation, and metastasis by the following mechanisms: direct activation of receptors on cancer cells; enforcing the epithelial cells to release angiogenic factors, pro-inflammatory mediators, and tumor growth factors [36]. This might be another explanation for the worse clinical outcomes of COAD patients with COL1A2 overexpression. To reveal the pathological function of COL1A2 in COAD, enrichment analyses were performed. GO enrichment of the COL1A-related DEGs showed the cellular response to an organic substance, extracellular matrix organization, extracellular matrix, vesicle, signaling receptor binding, and growth factor binding. The KEGG pathway showed the enrichment signaling pathways of oxidative phosphorylation, phagosome, focal adhesion, and fatty acid metabolism, among which focal adhesion shared the same pathway with GSEA results. Interestingly, COL1A2 expression was positively correlated with the enrichment levels of the focal adhesion pathway. Focal adhesion plays a vital role in cancer metastasis, invasion, and drug resistance [37]. Ning et al. reported that the focal adhesion signaling pathway was crucial in the EMT process in pancreatic cancer [38]. The activation of the focal adhesion pathway alters cancer cell glycolysis and induces cisplatin resistance in cervical and breast cancer [39]. Similarly, the authors inferred that COL1A2 might promote metastasis, and induce drug resistance, thereby enhancing the progression of COAD via regulation of the focal adhesion pathway. For further mechanistic explorations, Pekow et al. reported that COL1A2 was predicted to be the target of miR-4728-3p. In the HCT116 human colon cancer cells, miR-4728-3p regulated the COL1A2 gene expression by targeting the wild type 3′untranslated region (3′UTR) of the COL1A2 gene in the focal adhesion pathway. Besides, they revealed that miR-4728-3p decreased focal adhesion intensity and area [40]. Taken together, the possible mechanism of COL1A2 involvement in COAD was shown in Figure 10, which should be verified in the future. Additionally, other molecules could regulate COL1A2 mRNA expression. Acute ulcerative colitis (UC)-like colitis was induced in Chga-C57BL/ 6-deficient (Chga −/− ) and wild-type (Chga +/+ ) mice. COL1A2 mRNA expression was decreased in Chga −/− mice model [41]. It has also been demonstrated that patients with UC are at high risk to develop colonic neoplasia [42]. Therefore, the authors speculated that targeting CHGA might decrease the COL1A2 mRNA expression, inhibiting the progression of COAD.
In summary, our study comprehensively enlightens the role of COL1A2 in the initiation and development of COAD. COL1A2 might serve as a valuable prognostic biomarker and a potential therapeutic target for COAD. Moreover, COL1A2 might promote the progression of COAD by positive regulation of the focal adhesion pathway.

AUTHOR CONTRIBUTIONS
Conception and design: Jian-Jiang Jin and Ting Zheng; Collection and assembly of data: Xiao-Xia Xu and Lei Zheng; Data analysis and interpretation: Fang-Yuan Li, Xing-Xing Li and Li Zhou; Manuscript writing: all authors; Final approval of manuscript: all authors.