Profiling and integrated analysis of transcriptional addiction gene expression and prognostic value in hepatocellular carcinoma

Transcriptional dysregulation caused by genomic and epigenetic alterations in cancer is called “transcriptional addiction”. Transcriptional addiction is an important pathogenic factor of tumor malignancy. Hepatocellular carcinoma (HCC) genomes are highly heterogeneous, with many dysregulated genes. Our study analyzed the possibility that transcriptional addiction-related genes play a significant role in HCC. All data sources for conducting this study were public cancer databases and tissue microarrays. We identified 38 transcriptional addiction genes, and most were differentially expressed genes. Among patients of different groups, there were significant differences in overall survival rates. Both nomogram and risk score were independent predictors of HCC outcomes. Transcriptional addiction gene expression characteristics determine the sensitivity of patients to immunotherapy, cisplatin, and sorafenib. Besides, HDAC2 was identified as an oncogene, and its expression was correlated with patient survival time. Our study conclusively demonstrated that transcriptional addiction is crucial in HCC. We provided biomarkers for predicting the prognosis of HCC patients, which can more precisely guide the patient’s treatment.


INTRODUCTION
Primary liver cancer is the sixth most commonly diagnosed cancer worldwide [1]. China is a high-risk area for liver cancer, with the second-highest incidence rate among all malignant cancers [2]. Hepatocellular carcinoma (HCC) accounts for 75-85% of all liver cancers. There are several drug strategies for HCC, including chemotherapy, targeted therapy, and immunotherapy, but the drug response varies significantly among cancer patients [3,4]. Considering the tumor heterogeneity and multiple risk factors, biomarkers for HCC should be individualized and diversified.
The dysregulation of transcription is crucial for tumorigenesis and development [5]. Transcriptional dysregulation caused by genomic and epigenetic alterations is called "transcriptional addiction" [6]. Transcriptional addiction in cancer cells can be overly dependent on various components of the transcriptional process [7]. Therefore, small molecule inhibitors targeting transcriptional processes may be more effective against these cancer cells [5]. Targeting transcriptional cyclin-dependent kinases and small molecule proteins is a promising treatment strategy that could contribute to the rational design of cancer combination therapies.
The oncogenic effect of transcriptional addiction has been demonstrated in several cancers. The efficacy of CDK7 inhibitors depends on the selective dependency of tumors on CDK7 kinase activity. For instance, CDK7 inhibitors could selectively target tumor cells and suppress the growth of breast tumors [7]. The repression AGING of CDK7 may promote marked transcriptional dysregulation in ovarian cancer, which is not recognized as an actionable vulnerability [8]. Identifying transcriptional addiction genes may aid in developing HCC biomarkers and therapeutic strategies.
The present study systematically analyzed the expression levels and prognostic value of transcriptional addiction genes in HCC. Clustered subgroups and risk signatures provided predictable biomarkers of patients' survival. Among these genes, HDAC2-an oncogenewas associated with patient survival. These findings substantiate the great potential of transcriptional addiction in HCC.

Identification of differentially expressed genes
We identified a series of genes associated with cancer transcriptional addiction (n=38) through a focused literature review   (Table 1). We refer to these genes as transcriptional addiction genes.
We identified DEGs in HCC from three platforms. In the TCGA-LIHC set, 33 genes (86.8%) were differentially expressed in tumor tissues and paraneoplastic tissues (p < 0.05) ( Figure 1A). We identified 33 and 26 DEGs in the ICGC-JP set and GEO-GSE14520 set, respectively ( Figure 1B, 1C). The analysis results were consistent across the three platforms, with most genes overlapping ( Figure 1D).

Transcription factor co-expression network and gene function analysis
There was a strong correlation (|R 2 | > 0.5 and p < 0.05) between the expression of 14 DEGs and tumor transcription factors (Figure 2A). The top three DEGs were KMT2A, MED1, and HDAC2. The expression of these 14 DEGs in cancer tissues was higher than that in paracancerous tissues. There were 314 transcription factors screened using the correlation analysis. We plotted the co-expression network using the Cytoscape v3.8.1 software ( Figure 2B). Functional enrichment analysis was performed based on 33 DEGs from the TCGA-LIHC set. The GO dataset enriched the regulation of stem cell differentiation, transcription initiation from RNA polymerase II promoter, and DNA−templated transcription ( Figure 2C). As for the KEGG dataset, viral carcinogenesis, transcriptional misregulation in cancer, and microRNAs in cancer were enriched ( Figure 2D). Transcription-related functions were enriched in both datasets.

Clustered subgroups and nomograms for predicting patient survival in the TCGA-LIHC set
For the best clustering effect, we chose k = 2 for the subsequent analyses ( Figure 3A, 3B). The different clustering subgroups revealed a two-way distribution through PCA analyses ( Figure 3C). According to the survival analysis, the patients in subgroup 1 exhibited higher one-, three-, and five-year overall survival rates than those in subgroup 2 (log-rank test p < 0.01) ( Figure 3D).
The independent variables used to construct the nomogram include age, gender, grade, stage, and cluster ( Figure 4A). The calibration curve and C-index confirm the high predictive accuracy and specificity of the nomogram ( Figure 4B, 4C). In univariate/multivariate Cox regression analyses, the nomogram was an independent predictor of HCC outcome (HR > 1 and p < 0.001) ( Figure 4D, 4E).
In all sets, there was a greater median overall survival time among the patients in the low-risk group compared to those in the high-risk group (p < 0.01) ( Figure 5A-5C). In the TCGA-LIHC and ICGC-JP cohorts, the area under the curve (AUC) for the risk score was higher than that for the other clinical factors ( Figure 5D, 5E). In the GEO cohort, the AUC for the risk score demonstrated slightly lower performance ( Figure 5F). The additional GEO verification set (GSE20140) is provided in the Supplementary Figure 1. This difference could be related to the patient's geographical area, tumor stage, and other factors. The results of the univariate/multivariate Cox regression analyses were consistent across multiple cohorts, and they all supported that risk score was an independent risk predictor of patient survival (HR > 1 and p < 0.05) ( Figure 6).

Correlation of risk signature with therapeutic drugs and immune status
There were differences in TIDE and exclusion scores between risk groups (p < 0.001) ( Figure 7A, 7B). Thus, high-risk patients were more likely to benefit from immunotherapy. Similarly, chemotherapy and targeted therapy are more effective in the high-risk group, considering the lower IC50 values in this group ( Figure  7C, 7D). Both DNA and RNA stemness scores were positively correlated with the risk score (p < 0.001) ( Figure 7E, 7F). This result could provide a reference for targeted cancer stem cell therapy. Finally, we identified several immune cell contents and immune function scores associated with risk signature, including T cells CD8, macrophages M0, cytolytic activity, MHC class I, type I IFN response, and type II IFN response ( Figure 7G, 7H).

HDAC2 expression and clinical value in HCC
By bioinformatics mining, HDAC2 expression was higher in the cancer tissues than that in the adjacent tissues in the TCGA-LIHC and ICGC-JP sets (p < 0.001) ( Figure 8A, 8B). In the HPA database, immunohistochemistry staining also identified higher levels of HDAC2 protein in HCC tissues compared with those in normal liver tissues ( Figure 8C). Tissue microarrays likewise demonstrated high levels of HDAC2 protein in tumor tissues compared with those of paracancerous tissues and distal tissues ( Figure 8D).
In the TCGA-LIHC and ICGC-JP sets, the median HDAC2 expression value was used for grouping patients. The patients with high HDAC2 expression had a shorter median overall survival time than low expression patients (p < 0.001) ( Figure 9A, 9B). The HDCA2 protein level was scored for each sample of the tissue microarrays [40,41] (Figure 9C). Consistent results were observed across all survival analyses. The patients with higher ICH scores had shorter survival times ( Figure 9D, 9E). HDAC2-an oncogene-may affect the survival of HCC patients. Furthermore, HDAC2 was up-regulated in HCC recurrent and metastatic tissues ( Figure 10A, 10B). HDAC2 scores also differed in samples with different tumor sizes and stages ( Figure 10C, 10D). HDAC2 score could be an independent prognostic factor for HCC patients ( Figure 10E, 10F).

DISCUSSION
The mortality rate for liver cancer is the fourth highest among all cancers [1]. Because of the lack of early-stage symptoms, most patients do not benefit from surgery at the initial diagnosis [4,42]. The conventional treatment for HCC is ineffective, and many exploratory clinical trials have failed to meet their endpoints [43]. The development of HCC is a complex process involving multiple gene regulatory networks at the molecular level. The conventional early warning markers primarily focus on tumor volume and clinical symptoms, with insufficient sensitivity and specificity. In the precision medicine era, studies on genetic characteristics may provide us with new directions. A growing number of   [44], autophagy [45], and ferroptosis-related gene signature [46]. The present study identified novel transcriptional addiction gene-related signatures and nomograms. The transcriptional addiction gene signature demonstrated better sensitivity and specificity compared with conventional clinical factors when validated across multiple platforms.    The HCC genome is highly heterogeneous, with transcriptional dysregulation of many genes. However, transcriptional addiction has received little attention in the field of liver cancer. Francesca et al. [7] found that the oncogenic properties of BRD4 were dependent on the YAP/TAZ transcriptional response. BET inhibitors could limit the liver growth caused by YAP transcription. There are also several other transcriptional addiction genes. For example, CDK7 is a famous transcriptional addictionrelated gene that affects the progression, metastasis, and prognosis of many cancers [6,16]. SOX2 transcription is associated with cell growth in lung and esophageal cancers [26,31]. Transcriptional addiction may be a vital physiological and pathological mechanism in various processes, including tumorigenesis, development, and metastasis.
Our study examined the expression characteristics of transcriptional addiction genes in HCC. We identified several DEGs. The functional role of most genes has not been a concern in HCC. For instance, high expression of DOT1L may be associated with poor prognosis in HCC patients by mining public cancer databases. As a transcriptional addiction gene, DOT1L-mediated transcriptional regulatory mechanisms were involved in developing and maintaining MLL leukemia [13]. Yu et al. [47] reported that TRPS1 could facilitate the invasion and proliferation of HCC cells. Overexpression of TRPS1 was associated with tumor size and stage.
However, the specific mechanism and prognostic value of TRPS1 in HCC are still unclear. We compiled a list of the transcriptional addiction-related genes to serve as a reference for subsequent studies.
It is common for patients with the same tumor stage and receiving the same treatment to have different outcomes.
Targeting key transcriptional addiction genes may facilitate the development of biomarkers and therapeutic strategies. Wei et al. [48] reported that ZFP64 was up-regulated in tumor tissues of immunotherapy-insensitive HCC patients. Inhibiting PKCα/ZFP64/CSF1 axis could overcome anti-PD1 resistance in HCC. ETV1 may facilitate the invasion and metastasis of HCC by up-regulating metastasisrelated genes; PTK2 and c-MET inhibitors could significantly inhibit this process [49]. Our study found that different groups of patients responded differently to cisplatin, sorafenib, and immunotherapy, which could guide drug selection for patients.
In combination with public cancer databases and tissue microarrays, this study investigates the role of HDAC2 in HCC. Aberrant expression and mutations of HDACs in cancer have been widely reported. HDAC10 regulates the stemness characteristics of lung cancer cells [50]. HDAC10 also serves as a new therapeutic target in ovarian and gastric cancers [51,52]. Some selective HDAC6 inhibitors, such as ricolinostat and ACY-1215, have been applied in clinical trials [53]. As for HDAC2, the related studies mainly focus on colorectal cancer (CRC). Tang et al. [54] found that transcriptional regulation associated with the p300/YY1/miR-500a-5p/HDAC2 signaling axis was a critical step that affects the proliferation of CRC cells. Cui et al. [55] reported that HDAC2 promoted CRC cell proliferation by regulating GLI2 expression. HDAC2 may also facilitate multiple biological behaviors in hepatocellular carcinoma. Jin et al. [56] found that HDAC2 inhibition could enhance HCC radiosensitivity. Nam et al. [57] reported that HDAC2 was involved in HCC progression through feedback control of mTOR and AKT. HDAC2 may also affect the M2 macrophage migration and immune escape in HCC [58]. Our study demonstrated that high expression of HDAC2 could contribute to poor patient outcomes. HDAC2-an oncogene-contributes to metastatic and recurrent HCC.

CONCLUSIONS
Cancer develops and metastasizes via transcriptional addiction, which is common and critical. As demonstrated in our study, transcriptional addiction genes were implicated in HCC. Transcriptional addiction-related nomograms and signatures could provide significant predictions of survival time for patients. Gene expression characteristics determined the sensitivity of patients to therapeutic agents. HDAC2 was highly expressed in tumor tissues, and its level was associated with patients' survival. Multiple databases confirmed our findings. Our study identified new biomarkers that can be used to predict survival and select suitable therapeutic drugs for HCC patients.

Data sources
This study's data were obtained from three publicly available databases and three tissue microarrays. We obtained HCC RNA-seq and clinical information from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/), International Cancer Genome Consortium (ICGC) (https://icgc.org/), and Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) databases. The TCGA-LIHC set, ICGC-JP set, and GEO-GSE14520 set included 365, 210, and 221 HCC cases, respectively. Specific clinical information is provided in the Supplementary Table 1. The protein levels of transcriptional addiction genes in normal and tumor tissues were analyzed using the Human Protein Atlas (HPA) database. RNA-seq profiles were normalized using the "limma" and "sva" packages in R v4.2.1 software (https://www.r-project.org/). Tissue microarray 1 (Outdo Biotech, Shanghai) contains tumor tissue, paracancerous tissue, and distal tissue from three HCC patients. The other tissue microarrays (Servicebio, Wuhan) contain tumor tissues from 152 HCC patients and their corresponding survival data. Tissue microarrays were used to verify the correlation between HDAC2 expression and patient survival.

Differentially expressed genes in different sets
The differentially expressed genes (DEGs) were identified by comparing transcriptional addiction gene expression in tumor and paracancerous tissues. The cutoff was set as p < 0.05. We visualized the overlapping genes from different datasets using Venn plots (https://bioinformatics.psb.ugent.be/webtools/Venn/).

Construction of cancer transcription factor coexpression network and gene function analysis in the TCGA-LIHC set
Cistrome Cancer (http://cistrome.org/) provides a list of cancer transcription factors. We analyzed the correlation between 318 cancer transcription factors and DEGs in the TCGA-LIHC set. The threshold value was p < 0.05 and |R 2 | > 0.5. The co-expression network of these genes was visualized using Cytoscape v3.8.1. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) suggest a range of functional and pathway sets in which DEGs may influence the biological behavior of HCC. This process was performed using the "enrichplot" R package.

Construction and validation of clustered subgroups and nomogram in the TCGA-LIHC set
Gene clustering was performed using the "ConsensusClusterPlus" package according to the expression of DEGs in the TCGA-LIHC set. The distribution among those cases was mapped using principal component analysis (PCA). Survival analysis verifies whether there were differences in survival rates between subgroups of patients. Further, clinical data and clustered subgroups were integrated to construct a nomogram using the "nomogram" package. The calibration curve and concordance index (C-index) were used to verify the specificity and predictive accuracy of the nomogram. The nomogram combined with clinicopathological characteristics was included for univariate and multivariate regression analyses.

Construction and validation of transcriptional addiction gene signature
With the TCGA-LIHC set as the training set, we developed a risk signature using multivariate Cox proportional regression. The ICGC-JP and GEO sets were defined as test sets. All cases were divided into high-risk and low-risk groups based on the median risk score of the training set. Survival analysis, receiver operating characteristic (ROC) curve, and univariate/ multivariate regression were used to analyze the prognostic value of risk score in HCC patients.
The tumor immune dysfunction and exclusion (TIDE) score (http://tide.dfci.harvard.edu/) reflects the sample's sensitivity to immunotherapy. We analyzed the sensitivity of different risk samples to immunotherapy, cisplatin, and sorafenib treatment. We evaluated the correlation between risk score and stem cell content by calculating DNA and RNA stemness scores. Finally, we evaluated the immune cell contents and immune function scores of different samples through the CIBERSORT algorithm (https://cibersortx.stanford.edu/).

Tissue microarrays and immunohistochemistry detection
We extracted the HDAC2 expression profiles of all cases from the TCGA and ICGC datasets and analyzed the distribution of HDAC2 in tumor tissues and paracancerous tissues. Tissue microarrays and the HPA database were used to verify HDAC2 protein levels in various types of tissues. We performed immunohistochemistry (IHC) of HDAC2 on the tissue microarrays. We incubated the tissue microarrays with the primary antibody against HDAC2 (Servicebio, GB11371, Wuhan), followed by the secondary antibody (Servicebio, GB23303, Wuhan) and 3,3'diaminobenzidine (DAB) IHC kit (DAKO, K5007, Denmark). The intensity of HDAC2 staining was scored as 0, 1, 2, and 3 for no, low, medium, and high staining, respectively. We pooled these data and analyzed the correlation between HDAC2 protein levels and survival time of HCC patients in TCGA, ICGC, and tissue microarray sets. Univariate and multivariate regression analyses were also performed.

Statistical analysis
Expression or score differences between groups were analyzed using the Wilcoxon rank-sum test. Spearman's rank correlation coefficient was used to determine the correlation between risk and stemness scores. For survival analysis, the Kaplan-Meier and log-rank tests were used. The independent prognostic value of the nomogram, risk signature, and HDAC2 score was performed with univariate/multivariate regression analyses. All statistical analysis procedures were performed using R v4.2.1 software.

Consent for publication
All authors have read and approved the content and agree to submit for consideration for publication in the journal.

AUTHOR CONTRIBUTIONS
Conceived and designed the study: GL, TSC. Collected the literature: XWD, HW. Wrote the manuscript: GL, XWD, JX, YFZ. Revised the manuscript: XWD, GL, JX, TSC, HW. All authors read and approved the final manuscript.