Seven oxidative stress-related genes predict the prognosis of hepatocellular carcinoma

Predicting the prognosis of hepatocellular carcinoma (HCC) is a major medical challenge and of guiding significance for treatment. This study explored the actual relevance of RNA expression in predicting HCC prognosis. Cox's multiple regression was used to establish a risk score staging classification and to predict the HCC patients’ prognosis on the basis of data in the Cancer Genome Atlas (TCGA). We screened seven gene biomarkers related to the prognosis of HCC from the perspective of oxidative stress, including Alpha-Enolase 1(ENO1), N-myc downstream-regulated gene 1 (NDRG1), nucleophosmin (NPM1), metallothionein-3, H2A histone family member X, Thioredoxin reductase 1 (TXNRD1) and interleukin 33 (IL-33). Among them we measured the expression of ENO1, NGDP1, NPM1, TXNRD1 and IL-33 to investigate the reliability of the multi-index prediction. The first four markers’ expressions increased successively in the paracellular tissues, the hepatocellular carcinoma samples (from patients with better prognosis) and the hepatocellular carcinoma samples (from patients with poor prognosis), while IL-33 showed the opposite trend. The seven genes increased the sensitivity and specificity of the predictive model, resulting in a significant increase in overall confidence. Compared with the patients with higher-risk scores, the survival rates with lower-risk scores are significantly increased. Risk score is more accurate in predicting the prognosis HCC patients than other clinical factors. In conclusion, we use the Cox regression model to identify seven oxidative stress-related genes, investigate the reliability of the multi-index prediction, and develop a risk staging model for predicting the prognosis of HCC patients and guiding precise treatment strategy.


INTRODUCTION
Liver cancer seriously endangers people's health.It is conservatively estimated that 1 million people will be affected by liver cancer each year [1].Hepatocellular carcinoma (HCC) is the predominant type of liver cancer and is associated with persistent inflammation and fibrosis [2].Post-treatment recurrence is noted up to 60% of patients following partial liver resection, and about 15% of liver transplant patients [3].In previous studies, Macrovascular invasion, Lymphovascular invasion and poor differentiation have been used to speculate the clinical outcomes of patients [4].However, liver cancer is an extraordinarily heterogeneous malignant disease, we need to search for novel biomarkers to provide a more precise prognosis for doctors.
HCC is a heterogeneous malignancy with dismal prognosis.TERT, TP53 and CTNNB1 mutations are the most common mutations that affect the prognosis of HCC.However, these mutations are not fully present in most patients, and their clinical use is not yet widely accepted.To date, specific biomarkers are still needed to improve the prognosis of HCC.Compared with a single biomarker, polygenic markers can improve the specificity of the prognosis of tumor patients.
In the past few decades, considerable progress has been made in understanding the biomarkers and molecular characteristics of HCC [5].The most studied prognostic parameters were tumor number, size, α-Feto protein (AFP) level, cell differentiation, MVI and ES grades, presence of satellite nodule, and pTNM stage [6].However, existing prognostic staging systems still have many limitations in guiding treatment and prognosis.Recently, there has been increasing evidence that oxidative stress plays a crucial role in the development of liver cancer [7].The occurrence of liver cancer is closely related to oxidative stress and inflammation [8].Excessive and long-term inflammation and oxidative stress can cause irreversible damage and may lead to cirrhosis and carcinogenic transformation, while liver cancer is the inevitable result of the development of liver cirrhosis.Therefore, understanding the molecular mechanism of oxidative stress in hepatocellular carcinoma and its effect on prognosis will help to provide new therapeutic strategies for the treatment of HCC.Shen et al. investigated that Facilitates Chromatin Transcription complex was remarkably upregulated in HCC, which mediated oxidative stress to promote HCC progression [9].Similarly, as a biomarker for oxidative stress, high expression of 8-Hydroxy-2-deoxyguanosine is associated with poor survival in HCC patients [10].Recent findings have reported that oxidative stress promotes thyroid cancer development by upregulating protein Tyrosine Phosphatases expression [11].Furthermore, the degradation of mitogen-activated protein kinase phosphatase-3 mediated by oxidative stress contributes to tumorigenicity of human ovarian cancer cells [12].
The above studies support that those biomarkers are useful tools for oxidative stress, and they do have potential in determining the stage of tumor progression.However, compared with single biomarker, polygenic markers can improve the specificity of tumor patient prognosis [13].Paik et al. found that a 21-gene recurrence score predicts response to chemotherapy in breast cancer [14], indicating the superiority of polygenic markers in predicting prognosis.
Here, we used Cox multiple regression models to evaluate gene expression in HCC cases from The Cancer Genome Atlas (TCGA; http://www.tcga.org/).Compared with the patients with high-risk scores, the survival rates of the patients with low-risk scores are significantly higher.The findings were further validated with training and complete test datasets.Furthermore, we found that risk scores can be independent of other clinical variables, has a greater advantage in judging the prognosis of HCC.Combining the risk score with other clinical factors to form a nomogram to predict the 5year and 10-year survival rates of HCC patients is more accurate and convenient.

Differential oxidative stress-related genes
Analysis of differential gene expression between liver cancer and normal tissues suggested that a total of 254 genes were dysregulated, of which 145 were upregulated and 109 were down-regulated (Supplementary Table 1).We visualised the differences and created a volcano map and heat map, in volcano map the downregulated genes were expressed in blue, the upregulated genes in yellow and the genes that did not differ between groups were represented in grey (Figure 1A, 1B).LASSO regression results demonstrate that Lambda obtained minimum value (Figure 1C), model enrolled 11 important genes (Figure 1D).

Seven gene signatures of oxidative stress-related genes can predict patient prognosis
Multifactorial Cox regression of 11 prognostic genes resulted in seven independent risk genes affecting the prognosis of patients with hepatocellular carcinoma, namely ENO1, NDRG1, NPM1, H2AX, IL33, MT3 and TXNRD1 (Table 1).Based on the expression of the genes and the model coefficients, we constructed a signature model for each of the 7 genes and calculated the risk score for each patient based on the model (Figure 2A).Survival analysis revealed that patients in the high-risk group had a worse prognosis, compared to patients in the low-risk group, who had a significantly longer survival time (Figure 2B).In addition, the area under the ROC curve was equal to 0.841, suggesting that the model was highly discriminatory in terms of patient stratification (Figure 2C).This conclusion is also supported by the data from this validation set (Figure 3A-3C).

Seven oxidative stress-related genes detect earlystage liver cancer
The oxidative stress-related gene signature of the seven genes was derived from a differential analysis of cancer and paracancer, so we hypothesized that it could not only predict the prognosis of patients but also have the potential to detect early-stage liver cancer, and therefore we further analyzed the model genes in the signature using the BP network and visualized the network (Figure 4A and Supplementary Table 3).The results suggested that these seven genes could be used as a panel for early detection of liver cancer with high early detection efficacy (ROC=0.955)(Figure 4B).ENO1, NDGR1, NPM1 and TXNRD1 are highly expressed in HCC tissues from patients with poor prognosis PPI network identifies ENO1 as hub gene of signature (Figure 5A).We examined the expression of ENO1 gene in hepatocellular carcinoma tissues at both transcriptomic levels and showed that ENO1 gene was significantly more highly expressed in both paired and (B) about differential oxidative stress-related genes was created, the down-regulated genes were expressed in blue, the up-regulated genes in yellow and the genes that did not differ between groups were represented in grey.Lambda obtained minimum value (C), model enrolled unpaired hepatocellular carcinoma samples compared to paracellular tissues (Figure 5B, 5C).To further verify the effectiveness of ENO1, we performed the validation studies.As shown in Figure 5D, immunohistochemistry experiment was used to measure the expression of ENO1 in paracellular tissues and hepatocellular carcinoma sample.Compared with the paracellular tissues (Normal * group), the expression of ENO1 was increased in the hepatocellular carcinoma samples (Tumor * group).Furthermore, we found that ENO1 expression in the hepatocellular carcinoma samples from patients with poor prognosis (Tumor ** group) was higher than the samples from patients with better prognosis (Tumor * group).Additionally, we measured the expression of another four targets (NDRG1, NPM1, TXNRD1 and IL-33) to investigate the reliability of the multi-index prediction.As shown in Figure 5D, compared with the paracellular tissues, the expression of NDRG1, NPM1, TXNRD1 were both increased in the hepatocellular carcinoma samples.We also found that the expression of NDRG1, NPM1, TXNRD1 in the hepatocellular carcinoma samples from patients with poor prognosis were higher than the samples from patients with better prognosis.In contrast, the expression of IL-33 was decreased in the hepatocellular carcinoma samples.We also found that IL-33 expression in the hepatocellular carcinoma samples from patients with poor prognosis were lower than the samples from patients with better prognosis.Meanwhile, the samples were observed using haematoxylin and eosin (H&E) staining.As shown in Figure 5D

High expression of hub gene ENO1 suggests disease progression and correlates with tumor immune infiltration
The results of the clinical correlation analysis of ENO1 gene expression suggest that ENO1 is associated with tumor size and stage.The gene was less expressed in stage T1 compared to stage T2 and T3 (Figure 6A).The expression of ENO1 increased significantly with higher tumor stage in stage (Figure 6B), and in the pathological Grade stage, the expression of ENO1 was also increased in other Grade grades compared to Grade 1 (Figure 6C).However, unfortunately, our analysis revealed that ENO1 was not associated with lymph node status or distant metastatic status (Figure 6D, 6E).Furthermore, the immune infiltration results suggested that ENO1 correlated with Th2 cells, aDC, NK CD56bright cells, macrophages, pDC, CD8 T cells and Th17 cells in the tumor microenvironment, suggesting that ENO1 may promote liver cancer progression by regulating immune infiltrating cells in the tumor microenvironment (Figure 7).

DISCUSSION
In this study, we screened seven gene biomarkers related to the prognosis of HCC from the perspective of oxidative stress, measured five targets expression (ENO1, NDRG1, NPM1, TXNRD1 and IL-33) of the seven gene biomarkers to investigate the reliability of the multi-index prediction in clinic, increasing the sensitivity and specificity of the predictive model and resulting in a significant increase in overall confidence.
Numerous previous studies have reported biomarkers associated with the prognosis of HCC.Liu et al. showed   AGING that the expression of PGM5 in HCC was significantly lower than that in adjacent tissues [15].Wang et al. found that the expression of PPM1G gene in HCC tissues was lower than that in adjacent tissues [16].However, the above studies were based on the evaluation of single-gene biomarkers, the clinical application was limited and there were also certain clinical biases.Therefore, it is particularly important to develop new polygenic models to predict the survival rate of HCC patients.Studies have reported that oxidative stress-related indicators are highly correlated with the prognosis of HCC patients.Huang et al. indicated that found that the level of glutathione in HCC tissues of patients was significantly higher than that in adjacent normal tissues [17].Xiong et al. shown that PCK gene expression was downregulated and was associated with poor prognosis in HCC patients [18].Given the importance of oxidative stress, we tried to screen multi-gene biomarkers to predict the prognosis of liver cancer from the perspective of oxidative stress.

AGING
In this study, we performed Cox multiple regression analysis on the RNA sequencing data of HCC downloaded from the TCGA database, and identified the genes related to oxidative stress in patients.Among them, 7 genes met the selection criteria.We further analyzed related genes.Survival analysis showed that patients with high-risk scores had significantly shorter OS time compared with patients with low-risk scores.At the same time, risk score was a better predictor of patient survival than other specific medical parameters including age, tumor stage and histological type, suggesting these 7 genes deserve further study.

AGING
poor survival rates to the correct treatment regimen.Among the seven genes identified, expression of ENO1, H2AX, MT3, NDRG1, NPM1 and TXNRD1 was riskassociated, suggesting that the expression levels of these genes are inversely correlated with HCC survival time.
In contrast, the expression level of IL-33 was positively correlated with survival.ENO1, NDRG1 and NPM1 are reportedly associated with cancer as further discussed.We further measured the expression of ENO1, NDRG1, NPM1, TXNRD1 and IL-33 to investigate the reliability of the multi-index prediction.The data indicated that the oxidative stress-related polygenic biomarkers we screened are reliable and clinically significant in predicting prognosis.
As a hub gene of signature identified by PPI network, ENO1 is overexpressed in 70% of human cancers and is associated with poor cancer prognosis, converts 2-phosphoglycerate to phosphoenolpyruvate, plays an important role in the glycolytic pathway and the Warburg effect in cancer cells [19,20].High expression of ENO1 increased the risk score and the likelihood of poor prognosis, act as a biomarker in patients with hepatocellular carcinoma, and may be a favorable candidate for targeted treatment [21].NDRG1, one member of NDR-and an α∕β hydrolasefold region family [22].Both NDRG1 and ENO1 are closely related to glycolysis and carcinogenesis.
Studies have shown that NDRG1 play important roles in tumor invasion and metastasis [23].Compared to healthy controls, NDRG1 expression is upregulated in HCC patients and is associated with poorer prognosis and histological grade [24,25].NPM1, abundant nucleolar proteins that often shuttle between the nucleolus and the nucleoplasm or cytoplasm, is involved in chromatin remodeling, genome stabilization, cell cycle progression, and apoptosis in cancer [26].The study showed that liver cancer patients with lower levels of NPM1 have a better prognosis [27].MT3 protects the body from DNA damage, and H2AX is a biomarker of DNA damage [28,29].As one of the major REDOX regulators, TXNRD1 is associated with tumor aggressiveness and poor prognosis [30].Patients with high expression of IL-33 in their tumors had higher survival rates.These indicate the importance of these 7 indicators in tumor prognosis.
Although we have used multigene biomarker analysis and the characteristics of the 7 genes screened can effectively predict the prognosis of HCC patients, this study still has certain limitations.Future studies are needed to improve the accuracy of the risk scoring model with additional patient cohorts.In addition, a patient's treatment regimen is critical to cancer prognosis, incorporating patient treatment regimen data into these analyses will add value to subsequent results.

CONCLUSIONS
The seven gene signatures established in this study are effective and stable in HCC samples from TCGA.These 7 gene signatures are autonomous factors for the prognosis of HCC patients.These results may help develop more effective prognostic tools and ultimately improve patient outcomes.

Data acquisition and pre-processing
Data from sequenced liver cancer samples were obtained from the public database TCGA, a dataset containing a total of 50 paraneoplastic and 371 cancerous tissues.Clinical information matching the patient samples was likewise downloaded.Patients with less than 30 days of follow-up were also removed based on survival information.Oxidative stress-related genes were obtained from the gene card database and these genes have been confirmed by previous experiments.

Differential gene analysis
We extracted oxidation-related gene expression profiles from transcriptomic data based on information from screened clinical samples and performed differential analysis using the limma package, differential genes were defined as those genes whose gene expression in cancer and paracancer met |LogFC| > 1 while p < 0.05.FC: Fold change.

Univariate Cox regression and LASSO regression
Univariate Cox regression was used to filter prognostic genes associated with patients' overall survival (OS), and genes that met the criteria were further entered into the LASSO regression model for selection of significant genes among these genes, which were done by the survival package and the caret package respectively.A p-value less than 0.05 was defined as statistically significant in the univariate Cox regression.The best model variable was obtained by taking the minimum value of Lambda in the LASSO regression.The lambda.min is the optimal value of λ found during the cross-validation process, used to minimize prediction error, rather than just being the smallest possible value of λ.

Multi-Cox regression and prognostic model construction
The results of the above analysis were further incorporated into a multifactorial Cox regression model to identify independent prognostic genes, a procedure performed by the survival package.Finally, we constructed a multi-gene prognostic model based on the coefficients of this regression model and the expression profile of the genes.Based on the scores of the model, we stratified patients and compared survival differences between groups, using ROC to assess the predictive efficacy of the model.To test the robustness of the model, we also randomly extracted 30% of the data for validation.

Prognostic model with tumor detection
Back propagation (BP) network was used to assess the ability of the model genes to detect early-stage liver cancer.We first transformed the raw count into TPM data and calculated the genescore of the model genes based on the results of the difference analysis and the expression of the model genes in the genescore matrix, where genescore in the genescore matrix, genescore greater than the median expression value is defined as 1, and those less than the median expression value is defined as 0. Finally, we input the genescore matrix into the BP network and visualise it, and the area under the ROC curve is used to assess the predictive power.

PPI identify hub gene of model and validate expression
We performed PPI protein interaction network analysis on the genes in the model through an online database, STRING database, and defined the hub genes with multiple gene links as core genes.We further validated the expression of the core genes at the transcriptional and post-transcriptional levels.

Hub gene with clinical factors and immune infiltration
To further assess the clinical value of the core gene, we correlated the core gene with common clinical variables (T, N, M, stage, grade and tumor status).We removed samples missing the above variables from the original clinical samples and looked at the differences in expression of hub gene between the variable groups.ssGSEA was used to assess the relationship between hub gene expression and immune infiltration cells in the tumor microenvironment.The essential clinical information of the patients is presented in Supplementary Table 4.

Histological analysis
Samples were fixed in 4% paraformaldehyde (PFA) at 4° C overnight.Samples were processed using conventional techniques.5 μm sections were cut and stained with haematoxylin and eosin (H&E) to observe the morphology of the tissue.

Evaluation of IHC-positively stained tissues
The H-score method was applied for calculating the staining score of each sample, which was to multiply the immunoreaction intensity (negative: 0, weak: 1, moderate: 2, strong: 3) by the staining extent score (0%-100%).According to the H-score, the stained samples were divided into four groups: negative (-; 0), weak (+; 0~1), moderate (++; 1~1.5), and strong (+++; 1.5~3).Samples with a negative or weak H-score were determined to be the low protein expression group, whereas those with a moderate or strong H-score were classified as the high protein expression group.

Figure 1 .
Figure 1.Differential oxidative stress-related genes between liver cancer and normal tissues.A volcano map (A) and heatmap

Figure 2 .
Figure 2. Seven oxidative stress-related genes predict patient prognosis model in training dataset.(A) A signature model for each of the 7 genes was constructed, and the risk score for each patient based on the model was calculated.X axis is number of patients.(B) Survival analysis about patients in the high-risk group and the low-risk group.(C) Analysis of the area under the ROC curve.The signature model based on Cox regression coefficients and gene expression, the riskscore = gene A expression* coefficients A+ gene B expression* coefficients B.

Figure 3 .Figure 4 .
Figure 3. Seven oxidative stress-related genes predict patient prognosis model in test dataset.(A) A signature model for each of the 7 genes was constructed, and the risk score for each patient based on the model was calculated.X axis is number of patients.(B) Survival analysis about patients in the high-risk group and the low-risk group.(C) Analysis of the area under the ROC curve.The signature model based on Cox regression coefficients and gene expression, the riskscore = gene A expression* coefficients A+ gene B expression* coefficients B.

To more accurately predict patient 5 -Figure 6 .
Figure 6.High expression of hub gene ENO1 suggests disease progression.(A) The clinical correlation analysis of ENO1 gene expression and tumor size and stage.The expression of ENO1 in different tumor stages (B), and in the pathological Grade stage (C).ENO1 was not associated with lymph node status (D) or distant metastatic status (E).