Integrated multi-omics analysis and machine learning identify hub genes and potential mechanisms of resistance to immunotherapy in gastric cancer

Background: Patients with gastric cancer respond poorly to immunotherapy. There are still unknowns about the biomarkers associated with immunotherapy sensitivity and their underlying molecular mechanisms. Methods: Gene expression data for gastric cancer were gathered from TCGA and GEO databases. DEGs associated with immunotherapy response came from ICBatlas. KEGG and GO analyses investigated pathways. Hub genes identification employed multiple machine algorithms. Associations between hub genes and signaling pathways, disease genes, immune cell infiltration, drug sensitivity, and prognostic predictions were explored via multi-omics analysis. Hub gene expression was validated through HPA and CCLE. Multiple algorithms pinpointed Cancer-Associated Fibroblasts genes (CAFs), with ten machine-learning methods generating CAFs scores for prognosis. Model gene expression was validated at the single-cell level using the TISCH database. Results: We identified 201 upregulated and 935 downregulated DEGs. Three hub genes, namely CDH6, EGFLAM, and RASGRF2, were unveiled. These genes are implicated in diverse disease-related signaling pathways. Additionally, they exhibited significant correlations with disease-associated gene expression, immune cell infiltration, and drug sensitivity. Exploration of the HPA and CCLE databases exposed substantial expression variations across patients and cell lines for these genes. Subsequently, we identified CAFs-associated genes and established a robust prognostic model. The analysis in the TISCH database showed that the genes in this model were highly expressed in CAFs. Conclusions: The results unveil an association between CDH6, EGFLAM, and RASGRF2 and the immunotherapeutic response in gastric cancer. These genes hold potential as predictive biomarkers for gastric cancer immunotherapy resistance and prognostic assessment.


INTRODUCTION
Gastric cancer is a prevalent global malignancy, carrying a significant burden of morbidity and mortality [1].Worldwide, it ranks as the fourth leading cause of cancer-related death and the fifth most prevalent cancer [2].In 2020 alone, there were over a million new cases of gastric cancer, with an estimated 700,000 deaths [2].Notably, East Asia, particularly China, bears a substantial burden, with China accounting for around 44% of new global cases and half of the world's fatalities [2].The prevalent Helicobacter pylori infection can be blamed for the high prevalence of stomach cancer in China [3,4].Gastric cancer elusive etiology and delayed diagnosis contribute to its dismal survival rates [5].Consequently, the imperative lies in the prevention, early detection, treatment efficacy, and prediction of sensitivity for gastric cancer.
Immune checkpoints are pivotal in orchestrating the host's immune response, yet tumors exploit these checkpoints to evade immune eradication and simultaneously subdue immune reactions [6].With the discovery of immune checkpoint pathways and ways to block them, targeted immunotherapy has become a major mutation in the treatment of solid tumors, bringing new hope to patients [7].Since the first immunotherapeutic drug was applied to melanoma in 2011, immunotherapeutic drugs have evolved rapidly and become one of the most commonly used drugs in treating most tumors.However, the variability in response rates to such agents remains substantial.Notably, tumors boasting heightened somatic mutations, like melanoma and non-small cell lung cancer, tend to evince more favorable immunotherapeutic responses [8,9].Furthermore, patients with high PD1 and PD-L1 expression responded better to immunotherapy.Research has shown that PD-L1 is highly expressed in various solid tumors, including esophageal, colorectal, pancreatic, gastric, lung, and breast cancers [10].Meanwhile, gastric cancer patients have a highly complex immune microenvironment as well as a high rate of somatic mutation [11,12].These studies suggest that gastric cancer patients may derive greater benefits from immunotherapy.However, the use of immunotherapy in gastric cancer has fallen short of expectations.It is only approved for use in advanced gastric cancer in some countries [13].Therefore, identifying which gastric cancer patients are sensitive to immunotherapy or which are unsuitable for immunotherapy is crucial, and there is an urgent need for several biomarkers to aid in this identification.
In this investigation, we discerned genes intricately linked to immunotherapeutic response in gastric cancer, employing comprehensive genome-wide gene expression profiles from the TCGA and ICBatlas datasets.By employing diverse machine-learning techniques, we systematically evaluated potential biomarkers that hold promise for gauging the efficacy of immunotherapy in gastric cancer.Additionally, we pinpointed genes associated with CAFs and subsequently crafted a predictive model and a corresponding nomogram to forecast prognosis.Notably, the integrity of our model was reaffirmed through meticulous validation against pertinent data from the GEO repository.

Acquisition of datasets related to gastric cancer
We sourced stomach adenocarcinoma (STAD) expression profiles, mutational data, and clinical records from the TCGA database (https://portal.gdc.cancer.gov/).The transcriptomic information for STAD encompassed a cohort of 448 samples, comprising 412 tumor tissues and 36 normal tissues.GSE15459 expression profiles and complete clinical information were obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/).Extract differentially expressed genes (DEGs) associated with gastric cancer immunotherapy response from the ERP107734 data in the ICBatlas database (http://bioinfo.life.hust.edu.cn/ICBatlas/).

Acquisition of DEGs and functional analysis
DEGs were obtained from the ICBatlas database, where genes with FDR (False Discovery Rate) < 0.05 and |log2FC (Log2 Fold Change)| > 1 were used for subsequent analysis.KEGG and GO analyses were conducted to determine the pathways and functions of DEGs enrichment associated with immunotherapy response, facilitated by the "clusterProfiler" software package.The Search Tool for the Retrieval of Interacting Genes (STRING) database obtained a protein-protein interaction (PPI) network graph with a minimum interaction score of 0.7 (http://string.embl.de/)[14].

Identifying hub genes through machine learning
Firstly, the genes most relevant to the disease among the DEGs were identified in the TCGA transcript data by the "WGCNA" package [15].Subsequently, using P<0.05 as the criterion, perform univariate Cox regression to screen for prognostic genes among the genes obtained from the Weighted gene co-expression network analysis (WGCNA) analysis.We used the "randomForestSRC" package and the "randomSurvivalForest" package to perform a random survival forest algorithm to rank the importance of prognostic genes.We selected genes with significance >0.4 for subsequent analyses.We used the "XGBoost", and "Boruta" packages for the Boruta and Extreme Gradient Boosting (XGBoost) algorithms [16] for prognostic genes were ranked in terms of their importance.By using these three machine learning methods, the hub genes are ultimately determined.

Gene set variation analysis (GSVA) of hub genes
GSVA [17] was performed using the R package "GSVA" to explore the relevance of core genes in the TCGA_STAD cohort to the Hallmark pathway.Relevant gene sets were downloaded from the Molecular Signature Database (MSigDB) (http://software.broadinstitute.org/gsea/msigdb/index.jsp).

Gene set enrichment analysis (GSEA) of hub genes
In this study, differentially expressed genes between the high and low-expression groups were identified based on the expression profiles of STAD patients using the GSEA tool (http://www.broadinstitute.org/gsea)[18].When screening gene sets, the maximum gene set size was set to 500 and the minimum gene set size to 15.After 1000 permutations, a significantly enriched gene set was obtained based on a significance level of P<0.05.

Core gene mutation analysis
Mutational differences in somatic cells between high and low expression of core genes were investigated using TCGA somatic mutation data by the "maftools" package [19].Mutations in core genes were studied through the cBioPortal database (https://www.cbioportal.org).

Correlation analysis between core genes and diseaserelated genes
We searched the GeneCards database for STADassociated disease genes using the keyword "STAD" and used the 20 genes with the highest relevance scores for subsequent analyses (https://www.genecards.org/).We performed Pearson correlation analysis between the expression levels of core genes and disease-related genes, where P<0.05 was considered statistically significant.

Immune infiltration analysis
The transcript data from the TCGA_STAD cohort were analyzed using the cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT) algorithm [20] to calculate the relative content of 22 types of immune infiltrating cells and to analyze the difference between the immune cell content of normal and tumor tissues.We performed Pearson correlation analysis between the expression of core genes and the content of immune cells.The expression of core genes and immune-related genes were compared using Pearson correlation analysis after immune-related genes were also retrieved from the Tumor and Immune System Interaction Database (TISIDB) (http://cis.hku.hk/TISIDB/).P<0.05 was considered statistically significant.

Drug sensitivity analyses
Downloaded the relevant data from the Genomics of Drug Sensitivity in Cancer (GDSC) database (https://www.cancerrxgene.org/), and used the R package "oncoPredict" to predict drug sensitivity for each tumor sample in order to evaluate the predictive ability of core genes on drug treatment response.Default values were selected for all parameters, including "combat" to eliminate batch effects and average duplicate gene expression.The IC50 differences between different expressions of the core genes were then compared using the Wilcoxon test.Finally, we plot the results using the "ggplot" R package.

Validation of core gene expression in databases
Expression data of gastric cancer cell lines were downloaded from the CCLE (http://www.broadinstitute.org/ccle) database to validate the gene expression levels of core genes in gastric cancer cell lines.Immunohistochemistry-related data were downloaded from the HPA (http://www.proteinatlas.org)database to validate the protein expression levels of the core genes in normal and gastric cancer tissues.

Identification of CAFs-related genes
Immunotherapy evaluation-related metrics were calculated using the TIDE (Tumor Immune Dysfunction and Exclusion) database, and the expression of core genes was analyzed for Pearson correlation with these metrics.At the same time, we assessed the tumor mutation burden (TMB) of each patient in the TCGA_ STAD cohort.We performed Pearson correlation analysis between TMB and the expression levels of core genes.
CAFs content was calculated using three methods: the Estimated Proportion of Immune and Cancer Cells (EPIC) algorithm [21], the xCell algorithm [22], the Microenvironmental Cell Population-counter (MCPcounter) algorithm [23].First, in the TCGA transcripts, we counted genes associated with the core genes, and genes with correlation coefficients >0.3 and p<0.05 were included in subsequent analyses.We then performed WGCNA analysis using the "WGCNA" package to obtain the CAFs genes associated with the core genes.Finally, we selected gene significance (GS) > 0.5 as potential CAFs-related genes.

Enrichment analysis
Utilizing the "clusterProfiler" software package, we executed KEGG and GO analyses to unravel the pathways and functions enriched with CAFs-related genes.We also conducted Disease Ontology (DO) enrichment analysis using the R package "DOSE" [24] to identify which diseases are associated with CAFsrelated genes.

Construction and validation of CAFs features
The obtained CAFs-related genes were subjected to univariate Cox regression analysis, and genes with P<0.05 were considered prognostic-related genes.We used the prognostic-related genes obtained from the univariate Cox regression analysis and took the intersection with the genes from the GSE15459 dataset for further analysis.To build a CAFs signature with high accuracy and generalizability, we integrate ten machine learning algorithms, including Cox boost, Stepwise Cox, Lasso, Ridge, Elastic Net (Enet), Survival Support Vector Machines (survival-svm), Generalized Boosted Regression Models (GBMs), Supervised Principal Components (SuperPC), Partial Minimum Cox (plsRcox) and RSF.In constructing this model, we set "ntree" to 1000 and "nodesize" to 5.During the model selection process, we calculate each model's consistency index (C-index) and filter based on a combination of the number of genes in the model and the C-index.Models with higher C-index and fewer genes were considered the optimal models.The Kaplan-Meier method was employed to generate survival curves.At the same time, the log-rank test was utilized to assess the disparity in survival rates between the high-and low-risk cohorts.The models' accuracy was evaluated using Receiver Operating Characteristic (ROC) curves.The model underwent univariate and multivariate Cox regression analyses to evaluate its potential as an independent prognostic marker for STAD patients.All results were validated in the GSE15459 dataset.A nomogram with a calibration plot was constructed using the "rms" package to predict the consistency between actual and predicted survival.A multivariable ROC curve was constructed to compare with other factors to validate the model and nomogram optimality.Evaluation of the clinical usefulness of model and nomogram by decision curves was done.

Single-cell sequencing analysis
The genes in the CAFs risk scores and models were subjected to Pearson correlation analysis with CAFs marker genes in the previous literature to see whether the genes identified by machine learning and the constructed models were associated with CAFs marker genes reported by previous authors.We analyzed single-cell RNA-sequencing (scRNA-seq) data from STAD tissues (GSE167297) based on the TISCH database [25], which was used to identify whether genes in the model were highly expressed in CAFs.

Statistical analysis
R software (version 4.2.1) was used for all statistical analyses.Wilcoxon tests were used for group comparisons.Prognostic-related genes were screened using univariate Cox regression analysis.The logrank test and Kaplan-Meier analysis were employed to compare overall survival.All correlation tests were conducted using the Pearson correlation analysis method in the study.P<0.05 was considered statistically significant.

Screening for immunotherapy efficacy-related DEGs and functional enrichment of DEGs
We filtered out 1136 DEGs between gastric cancer samples with response and without response to immune therapy from the ICBatlas database using the criteria of FDR<0.05 and |log2FC|>1 (Supplementary Table 1).The number of up-regulated genes was 201, and the number of down-regulated genes was 935 in responders compared to non-responders.Supplementary Figure 1A represents the volcano plot, while Supplementary Figure 1B shows the heatmap of DEGs.To further investigate the functions and pathways of DEGs, we performed KEGG and GO enrichment analyses.The KEGG pathway enrichment results showed that DEGs were mainly enriched in cancer, PI3K-Akt, gastric cancer, and MAPK pathways (Supplementary Figure 1C).GO enrichment analysis showed that DEGs were mainly involved in functions such as extracellular matrix composition, basement membrane, and receptorligand activity (Supplementary Figure 1D).PPI network analysis revealed that most DEGs were tightly and intricately linked (Supplementary Figure 1E).

Machine learning of DEGs and identification of three hub genes, CDH6, EGFLAM, and RASGRF2
To further understand which genes in DEGs affect immunotherapy, we selected the TCGA_STAD cohort for WGCNA analysis and screened the genes most relevant to STAD.The green module in the WGCNA analysis was most relevant to STAD and was selected for our subsequent analysis, which included 62 genes (Figure 1A-1C).Subsequently, using a significance level of P<0.05, we screened out 38 genes that are significantly associated with prognosis through univariate Cox analysis (Figure 1D).These 38 genes were screened by Random Forest Analysis, Boruta, and XGBoost to identify the final genes (Figures 1E,  1F, 2A-2C).Finally, the intersection of these three machine-learning screened genes was taken, and three genes met our criteria, namely CDH6, EGFLAM, and RASGRF2 (Figure 2D).The Kaplan-Meier survival  analysis demonstrated a significant correlation between elevated overall survival and reduced expression of EGFLAM, RASGRF2, and CDH6 (P<0.001,P=0.008, and P<0.001) in contrast to high expression levels (Figure 2E-2G).Furthermore, these three genes exhibited heightened expression levels among non-responsive immunotherapy patients (Supplementary Table 1).These findings imply that gastric cancer patients with elevated expression of these three genes might experience limited or potentially no advantages from immunotherapy.

Exploration of specific signaling pathways associated with the hub genes EGFLAM, RASGRF2 and CDH6
Subsequently, we performed an in-depth analysis of specific signaling pathways involving these three core genes and investigated the potential impact of the candidate genes on pathways related to disease progression.The GSVA analysis revealed that the overexpression of CDH6 is mainly enriched in pathways such as oxidative phosphorylation, P53 signaling pathway, DNA repair, G2M checkpoint, and UV-response-up.Low-expressed CDH6 is mainly enriched in apoptosis, angiogenesis, UV-responsedown, TGFβ signaling pathway, IL2-STAT5 signaling pathway, and other pathways.The overexpression of EGFLAM is predominantly enriched in signaling pathways, including the P53 signaling pathway, glycolysis, oxidative phosphorylation, downstream effects of the KRAS signaling pathway, and interferonalpha response.The low expression of EGFLAM was mainly enriched in the IL6_JAK_STAT3 signaling pathway, apoptosis, PI3K_AKT_MTOR signaling pathway, fatty acid metabolism, TGFβ signaling pathway, and other signaling pathways.The upregulated RASGRF2 is primarily enriched in the MTORC1 signaling pathway, glycolysis, cholesterol homeostasis, MYC target genes, and E2F target genes.Low expression of RASGRF2 was mainly enriched in the PI3K_AKT_MTOR signaling pathway, adipogenesis, apoptosis, TNF-α signaling via NFKB, and protein secretion (Figure 3A-3C).In addition, we performed a GSEA enrichment analysis of these three genes.The results indicate that the elevated expression of CDH6 is enriched in cancer, MAPK signaling pathway, and other pathways.The downregulated expression of CDH6 is enriched in pathways such as glyoxylate and dicarboxylate metabolism, proteasome, and ribosome.The overexpression of EGFLAM is enriched in pathways including focal adhesion, gap junction, cancer, VEGF signaling pathway, and others.Low expression of EGFLAM is enriched in ribosomes.The elevated expression of RASGRF2 is enriched in pathways such as the MAPK signaling pathway, cancer, Wnt signaling pathway, and others.RASGRF2 low expression was enriched in oxidative phosphorylation, proteasome, and ribosome (Figure 3D-3F).

Analysis of mutations in the three core genes and their relationships with disease-related genes
Searched and selected the top 20 genes most relevant to STAD from the GeneCards database.Comparison between the STAD group and the normal group revealed significant differences in the expression of genes such as CDH1, BRCA2, BRCA1, TP53, APC, and ATM (Supplementary Figure 2A).Through Pearson correlation analysis, we found significant correlations between the three central genes, CDH6, EGFLAM, and RASGRF2, and the expression of several diseaserelated genes in STAD.As presented in Supplementary Figure 2B, CDH6 was positively correlated with the expression of genes such as APC, ATM, and PIK3CA; the high expression of EGFLAM was correlated with the high expression of genes such as APC, ATM, and PTEN; and RASGRF2 was significantly positively correlated with the expression of genes such as APC, ATM, and MLH1.With the help of the STAD cohort from the cBioportal database, we investigated the mutation status of these three core genes.It was found that all three core genes were mutated to varying degrees, with RASGRF2 at 4%, EGFLAM at 11%, and CDH6 at 8% (Supplementary Figure 3A).Furthermore, we analyzed the mutation rates between the high and low expression groups of these three core genes.Their results showed that the low expression group of the RASGRF2 gene had a higher mutation rate than the high expression group; the mutation rate of the EGFLAM gene was similar between the high and low expression groups; and the low expression group of the CDH6 gene had a higher mutation rate than the high expression group (Supplementary Figure 3B-3D).

Study of the clinical predictive potential of three core genes based on multi-omics research
The tumor microenvironment is irreplaceable in tumor development, patient survival, and treatment sensitivity.The tumor microenvironment mainly includes tumor cells, immune cells, and mesenchymal stromal cells.By investigating the relationship between the expression of hub genes and tumor immune infiltration, we can gain deeper insights into how these hub genes influence the development of STAD, uncovering potential molecular mechanisms.Supplementary Figure 4A demonstrates the Pearson correlation between immune cells in STAD patients.STAD patients had higher levels of B cells naive, T cells CD4 memory activated, T cells regulatory (Tregs), Macrophages M0, and Macrophages M1 than normal patients (Supplementary Figure 4B).CDH6 was positively correlated with B cells naive, Mast cells resting, T cells CD4 memory resting, etc., and negatively correlated with T cells CD8, T cells CD4 memory activated, and T cells follicular helper; EGFLAM was positively correlated with Macrophages M0, Mast cells activated, Neutrophils, etc., and negatively correlated with NK cells activated, T cells CD8, T cells regulatory (Tregs), etc.; RASGRF2 was positively correlated with B cells naive, Mast cells resting, Macrophages M2 were positively correlated, and negatively correlated with NK cells activated, T cells regulatory (Tregs), Plasma cells, etc. (Supplementary Figure 4C).In addition, we obtained chemokines-related, immunostimulatoryrelated, immunoinhibitor-related, MHC-related, and receptor-related genes from the TISIDB database and performed a Pearson correlation analysis.The results demonstrate that CDH6, EGFLAM, and RASGRF2 exhibit varying correlations with these immune-related genes (Supplementary Figure 5).
In order to understand the sensitivity of these three pivotal genes to chemotherapeutic drugs, we conducted drug sensitivity studies based on the GDSC database using the "oncoPredict" package.The findings reveal that the expression of CDH6, EGFLAM, and RASGRF2 is correlated with the sensitivity to drugs like Buparlisib, Cediranib, Dasatinib, Dinaciclib, Erlotinib, Niraparib (Supplementary Figure 6).

Creating nomogram and crafting calibration curves for STAD patient outcome prediction
By incorporating the expression levels of CDH6, EGFLAM, and RASGRF2, along with age, gender, stage, and grade, we effectively developed a prognostic nomogram for anticipating the overall survival of STAD patients.Based on logistic regression analyses, we observed that the three key genes, CDH6, EGFLAM, and RASGRF2, contributed differently at different STAD scoring stages.The higher the overall scores for the three key genes and clinical features, the worse the 1-, 3-, and 5-year overall survival of the patients (Figure 4A).Next, we constructed a calibration curve.The calibration curve showed significant agreement between the predicted and observed OS (Figure 4B).

Validation of protein expression and cellular expression levels of CDH6, EGFLAM and RASGRF2 genes
We pooled the IHC maps of CDH6, EGFLAM, and RASGRF2 from the HPA database to validate the protein expression levels of these three pivotal genes in clinical specimens.In normal tissues, the expression of CDH6 shows various levels, ranging from low to moderate.However, in gastric cancer patients, the expression of CDH6 also demonstrates diverse degrees, ranging from undetectable low to moderate levels (Supplementary Figure 7A).EGFLAM showed high levels of expression in normal tissues and diverse levels in gastric cancer patients, ranging from undetectable low to moderate to high levels (Supplementary Figure 7C).RASGRF2 showed high levels of expression in normal tissues and diverse levels in gastric cancer patients, ranging from moderate to high (Supplementary Figure 7E).In clinical samples, the variation in protein expression levels of hub genes may account for differences in immune therapy sensitivity among gastric cancer patients.Furthermore, we also analyzed the gene expression levels of hub genes in gastric cancer cell lines using the CCLE database.Noticeable variations were observed in the expression levels of identical hub genes within gastric cancer cell lines.(Supplementary Figure 7B, 7D, 7F).

Acquisition and functional analysis of CAFs-related genes
A negative correlation between hub genes and TMB was found by calculating TMB and performing Pearson correlation analysis, consistent with poor immunotherapy sensitivity in patients with high hub gene expression.In addition, the TIDE database and Pearson correlation analysis revealed that hub genes were positively correlated with CAFs, Dysfunction, and Exclusion, especially CAFs, with which all three hub genes had significant positive correlations (Figure 5A).Therefore, CAFs may be associated with a low benefit of immunotherapy with high expression of the hub genes.Subsequently, based on the TCGA_STAD cohort, the EPIC, xCell, and MCP-counter algorithms were employed to estimate the content of CAFs for each patient, and all samples were divided into high CAFs and low CAFs groups using the optimal cutoff value of CAFs content.In the TCGA_STAD cohort, those with low CAFs content had better overall survival (P=0.006,P=0.006, and P=0.004) (Figure 5B).In order to obtain the CAFs genes related to hub genes, we performed correlation analysis based on the TCGA_STAD cohort to obtain a total of 4020 genes related to hub genes.For these 4020 genes, we performed WGCNA analysis, in which the black module had the strongest positive correlation with the Fibroblasts_MCPcounter score (Cor = 0.8, P = 4e-89), Fibroblasts_XCELL score (Cor = 0.83, P = 3e-102) and CAFs_EPIC score (Cor = 0.62, P = 8e-43) had the strongest positive correlations (Figure 5C-5F).Finally, we obtained 262 genes in the black module as potential CAFs-associated genes associated with hub genes, using GS > 0.5 as the threshold (Supplementary Table 2).
In order to explore the functions and pathways of these 262 genes and which diseases they are associated with, we performed KEGG, GO, and DO analyses.KEGG enrichment analysis showed that CAFs-related genes were mainly enriched in the PI3K-Akt signaling pathway, cancer, cAMP signaling pathway, ECMreceptor interaction, Wnt signaling pathway, and other pathways (Supplementary Figure 8A).Based on the GO enrichment analysis outcomes, CAFs-associated genes predominantly participated in the regulation of cellular response to growth factor stimulation, extracellular matrix composition, basement membrane, and fibroblast activation (Supplementary Figure 8B).DO results suggest these genes are associated with bone cancer, fibrosarcoma, and connective tissue cancer (Supplementary Figure 8C).

Model construction driven by CAFs properties
In the TCGA_STAD cohort, a univariate Cox regression analysis of these 262 genes yielded a total of 218 genes that were associated with prognosis.After taking the intersection with the GSE15459 dataset, 195 genes were obtained for model construction (Supplementary Table 3).Subsequently, these 195 genes were incorporated into combinations of 10 machine algorithms to generate a series of models, and the concordance index (C-index) was calculated for each model.Based on the C-index and model gene count criteria, we ultimately selected the algorithmic model composed of RSF and GBM, which exhibited a higher validation set C-index and fewer genes.This model encompasses 27 genes (Figure 6A).
Patients within each cohort were divided into high and low-risk categories, classified according to the median risk scores derived from the model.In both TCGA_STAD and GSE15459 cohorts, the low-risk group exhibited significantly improved overall survival (P < 0.001 and P = 0.02, respectively) (Figure 6B, 6D).In both the TCGA_STAD and GSE15459 cohorts, the risk score was included in univariate and multivariate Cox regression analyses alongside clinical characteristics.The results indicated that this risk score is an independent prognostic indicator for STAD patients (Figure 7A, 7B).Furthermore, for 1-year, 3-year, and 5-year overall survival rates, ROC curve results indicate that our model possesses excellent predictive performance (Figure 6C,  6E).To develop a tool that could predict the overall survival of STAD patients in the TCGA_STAD cohort, we created a nomogram incorporating clinical features (Figure 7C).Subsequently, we plotted a calibration curve, which showed a high degree of agreement between actual and expected survival (Figure 7D).The results of the multivariate ROC curves for 1-, 3-, and 5year OS showed clear superiority of our model and nomogram over other clinical features (Figure 7E, 7G,  7I).DCA analyses at 1, 3, and 5 years showed that using a model or nomogram was more favorable than using other clinical characteristics to predict patient prognosis (Figure 7F, 7H, 7J).

A Pearson correlation analysis was conducted between
CAFs risk scores, model gene expression levels, and previously reported CAFs marker genes from existing literature.The examination unveiled a notable positive correlation linking CAFs risk scores, model genes, and AGING the previously documented CAFs marker genes (Figure 8A).To investigate whether model genes are expressed in CAFs, we performed a single-cell analysis based on the TISCH database.We identified scRNA-seq to 9 cell types: B, CD8T, DC, endothelial, epithelial, fibroblasts, Mast, Mono/Macro, and plasma (Figure 8C).Model genes are 25 in this scRNA-seq.The differential analysis showed that most model genes were highly expressed in fibroblasts, especially GNG11, FBLN1, MFGE8, C11orf96, and VIM expression was significantly higher (Figure 8D-8G).Furthermore, the single-cell GSEA-KEGG analysis corroborated the outcomes of the KEGG analysis for CAFs-related genes, illustrating a notable enrichment of fibroblasts within the extracellular matrix-receptor interaction pathway (Figure 8B).These results suggest that the genes in the CAFs-associated models screened and constructed based on machine learning may be CAFsspecific markers.

DISCUSSION AND CONCLUSIONS
Gastric cancer is a global tumor with a high clinical burden [26].It possesses high heterogeneity, and its survival rate is low [27,28].In recent years, immunotherapy has developed rapidly and brought breakthroughs in treating gastric cancer.However, the efficacy of immunotherapy in gastric cancer is not apparent.Therefore, an in-depth understanding of the mechanisms of immunotherapy sensitivity is essential to enhance the efficacy of gastric cancer immunotherapy.We comprehensively explored the biomarkers and underlying mechanisms of immunotherapy resistance in gastric cancer patients through bioinformatics analyses.
This study obtained DEGs between immunotherapy responders and non-responders among gastric cancer patients from the ICBatlas database.DEGs include 201 up-regulated and 935 down-regulated genes.The KEGG and GO enrichment analyses revealed significant enrichment of DEGs in multiple tumor-related biological processes, while the PPI results demonstrated intricate interactions among the DEGs.A series of analyses, including WGCNA, univariate Cox, random forest, XGBoost, and Boruta, were performed on the DEGs from the TCGA_STAD cohort, revealing that three genes, namely CDH6, EGFLAM, and RASGRF2, met the criteria.Finally, these three genes were significantly associated with survival by Kaplan-Meier survival analysis and were identified as hub genes for this study.The results of GSVA and GSEA analyses of the three hub genes, CDH6, EGFLAM, and RASGRF2, indicated that their differential expression levels may affect multiple signaling pathways associated with disease progression, including oxidative phosphorylation, DNA repair, apoptosis, angiogenesis, TGFβ signaling, and IL2-STAT5 signaling pathways that may be affected by CDH6.Furthermore, these three hub genes show significant correlations with STAD-related genes, including APC, ATM, MLH1, PIK3CA, PTEN, and others.Gene mutation analyses revealed significant mutation profiles in hub genes, as well as different somatic mutation rates between hub gene expression differences, factors that may potentially influence the efficacy of immunotherapy.Multi-omics studies have revealed significant correlations between CDH6, EGFLAM, and RASGRF2 with tumor immunity, multiple immune-related genes, and chemotherapeutic drug sensitivity.By integrating the expression data of CDH6, EGFLAM, and RASGRF2, as well as clinical features, a nomogram for predicting the prognosis of STAD patients was established.The calibration curves exhibit the remarkable predictive efficacy of the nomogram in prognosticating the outcomes of STAD patients.In addition, we confirmed the expression of these three genes in clinical samples and cell lines with the help of the HPA and CCLE databases.The findings suggest significant variability in the expression levels of these genes in gastric cancer patients and cell lines, which may reflect differences in intrinsic immunotherapeutic susceptibility between patients.
Ultimately, we investigated the connections between the hub gene and additional metrics for assessing immunotherapy, and the study's findings indicated notable correlations between the hub gene and these other assessment metrics.At the same time, we found a significant relationship between the hub gene and CAFs.CAFs are considered to play a crucial role in the interaction between the tumor microenvironment and cancer cells, driving tumor progression [29].This implies that hub genes may potentially induce immunotherapy resistance by impacting CAFs.Furthermore, we found that high CAFs scores were associated with poor overall survival in STAD patients, consistent with previous findings.In contrast to traditional DEGs analysis for selecting CAFs markers [30], we employed various bioinformatics algorithms to assess the abundance of CAFs in each STAD sample, ensuring the robustness of the constructed WGCNA network.The KEGG, GO, and DO analyses performed on CAFs-related genes revealed their close association with the occurrence and progression of tumors.Similarly, to ensure the reliability of predictive features, we employed a combination of 10 different machine-learning algorithms for construction and validation.Additionally, we compared the model and its constituent genes with previously identified CAFs markers, revealing their close association.Also, at the single-cell level, we verified that the genes included in the model have high expression in CAFs.The above findings imply a strong association between our selected genes and CAFs, and the model we constructed can accurately predict patient prognosis.CDH6, or CAD6 or KCAD, is a cadherin (CDH) family member.Recent research has indicated that CDH molecules play a significant role in tumor initiation, growth, and progression, potentially serving as diagnostic markers, prognostic indicators, and potential therapeutic targets for cancer patients [31,32].As one of the family members, the CDH6 protein has five extracellular structural domains and one cytoplasmic structural domain, a particular structure that makes it unique from other family members in terms of its interaction with connexin molecules [33].Elevated CDH6 expression is closely linked to unfavorable prognostic outcomes across multiple malignant tumors.In papillary thyroid carcinoma, aberrant up-regulation of CDH6 may promote epithelial-mesenchymal transformation and cancer cell metastasis by regulating autophagic processes [34,35].CDH6 is highly expressed in renal cancer and is associated with lymph node invasion and metastasis [36].Furthermore, CDH6 shows a high expression level in osteosarcoma, which is closely associated with patient prognosis [37].In addition, heightened CDH6 expression in gastric cancer is connected to tumor advancement and unfavorable prognostic implications [38].Finally, CDH6 is also overexpressed in tumors such as snuff carcinoma, ovarian carcinoma, and oral squamous carcinoma and may be involved in the prognosis of patients [39][40][41].To the best of our knowledge, studies on CDH6 have been limited to its effects on tumors, and no studies have examined the role of CDH6 in immunotherapy resistance in gastric cancer.RASGRF2, functioning as a guanylate exchange factor for RasGTPase, participates in diverse cellular processes and contributes to tumor progression, migration, and invasion.Recent genome-wide association studies have unveiled a link between RASGRF2 and the susceptibility to malignant mesothelioma (MM) [42].High expression of RASGRF2 inhibits tumor migration invasion in colorectal cancer [43].However, high RASGRF2 expression in lung adenocarcinoma is associated with tumor invasion and poor prognosis [44].However, indepth studies and research on the function of RASGRF2 in gastric cancer and its importance in immunotherapy resistance are still lacking.The role of EGFLAM, EGF-like, type III fibronectin, and laminin G structural domains in tumors must be studied more.In recent studies, it has been found that EGFLAM exhibits significant hypomethylation in ovarian cancer [45], while in glioblastoma, EGFLAM is associated with tumor cell migration, invasion, and adverse prognosis [46].
However, our study has some limitations and shortcomings.First, given the high heterogeneity of gastric cancer, while the number of samples we retrieved from the TCGA and GEO databases was limited, this would lead to possible bias in the results.Second, these results have not been confirmed by in vivo and in vitro experiments.Despite these shortcomings, the preliminary study provides valuable and constructive basic information.In the subsequent study, we will delve into the role of CDH6, EGFLAM, and RASGRF2 as hub genes in gastric cancer immunotherapy resistance through a series of experiments.We will also investigate the effects of CDH6, EGFLAM, and RASGRF2 on gastric cancer proliferation, migration, and invasion in both in vivo and in vitro contexts.In summary, we identified genes associated with immunotherapy resistance in gastric cancer based on a machine learning approach, which provides a new biological marker for determining the sensitivity of gastric cancer patients to immunotherapy.In addition, CAFs scores constructed using CAFs-related genes based on these resistance genes can effectively predict the prognosis of patients and facilitate clinical decisionmaking.

Figure 1 .
Figure 1.Selection and analysis of differentially expressed genes in the TCGA_STAD cohort.(A) Scale independence and mean connectivity plot generated using WGCNA.(B) Gene dendrogram and nodule color from WGCNA analysis.(C) Merged module correlation coefficients from WGCNA analysis.(D) Results of the univariate Cox regression analysis (P<0.05).(E) Random forest analysis of prognostic genes.(F) Ranking plot of selected features with random forest importance scores> 0.4.

Figure 4 .
Figure 4. Construction of a nomogram for prognostic prediction of STAD patients.(A) Nomogram illustrating the 1-year, 3-year, and 5-year overall survival prediction for patients with STAD.(B) Calibration curves of a nomogram predicting 1-year, 3-year, and 5-year overall survival.

Figure 5 .
Figure 5. Multiple methods for identifying CAFs-associated genes related to hub genes.(A) Pearson correlation analysis of hub genes with immunotherapy-related indicators.(B) The Kaplan-Meier survival analysis of CAFs scores was calculated using three methods (p=0.006,p=0.006, and p=0.004, respectively).(C) Scale independence and mean connectivity analysis.(D) Cluster dendrogram among modules.(E) Scatterplot of MM and GS from the black module.(F) Module-trait relationships.*p < 0.05, **p < 0.01, ***p < 0.001.

Figure 6 .Figure 7 .
Figure 6.Establishment and validation of CAFs risk model in STAD through a combination of 10 machine learning approaches.(A) A c-index ranking map of 10 machine learning combinatorial models.(B) Kaplan-Meier survival analysis of risk models for CAFs in the TCGA_STAD cohort.(C) Time-dependent ROC curves for the TCGA_STAD cohort CAFs risk model.(D) Kaplan-Meier survival analysis of risk models for CAFs in the GSE15459 cohort.(E) Time-dependent ROC curves for the GSE15459 cohort CAFs risk model.

Figure 8 .
Figure 8. Validation of genes in the CAFs risk model at the single-cell level.(A) Pearson correlation analysis of risk models with reported CAFs genes.(B) GSEA of genes that are upregulated in different cell types.(C) Predominant cellular category in single-cell sequencing.(D-G) Validation of risk model genes for CAFs.