Integrated single-cell and bulk RNA sequencing revealed the molecular characteristics and prognostic roles of neutrophils in pancreatic cancer

Pancreatic cancer, one of the most prevalent tumors of the digestive system, has a dismal prognosis. Cancer of the pancreas is distinguished by an inflammatory tumor microenvironment rich in fibroblasts and different immune cells. Neutrophils are important immune cells that infiltrate the microenvironment of pancreatic cancer tumors. The purpose of this work was to examine the complex mechanism by which neutrophils influence the carcinogenesis and development of pancreatic cancer and to construct a survival prediction model based on neutrophil marker genes. We incorporated the GSE111672 dataset, comprising RNA expression data from 27,000 cells obtained from 3 patients with PC, and conducted single-cell data analysis. Thorough investigation of pancreatic cancer single-cell RNA sequencing data found 350 neutrophil marker genes. Using The Cancer Genome Atlas (TCGA), GSE28735, GSE62452, GSE57495, and GSE85916 datasets to gather pancreatic cancer tissue transcriptome data, and consistent clustering was used to identify two categories for analyzing the influence of neutrophils on pancreatic cancer. Using the Random Forest algorithm and Cox regression analysis, a survival prediction model for pancreatic cancer was developed, the model showed independent performance for survival prognosis, clinic pathological features, immune infiltration, and drug sensitivity. Multivariate Cox analysis findings revealed that the risk scores derived from predictive models is independent prognostic markers for pancreatic patients. In conclusion, based on neutrophil marker genes, this research created a molecular typing and prognostic grading system for pancreatic cancer, this system was very accurate in predicting the prognosis, tumor immune microenvironment status, and pharmacological treatment responsiveness of pancreatic cancer patients.


INTRODUCTION
By the year 2025, pancreatic cancer (PC) is estimated to become the third most significant contributor to cancer-related deaths in Europe, responsible for 4.7% of all cancer-related fatalities in humans [1].Pancreatic ductal adenocarcinoma (PDAC) stands as the prevailing form of pancreatic cancer, carrying a heightened risk for AGING individuals afflicted with diabetes or chronic pancreatitis [2,3].PDAC is a formidable malignancy with a significant mortality rate, with a mere 18% of patients surviving at the 1-year mark across all stages of the disease.It is difficult to detect the early subclinical symptoms of PDAC, such as lack of appetite, stomach discomfort, and back pain.Hence, the majority of patients with PDAC are identified in the intermediate and advanced stages with distant metastases [4][5][6].In addition, PDAC is one of the cancer forms with the highest immunotherapy resistance [7].The majority of individuals with PDAC do not react well to immunotherapy alone, immune checkpoint suppression treatment is unsuccessful for these individuals [8].Although surgery is currently the only curative therapy for PC, even patients who undergo surgery have a high risk of cancer recurrence and may require additional treatment [9].
To improve the prognosis of PC patients, it is necessary to find new biomarkers for clinical diagnosis and prediction of therapeutic success [10].PC recurrence and metastasis are significantly influenced by the immunological microenvironment.Recent investigations on the tumor microenvironment (TME) have underlined the significance of immune cell infiltration in PC development, metastasis, and immune evasion [11][12][13][14].Mlecnik B. et al. earlier constructed an immunoscore-based cancer classification system for colon cancer and demonstrated that immune cell invasion has a greater predictive value than traditional tumor invasion (TNM stage) [15].The implementation of immunological scoring systems will thus bring unique insights into the clinical diagnosis of PC and the prediction of therapeutic success.
Neutrophils, which are produced from hematopoietic stem cells in bone marrow, are plentiful in the bloodstream and a vital component of the innate immune system.In addition, neutrophils react efficiently to infection and inflammatory damage.Neutrophils represent a significant fraction of immune cells that invade various forms of cancer, including colorectal cancer, breast cancer, melanoma, and renal cell carcinoma [16][17][18].Several tumor-secreting agents may reportedly attract neutrophils into the TME.CXCL1, CXCL2, CXCL5, and CXCL8 interact to the neutrophil surface receptors CXCR1 and CXCR2 to induce chemotaxis [19][20][21].In addition, CD4+ T cells in the TME and the cytokine IL17 released by T cells contribute to the recruitment of neutrophils [22].Neutrophils recruited into the TME interact with other TME components, which may restrict or promote tumor development.Consequently, neutrophils may be separated into two polarization states: N1 neutrophils, which limit tumor growth, and N2 neutrophils, which promote tumor development.The IFN-signaling pathway stimulates the production of IL12, ICAM1, and tumor necrosis factor (TNF), all of which are deadly to tumor cells and suppress tumor development.Transforming growth factor (TGF) promotes carcinogenesis by secreting arginase, MMP9, VEGF, and other chemokines to stimulate tumor angiogenesis [23][24][25][26][27].
PC's aggressive traits are linked to its high fibroblast content and inflammatory TME including numerous immune cells, including neutrophils [28].Experiments with animal models revealed that neutrophils interact with pancreatic stellate cells, tumor-associated macrophages, and extracellular matrix in the TME to significantly accelerate tumor growth [29][30][31].Yet, neither the underlying processes nor the therapeutic uses of neutrophils have been explained.In order to offer a scientific foundation for clinical decisionmaking and risk management in patients, the purpose of this work was to examine the molecular mechanism of neutrophils in PC development and to identify neutrophil-related genes (NRGs) to design survival predictive models.

Data acquisition and preprocessing
Single-cell sequencing data of PC were acquired from the Gene Expression Omnibus (GEO) database (https://portal.gdc.cancer.gov/)(registration number: GSE111672) [32].The transcriptome data of PC tissues (data from 178 pancreatic tumor tissues and 4 paracancerous tissues) as well as associated clinical data (data from 185 PC samples) were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/)[33].The GSE28735 [34,35] (data from 45 PC tissues), GSE62452 [36] (data from 69 PC tissues), GSE57495 [37] (data from 63 PC tissues), and GSE85916 datasets (data from 80 PC tissues) were obtained from the GEO database.The "SVA" R package was used to remove batch effects between different datasets [38,39].Data in which the survival time of patients was less than 30 days were excluded and not included in the survival analysis.

Single-cell data analysis and identification of NRGs
For quality control, analysis, and exploration of singlecell RNA expression data, the R software packages "Seurat" and "SingleR" were employed [40,41].To obtain high-quality data on single-cell RNA expression, the following filtering criteria were established: (1) exclusion of genes smaller than those detected in 3 cells, (2) exclusion of cells with fewer than 50 detected genes, and (3) exclusion of cells in which mitochondrial AGING gene expression accounts for more than 5% of total gene expression.Single-cell RNA expression data were normalized using the "Normalize Data" program.The "Find Variable Feature" tool was used to determine the top 1500 genes whose expression changed significantly.The "Run PCA" tool was used to conduct principal component analysis (PCA) on the top 1500 genes, and the P-value of each principal component was obtained.The top 15 main components were chosen and subjected to t-distributed stochastic neighbor embedding (t-SNE) cell clustering analysis.To identify marker genes, differentially expressed genes across distinct cell clusters were examined using the following criteria: adjusted P-value < 0.05 and |log2 (fold-change)| > 1. Annotation of cell clusters was conducted using reference data from Human Primary Cell Atlas for reference-based annotation [42].

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis
The R packages "org.Hs.eg.db", "clusterProfiler", "enrichplot", and "ggplot2" were used to conduct GO and KEGG function and pathway enrichment studies of differentially expressed NRGs with adjusted P-value < 0.05 as the filter criterion [43].Biological processes, cellular components, and molecular functions were the GO words.

Consensus cluster analysis of NRGs
Using the "ConsensusClusterPlus" R software, data from the TCGA, GSE28735, GSE62452, GSE57495, and GSE85916 datasets were pooled to cluster all patients based on the expression of NRGs [44].Based on the consensus matrix and cumulative distribution function (CDF) curves, the optimal number of clusters was determined [45].Using the Kaplan-Meier technique and the log-rank test, the survival time between various groups was compared.Typing consistency was shown using a PCA with a strong capacity for dimension reduction [46].

Differential expression analysis and typing analysis
Using the criterion |log2 (FC)| > 1 and adjusted P-value < 0.05, the "limma" R program was used to find differentially expressed genes across NRGcluster [47].Differentially expressed genes were analyzed for GO and KEGG enrichment using the "org.Hs.eg.db", "clusterProfiler", "enrichplot", and "ggplot2" R packages [43].To assess further the influence of differently expressed genes on PC prognosis, the "ConsensusClusterPlus" R program was used to cluster all patients according to differentially expressed genes [44].Using the Kaplan-Meier technique and the log-rank test, the survival time between various kinds was compared [48].

Construction and validation of the prognostic risk score model
To further investigate the predictive usefulness of NRGs in PC and quantify the NRG score in each patient, univariate Cox regression analysis was used to screen NRGs linked with patient outcomes.Random survival forests-variable hunting (RSFVH) was used to screen and filter genes [49,50].A prognostic risk score model for PC was developed using a multivariable Cox regression analysis.The P-value of the findings of the survival analysis was utilized to test for the optimal gene combination or final prognostic risk score model.Data from TCGA datasets used as training sets, whilst GEO datasets GSE28735, GSE62454, GSE57495 and GSE85916 served as validation sets.Based on gene expression profiles and coefficients of Cox analysis, the risk score for each sample may be computed.Each sample was classified as either high-risk or low-risk based on a comparison to the training set's median risk score.

Correlation and independent prognostic analysis of clinical pathological features
Clinical data from the TCGA datasets and risk scores were combined and classified according to their clinical pathological characteristics.The Wilcoxon signed-rank test was used to assess risk score differences across groups.In addition, univariate and multivariate Cox regression model studies were conducted to determine if risk scores were independent of the clinic pathological characteristics of PC as predictive risk factors.

Gene set variation analysis (GSVA) and gene set expression analysis (GSEA)
GSVA is a non-parametric, unsupervised technique that calculates the enrichment percentage of a certain gene set in each sample without requiring previous categorization of data [51].In expression datasets, GSVA is often used to quantify changes in metabolic pathway and bioprocess activity.GSEA is a computer tool that determines if a certain gene set differs substantially between two biological states.In expression datasets, GSEA is often used to evaluate changes in pathway and bioprocess activity [52].GSVA was done using the "GSVA" R package to compare the biological processes and metabolic pathways across patients with various subtypes and different risk categories [51,53].The "limma", "org.Hs.eg.db", "clusterProfiler", and "enrichplot" R packages were used to run GSEA [43,47,52].

Immunoassays
To investigate the variations in immune cell infiltration across patients belonging to distinct subtypes or risk groups, the "GSVA" and "GSEABase" R packages were used to determine the immune cell invasion score for each sample using ssGSEA [53,54].Based on the results of ssGSEA analysis, the level of immune cell invasion was compared between patients with PC belonging to different subtypes or risk groups.Additionally, the proportions of 22 immune cell subtypes, including seven T cell subtypes, naïve and memory B cells, plasma cells, natural killer (NK) cells, and bone marrow subsets, were calculated for each sample using the CIBERSORT algorithm [55].P-value < 0.05 suggested that CIBERSORT accurately estimated the fraction of 22 immune cell isotypes.These samples were utilized to analyze the link between risk ratings and immune cell infiltration in further detail.

Drug sensitivity analysis
"OncoPredict" is an R package for predicting drug response in vivo or in patients with cancer from cell line screening data [56].To examine the responsiveness of PC patients to pharmacological treatment, the "OncoPredict" R program was used to evaluate variations in drug sensitivity across patients with distinct subtypes or risk groups [56].

Gene expression analysis
The Gene Expression Profiling Interactive Analysis (GEPIA) database is an online resource for analyzing genetic differences, relationships, and survival using TCGA and Genotype-Tissue Expression (GTEx) data [57].At the RNA level, the differential expression of the model genes LDHA, IL1R2, and TM4SF1 between tumor and no tumor tissues was assessed.The Human Protein Atlas (HPA) database (https://www.proteinatlas.org/), a public database designed to create maps of protein expression patterns in healthy cells, tissues, and cancer [58][59][60], was used to analyze the differential expression of LDHA, IL1R2, and TM4SF1 between tumor and non-tumor tissues at the protein levels.

Data analysis
Using R software, data analysis and visualization were conducted (Ver 4.1.2).For regularly distributed parameters, the t-test was used to compare the means of two groups.We compared the means of two groups for non-normally distributed parameters using Wilcoxon rank sum tests.Using either the Spearman correlation analysis or the Pearson test, correlation was assessed.
Using the Kaplan-Meier technique, the survival curve of several groups was produced.Using the log-rank test, the survival rates of several groups were compared.At P-value < 0.05, differences were deemed significant.

Availability of data and materials
The datasets analyzed in this work may be found in the Supplementary Materials or contact with the first author.

Identification of neutrophil marker gene expression profiles
The GSE111672 dataset was obtained, which contains the RNA expression data of 27,000 cells from three individuals with PC.The range of genes discovered in each sample, the sequencing depth, and the proportion of mitochondrial content are shown in Figure 1A-1C, respectively.Single cells with a total number of identified genes more than 50, mitochondrial gene expression of less than 5%, and genes found in less than three cells were chosen (Figure 1A-1C).As seen in Figure 1D, 1E, sequencing depth was favorably connected with gene number and mitochondrial content, with correlation values of 0.92 and 0.2, respectively.After this, ten of the original 1500 highly variable genes between cells were labeled (Figure 1F).PCA was used to illustrate the top 15 principal components with P-value < 0.05 for the leading 1500 genes (Figure 1G, 1H).On the basis of the PCA findings, t-SNE analysis was done to further categorize all cells into 15 groups (Figure 1I).Each cell cluster was annotated using Human Primary Cell Atlas reference data.In clusters 2 and 5, neutrophils were found (Figure 1J).350 NRGs were identified in total (Supplementary Table 1).GO functional enrichment and KEGG pathway enrichment studies were then used to study the biological function of these marker genes.Neutrophil marker genes were mostly engaged in biological processes such as leukocyte mobility, leukocyte chemotaxis, ribosomes, and plaque formation, as shown by GO functional enrichment analysis (Figure 2A).KEGG pathway enrichment analysis showed that these genes were largely associated with COVID-19, ribosomes, leukocyte trans endothelial migration, IL17 signaling pathway, TNF signaling route, and apoptosis (Figure 2B).

Classification and characteristics based on the expression characteristics of neutrophil marker genes
Using PCA, the TCGA, GSE28735, GSE62452, GSE57495 and GSE85916 datasets were combined to AGING correct the data.After adjustment, the batch effect was effectively eliminated (Figure 3A, 3B).According to the expression of NRGs, the samples from GSE28735, GSE57495, GSE62452, GSE8516 and TCGA data sets were identified as two immune subtypes, NRGcluster A and NRGcluster B, which respectively represent high and low expression of NRGs.The k value should be adjusted to 2 for best performance, as evidenced by the area under the CDF curve and its relative change (Figure 3C-3E).In Figure 3C, when the slope of the curve in the CDF diagram is the most gentle, the clustering analysis results are the most reliable, and the corresponding k value is the best k value; Figure 3D shows the relative change of area under the CDF curve compared to k and k-1, and since there is no k = 1, the first point represents the total area under the CDF curve for k = 2; Figure 3E shows the matrix heat map when k = 2, and the white part rarely has blue infection, indicated that there is less interference between subgroups.To sum up, k = 2 is the optimal k value, and when k is 3 or other values, the interference between subgroups cannot be minimized.The Kaplan-Meir survival analysis indicated that patients with the NRGcluster A subtype had a poorer survival rate than those with the NRGcluster B subtype (p < 0.001) (Figure 3F).The PCA findings indicated a batch effect between NRGcluster A and B, suggesting that both clusters had strong internal consistency (Figure 3G).The survival and mortality status of the two patient groups were shown as heat maps in Figure 3H, along with the expression of neutrophil marker genes such as SRGN, BCL2A1, ALOX5AP, FPR1, BASP1, and CSFR.Individuals with the NRGcluster A subtype demonstrated elevated expression levels of neutrophil marker genes, suggesting that these patients may have aggressive neutrophil infiltration (Figure 3H).Subsequently, the unique biological processes and metabolic pathways of patients with distinct subtypes were investigated.The heat map generated by GSVA indicated substantial variations between the two groups in the activation and inhibition of 30 metabolic pathways.P53 signaling pathway, cell cycle, base excision repair, amino sugar and nucleotide sugar metabolism, glycolysis and gluconeogenesis, sphingolipid metabolism, endocytosis, glycerolphospholipid metabolism, and linoleic acid metabolism were highly abundant in NRGcluster A. In contrast, NRGcluster B was markedly enriched in glycine, serine, and threonine metabolism, neuroactive ligandreceptor interaction, and calcium signaling pathway (Figure 4A).GSEA demonstrated that the neuroactive ligand-receptor interaction, the calcium signaling system, and the adipocytokine signaling pathway were significantly enriched in NRGcluster B patients.The cell cycle and O-glycan production were dramatically enhanced in NRGcluster A patients (Figure 4B).
We also assessed the TME status of the two subtypes using ssGSEA.Immune cell infiltration differed dramatically across subtypes.Neutrophils, central memory CD8+ T cells, Th1 cells, Th17 cells, and eosinophils were the most abundant infiltrating cells in NRGcluster A, while activated CD8+ T cells were the least abundant (Figure 4C).
The Genetics of Drug Sensitivity in Cancer database was used to evaluate the relationship between the two categories and the sensitivity to numerous first-line chemotherapy treatments.The NRGcluster B subtype was more sensitive to 5-fluorouracil, cisplatin, KRAS (G12C) inhibitor-12, gemcitabine, irinotecan, oxaliplatin, and paclitaxel, but less sensitive to selumetinib and trametinib (Figure 5A-5I).

Genotyping based on differentially expressed genes
Subsequently, the genetic alterations and underlying processes of NRG expression patterns were investigated.Using the "limma" R package, the differentially expressed genes of NRGcluster A and B were selected based on the following criteria: |log2 FC| > 1 and adjusted P-value < 0.05.A total of 285 genes with differential expression were found (Supplementary Table 2).The functions of these differentially expressed genes were then identified.The differentially expressed genes were mostly connected with epidermal (B) KEGG enrichment analysis.AGING development, the apical region of the cell, endopeptidase activity, extracellular matrix structural component, etc., as determined by GO functional enrichment analysis (Figure 6A).The majority of differentially expressed genes were involved in pancreatic secretion, protein digesting, and absorptionrelated processes, as shown by KEGG pathway enrichment analysis (Figure 6B).To examine further the influence of differently expressed genes on PC prognosis and tumor immune microenvironment, all patients were grouped using the "Consensus Cluster Plus" R program based on differentially expressed genes.Two immunological isotypes (geneClusters A and B) were distinguished (Figure 6C-6E).Patients with geneCluster A had a considerably shorter survival rate than those with geneCluster B (Figure 6F).The ssGSEA found distinct immune cell infiltration between the two categories.Neutrophils, activated CD4+ T cells, and central memory CD8+ T cells were the most abundant infiltrating cells in geneCluster A, whereas monocytes and activated CD8+ T cells were less abundant (Figure 6G).

Establishment of the PC prognostic risk model based on NRGs
To further study the predictive usefulness of NRGs in PC, a univariate Cox regression analysis was conducted to identify NRGs linked with patient prognosis.Each potential gene is represented by a red dot in the volcano map (Figure 7A).Using the RSFVH method, the 10 most significant genes were analyzed further (Figure 7B).Using an algorithm, survival analysis p-values were analyzed.The optimal gene combination (LDHA, IL1R2, and TM4SF1) was determined (Figure 7C), and a predictive model was created using multivariate Cox regression.The correlation coefficients of the three model genes were shown in Figure 7D, and risk scores of each sample were computed with the help of Cox coefficients and gene expression profiles.The GEO database datasets GSE28735, GSE62452, GSE57495, and GSE85916 were utilized as the validation set.To evaluate the accuracy of the prognostic model, the training and validation sets were split into high-risk and low-risk groups based on the median risk score.Survival study of the TCGA and GEO datasets found that the survival time of high-risk patients was shorter than that of low-risk patients (Figure 7E, 7F).For all three model genes (LDHA, IL1R2, and TM4SF1), the high-expression group had a lower life period than the low-expression group (Figure 7G-7I).The alluvial plot indicated the link between NRGcluster typing, gene Cluster type, risk categorization, and sample survival status (survival or death) and supported the reliability and consistency of the analytical findings (Figure 7J).The majority of genes were elevated in the high-risk group, indicating that high-risk individuals may have aggressive neutrophil infiltration (Figure 7K).

Correlation between risk model and clinical pathological features
After merging clinical data and risk scores from TCGA datasets, the risk scores were compared across clinical groups using the Wilcoxon signed-rank test to investigate the association between the model and the clinical features of the patients.The risk ratings did not vary substantially across ages, genders, or M stages (Figure 8A, 8B and 8E).Due to the limited number of  8G).The difference between N1 and N0 patients was close to statistical significance (p = 0.083), and stage N1 patients had a higher risk score (Figure 8D).Age, pathological grade, and risk score were substantially linked with the prognosis of patients with PC, as evidenced by the forest plot of univariate Cox analysis findings (Figure 8H).Furthermore, the forest plot of multivariate Cox analysis findings revealed that age and risk score are independent prognostic markers for PC patients (Figure 8I).

Molecular pathway and immune infiltration analysis based on the prognostic risk model
To assess the metabolic pathway activity in high-risk and low-risk samples, GSVA and GSEA were conducted.The high-risk group had considerably greater glycan biosynthesis, P53 signaling pathway, and cell cycle activity than the low-risk group.In contrast, the activities of neuroactive ligand-receptor interaction and calcium were increased in the group at low risk (Figure 9A).Cell cycle and olfactory transduction were typically active in the high-risk group, while maturityonset diabetes of the young and neuroactive ligandreceptor interaction were passive (Figure 9B).To further study the link between risk score and TME in PC, ssGSEA was used to analyze immune cell infiltration across groups with high and low risk.The high-risk group had higher infiltration of activated CD4+ T cells, NK cells, and neutrophils, whereas the low-risk group demonstrated increased infiltration of activated CD8+ T cells and activated B cells (Figure 9C).In addition, the CIBERSORT method was used to calculate the infiltration scores of 22 immune cell subtypes in each sample (Supplementary Figure 1) (7 T cell subtypes, naïve and memory B cells, plasma cells, NK cells, bone marrow subsets, etc.).Neutrophils, M0 macrophages, active dendritic cells, and resting dendritic cells associated considerably and positively with the risk score, while CD8+ T cells, naïve B cells, and monocytes linked significantly and negatively with the risk score (Figure 9D-9J).

Drug sensitivity analysis
The R program "oncoPredict" was used to examine the differential medication sensitivity between high-risk and low-risk groups.Low-risk patients were sensitive to 5-fluorouracil, cisplatin, gemcitabine, irinotecan, oxaliplatin, and paclitaxel, whereas high-risk patients were susceptible to selumetinib and trametinib (Figure 10A-10H).These results help influence clinical medication for PC patients.

Gene expression analysis
To evaluate the differential expression levels of model genes LDHA, IL1R2, and TM4SF1 across tumor and non-tumor tissues at the RNA level, scatter plots were generated using the GEPIA database.The expression of model genes was higher in tumor tissues compared to non-tumor tissues (Figure 11A-11C).The HPA database was queried for pictures of immunohistochemical investigation of LDHA, IL1R2, and TM4SF1.Figure 11D, 11E demonstrates that LDHA and IL1R2 were highly elevated in PC tissues, but TM4SF1 was moderately expressed in both PC and non-tumor tissues (Figure 11F).

DISCUSSION
The current outlook for PC, an aggressive malignant tumor, is dismal.PDAC, the most common form of pancreatic cancer, is a highly aggressive and moderately resistant pancreatic exocrine tumor.Previous research has demonstrated that PDAC tumors exhibit intratumor heterogeneity, which contributes to the extensive metastasis observed at the time of diagnosis and poor prognosis and poses a significant challenge to medical professionals attempting to develop effective treatment strategies [61,62].In addition to imaging techniques (such as computed tomography and magnetic resonance imaging), clinics routinely utilize tumor markers (such as CEA and CA199) to diagnose PC.Nonetheless, the sensitivity and specificity of these indicators for diagnosing PC remain inadequate.In addition, these markers are not considerably elevated during the disease's early stages [63].This research focused on TME, which directly contributes to the aggressiveness of PC, in order to discover precise biomarkers and effective targeted treatments and to comprehend the prospective features of PC.Neutrophils, which serve as the first line of defense against pathogen invasion, are a crucial component of the TME.In addition, neutrophils are essential for antitumor immunity.Following carcinogenesis, neutrophils recruited to the TME may facilitate interactions between tumor cells and other stromal cells, therefore influencing the direct or indirect development of the tumor [19,22,64].Neutrophils may also promote the development of primary pancreatic tumors by producing neutrophil extracellular traps, therefore promoting cancer-associated hypercoagulability and accelerating the formation of metastatic lesions [65][66][67].
Prior research has showed that single-cell transcriptome analysis may determine cell type abundance using a large number of transcriptome-specific genes.In cancer research, advances in the creation and availability of single-transcriptome analysis have improved cancer detection and therapy [68][69][70].This work merged several single-cell datasets, identified main cell subtypes (including T cells, neutrophils, monocytes, tissue stem cells, epithelial cells, and endothelial cells), and obtained neutrophil-specific marker genes.The neutrophils were then extensively tested to determine tumor characteristics.On the basis of univariate Cox regression analysis, prognosis-related neutrophil marker genes were evaluated.A prognostic risk model consisting of three important genes (LDHA, IL1R2, and TM4SF1) was developed.The model's independent prediction performance and connection with clinical pathological characteristics, degree of immune infiltration, and medication sensitivity were validated.Our results revealed that neutrophils play a crucial role in PC therapy responsiveness and prognosis.
Clustering facilitated the classification of PC patients from the TCGA, GSE28735, GSE62452, GSE57495, and GSE85916 datasets into NRGcluster A and NRGcluster B. The patient prognosis, characteristics of TME immune cell infiltration, biological behavior, and medication sensitivity differed dramatically between the two neutrophil clusters.The outcomes of GSVA and ssGSEA suggested that NRGcluster A was enriched in the cell cycle and cancer metabolic pathway and that neutrophil infiltration was discernable.Prior research and Kaplan-Meir curves revealed that the prognosis for patients with NRGcluster A type was bad.Subsequently, 285 genes with differential expression between NRGclusters were discovered.Similar to the clustering findings of neutrophil phenotypes, two genomic subgroups were established on the basis of distinctive genes highly linked with cell behavior and immunological activation.Results suggested that neutrophil activation is critical for the growth and prognosis of PC tumors.
Given the variety of NRGs, there is an urgent need to characterize NRG patterns in PC patients.This research thus devised a scoring system for measuring NRGs in PC patients.The survival prediction model included LDHA, IL1R2, and TM4SF1.Using the GSE28735, GSE62452, GSE57495, and GSE85916 datasets, the correctness of the model was validated.LDHA, which is increased in hematological malignancies and solid cancers, encodes the M-subunit of lactate dehydrogenase (LDH), which regulates aerobic glycolysis to fulfill the energy and biosynthetic demands of malignant tumors [71,72].According to reports, LDH overexpression is an independent poor prognostic marker in individuals with PC.Furthermore, in hypoxic circumstances, the upregulated LDH stimulates the production of LDHA in PDAC cells [73,74].LDH synthesis caused by LDHA boosts the aggressiveness and metastasis of PC and enhances resistance to radiation and chemotherapy [75,76].In renal cell carcinoma, LDHA promotes metastasis by stimulating epithelial-mesenchymal transformation, which can be inhibited by LDHA inhibitors [77].IL1R2 is located on chromosome 2q12 and encodes IL1R2, which comprises 398 amino acids.In contrast to IL1R1, IL1R2 mainly acts as an endogenous inhibitor to prevent IL1 from binding to IL1R1 and inhibits IL1 signaling [78].In renal cell carcinoma, IL1R2 promotes tumor progression through the JAK2/STAT3 pathway.The overexpression of IL1R2 reverses G1 phase inhibition and cell cycle arrest, promoting cancer cell proliferation, migration, and invasion [79].Meanwhile, in colorectal cancer, IL1R2 is involved in activating IL6, VEGFA, and MEK/ERK, promoting cell proliferation and metastasis, promoting tumor angiogenesis, and enhancing tumor drug resistance [80].IL1R2 is also reported to be involved in the progression of osteosarcoma, non-small cell lung cancer, liver cancer, and lymphoma [81][82][83][84].TM4SF1, which is located on human chromosome 3, encodes a specific quadruple transmembrane low molecular weight glycoprotein.Previous studies have demonstrated that TM4SF1 promotes cell proliferation and survival through JAK2/STAT3 signaling and PI3K/AKT/mTOR-related signaling pathways.Additionally, TM4SF1 can promote cell migration, invasion, and stemness maintenance through the Wnt/β-catenin/c-Myc/SOX2 axis in various cancer, such as colorectal cancer [85][86][87].One study reported that TM4SF1 was upregulated in glioma tumor tissues and cell lines and that TM4SF1 expression was inversely correlated with patient survival [88].Additionally, the correlation between three key genes and the tumor immune microenvironment comprising neutrophils has also been explored in previous studies.For example, LDHA downregulation mediated by the PI3K/Akt-HIF-1α pathway results in glycolytic inhibition, which leads to neutrophil immunosuppression during sepsis [89].In PDAC, the expression of IL1R2, encoding the bait receptor IL1R2, is associated with an increased abundance of Th2 cells and eosinophils [90].Similar to CD63, TM4SF1 encodes transmembrane proteins that are also involved in signaling in immune cells, such as neutrophils [91].
According to these data, LDHA, IL1R2, and TM4SF1 are risk genes for PC, which is consistent with the findings of prior research.The high-risk group had a higher infiltration of activated CD4+ T cells, NK cells, and neutrophils, in addition to poor clinical pathological features and prognosis.Lastly, the GEPIA and HPA databases were consulted to validate the RNA and protein concentrations, respectively, that the model predicted.
In conclusion, based on the results of bioinformatics studies, we have revealed the molecular characteristics and prognostic role of neutrophils in pancreatic cancer, which will be helpful for the diagnosis and treatment of pancreatic cancer patients.However, the online database can only be used for the verification of early protein levels, the experimental results may not support the findings of computer analysis, and the role of neutrophils in the etiology of pancreatic cancer has not been determined, more experimental studies are needed to confirm the conclusions of this study.In addition, pancreatic cancer may present distant metastases AGING in vivo, and the samples evaluated in this study did not include lung, bone, brain, and other types of metastases, which will be the subject of our future research.Therefore, more large-sample and multicenter studies will be necessary to verify the validity of our diagnostic and prognostic models.

CONCLUSIONS
This research revealed an extensive variety of regulatory mechanisms used by NRGs in the TME in PC.Differences in NRG expression patterns contribute greatly to the variety and complexity of the TME in individuals with PC.The developed neutrophil-related molecular typing and prognostic models will facilitate the thorough assessment of individual patients with PC and enhance our comprehension of the cell infiltration features of the TME.This novel biomarker will lead the development of personalized and successful targeted therapies.

Figure 1 .
Figure 1.Single-cell analysis and screening of neutrophils-related genes (NRGs).(A) A visualization showing the number of genes in each cell as a violin.(B) A violin plot showing the total of each cell's gene expression levels.(C) Violin plot showing each cell's proportion of mitochondrial genes.(D) Scatterplot of the total gene expression levels and the number of genes present in each cell.(E) Scatterplot comparing the total gene expression levels in each cell to the proportion of mitochondrial genes.(F) Genes that vary significantly between cells.Red dots denote 1500 genes whose expression levels have changed significantly.The top ten most variable genes are labeled in the graph.(G) The cell distribution in the PC1 and PC2 dimensions.(H) Principal component analysis identified the top 15 principal components with a P-value < 0.05.(I) The study of t-Distributed stochastic neighbor embedding (t-SNE) identified 15 cell clusters.(J) Using maker genes, cell subtypes were further annotated and labelled.

Figure 2 .
Figure 2. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses.(A) GO enrichment analysis.

Figure 3 .
Figure 3. Establishment of neutrophils-related genes (NRGs) cluster subtypes.(A) Principal component analysis (PCA) plot of the overall expression of all samples before batching.(B) PCA plot of the overall expression of all samples after batching.(C-E) Process and results of consistency cluster analysis, which classified the data into NRGclusters A and B. (F) Survival curves of patients with NRGclusters A and B. The prognosis of patients with NRGcluster B was significantly better than that of patients with NRGcluster A. (G) PCA plot of NRGclusters A and B. (H) The expression of neutrophil-associated genes in NRGclusters A and B, most neutrophil-associated genes are upregulated in NRGcluster A.

Figure 6 .
Figure 6.Formation of subtypes of gene clusters.(A) Analysis of Gene Ontology (GO) enrichment.(B) Enrichment analysis of the Kyoto Encyclopedia of Genes and Genomes (KEGG).(C-E) The methodology and outcomes of consistency cluster analysis, which categorized the data into geneClusters A and B. (F) The survival curves for patients with the A and B geneClusters.Patients with geneCluster B had a much better prognosis than those with geneCluster A. (G) Single-sample gene set enrichment analysis (ssGSEA) findings.

Figure 7 .
Figure 7. Construction of the prognostic risk score model.(A) Volcano plot of univariate Cox regression analysis results.Red dots represent neutrophil-related genes associated with pancreatic cancer prognosis.(B) Random survival forest analysis identified the top 10 important genes.(C) After survival analysis of 2 10 −1 = 1023 combinations, the top 20 models were sorted according to P-values.The best model comprised three genes.(D) Coefficients of genes in the model.(E) The Cancer Genome Atlas (TCGA) dataset (Kaplan-Meir curve of the training set).(F) GSE28735, GSE62452, GSE57495, and GSE85916 datasets from the Gene Expression Omnibus (GEO) platform (Kaplan-Meir curves of the validation set).Survival curves of patients according to the expression of LDHA (G), IL1R2 (H), and TM4SF1 (I).(J) Sankey plots of different subtypes, risk groups, and survival states.(K) The expression of neutrophil-associated genes in high-risk and low-risk groups, most neutrophil-associated genes are upregulated in the high-risk groups.

Figure 8 .
Figure 8. Correlation and independent prognostic analysis of clinical pathological features.Differential risk scores between age groups (A), gender groups (B), different T stages (C), different N stages (D), different M stages (E), different pathological grades (F), and different clinical stages (G).Forest plots of univariate Cox analysis (H) and multivariate Cox analysis results (I).

Figure 9 .
Figure 9. Analysis of metabolic pathways and immune infiltration using the prognostic models.(A) Gene set variation analysis (GSVA) results.(B) Gene set enrichment (GSEA) results.(C) Single-sample GSEA (ssGSEA) results.Risk scores were significantly and positively correlated with neutrophil infiltration levels (D), significantly and negatively correlated with CD8+ T cell infiltration levels (E), significantly and negatively correlated with naïve B cell infiltration levels (F), significantly and positively correlated with M0 macrophage infiltration levels (G), significantly and negatively correlated with monocyte infiltration levels (H), significantly and positively correlated with activated dendritic cell infiltration levels (I), and significantly and positively correlated with resting dendritic cell infiltration levels (J).

Figure 11 .
Figure 11.Validation of levels of gene expression.In pancreatic tumor tissues, the expression levels of LDHA (A), IL1R2 (B), and TM4SF1 (C) were considerably greater than in healthy tissues.Immunohistochemical examination indicated that the expression levels of LDHA (D) and IL1R2 (E) were considerably elevated in pancreatic tumor tissues compared to healthy tissues, TM4SF1 (F) was moderately expressed in both pancreatic tumor tissues compared to healthy tissues.