Research Paper Volume 13, Issue 3 pp 4024—4044

Identification of candidate genes encoding tumor-specific neoantigens in early- and late-stage colon adenocarcinoma

Chong Wang1, , Wenhua Xue2, , Haohao Zhang3, , Yang Fu4, ,

  • 1 Department of Hematology, The First Affiliated Hospital of Zhengzhou University, Henan, China
  • 2 Department of Pharmacy, The First Affiliated Hospital of Zhengzhou University, Henan, China
  • 3 Department of Endocrinology, The First Affiliated Hospital of Zhengzhou University, Henan, China
  • 4 Department of Gastrointestinal Surgery, The First Affiliated Hospital of Zhengzhou University, Henan, China

Received: June 24, 2020       Accepted: October 31, 2020       Published: January 10, 2021
How to Cite

Copyright: © 2021 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Colon adenocarcinoma (COAD) is one of the most common gastrointestinal malignant tumors and is characterized by a high mortality rate. Here, we integrated whole-exome and RNA sequencing data from The Cancer Genome Atlas and investigated the mutational spectra of COAD-overexpressed genes to define clinically relevant diagnostic/prognostic signatures and to unmask functional relationships with both tumor-infiltrating immune cells and regulatory miRNAs. We identified 24 recurrently mutated genes (frequency > 5%) encoding putative COAD-specific neoantigens. Five of them (NEB, DNAH2, ABCA12, CENPF and CELSR1) had not been previously reported as COAD biomarkers. Through machine learning-based feature selection, four early-stage-related (COL11A1, TG, SOX9, and DNAH2) and four late-stage-related (COL11A1, SOX9, TG and BRCA2) candidate neoantigen-encoding genes were selected as diagnostic signatures. They respectively showed 100% and 97% accuracy in predicting early- and late-stage patients, and an 8-gene signature had excellent prognostic performance predicting disease-free survival (DFS) in COAD patients. We also found significant correlations between the 24 candidate neoantigen genes and the abundance and/or activation status of 22 tumor-infiltrating immune cell types and 56 regulatory miRNAs. Our novel neoantigen-based signatures may improve diagnostic and prognostic accuracy and help design targeted immunotherapies for COAD treatment.


COAD: colon adenocarcinoma; TSA: tumor-specific antigens; DEGs: differentially expressed genes; PCA: principal component analysis; GO: Gene Ontology; WGS: whole-genome sequencing; WES: whole-exome sequencing; STAD: Stomach adenocarcinoma; ESCA: Esophageal carcinoma; LUAD: Lung adenocarcinoma; HNSC: Head and Neck squamous cell carcinoma; OV: Ovarian serous cystadenocarcinoma; BRCA: Breast invasive carcinoma; PAAD: Pancreatic adenocarcinoma; KICH: Kidney Chromophobe; PRAD: Prostate adenocarcinoma; THCA: Thyroid carcinoma.