Abstract

Abnormal DNA methylation is a major early contributor to colon cancer (COAD) development. We conducted a cohort-based systematic investigation of genome-wide DNA methylation using 299 COAD and 38 normal tissue samples from TCGA. Through conditional screening and machine learning with a training cohort, we identified one hypomethylated and nine hypermethylated differentially methylated CpG sites as potential diagnostic biomarkers, and used them to construct a COAD-specific diagnostic model. Unlike previous models, our model precisely distinguished COAD from nine other cancer types (e.g., breast cancer and liver cancer; error rate ≤ 0.05) and from normal tissues in the training cohort (AUC = 1). The diagnostic model was verified using a validation cohort from The Cancer Genome Atlas (AUC = 1) and five independent cohorts from the Gene Expression Omnibus (AUC ≥ 0.951). Using Cox regression analyses, we established a prognostic model based on six CpG sites in the training cohort, and verified the model in the validation cohort. The prognostic model sensitively predicted patients’ survival (p ≤ 0.00011, AUC ≥ 0.792) independently of important clinicopathological characteristics of COAD (e.g., gender and age). Thus, our DNA methylation analysis provided precise biomarkers and models for the early diagnosis and prognostic evaluation of COAD.