Research Paper Volume 12, Issue 23 pp 23917—23930

Molecular subtypes based on DNA promoter methylation predict prognosis in lung adenocarcinoma patients

Shanping Shi1, , Mingjun Xu1, , Yang Xi1, ,

  • 1 Diabetes Center, Zhejiang Provincial Key Laboratory of Pathophysiology, Institute of Biochemistry and Molecular Biology, School of Medicine, Ningbo University, Ningbo 315211, China

Received: July 6, 2020       Accepted: August 25, 2020       Published: November 24, 2020
How to Cite

Copyright: © 2020 Shi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Background: The heterogeneity of lung adenocarcinoma (LADC) makes the early diagnosis and treatment of the disease difficult. Gene silencing of DNA methylation is an important mechanism of tumorigenesis. A combination of methylation and clinical features can improve the classification of LADC heterogeneity.

Results: We investigated the prognostic significance of 335 specimen subgroups of Lung adenocarcinoma based on the DNA methylation level. The differences in DNA methylation levels were related to the TNM stage classification, age, gender, and prognostic values. Seven subtypes were determined using 774 CpG sites that significantly affected the survival rate based on the consensus clustering. Finally, we constructed a prognostic model that performed well and further verified it in our test group.

Conclusions: This study shows that classification based on DNA methylation might aid in demonstrating heterogeneity within formerly characterized LADC molecular subtypes, assisting in the development of efficient, personalized therapy.

Methods: Methylation data of lung adenocarcinoma were downloaded from the University of California Santa Cruz (UCSC) cancer browser, and the clinical patient information and RNA-seq archives were acquired from the Cancer Genome Atlas (TCGA). CpG sites were identified based on the significant correlation with the prognosis and used further to cluster the cases uniformly into several subtypes.


CpG: Cytosine Guanine; NSCLC: non-small cell lung cancer; KNN: k-nearest neighbors; SNPs: single nucleotide polymorphisms; CDF: cumulative distribution function; SD: Standard Deviation; CV: coefficient of variation; MN: mean; KEGG: Kyoto Encyclopedia of Genes and Genomes; CDF: cumulativedistributionfunction; HR: hazard ratio; AUC: area under curve; HOXA9: Homeobox A9; BVI: Blood Vessel Invasion; STXBP6: syntaxin binding protein 6; CEP55: centrosomal protein 55; PITX1: paired like homeodomain 1; TGFBI: transforming growth factor beta induced; CCDC181: coiled-coil domain containing 181; PLAU: plasminogen activat: orurokinase; S1PR1: sphingosine-1-phosphate receptor 1; KLHDC9: kelch domain containing 9.