Research Paper Advance Articles
Identification of lncRNA biomarkers for lung cancer through integrative cross-platform data analyses
- 1 Department of Quantitative Health Sciences, University of Hawaii John A. Burns School of Medicine, The University of Hawaii at Manoa, Honolulu, HI 96813, USA
- 2 Department of Molecular Biosciences and Bioengineering, The University of Hawaii at Manoa College of Tropical Agriculture and Human Resources, Agricultural Sciences 218, Honolulu, HI 96822, USA
Received: January 9, 2020 Accepted: June 1, 2020 Published: July 16, 2020https://doi.org/10.18632/aging.103496
How to Cite
Copyright © 2020 Zhao et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
This study was designed to identify lncRNA biomarker candidates using lung cancer data from RNA-Seq and microarray platforms separately.
Lung cancer datasets were obtained from the Gene Expression Omnibus (GEO, n = 287) and The Cancer Genome Atlas (TCGA, n = 216) repositories, only common lncRNAs were used. Differentially expressed (DE) lncRNAs in tumors with respect to normal were selected from the Affymetrix and TCGA datasets. A training model consisting of the top 20 DE Affymetrix lncRNAs was used for validation in the TCGA and Agilent datasets. A second similar training model was generated using the TCGA dataset.
First, a model using the top 20 DE lncRNAs from Affymetrix for training and validated using TCGA and Agilent, achieved high prediction accuracy for both training (98.5% AUC for Affymetrix) and validation (99.2% AUC for TCGA and 92.8% AUC for Agilent). A similar model using the top 20 DE lncRNAs from TCGA for training and validated using Affymetrix and Agilent, also achieved high prediction accuracy for both training (97.7% AUC for TCGA) and validation (96.5% AUC for Affymetrix and 80.9% AUC for Agilent). Eight lncRNAs were found to be overlapped from these two lists.
LUAD/ADC: adenocarcinoma; LUSC/SCC: squamous cell carcinoma; SCLC: small cell lung cancer; NSCLC: non-small cell lung cancer; lncRNA: long non-coding RNA; GEO: Gene Expression Omnibus; TCGA: The Cancer Genome Atlas; PCA: principal component analysis; TANRIC: The Atlas of ncRNA in Cancer; DAVID: Database for Annotation, Visualization, and Integrated Discovery; TARGET: Tumor Alterations Relevant for Genomics-driven Therapy.