Research Paper Advance Articles

Identification of lncRNA biomarkers for lung cancer through integrative cross-platform data analyses

Tianying Zhao1,2, , Vedbar Singh Khadka1, , Youping Deng1, ,

  • 1 Department of Quantitative Health Sciences, University of Hawaii John A. Burns School of Medicine, The University of Hawaii at Manoa, Honolulu, HI 96813, USA
  • 2 Department of Molecular Biosciences and Bioengineering, The University of Hawaii at Manoa College of Tropical Agriculture and Human Resources, Agricultural Sciences 218, Honolulu, HI 96822, USA

Received: January 9, 2020       Accepted: June 1, 2020       Published: July 16, 2020
How to Cite

Copyright © 2020 Zhao et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


This study was designed to identify lncRNA biomarker candidates using lung cancer data from RNA-Seq and microarray platforms separately.

Lung cancer datasets were obtained from the Gene Expression Omnibus (GEO, n = 287) and The Cancer Genome Atlas (TCGA, n = 216) repositories, only common lncRNAs were used. Differentially expressed (DE) lncRNAs in tumors with respect to normal were selected from the Affymetrix and TCGA datasets. A training model consisting of the top 20 DE Affymetrix lncRNAs was used for validation in the TCGA and Agilent datasets. A second similar training model was generated using the TCGA dataset.

First, a model using the top 20 DE lncRNAs from Affymetrix for training and validated using TCGA and Agilent, achieved high prediction accuracy for both training (98.5% AUC for Affymetrix) and validation (99.2% AUC for TCGA and 92.8% AUC for Agilent). A similar model using the top 20 DE lncRNAs from TCGA for training and validated using Affymetrix and Agilent, also achieved high prediction accuracy for both training (97.7% AUC for TCGA) and validation (96.5% AUC for Affymetrix and 80.9% AUC for Agilent). Eight lncRNAs were found to be overlapped from these two lists.


LUAD/ADC: adenocarcinoma; LUSC/SCC: squamous cell carcinoma; SCLC: small cell lung cancer; NSCLC: non-small cell lung cancer; lncRNA: long non-coding RNA; GEO: Gene Expression Omnibus; TCGA: The Cancer Genome Atlas; PCA: principal component analysis; TANRIC: The Atlas of ncRNA in Cancer; DAVID: Database for Annotation, Visualization, and Integrated Discovery; TARGET: Tumor Alterations Relevant for Genomics-driven Therapy.