Research Paper Volume 13, Issue 10 pp 14499—14521

Identification of a 15-pseudogene based prognostic signature for predicting survival and antitumor immune response in breast cancer

Liqiang Tan1,2, *, , Xiaofang He3,4, *, , Guoping Shen3, ,

  • 1 Department of Medical Bioinformatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
  • 2 Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou 510080, China
  • 3 Department of Radiation Oncology, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou 510080, China
  • 4 Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
* Equal contribution

Received: March 3, 2020       Accepted: July 7, 2020       Published: December 16, 2020
How to Cite

Copyright: © 2021 Tan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Pseudogenes are noncoding RNAs that have been revealed to play critical roles in oncogenesis and tumor progression. However, their functional roles have not been comprehensively clarified in breast cancer. Here, we systematically analyzed the RNA sequencing data of 13931 pseudogenes in 775 breast cancer patients from The Cancer Genome Atlas dataset, and ultimately identified 15 prognostic pseudogenes by univariate Cox proportional hazard regression. A risk score model was constructed based on the prognostic pseudogenes via LASSO analysis and dichotomized patients into low- and high-risk subgroups. Patients in the high-risk group had a significantly shorter overall survival than those in the low-risk group. The prognostic value of these 15 pseudogenes and the risk score model were further validated in the European Genome-Phenome Archive dataset. Furthermore, we performed consensus clustering of the 15 prognostic pseudogenes and found that their expression pattern was significantly associated with tumor malignancy and host antitumor immune response, in terms of infiltrating immune cell compositions, antigen presenting genes expression, cytolytic activity and T-cell exhausted markers. This study indicated that these 15 prognostic pseudogenes were significantly correlated with tumor malignancy and host antitumor immune response in breast cancer, and might serve as potential targets for immunotherapy.


PD-1: programmed cell death 1; PD-L1: programmed cell death 1 ligand 1; PD-L2: programmed cell death 1 ligand 2; CTLA-4: cytotoxic T-lymphocyte-associated protein 4; ceRNA: competitive endogenous RNA; TCGA: The Cancer Genome Atlas; EGA: European Genome-Phenome Archive; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; GSEA: Gene Set Enrichment Analysis; LASSO: the least absolute shrinkage and selection operator; ROC: the receiver operating characteristic curve; AUC: area under the curve; NCF1C: neutrophil cytosolic factor 1 pseudogene; HLA: human leukocyte antigen; RPL13AP20: L13P family of ribosomal proteins pseudogene 20; PGM5P2: Phosphoglucomutase 5 Pseudogene 2; HERC2P4: HECT And RLD Domain Containing E3 Ubiquitin Protein Ligase 2 pseudogene 4; HSP90AB2P: Heat Shock Protein 90 Alpha Family Class B Member 2 Pseudogene; DHX40P1: DEAH-Box Helicase 40 pseudogene 1; RRN3: RRN3 homolog, RNA polymerase I transcription factor; SDHAP1: Succinate Dehydrogenase Complex Flavoprotein Subunit A Pseudogene 1; RPL23AP53: L23P family of ribosomal proteins pseudogene 53; P1: patient subgroup 1; P2: patient subgroup 2; GZMA: Granzyme A; PRF1: Perforin 1; LAG3: lymphocyte-activation gene 3; TIM3: T-cell immunoglobulin and mucin-domain containing-3; CCR4: C–C chemokine receptor type 4; TIGIT: T cell immunoreceptor with Ig and ITIM domains; ICOS: inducible T-cell costimulatory.