Research Paper Volume 12, Issue 10 pp 9840—9854

Development of a machine learning-based multimode diagnosis system for lung cancer

Shuyin Duan1, , Huimin Cao1, , Hong Liu2, , Lijun Miao2, , Jing Wang2, , Xiaolei Zhou3, , Wei Wang1, , Pingzhao Hu4, , Lingbo Qu1,5, , Yongjun Wu1,6, ,

  • 1 College of Public Health, Zhengzhou University, Zhengzhou 450001, China
  • 2 The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450001, China
  • 3 Henan Provincial Chest Hospital, Zhengzhou 450001, China
  • 4 Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3N4, Canada
  • 5 Henan Joint International Research Laboratory of Green Construction of Functional Molecules and Their Bioanalytical Applications, Zhengzhou 450001, China
  • 6 The Key Laboratory of Nanomedicine and Health Inspection of Zhengzhou, Zhengzhou 450001, China

Received: February 10, 2020       Accepted: April 20, 2020       Published: May 23, 2020
How to Cite

Copyright © 2020 Duan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


As an emerging technology, artificial intelligence has been applied to identify various physical disorders. Here, we developed a three-layer diagnosis system for lung cancer, in which three machine learning approaches including decision tree C5.0, artificial neural network (ANN) and support vector machine (SVM) were involved. The area under the curve (AUC) was employed to evaluate their decision powers. In the first layer, the AUCs of C5.0, ANN and SVM were 0.676, 0.736 and 0.640, ANN was better than C5.0 and SVM. In the second layer, ANN was similar with SVM but superior to C5.0 supported by the AUCs of 0.804, 0.889 and 0.825. Much higher AUCs of 0.908, 0.910 and 0.849 were identified in the third layer, where the highest sensitivity of 94.12% was found in C5.0. These data proposed a three-layer diagnosis system for lung cancer: ANN was used as a broad-spectrum screening subsystem basing on 14 epidemiological data and clinical symptoms, which was firstly adopted to screen high-risk groups; then, combining with additional 5 tumor biomarkers, ANN was used as an auxiliary diagnosis subsystem to determine the suspected lung cancer patients; C5.0 was finally employed to confirm lung cancer patients basing on 22 CT nodule-based radiomic features.


CT: computed tomography; DT: decision tree; ANN: artificial neural network; SVM: support vector machine; AUC: area under the receiver operating characteristic curve; LDCT: low-dose computed tomography; ProGRP: progastrin-releasing peptide; VEGF: vascular endothelial growth factor; CEA: carcinoembryonic antigen; CYFRA21-1: cytokeratin 19 fragment; NSE: neuron specific enolase; PPV: positive predictive value; NPV: negative predictive value; CI: confidence interval.