Research Paper Volume 13, Issue 2 pp 1972—1988

Plasma cytokines for predicting diabetic retinopathy among type 2 diabetic patients via machine learning algorithms

Bin Cao1,2, , Ning Zhang1,2, , Yuanyuan Zhang1,2, , Ying Fu1,2, , Dong Zhao1,2, ,

  • 1 Center for Endocrine Metabolism and Immune Diseases, Beijing Luhe Hospital, Capital Medical University, Beijing 101149, China
  • 2 Beijing Key Laboratory of Diabetes Research and Care, Beijing 101149, China

Received: May 28, 2020       Accepted: October 9, 2020       Published: December 11, 2020
How to Cite

Copyright: © 2020 Cao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Aims: This study aimed to investigate changes of plasma cytokines and to develop machine learning classifiers for predicting non-proliferative diabetic retinopathy among type 2 diabetes mellitus patients.

Results: There were 12 plasma cytokines significantly higher in the non-proliferative diabetic retinopathy group in the pilot cohort. The validation cohort showed that angiopoietin 1, platelet-derived growth factor-BB, tissue inhibitors of metalloproteinase 2 and vascular endothelial growth factor receptor 2 were significantly higher in the NPDR group. Machine learning algorithms using the random forest yielded the best performance, with sensitivity of 92.3%, specificity of 75%, PPV of 82.8%, NPV of 88.2% and area under the curve of 0.84.

Conclusions: Plasma angiopoietin 1, platelet-derived growth factor-BB, and vascular endothelial growth factor receptor 2 were associated with presence of non-proliferative diabetic retinopathy and may be good biomarkers that play important roles in pathophysiology of diabetic retinopathy.

Materials and Methods: In pilot cohort, 60 plasma cytokines were simultaneously measured. In validation cohort, angiopoietin 1, CXC-chemokine ligand 16, platelet-derived growth factor-BB, tissue inhibitors of metalloproteinase 1, tissue inhibitors of metalloproteinase 2, and vascular endothelial growth factor receptor 2 were validated using ELISA kits. Machine learning algorithms were developed to build a prediction model for non-proliferative diabetic retinopathy.


Diabetic retinopathy (DR), one of the most prominent microvascular complications of diabetes mellitus (DM), is the leading cause of vision impairment and new-onset blindness in the working-age population and diabetes mellitus patients [1, 2]. The increase in the global prevalence of diabetic eye diseases, comprising DR and diabetic macular edema (DME), is intimately connected to the soaring prevalence of DM [35]. It was reported that across China, the prevalence of DR and sight-threatening DR were 27.9% and 12.6% in diabetic patients, respectively [6].

For algorithm development, deep learning techniques have been used for automated detection of DR and DME, based on features in retinal fundus photographs and achieved robust performance [710]. Although image-based features of DR are well-known, knowledge about its protein phenotype are limited. It is accepted that angiogenesis and inflammation crosstalk are intrinsic components of DR [11, 12]. Increasing evidence shows that, in retinal cells and tissues, various cytokines, including vascular endothelial growth factor (VEGF), matrix metalloproteinases (MMPs), and tissue inhibitors of metalloproteases (TIMPs), play essential roles in the progress of DR via angiogenic, inflammatory and fibrotic reactions [1317]. Thus, cytokines play important roles in the pathophysiology of DR. However, the associations between plasma cytokines and non-progressive DR (NPDR) are unclear.

This is the first study to investigate the associations between plasma cytokines and non-progressive DR (NPDR) and to build a prediction model for NPDR. In this study, we hypothesized that the pathological processes leading to NPDR caused characteristic changes in the concentrations of plasma proteins. We then investigated the characteristic changes in plasma cytokines, generating a detectable disease-specific protein phenotype, and finally developed machine learning classifiers for NPDR at the protein level.


Study subjects

For plasma protein profiling, 14 patients with NPDR and 14 patients with T2DM were selected as the pilot cohort. The mean ages of patients with NPDR or T2DM were 62.71 vs. 58.50 years, respectively, and the median durations of diabetes were 13.57 vs. 8.08 years, respectively. The proportion of hypertension was significantly higher in the NPDR group (78.6% vs. 28.6%, p = 0.023). For validation, 115 patients with NPDR and 115 patients with T2DM were selected as the validation cohort. The mean ages of patients with NPDR or T2DM were 60.40 vs. 58.63 years, respectively, and the median durations of diabetes were 8.69 vs. 6.92 years, respectively. In the same manner, the proportion of hypertension was significant higher in the NPDR group (60.9% vs. 47.0%, p = 0.047) (Table 1).

Table 1. Clinical characteristics of the study population.

Clinical characteristicsPilot cohortValidation cohort
DM (n=14) (Mean ± SD)DR (n=14) (Mean± SD)pDM (n=115) (Mean ± SD)DR (n=115) (Mean ± SD)p
Age (years)58.50±8.3162.71±7.630.17458.63±14.2460.40±12.040.316
BMI (Kg/m2)24.83±2.3827.42±4.600.08125.74±3.9026.03±3.810.594
Duration of diabetes (years)8.08±8.7313.57±10.240.1536.92±8.538.69±8.190.116
Fasting plasma glucose (mmol/L)8.08±8.7313.57±10.240.1188.92 ±3.248.82 ±4.030.847
HbA1c (%)9.36±2.289.59±1.550.7669.85 ±2.139.31 ±2.140.060
Fasting C peptide (mIU/L)1.49±0.591.68±1.040.5691.53 ±1.001.76 ±1.050.111
2-h post prandial C-peptide (mIU/L)5.19±3.863.90±2.210.3203.74 ± 2.703.96 ± 2.320.529
Triglyceride (mmol/L)2.05±1.541.93±1.270.8361.80 ± 1.391.78 ±1.080.925
Total cholesterol (mmol/L)4.85±2.294.94±1.180.9174.46 ± 1.294.45 ±1.080.947
Low-density lipoprotein (mmol/L)3.08±1.653.11±0.780.9552.85 ± 1.002.86 ± 0.850.949
Gender, male (%)8 (57.1%)4 (28.6%)0.25262 (53.9%)44 (38.3%)0.025
Hypertension, number (%)4 (28.6%)11 (78.6%)0.02354 (47.0%)70 (60.9%)0.047
*Diabetic nephropathy, number (%)2 (14.3%)4 (28.6%)0.64534 (31.8%)46 (41.1%)0.198
Diabetic peripheral neuropathy, number (%)0 (0%)0 (0%)12 (11.7%)1 (0.9%)1
Diabetic foot, number (%)0 (0%)0 (0%)10 (0%)0 (0%)1
*, 11 missing data in validation cohort.

Identification of predominant plasma cytokines in NPDR patients

We profiled plasma cytokines by using the human glass-based arrays and obtained semi-quantifiable results for 60 plasma cytokines. Compared with T2DM patients, the relative changes of the 60 cytokines were shown in Figure 1A. There were 27 cytokines significantly different between the two groups, among which the fold change of 12 plasma cytokines were larger than four (Figure 1B). As shown in the volcano plot, the top 10 increased cytokines were PDGF-BB, leptin, ANG-1, TIMP-1, RANTES, TIMP-2, ENA-78, angiostatin, CXCL16, and VEGFR2, and the top 10 decreased cytokines were IL-10, ANGPTL4, bFGF, VEGFR3, HB-EGF, IL-12p40, IGF-1, IL-17, I-309, and LIF (Figure 1C). Based on the top 10 increased cytokines, PCA was performed, showing a clear separation between the two groups (Supplementary Figure 1). These findings suggested that plasma cytokines may be helpful to distinguish NPDR from T2DM patients.

Relative cytokine changes in the pilot cohort. A heat map of relative changes of all 60 plasma cytokines (A); a heat map of 27 cytokines with a fold change larger than 4 or less than 0.25 (B); a volcano plot of the top 10 increasing and decreasing cytokines (C).

Figure 1. Relative cytokine changes in the pilot cohort. A heat map of relative changes of all 60 plasma cytokines (A); a heat map of 27 cytokines with a fold change larger than 4 or less than 0.25 (B); a volcano plot of the top 10 increasing and decreasing cytokines (C).

Validation of the six increased plasma cytokines in a large-scale cohort

We further measured the plasma concentration of PDGF-BB, TIMP-1, TIMP-2, ANG-1, CXCL16, and VEGFR2 by ELISA kits in a large cohort, which comprised 115 NPDR and 115 T2DM patients. The concentrations of ANG-1, PDGF-BB, TIMP-2, and VEGFR2 (351.85 ng/mL, 34.95 pg/mL, 114.60 ng/mL, and 14.06 ng/mL, respectively) were significantly higher in NPDR samples than those in T2DM patients (286.81 ng/mL, 28.07 pg/mL, 105.01 ng/mL, and 11.91 ng/mL, respectively). However, there was no significant difference of CXCL16 and TIMP-1 (3,828.94 vs. 3,849.86 pg/mL and 6.78 vs 6.68 ng/mL, respectively) (Figure 2).

A comparison of plasma concentrations of PDGF-BB, TIMP-1, TIMP-2, ANG-1, CXCL16, and VEGFR2 in the validation cohort. ANG-1, PDGF-BB, TIMP-2, and VEGFR2 were significantly higher in non-proliferative diabetic retinopathy patients than in diabetes mellitus patients. However, there were no significant difference of CXCL16 and TIMP-1.

Figure 2. A comparison of plasma concentrations of PDGF-BB, TIMP-1, TIMP-2, ANG-1, CXCL16, and VEGFR2 in the validation cohort. ANG-1, PDGF-BB, TIMP-2, and VEGFR2 were significantly higher in non-proliferative diabetic retinopathy patients than in diabetes mellitus patients. However, there were no significant difference of CXCL16 and TIMP-1.

Correlation between cytokines and clinical characteristics

Pearson’s correlation analysis was performed to investigate the potential relationships among cytokines and clinical characteristics. For NPDR, PDGF-BB was weakly correlated with diabetic duration (r = 0.34), and VEGFR2 was weakly correlated with total cholesterol (r = 0.33) and low-density lipoprotein (r = 0.30) (Supplementary Figure 2A). For T2DM patients, there was no obvious relationship between plasma cytokines and clinical characteristics (Supplementary Figure 2B).

The proportion of hypertension was significant higher in the NPDR group in both the pilot and validation cohorts. To further eliminate the interference of hypertension on the six plasma cytokines, we focused on comparing concentrations of the six plasma cytokines in patients with or without hypertension. Supplementary Figure 3 shows that there was no significant difference of the mean levels of ANG-1, CXCL16, PDGF-BB, TIMP-1, TIMP-2, and VEGFR2 in the NPDR and T2DM groups (322.83 vs. 315.24 ng/mL, 3,796.44 vs. 3,889.65 pg/mL, 32.17 vs. 30.74 pg/mL, 6.70 vs. 6.77 ng/mL, 107.13 vs 112.94 ng/mL, and 12.99 vs. 12.99 ng/mL, respectively). Thus, the higher concentration of these six cytokines in NPDR patients may have minimal association with hypertension in this study.

Feature selection for the machine learning algorithms

We then used PCA to compute the relative contributions of each cytokine to the separation among NPDR and T2DM patients. The first and second principal components of the PCA plot (Dim1 and Dim2) accounted for 36.0%, and 17.1% of the variations, respectively, in the dataset. The projections of samples in PCA were distinguished with relatively small overlapping areas. CXCL16 and TIMP-1 contributed more to the second principal component than the first principal component, while ANG-1, PDGF-BB, TIMP-2, and VEGFR2 contributed more to the first principal component (Supplementary Figure 4A). To be specific, the contribution order of cytokines to the first principal component were ANG-1 (25.9%), PDGF-BB (21.0%), TIMP-2 (20.2%), VEGFR2 (16.5%), TIMP-1 (9.9%), and CXCL16 (6.5%) (Supplementary Figure 4B).

Random forest was performed to evaluate the importance level of each cytokine to the separation among NPDR and T2DM patients. The importance level of ANG-1 (22.8%), VEGFR2 (22.2%), and PDGF-BB (20.5%) were higher than TIMP-1 (13.7%), TIMP-2 (10.5%), and CXCL16 (10.3%) (Supplementary Figure 4C).

Lasso regression was also conducted for model selection. The coefficients of ANG-1, VEGFR2, and PDGF-BB in lasso regression were 0.014, 0.003 and 0.001, while the coefficient of TIMP-1, TIMP-2, and CXCL16 were 0 (Supplementary Figure 4D).

Finally, combined PCA, random forest, and lasso regression results for ANG-1, VEGFR2, and PDGF-BB were selected for a machine learning prediction model building.

Development and validation of machine learning classifiers

To select a high-performance classifier for prediction, we developed ANN, LR, SVM, XBG, and RF classifiers based on ANG-1, VEGFR2, and PDGF-BB. In 10-fold cross validation of the train set, the mean AUC of ANN, LR, SVM, XBG, and RF were 0.82, 0.83, 0.82, 0.82, and 0.85, respectively (Figure 3). This finding revealed that all classifiers performed similarly and exhibited excellent performance in the training set.

The average area under the curve of a 10-fold cross validation of ANN (A), LR (B), SVM (C), XBG (D), and RF (E) in the train set.

Figure 3. The average area under the curve of a 10-fold cross validation of ANN (A), LR (B), SVM (C), XBG (D), and RF (E) in the train set.

For validation, the test set was used to evaluate the performance of machine learning classifiers. Table 2 and Supplementary Table 1 show the comparison results of machine learning algorithms in the test set. LR, ANN, and SVM exhibited moderate predictive performance, with accuracy ranging from 72% to 76%, sensitivity ranging from 73.1% to 84.6%, specificity ranging from 65% to 70%, AUCs ranging from 0.72 to 0.75, and f1 scores ranging from 0.75 to 0.80. XGB exhibited good predictive performance, with an AUC of 0.82 and an f1 score of 0.85. RF classifier performed best in all the following indicators; the accuracy was 85%, the sensitivity was 92.3%, the specificity was 75%, the PPV was 82.8%, the NPV was 88.2%, the AUC was 0.84, and the f1 score was 0.87, while LR performed worst. The McNamara’s test was conducted to statistically compare performance of RF (the best model) and LR (the worst model) and p value equal to 0.289. Combined these results, we can calculate that, RF was the best classifier, although there was no statistical difference compared with other models.

Table 2. Performance of the 5 machine learning classifiers on the test set.

ModelAccuracySensitivitySpecificityAUCF1 score
Test setLR72%73.1%70.0%0.720.75


Because of the main role that angiogenesis and inflammation have in the development and progression of NPDR, we hypothesized that angiogenesis- and inflammation-related cytokines in the plasma might be different in NPDR patients, and could be novel predictive biomarkers. To the best of our knowledge, this is the first large-scale study to determine specific plasma cytokines for the diagnosis of NPDR when compared with those in T2DM patients. In the pilot cohort with a small number of samples, cytokines antibody arrays were performed to identify 60 plasma cytokines. The results showed that 27 cytokines were increased in patients with NPDR, among which 12 cytokines were increased in the NPDR group (fold change > 4). In the larger-scale validation cohort, ELISA kits were used to validate six of the 12 plasma cytokines. Four out of six plasma cytokines, ANG-1, PDGF-BB, TIMP-2, and VEGFR2, were confirmed to be significantly higher in NPDR patients. These results suggested that plasma cytokines may be specifically involved in the development of NPDR.

The main goal of this study was to identify potential plasma biomarkers of patients with NPDR. Feature selection indicated that NPDR was highly associated with ANG-1, PDGF-BB, and VEGFR2, so these three cytokines were finally included in the machine learning algorithms. LR, ANN, SVM, RF, and XGB classifier confirmed that these three cytokines were highly discriminatory for NPDR in the independent test set, with the sensitivity ranging from 73.1% to 92.3%, with the specificity ranging from 65.0% to 75%, and with the AUC ranging 0.72 to 0.84. Among the five machine learning algorithms, RF classifier, with a sensitivity of 92.3% and the AUC of 0.84 in the test set, showed excellent discrimination of NPDR from T2DM patients.

Angiogenesis- and inflammation-related cytokines play vital roles in injuries of human retinal endothelial cells in culture. ANG-1, a member of the angiopoietins family, is a growth factor that plays a key role in vessel homeostasis, angiogenesis, and vascular permeability via interacting with the Tie2 transmembrane receptor tyrosine kinase [1820]. Damage of the blood retinal barrier, which is induced by diabetes, is inhibited by Ang-1 in a dose-dependent manner [21]. PDGF-BB has been reported to be involved in astrogliosis and the formation of proliferative membranes in retinopathy by activating PDGFRα and PDGFRβ [22]. The upregulated combination of VEGF-A and VEGFR2 is a response to the ischemia induced by retinal vascular damage, and stimulates extraretinal vascular outgrowth to the retinal surface without amelioration of ischemia in the retina [2325]. TIMP-2 is an endogenous inhibitor of matrix metalloproteinase-2 and may act as a protector to reduce the loss of capillary cells resulting in the development of diabetic retinopathy [26, 27]. Although angiogenesis- and inflammation-related cytokines are involved in the development and progression of DR, their changes in DR are unclear.

With the development of an algorithm, deep learning techniques have been used for automated detection of DR. Based on features in retinal fundus photographs, deep learning algorithms show discriminative abilities comparable with those of ophthalmologists [7, 8, 2831]. Image features of DR are well-known; however, knowledge about its plasma protein specific features are limited. In the present study, the RF classifier, which was based on the plasma concentrations of ANG-1, PDGF-BB, and VEGFR2, also showed good prediction abilities. Although the performance of the plasma protein-based RF classifier was not as good as that of the image-based deep learning classifier, our results indicated that plasma ANG-1, PDGF-BB, and VEGFR2 may be protein specific features of NPDR, and the roles of these three plasma cytokines in the pathophysiology of NPDR, are worthy of further study.

Two previous studies reported that serum levels of ANG-1 were significantly higher in the NPDR group, when compared to the T2DM group [32, 33]. Paine et al. also reported that the plasma levels of soluble VEGFR2 consistently increased with the severity of DR [34]. Consistent with these findings, in the present study, we showed that in NPDR, ANG-1, TIMP-2, VEGFR2, and PDGF-BB were significantly increased. The protective cytokines, ANG-1 and TIMP-2, were increased in the NPDR group. A possible explanation for this might be that the increases of ANG-1 and TIMP-2 may represent an adaptive compensatory mechanism to promote cellular repair, to suppress the development of retinal or choroidal neovascularization, and to strengthen the integrity of the vascular structure [35].

The plasma cytokine changes in NPDR patients have been controversial, and the correlations between plasma cytokines and clinical features were also unclear. Pearson’s correlation indicated that for NPDR patients, PDGF-BB was weakly correlated with the duration of diabetes (r = 0.34), and VEGFR2 was weakly correlated with total cholesterol (r = 0.33) and low-density lipoprotein (r = 0.30). According to previous studies, diabetic duration, total cholesterol, and low-density lipoprotein were risk factors for diabetic retinopathy [6, 36, 37]. Whether PDGF-BB and VEGFR2 act independently or in concert with blood lipids during NPDR is still unclear, so further studies are needed.

The strengths of this study were as follows. It was the first study to include a large number of patients with comparable baselines. It contained a pilot study for screening of possible cytokines associated with NPDR and a large-scale cohort for ELISA verification Machine learning algorithms based on these plasma cytokines exhibited good performance for distinguishing DR from T2DM patients. However, we acknowledge several limitations in our study. First, the examination for DR was based on two-field fundus photographs, which are theoretically less sensitive than seven-field retinal photographs. However, the presence of mild DR would be underestimated only if the retinal pathologies were present in the peripheral area of the retina. Second, patients with coronary heart disease (CHD) were excluded in this study. The diagnosis of CHD, however, was based on the history of disease provided by the patients. A significant percentage of diabetic patients with coronary heart disease usually have no symptoms, so it may be inevitable that a few CHD patients were included in this study.

In summary, we report that plasma cytokines including ANG-1, PDGF-BB, TIMP-2, and VEGFR2 were increased, and that plasma cytokine patterns were comprehensive predicators of DR based on machine learning algorithms. Our results suggested that plasma cytokines could be strong risk markers of NPDR.

Materials and Methods


Inpatient patients with NPDR or Type 2 DM (T2DM) were enrolled in this study at the Center for Endocrine Metabolism and Immune Diseases of Beijing Luhe Hospital, Capital Medical University (Beijing, China) between November 2018 and September 2019. Two different groups were established: (1) an age- and body mass index (BMI)-matched pilot cohort containing 14 NPDR patients and 14 T2DM patients to comprehensively screen changes of angiogenesis- and inflammation-related plasma cytokines by human glass-based cytokine microarrays. (2) A validation cohort containing 115 NPDR patients and 115 T2DM patients to further measure the concentrations of plasma angiopoietin 1 (ANG-1), CXC-chemokine ligand 16 (CXCL16), platelet-derived growth factor-BB (PDGF-BB), tissue inhibitors of metalloproteinase 1 (TIMP-1), tissue inhibitors of metalloproteinase 2 (TIMP-2), and vascular endothelial growth factor receptor 2 (VEGF R2) using ELISA kits.

The 2010 diagnostic criteria of T2DM from the American Diabetes Association were used: (1) glycosylated hemoglobin (HbA1c) ≥ 6.5%; (2) fasting plasma glucose ≥ 7.1 mmol/L; (3) 2 h of blood glucose during an oral glucose tolerance analysis ≥ 11.1 mmol/L; and (4) in a typical hyperglycemic or hyperglycemia crisis patient, random blood glucose was ≥ 11.1 mmol/L. Two-field retinal photographs were taken of each eye of all patients by a trained photographer, using a nonmydriatic fundus camera (Topcon, Tokyo, Japan). The diagnosis and grading of DR were conducted by two trained specialists following the Early Treatment of Diabetic Retinopathy Study Researched Group (ETDRS) [38] as follows: (1) no retinopathy; (2) mild NPDR; (3) moderate NPDR; (4) severe NPDR; and (5) proliferative retinopathy (PDR). Patients met the following inclusion criteria: (1) conformity to the above diabetes diagnostic criteria; (2) conformity to NPDR diagnostic criteria; and (3) > 18 years of age. The exclusion criteria were: (1) Type 1 DM or other type of DM; (2) any retinopathy other than NPDR; (3) acute complications of diabetes; and (4) a history of cardiovascular diseases and stroke.

Clinical examination and data collection

Blood biochemical parameters, including fasting glucose, HbA1c, 2 h postprandial C-peptide, triglycerides, total cholesterol, and low-density lipoprotein were collected at the time of the screening. All information from the patients, including height, weight, diabetic related complications, other histories of diseases, retinal examination, and optical coherence tomography, were recorded. Plasma samples were collected in ethylenediaminetetraacetic acid tubes and were immediately centrifuged at 1,400 × g for 10 min at 4° C, and then the supernatant was aliquoted and stored at -80° C, avoiding freeze thaw cycles. All samples were collected with the signed informed consent from all patients, and all related procedures were performed with the approval of the internal review and ethics boards of the indicated hospitals.

Cytokine antibody assay

Plasma soluble cytokines were measured in duplicate using the Ray Biotech G-Series Human Angiogenesis Array 2 and 3 (Ray Biotech, Norcross, GA, USA) following the recommended protocols. Briefly, all samples were biotinylated. The antibodies were immobilized in specific spot locations on glass slides. The incubation of array membranes with biological samples resulted in the binding of cytokines to the corresponding antibodies. Signals were visualized using streptavidin-horseradish peroxidase conjugates and colorimetric assays. Final spot intensities were measured as the original intensities after subtracting the background. The two kits provided high sensitivity and specificity to simultaneously detect a total of 60 cytokines from the plasma. As determined by densitometry, the inter-array coefficient of variation of spot signal intensities was less than 20%.

Differential protein level analysis

To identify proteins with significant concentrations in the plasma, the raw data were normalized and then the fold change of NPDR vs. T2DM for each cytokine was calculated using the “edgeR” package [39]. The fold change values of cytokines were used to indicate their relative concentration levels. Any fold change ≥ 2 or ≤ 0.5 with FDR < 0.05 was considered as significant. Based on differential plasma protein, principle component analysis (PCA) was conducted to evaluate variation between two groups using the “ggbiplot” package.

ELISA validation

Plasma concentration of PDGF-BB, TIMP-1, TIMP-2, ANG-1, CXCL16, and VEGFR2 were determined in the validation cohort by an ELISA kit following the manufacturer’s instructions (Human ELISA kit, MLbio, Shanghai, China). The intra-assay coefficient of variation was 10%, and the inter-assay coefficient of variation was 12%. No significant cross-reactivity or interference was observed.

Machine learning algorithms to distinguish NPDR from T2DM patients

The whole data set was randomly divided into the training set (80%) and the test set (20%). To prevent overfitting, the training set was randomly split into 10 equal-sized subgroups using the 10-folds cross validation method. In 10-folds cross validation, nine subgroups were retained as training data and the remaining one subgroup was used as the validation data for testing the model. The cross-validation process was then repeated 10 times, with each of the 10 subsamples used exactly once as the validation data. The 10 results from the folds then were averaged to produce a single estimation. Finally, the test set was used to evaluate the model (Supplementary Figure 5). Five machine learning algorithms were trained: the artificial neural network (ANN), logistic regression (LR), random forest (RF), support vector machine (SVM), and xgradient-boosting (XGB). Parameters for each machine learning method were shown in Supplementary Table 2. The performance of each classifier was evaluated by its accuracy, sensitivity and specificity, positive predictive value (PPV), negative predictive value (NPV), f1 score, Matthews correlation coefficient (MCC), and area under the curve (AUC) of the receiver operating characteristic (ROC).

Statistical analysis

Differences in clinical characteristics and cytokines between groups were calculated using the Wilcoxon test for continuous variables and the chi-square test for categorical variables. Pearson’s correlation was performed to assess the relationship of the plasma cytokines and clinicopathological characteristics. Machine learning algorithms and diagnostic performance were evaluated using scikit-learn (V0.21.3) in Python V3.7.4. Other data analysis and visualization were performed by R software, version 3.6.2 (The R Project for Statistical Computing, Vienna, Austria). A two-sided P-value < 0.05 was considered statistically significant.

Author Contributions

Bin Cao: conceptualization, formal analysis, writing (original draft). Ning Zhang: investigation, data curation. Yuanyuan Zhang and Ying Fu: data curation, formal analysis. Dong Zhao: conceptualization, funding acquisition, writing (review and editing).


The authors thank all participants who took part in the study.

Conflicts of Interest

The authors have no conflicts of interest.


This study was supported by grants from the Capital Health Development Research Project (2020-4-7082) and the Natural Science Foundation of Beijing (7194282).


  • 1. Cheung N, Mitchell P, Wong TY. Diabetic retinopathy. Lancet. 2010; 376:124–36. [PubMed]
  • 2. Yau JW, Rogers SL, Kawasaki R, Lamoureux EL, Kowalski JW, Bek T, Chen SJ, Dekker JM, Fletcher A, Grauslund J, Haffner S, Hamman RF, Ikram MK, et al, and Meta-Analysis for Eye Disease (META-EYE) Study Group. Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care. 2012; 35:556–64. [PubMed]
  • 3. Leasher JL, Bourne RR, Flaxman SR, Jonas JB, Keeffe J, Naidoo K, Pesudovs K, Price H, White RA, Wong TY, Resnikoff S, Taylor HR, and Vision Loss Expert Group of the Global Burden of Disease Study. Global Estimates on the Number of People Blind or Visually Impaired by Diabetic Retinopathy: A Meta-analysis From 1990 to 2010. Diabetes Care. 2016; 39:1643–9. [PubMed]
  • 4. Werfalli M, Engel ME, Musekiwa A, Kengne AP, Levitt NS. The prevalence of type 2 diabetes among older people in Africa: a systematic review. Lancet Diabetes Endocrinol. 2016; 4:72–84. [PubMed]
  • 5. Sattar N, Gill JM. Type 2 diabetes in migrant south Asians: mechanisms, mitigation, and management. Lancet Diabetes Endocrinol. 2015; 3:1004–16. [PubMed]
  • 6. Zhang G, Chen H, Chen W, Zhang M. Prevalence and risk factors for diabetic retinopathy in China: a multi-hospital-based cross-sectional study. Br J Ophthalmol. 2017; 101:1591–95. [PubMed]
  • 7. Ting DS, Cheung CY, Lim G, Tan GS, Quang ND, Gan A, Hamzah H, Garcia-Franco R, San Yeo IY, Lee SY, Wong EY, Sabanayagam C, Baskaran M, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017; 318:2211–23. [PubMed]
  • 8. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316:2402–10. [PubMed]
  • 9. Ting DS, Peng L, Varadarajan AV, Keane PA, Burlina PM, Chiang MF, Schmetterer L, Pasquale LR, Bressler NM, Webster DR, Abramoff M, Wong TY. Deep learning in ophthalmology: the technical and clinical considerations. Prog Retin Eye Res. 2019; 72:100759. [PubMed]
  • 10. Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017; 124:962–69. [PubMed]
  • 11. Capitão M, Soares R. Angiogenesis and inflammation crosstalk in diabetic retinopathy. J Cell Biochem. 2016; 117:2443–53. [PubMed]
  • 12. Zhou L, Zhang T, Lu B, Yu Z, Mei X, Abulizi P, Ji L. Lonicerae japonicae flos attenuates diabetic retinopathy by inhibiting retinal angiogenesis. J Ethnopharmacol. 2016; 189:117–25. [PubMed]
  • 13. Mohammad G, Vandooren J, Siddiquei MM, Martens E, Abu El-Asrar AM, Opdenakker G. Functional links between gelatinase B/matrix metalloproteinase-9 and prominin-1/CD133 in diabetic retinal vasculopathy and neuropathy. Prog Retin Eye Res. 2014; 43:76–91. [PubMed]
  • 14. Zhong Q, Kowluru RA. Regulation of matrix metalloproteinase-9 by epigenetic modifications and the development of diabetic retinopathy. Diabetes. 2013; 62:2559–68. [PubMed]
  • 15. Tsujinaka H, Fu J, Shen J, Yu Y, Hafiz Z, Kays J, McKenzie D, Cardona D, Culp D, Peterson W, Gilger BC, Crean CS, Zhang JZ, et al. Sustained treatment of retinal vascular diseases with self-aggregating sunitinib microparticles. Nat Commun. 2020; 11:694. [PubMed]
  • 16. Yokomizo H, Maeda Y, Park K, Clermont AC, Hernandez SL, Fickweiler W, Li Q, Wang CH, Paniagua SM, Simao F, Ishikado A, Sun B, Wu IH, et al. Retinol binding protein 3 is increased in the retina of patients with diabetes resistant to diabetic retinopathy. Sci Transl Med. 2019; 11:eaau6627. [PubMed]
  • 17. Opdenakker G, Abu El-Asrar A. Metalloproteinases mediate diabetes-induced retinal neuropathy and vasculopathy. Cell Mol Life Sci. 2019; 76:3157–66. [PubMed]
  • 18. Féraud O, Mallet C, Vilgrain I. Expressional regulation of the angiopoietin-1 and -2 and the endothelial-specific receptor tyrosine kinase Tie2 in adrenal atrophy: a study of adrenocorticotropin-induced repair. Endocrinology. 2003; 144:4607–15. [PubMed]
  • 19. Eklund L, Saharinen P. Angiopoietin signaling in the vasculature. Exp Cell Res. 2013; 319:1271–80. [PubMed]
  • 20. Hussain RM, Neiweem AE, Kansara V, Harris A, Ciulla TA. Tie-2/Angiopoietin pathway modulation as a therapeutic strategy for retinal disease. Expert Opin Investig Drugs. 2019; 28:861–69. [PubMed]
  • 21. Felcht M, Luck R, Schering A, Seidel P, Srivastava K, Hu J, Bartol A, Kienast Y, Vettel C, Loos EK, Kutschera S, Bartels S, Appak S, et al. Angiopoietin-2 differentially regulates angiogenesis through TIE2 and integrin signaling. J Clin Invest. 2012; 122:1991–2005. [PubMed]
  • 22. Kitahara H, Kajikawa S, Ishii Y, Yamamoto S, Hamashima T, Azuma E, Sato H, Matsushima T, Shibuya M, Shimada Y, Sasahara M. The novel pathogenesis of retinopathy mediated by multiple RTK signals is uncovered in newly developed mouse model. EBioMedicine. 2018; 31:190–201. [PubMed]
  • 23. Gariano RF, Gardner TW. Retinal angiogenesis in development and disease. Nature. 2005; 438:960–66. [PubMed]
  • 24. Shibuya M. Vascular endothelial growth factor and its receptor system: physiological functions in angiogenesis and pathological roles in various diseases. J Biochem. 2013; 153:13–19. [PubMed]
  • 25. Stitt AW, Curtis TM, Chen M, Medina RJ, McKay GJ, Jenkins A, Gardiner TA, Lyons TJ, Hammes HP, Simó R, Lois N. The progress in understanding and treatment of diabetic retinopathy. Prog Retin Eye Res. 2016; 51:156–86. [PubMed]
  • 26. Mohammad G, Kowluru RA. Matrix metalloproteinase-2 in the development of diabetic retinopathy and mitochondrial dysfunction. Lab Invest. 2010; 90:1365–72. [PubMed]
  • 27. Kowluru RA, Kanwar M. Oxidative stress and the development of diabetic retinopathy: contributory role of matrix metalloproteinase-2. Free Radic Biol Med. 2009; 46:1677–85. [PubMed]
  • 28. Gulshan V, Rajan RP, Widner K, Wu D, Wubbels P, Rhodes T, Whitehouse K, Coram M, Corrado G, Ramasamy K, Raman R, Peng L, Webster DR. Performance of a deep-learning algorithm vs manual grading for detecting diabetic retinopathy in India. JAMA Ophthalmol. 2019; 137:987–93. [PubMed]
  • 29. Son J, Shin JY, Kim HD, Jung KH, Park KH, Park SJ. Development and validation of deep learning models for screening multiple abnormal findings in retinal fundus images. Ophthalmology. 2020; 127:85–94. [PubMed]
  • 30. Verbraak FD, Abramoff MD, Bausch GC, Klaver C, Nijpels G, Schlingemann RO, van der Heijden AA. Diagnostic accuracy of a device for the automated detection of diabetic retinopathy in a primary care setting. Diabetes Care. 2019; 42:651–56. [PubMed]
  • 31. Keel S, Wu J, Lee PY, Scheetz J, He M. Visualizing deep learning models for the detection of referable diabetic retinopathy and glaucoma. JAMA Ophthalmol. 2019; 137:288–92. [PubMed]
  • 32. You QY, Zhuge FY, Zhu QQ, Si XW. Effects of laser photocoagulation on serum angiopoietin-1, angiopoietin-2, angiopoietin-1/angiopoietin-2 ratio, and soluble angiopoietin receptor tie-2 levels in type 2 diabetic patients with proliferative diabetic retinopathy. Int J Ophthalmol. 2014; 7:648–53. [PubMed]
  • 33. Khalaf N, Helmy H, Labib H, Fahmy I, El Hamid MA, Moemen L. Role of angiopoietins and tie-2 in diabetic retinopathy. Electron Physician. 2017; 9:5031–35. [PubMed]
  • 34. Paine SK, Mondal LK, Borah PK, Bhattacharya CK, Mahanta J. Pro- and antiangiogenic VEGF and its receptor status for the severity of diabetic retinopathy. Mol Vis. 2017; 23:356–63. [PubMed]
  • 35. Nambu H, Nambu R, Oshima Y, Hackett SF, Okoye G, Wiegand S, Yancopoulos G, Zack DJ, Campochiaro PA. Angiopoietin 1 inhibits ocular neovascularization and breakdown of the blood-retinal barrier. Gene Ther. 2004; 11:865–73. [PubMed]
  • 36. Yin L, Zhang D, Ren Q, Su X, Sun Z. Prevalence and risk factors of diabetic retinopathy in diabetic patients: a community based cross-sectional study. Medicine (Baltimore). 2020; 99:e19236. [PubMed]
  • 37. Liu L, Geng J, Wu J, Yuan Z, Lian J, Desheng H, Chen L. Prevalence of ocular fundus pathology with type 2 diabetes in a Chinese urban community as assessed by telescreening. BMJ Open. 2013; 3:e004146. [PubMed]
  • 38. Early Treatment Diabetic Retinopathy Study Research Group. Grading diabetic retinopathy from stereoscopic color fundus photographs—an extension of the modified airlie house classification. ETDRS report number 10. Ophthalmology. 1991 (5 Suppl); 98:786–806. [PubMed]
  • 39. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26:139–40. [PubMed]