Identification and external validation of the hub genes associated with cardiorenal syndrome through time-series and network analyses

Cardiorenal syndrome (CRS), defined as acute or chronic damage to the heart or kidney triggering impairment of another organ, has a poor prognosis. However, the molecular mechanisms underlying CRS remain largely unknown. The RNA-sequencing data of the left ventricle tissue isolated from the sham-operated and CRS model rats at different time points were downloaded from the Gene Expression Omnibus (GEO) database. Genomic differences, protein–protein interaction networks, and short time-series analyses, revealed fibronectin 1 (FN1) and periostin (POSTN) as hub genes associated with CRS progression. The transcriptome sequencing data of humans obtained from the GEO revealed that FN1 and POSTN were both significantly associated with many different heart and kidney diseases. Peripheral blood samples from 20 control and 20 CRS patients were collected from the local hospital, and the gene expression levels of FN1 and POSTN were detected by real-time quantitative polymerase chain reaction. FN1 (area under the curve [AUC] = 0.807) and POSTN (AUC = 0.767) could distinguish CRS in the local cohort with high efficacy and were positively correlated with renal and heart damage markers, such as left ventricular ejection fraction. To improve the diagnostic ability, diagnosis models comprising FN1 and POSTN were constructed by logistic regression (F-Score = 0.718), classification tree (F-Score = 0.812), and random forest (F-Score = 1.000). Overall, the transcriptome data of CRS rat models were systematically analyzed, revealing that FN1 and POSTN were hub genes, which were validated in different public datasets and the local cohort.


INTRODUCTION
Chronic heart and kidney disease often occur together and promote each other, leading to progressive deterioration of heart and renal function. The phenomenon in which acute or chronic impairment of the heart or kidney causes dysfunction of another organ was first defined as cardiorenal syndrome (CRS) by Ronco et al. in 2008 [1]. Several evidence-based medicine studies have shown that renal insufficiency is a vital prognostic predictor and risk factor for heart diseases, while cardiovascular deaths account for the largest proportion of renal diseases [2][3][4].
Microalbuminuria, a hallmark of renal dysfunction, is widely accepted as a predictive factor of cardiovascular events [5][6][7]. Additionally, patients with chronic heart failure are prone to microalbuminuria and exhibit unfavorable clinical outcomes [8,9]. In general, a bidirectional relationship between heart and renal diseases has been confirmed.
Several mechanisms have been proposed to explain CRS pathophysiology. From a hemodynamic standpoint, the reduction of cardiac output caused by heart failure directly leads to a decrease in renal blood flow, causing renal ischemia. The decrease in renal  [58] blood flow could also activate the renin-angiotensinaldosterone system (RAAS), thereby impairing the heart and renal functions [10]. From a physiological perspective, inflammation and oxidative stress play essential roles in CRS progression. The overactivation of RAAS promotes the release of inflammatory factors, such as interleukin 6, tumor necrosis factor α, and transforming growth factor (TGF), causing kidney fibrosis and ventricular remodeling [11,12]. The upregulation of RAAS could further advance the production of reactive oxygen species (ROS), and excessive ROS leads to necrosis of renal and cardiac cells [13]. In addition to these classical hypotheses, other factors such as activation of the sympathetic nervous system, accumulation of uremic toxins, and endoplasmic reticulum stress are known to impact CRS [14,15]. However, our current understanding of the initiation and development of CRS remains insufficient.
The rapid development of gene sequencing technology and big-data analysis has allowed further clarification of the latent molecular mechanisms underlying CRS. Chen et al. performed transcriptome sequencing of the right ventricle and kidney isolated from CRS mouse models and established lncRNA-miRNA-mRNA competing endogenous RNA networks, thus clarifying the comprehensive regulatory relationships of CRS [16]. Chuppa et al. utilized RNA-sequencing (RNA-seq) to analyze the left ventricle tissue of CRS rat models. They found that miR-21-5p could improve cardiac function by regulating peroxisome proliferator-activated receptor alpha [17]. These efforts broaden our horizons and highlight potential therapeutic targets for CRS. Nevertheless, the number of genome-wide studies on CRS remains limited.
Herein, the transcriptome data of the rat ventricle tissue at weeks 2, 4, 5, and 7 after subtotal nephrectomy were retrieved from the Gene Expression Omnibus (GEO) database. Genomic divergence, protein-protein interaction (PPI) network, and time-series analyses were conducted to identify the hub genes involved in CRS progression. Transcriptome sequencing of different types of heart and renal diseases downloaded from the GEO database was used for preliminary verification. The peripheral blood of the control and CRS subjects was also collected from the Shunde Hospital of Southern Medical University, and realtime quantitative polymerase chain reaction (RT-qPCR) was performed to detect gene expression levels. Finally, multiple machine learning algorithms were used to improve the diagnostic ability based on these hub genes.

Data collection
The GSE98520 dataset, including the RNA-seq data with fragments per kilobase million format (FPKM) of the left ventricle tissue isolated from the sham-operated and treated rats at weeks 2, 4, 5, and 7 after the 5/6 nephrectomy, was directly obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) as the training dataset. The GEO datasets GSE2240, GSE161472, GSE36961, GSE66494, GSE125779, and GSE36961, containing different heart disease or kidney disease samples, were also downloaded. Detailed information on the public datasets utilized in this study is presented in Table 1. According to the platform annotation files collected from the GEO database, all probe IDs were transformed into gene symbols using R software (version 3.6.3). If multiple probes corresponded to a gene, an average value was adopted. Probes corresponding to multiple genes were excluded from analyses.

Genomic difference analysis and PPI network construction
Genomic differences were analyzed to detect the differentially expressed genes between the shamoperated and subtotal nephrectomy rats at weeks 2, 4, 5, and 7 using the limma package. The filtering threshold to identify associated genes was set at P < 0.05. Genes  [18]. The PPI network was based on the STRING database (version 11.5, https://cn.stringdb.org/), and the confidence level was set to 0.4 [19]. The CytoHubba plug-in (version 0.1) in the Cytoscape software (version 3.8.0) was used to measure the importance of the genes in the network, and genes with a degree ≥10 were considered as hub genes [20].

Time-series analysis
Time-series analysis was performed using the Short Time-series Expression Miner (STEM, version 1.3.13). The data were normalized to the expression values of the sham-operated samples. The STEM clustering method was utilized to conduct the clustering with the following parameters: the maximum number of model profiles = 50 and maximum unit change in model profiles between time points = 2 [21]. The advance options were all set to the default values.

Functional enrichment analysis
Gene Ontology (GO) functional annotation was conducted using the clusterProfiler package after transforming the gene symbols into Entrez IDs according to the org.Rn.eg.db or org.Hs.eg.db package. Terms with P < 0.05 and Q < 0.05 were considered statistically significant.

Clinical samples
The study protocol was reviewed and approved by the Ethics Committee of the Shunde Hospital of Southern Medical University (Ethics Approval Number: 20210207). All participants signed an informed consent form. Peripheral blood samples (2 mL) of 20 control and 20 CRS subjects were collected within 24 h of admission and stored in EDTA anticoagulant tubes at 4°C. CRS diagnosis was based on the latest clinical guidelines [22,23]. The control subjects were defined as those without CRS, severe cardio or renal dysfunction, malignant tumors, severe infection, or other factors which could possibly influence the gene expression level. The clinicopathological features of patients, including age, sex, smoking, diabetes history, left ventricular ejection fraction (LVEF), N-terminal pro-B-type natriuretic peptide (NTproBNP), serum creatinine (Scr), blood urea nitrogen (BUN), and uric acid (UA), were also recorded.

RT-qPCR
Total RNA was extracted using the Trizol-chloroform method (Trizol reagent, Sigma-Aldrich, China) after the red blood cells of the whole blood sample were lysed with erythrocyte lysis buffer (Sigma-Aldrich, Saint Louis, MO, USA) for 15 min at room temperature. The purity and concentration of the RNA were measured using a Nanodrop2000 spectrophotometer (Thermo Scientific, Waltham, MA, USA). Complementary DNA was synthesized and amplified using the PrimeScript RT Reagent Kit (Takara, Dalian, China) and SYBR Premix ExTaq kit (Takara, China). The PCR experiments were conducted on an Applied Biosystems 7600 thermocycler (ABI, USA). Gene expression levels were normalized to GAPDH, and the 2 −ΔΔCt method was used to calculate the definite RNA expression values. The primer sequences used in this study are listed in Table 2.

Construction of diagnostic models based on machine learning algorithm
To improve the diagnostic ability of CRS, we constructed diagnostic models based on the screened core genes. Logistic regression was performed to establish the diagnostic model, and a nomogram was drawn to visualize the model using the rms package [24]. The classification tree was constructed using the rpart package [25], and the random forest model was developed using the randomForest package with the following parameters: 500 as the ntree and 3 as the mtry [26]. The confusion matrices, accuracy, precision, recall, and F-Score were used to measure the predictive ability of each machine learning diagnosis model.

Identification of the functionally-related genes
The top 20 functionally related genes of the hub genes were identified using the GeneMANIA database (http://genemania.org/search/homo-sapiens/). The number of the maximum resultant genes was set to 20, and that of maximum resultant attributes was set to 10. The query-dependent weighting method was chosen automatically by the website [27].

Statistical analysis
Statistical analysis were performed using R (version 3.6.3) and GraphPad Prism (version 8.4.3). Data are presented as mean ± standard deviation (SD). Student's t-test was used to compare gene expression differences in the PCR experiments. The Welch-corrected t-test was adopted to compare the differences in gene expression levels between the control and disease groups obtained from the GEO database and clinicopathological parameters of the control and CRS subjects from the Shunde Hospital of Southern Medical University. The association between the hub gene expression level and clinicopathological variables was measured using the Spearman correlation test. The receiver operating curve (ROC) and corresponding area under curve (AUC) were obtained from the pROC package. Unless otherwise specified, P < 0.05 was considered to be statistically significant; * P < 0.05, ** P < 0.01, *** P < 0.001.

Genomic difference analysis and PPI network construction
The workflow of the study protocol is shown in Figure 1. The R code used in this study is presented in Supplementary Material. The differentially expressed genes between the sham-operated and CRS model rats were analyzed. A total of 286 ( Figure  AGING 5-, and 7-week old rats, respectively, and 35 genes were found to overlap ( Figure 2E). Subsequently, 35 genes were included in the PPI network ( Figure 2F). The CytoHubba app revealed that the degrees of Col1a1, FN1, POSTN, and Col3a1 were greater than or equal to 10, and thus, the four genes were selected for subsequent analysis ( Figure 2G, Supplementary Table 1).

Time-series analysis and functional annotation
The time-series analysis of the transcriptome sequencing data of the CRS model rats at 4 time nodes identified 11 different gene clusters (P < 0.05) ( Figure  3A). A total of 1346 genes were included in these clusters, of which fibronectin 1 (FN1) and periostin (POSTN) were determined by PPI network analysis ( Figure 3B). Interestingly, FN1 and POSTN were both members of cluster 41 (P < 0.001, Figure 3C). GO enrichment analysis indicated that the genes in cluster 41 were mainly involved in cell cycle-and immunerelated pathways, such as negative regulation of metaphase/anaphase transition of the cell cycle, spindle checkpoint, and mast cell granules ( Figure 3D). The synthesis of pro-inflammatory mediators always AGING increases in CRS, causing cell death and fibrosis [28]. In addition, cell cycle arrest plays essential roles in pathological processes as the renal and cardiac cells usually undergo cell cycle arrest to prevent possible DNA damage from cell division when the cells undergo cellular stress [29]. Generally, these findings correspond to those of previous studies.

FN1 and POSTN are associated with many different heart and renal diseases
To further verify the role of FN1 and POSTN in CRS, transcriptome data of different heart and kidney diseases were downloaded. We found that FN1 was significantly upregulated in the atrial fibrillation (P < 0.05, Figure 4A), heart failure (P < 0.01, Figure 4B), hypertrophic cardiomyopathy (P < 0.001, Figure 4C), chronic kidney disease (P < 0.05, Figure 4D), focal segmental glomerulosclerosis (P < 0.01, Figure 4E), and uremia (P < 0.001, Figure 4F) samples compared with control samples. Similar trends were also observed in POSTN, as shown in Figure 4A-4F (P < 0.05). The indirect evidence partly demonstrated that FN1 and POSTN are strongly associated with many different heart and kidney diseases, thereby influencing the pathogenesis of CRS.   Table 3. Except for smoking history (P < 0.05), Scr (P < 0.05), BUN

Construction of the diagnostic models based on FN1 and POSTN
Diagnostic models containing FN1 and POSTN were established using multiple machine learning algorithms. The logistic regression model was constructed as follows: logistic score = −2.10 + 1.58 * EXP(FN1) + AGING 0.88 * EXP(POSTN), where EXP represented the mRNA expression level of FN1 or POSTN. A nomogram including FN1 and POSTN was constructed to help clinicians better understand the logistic model ( Figure  6A). The confusing matrix of indicated that the logistic regression model could predict CRS with high efficacy ( Figure 6B). The classification tree is shown in Figure  6C. Compared with the logistic regression model, the classification tree showed a higher predictive ability ( Figure 6D). Unexpectedly, it was found that only FN1  was included in the classification tree model, implying that FN1 had better predictive ability than POSTN. In the random forest model, the importance of FN1 and POSTN was quantified as a mean decrease in accuracy and Gini, which were previously described [34]. Compared with POSTN, FN1 exhibited higher mean decreases in accuracy and Gini ( Figure 6E), which was in agreement with the classification tree analysis. The random forest model could distinguish the CRS sample with the highest effectiveness ( Figure 6F, Supplementary Figure 1A and 1B), which is in alignment with the results of several previous studies that have provided proof of its strong performance [35][36][37]. The accuracy, precision, recall, and F-score of each model are listed in Table 4. In addition, the predictive ability of CRS for clinicopathological features and established diagnostic models were also compared (Supplementary Figure 1A and 1B). Overall, compared with the traditional biomarkers or single biomarkers like FN1 and POSTN, the combination of FN1 and POSTN through machine learning algorithms, especially random forest, greatly improved the efficiency of CRS diagnosis.

Functionally-related genes of FN1 and POSTN
The top 20 most strongly associated genes of FN1 and their interaction relationships are illustrated in Figure  7A. The size of the gene nodes represents the importance of the genes in the network, and the thickness of the lines is positively correlated with the interaction strength. GO analyses revealed that FN1 and its associated genes mostly participate in extracellular matrix-, immune-, and platelet-related biological processes ( Figure 7B). The top 20 POSTN-related genes are shown in Figure 7C, and their functions were mainly associated with the extracellular matrix, amino acids, and TGF ( Figure 7D). These findings indicate the possible mechanisms by which FN1 and POSTN are involved in the bidirectional interaction between the heart and kidney.

DISCUSSION
CRS is a complex heart-kidney disease characterized by high morbidity and mortality, resulting in a tremendous social burden worldwide [38]. About 20-40% of patients with acute heart failure suffer from kidney dysfunction, and almost 40-60% of patients with chronic heart failure experience chronic kidney disease [39,40]. Exploration of the molecular mechanisms in CRS is critical and difficult. The advancement and popularization of gene sequencing technology has provided an opportunity to further elucidate the pathological processes of this disease. Recently, many novel biomarkers associated with CRS have been reported, including cystatin 3, galectin 3, NGAL, and KIM1 [31,41]. These biomarkers not only provide the clinical guidelines for CRS diagnosis and prognosis, but also suggest the underlying cut-in points for mechanistic studies. However, given the complexity of CRS, the current findings are far from sufficient.
The present study systematically analyzed the transcriptomic data of the rat ventricle tissue at weeks 2, 4, 5, and 7 after subtotal nephrectomy through genomic difference detection, PPI network analysis, and timeseries analysis; FN1 and POSTN were ultimately identified as hub genes associated with CRS. The validation in different public datasets and local clinical samples indicated that FN1 and POSTN were both significant diagnostic biomarkers for CRS, which verified the findings from the animal experiments. Here, we report that levels of FN1 and POSTN were obviously increased not only in the diseased heart and kidney, but also in the plasma of CRS patients, implying that FN1 and POSTN are underlying cardiorenal connectors. FN1, encoding fibronectin, a protein located in the plasma, cell surface, and extracellular matrix, is mainly involved in cell adhesion and migration [42]. POSTN is also located in the extracellular space and regulates tissue development and regeneration.  [46]. Cardiofibrosis has a significant influence on the prognosis of heart diseases, which can be regulated by POSTN [47,48].
AGING POSTN was also found to be associated with renal diseases, including diabetic kidney disease [49], immunoglobulin A nephropathy [50], and polycystic kidney disease [51]. Hence, regardless of the absence of a regulatory relationship of POSTN and FN1 with CRS, FN1 and POSTN may exert important biological functions in the pathogenesis of CRS. Time-series analysis indicated that FN1 and POSTN were both members of cluster 41, and were thus mainly associated with immune-related pathways, which corresponded to the enrichment results of their functionally related genes. Immune-mediated dysregulation is well known for its vital role in CRS development, directly evidenced by the strong upregulation of plasma proinflammatory cytokines in CRS patients [52]. In summary, FN1 and POSTN are latent cardiorenal connectors that may function by regulating the immune response in CRS.
Another highlight of this study is the establishment of CRS diagnostic models comprising FN1 and POSTN.
To ameliorate the predictive ability, the diagnostic models were constructed using three different machine learning algorithms, namely logistic regression, classification tree, and random forest. The ROCs revealed that the random forest model could distinguish CRS samples from control samples with high efficacy. Compared with traditional biomarkers, the performance AGING of the random forest model is more satisfactory. In general, novel genetic diagnostic tools based on FN1 and POSTN are presented, providing new choices for clinicians.
The limitations of this study should not be neglected. First, we only performed the direct validation of the diagnostic value of FN1 and POSTN in the local hospital, and external verification in other centers would be beneficial to further clarify the clinical usefulness of these biomarkers. Second, we revealed that FN1 and POSTN serve as novel biomarkers of CRS in animal experiments and different cohorts, but how they affect the progression of CRS remains unclear. Further experiments are needed to explore the processes by which FN1 and POSTN affect the development of CRS.
In summary, the transcriptome sequencing data of the CRS rat models at different time models were systematically analyzed, and FN1 and POSTN were thus identified as novel biomarkers in CRS. These were externally validated in public datasets and local clinical samples, providing novel insights into the molecular mechanisms of CRS.

AUTHOR CONTRIBUTIONS
YZH designed the whole study and provided financial support. JJL developed the algorithm, drew the plots, conducted the experiments, and wrote the original manuscript. XHH and WWL did help to collecting the clinical samples, editing and reviewing.

Supplementary Figure
Supplementary

Supplementary Tables
Supplementary Table 1