A unified processing and analysis pipeline for single-cell based rejuvenation studies
To overcome the abiding issue of heterogeneous processing and analysis approaches between different studies, we propose a unified multiscale analysis pipeline that allows for the characterization and comparison of the effects of rejuvenation interventions. Starting from quantified expression profiles of single-cell RNA-seq experiments from treated and untreated donor samples, our pipeline first filters low quality cells based on dynamic thresholds for the percentage of mitochondrial and ribosomal reads as well as the relationship of read counts to detected genes. Next, the expression profiles of all cells in a dataset after regressing out the effect of cell cycle induced differences and normalization using scTransform [13]. Finally, the optimal clustering of cells is automatically identified by maximizing the Calinski-Harabasz Index [14].
After processing the data, our pipeline analyzes each dataset of treated and untreated samples at different levels of biological organization. Initially, it characterizes differentially expressed genes and the cellular processes they belong to. Subsequently, transcriptional regulatory networks (TRN) among differentially expressed genes by following a previously published method [15]. In brief, assuming an “inhibition dominant” regulatory logic in which one upregulated inhibitor is sufficient to cause the downregulation of a gene (no matter the number of activating relationships), a prior knowledge network (pkn) of TF - gene interactions is pruned to remove interactions in which the gene activity is behaving differently to what is seen in the differential expression profile. As a next step, our pipeline integrates signaling and transcriptional regulation to reconstruct sustained signaling cascades and identify their key molecules using SigHotSpotter [16]. Eventually, we employ InterCom to interrogate intercellular communication by reconstructing cell-cell interactions mediated by ligands and their cognate receptors [17]. In brief, InterCom infers ligand-receptor interactions by modeling intracellular signaling and downstream TF expression to ensure compatibility.
We collected 9 previously published single-cell RNA-seq datasets of heterochronic parabiosis (3 datasets), calorie restriction (1), exercise (1), metformin (2), rapamycin (1) and in vivo partial reprogramming (2) [7–10, 18–22] (Figure 1A, Supplementary Table 1, Supplementary Figure 1). As expected, we observed substantial technical variability in these datasets evidenced by large differences in sequencing depth, which further underscores the need for a homogeneous data processing pipeline (Figure 1B). Altogether, the employed datasets span a total of 74 cell types across 18 organs. Notably, tissues from the central nervous system, adipose tissue, liver and bone marrow could be found in multiple intervention datasets. To begin with, we set out to characterize the differential expression patterns of each cell type in each organ in response to the individual rejuvenation strategies and identified a considerable heterogeneity (Figure 1C). While systemic interventions such as calorie restriction and heterochronic parabiosis consistently exert large effects on the transcriptome of multiple organs, metformin has little to no effect on the organs it has been examined in. Interestingly, although exercising is directly affecting the muscles by diverting blood to them, the largest transcriptional effects were observed in the liver, artery and spinal cord.
Figure 1. (A) Overview of the SINGULAR project and its initial motivation. Publicly available datasets for several rejuvenation interventions were analyzed in this study. With the exception of Parabiosis, analyzed from three datasets, Reprogramming, which was analyzed from two datasets, and Metformin, also used in the Rapamycin experiment condition, every condition had data from one study. SINGULAR combines a unified processing pipeline for all the datasets with three main tools to explore transcriptional regulatory networks, signaling pathways, and cell-to-cell ligand receptor interactions. (B) Example comparison of the technical heterogeneity motivating this study. UMI counts across organ systems and studies, as well as the organs sample in each of the different studies, both vary greatly. (C) Comparison of counts of unique upregulated and downregulated genes from different studies grouped at the organ level, derived using the Delegate method. Even after applying SINGULAR’s unified preprocessing pipeline, substantial heterogeneity by organ and study in the transcriptional response to rejuvenation remains.
Identification of transcriptional master regulators that mediate rejuvenation effects
In order to gain insights into the regulatory relationships explaining the observed differential expression profile, we reconstructed TRNs among differentially expressed genes for each cell type in different tissues. As a result, we obtained 317 TRNs of cell types that were affected by a rejuvenation intervention. On average, TRNs are composed of 72 genes (range: 2–867) although most networks contain less than 35 genes (Figure 2A). Interestingly, the size of the TRNs is only weakly related to the number of differentially expressed genes (Pearson correlation, r = 0.21, p < 0.001), which suggests that the transcriptional response to rejuvenation interventions is dependent on other regulatory mechanisms. In this regard, we hypothesized that signaling dependent TFs, whose activity is not only mediated by their expression level but also extracellular signals, may regulate the genes that cannot be explained in the TRNs. Indeed, using a previously curated collection of signal-dependent TFs [23], we found between 77.8% and 100% (median 87.97%) of genes that are differentially expressed, have a known potential regulator but are not part of the TRNs can be regulated by signaling dependent TFs. In addition, we assessed how hierarchical each network is using the Krackhardt Hierarchy Score and found the TRNs to be highly hierarchical (average: 0.994, range: 0.932–1) (Figure 2B). This indicates the presence of a few ‘master regulators’, i.e., TFs that explain a large fraction of gene expression changes (see Methods for details).
Figure 2. Properties and clustering of master regulators in the rejuvenation response. (A) Ridge plot of network size, calculated as sum of unique TFs and targets for each regulatory gene network, grouped by study (bin size = 30, average number of genes 72, median number of genes 31, range 2–867). Provided enough distinct regulatory networks are observed, their number of elements can vary between organs and cell types of the same dataset. (B) Krackhardt hierarchy scores of all TRNs. In this case, we universally see values very close to 1 (mean 0.994, rage 0.932–1), indicating a very hierarchical regulatory response for all rejuvenation interventions. This motivated the search for master regulators in the transcriptional networks. (C) Distribution of instances of a specific TF being observed in each of the TRNs. The majority of TFs are seen in only a few regulatory networks, but a minority appear in a significant fraction. (D) Heatmap of the TF score (see online methods) for the 30 TFs with the greatest average ranking across all TRNs. Clustering was performed with the manhattan distance and the McQuitty method. Coordinated TF responses can be observed, as well as activity patterns strongly associated more with the rejuvenation condition than the cell type, potentially uncovering more holistic rejuvenation interventions by targeting master regulators behind different interventions. (E) Heatmap subset transcription factors known to be part of the AP-1 complex. Several clusters that contain a single master regulator can be observed in the differential rejuvenation response. Given these cofactors are expected to be coexpressed, this suggests a rejuvenation response in immune cell types under Calorie Restriction and Parabiosis that relies on the action of distinct AP-1 dimers.
Based on network statistics, we sought to identify these master regulators and simulated the downstream effects of activating a single TF in the network to assess the number of genes whose differential expression could be determined by this gene alone. Thus, a TF with a score of ‘1’ determines all genes in a network while a TF with a score of ‘0’ determines no other gene. Following this approach, we detected 493 TFs with a non-zero score across all cell types, organs and interventions. However, the majority of these TFs only act as master regulators in less than 5 conditions (Figure 2C). Moreover, the master regulator score of many TFs is low across the majority cell types and interventions. For example, the rejuvenation response in adipocytes after exercising is orchestrated by the co-expression of Clock and Arntl, which induce different downstream factors depending on the organ of origin. On the other hand, Nfkb and Esr1 regulates varying fractions of differentially expressed genes depending on the intervention (Supplementary Figure 2A). Indeed, it is not uncommon that in different conditions both shared and distinct mechanisms are found, suggesting therapeutic approaches to be equally promising through similar mechanisms. In basal cells of the Skin, for instance, Srf, Cebpb, Atf4, Jun and Myc shared the majority of their downstream regulatory target genes whereas other TFs mostly acted in a non-overlapping manner (Supplementary Figure 2B). Similar patterns could also be found in different cell types of the same intervention, with Ddit3, Spib and Cebpb mediating the effect in the granulocyte lineage while the remainder of the transcription factor response is determined by the maturity of the cell (Supplementary Figure 2C). Not surprisingly, we also observed distinct mediators of the intervention response. For instance, Ybx1, Klf4 and Ets1 were found to be master regulators of exercise and calorie restriction in hepatocytes, whereas only Foxo3 attained a high master regulator score in case of heterochronic parabiosis (Supplementary Figure 2D).
Next, we aimed at interrogating the most common intervention mediators and selected the 30 TFs that have the highest average master regulator score across all cell types (Figure 2D). Surprisingly, when contrasted against previously existing analysis that documented substantial declines or increases in TF activity with ageing [12], the overlap with those TFs is limited; with only 4 of our 30 mater regulators appearing in such an analysis (Nfkb1, Irf1, Arntl and Id3). Moreover, the sign of the change in TF activity varied depending on cell type, rather than being consistently positive or negative. This would suggest a marked distinction between the regulatory agents associated with age and those able to orchestrate the rejuvenation response.
Interestingly, our master regulator TFs have been previously associated with diverse cellular functions, including differentiation, proliferation, immune response and cell migration. Intriguingly, when grouping these TFs by their master regulator score in every cell type, we observed the presence of several clusters. As a general observation, we conclude that our master regulators rather group by intervention instead of cell type. In light of the diverse set of enriched cellular processes, this suggests the induction of distinct signaling pathways that differentially activate broad TFs. However, there exists one cluster that almost exclusively contains immune cells after treatment of heterochronic parabiosis or calorie restriction (Figure 2D; right part). Although all of these TFs contribute to the mediation of the intervention effects, Jun, Junb, Jund, Atf4 and Fos, all of which belong to the AP-1 transcription factor complex, display a consistent involvement (Figure 2E). In addition, we observed that many smaller clusters are formed that predominantly contain a single master regulator even though the AP-1 complex TFs mostly co-occur. The presence of the AP-1 complex across multiple interventions prompted us to dissect the involvement of all known subunits [24] separately. Consistent with the known dimerization patterns of the AP-1 complex [25], Fos and Jun clearly emerged as the most common master regulators across interventions and cell types. In contrast, other Jun-, Fos- and Atf-family TFs act more selectively as master regulatory, which leads us to hypothesize that the cell type and intervention specific effects are exerted by distinct AP-1 dimers. Interestingly, although previous studies have documented the influence of this complex in promoting age-related inflammation (“inflammaging”) [26], our analysis strongly suggests their action as anti-aging mediators depending on their dimerization.
To support the involvement of the identified master regulator TFs, we cross-referenced the 30 TFs having the highest master regulator score with aging-associated genes contained in GenAge [27]. As a result, we found 53% (16/30) master regulators to be linked to aging with varying degrees of evidence. In particular, Arntl, Cebpb, Foxo1 and Jun possess strong evidence and have been directly linked to aging in mammalian and non-mammalian model organisms. In addition, the gene products of Myc and Nfe2l2 have been directly linked to aging in a cellular model system and Foxo3 has been shown to be involved in human longevity. Interestingly, several genes, i.e., Ar, Egr1, Jun and Sp1, have been shown to regulate genes previously linked to aging. Less evidence is provided for Hif1a and Nfkb1, which are known to be involved in pathways or mechanisms linked to aging. Finally, Ddit3, Fos and Stat3 are known effectors of aging-related genes.
Despite the TFs that have been found in GenAge, we collected publicly available transcriptomic perturbation data of the top 30 master regulators and applied MultiTIMER [28], a multi-tissue transcriptional aging clock, to quantify potentially rejuvenating effects. Due to the nature of MultiTIMER as a predictor of transcriptional age in bulk data, we chose to validate the master regulators found in SINGULAR with experiments available in the Gene Expression Omnibus (GEO). In particular, we found knockdown/knockout experiments for Klf4, Irf1, Atf4, Myc, Hif1a and Esr1 in cell types where they have been identified as master regulators (Supplementary Table 3). Since master regulators are up-regulated upon rejuvenating interventions, we would consequently expect the age of normal cells to increase after their knockdown or knockout. Indeed, we observed increases in the predicted cellular age after perturbing Klf4 (9.1 years), Irf1 (3.9 years), Hif1a (2.6 years). In addition, we also collected transcriptional profiles after overexpression of Klf4 and Myc. Interestingly, the predicted cellular age after Myc overexpression is considerably younger (−9.7 years) whereas it slightly increased in case of Klf4 (1.6 years). These results suggest that master regulators act synergistically with other TFs to exert a rejuvenating effect in a cell type dependent manner. This idea is further supported by current partial reprogramming strategies that upregulate Klf4 and Myc in combination with Pou5f1 and Sox2 to achieve a significantly higher reduction in cellular age compared to what we observed for Klf4 or Myc alone [29].
Crosstalk between transcriptional master regulators and intracellular signaling response
Most of the TFs with the highest master regulator potential across interventions are well known to be activated or inhibited by multiple signaling pathways. Thus, we set out to identify the active signaling molecules that are likely to mediate the activation of master regulators in each cell type, tissue and intervention, as described before. Moreover, we compared the accordingly detected signaling molecules in treated and untreated samples to select those that are differentially active between both conditions. As a result, we identified 452 molecules in 33 cell types across conditions and organs. Of these molecules, 74 directly activated the downstream TFs of the corresponding TRNs after treatments. The full set of results can be viewed in the SINGULAR database.
To interrogate the function of the integrated signaling cascades and their induced TRNs, we performed Gene Set Enrichment Analysis (GSEA) of their constituent genes [30]. Strikingly, 23 integrated networks are negatively enriched in aging gene sets derived from Tabula Muris Senis, as found in The Molecular Signatures Database (MSigDB) [31]. This finding supports the rejuvenating effects of different interventions on cell types on the basis of an independent dataset (Figure 3A–3C). After heterochronic parabiosis however, neutrophils predominantly displayed a pro-aging signature (Figure 3D). This is consistent with a previous study reporting an increase of a gene signature of neutrophil activation in hematopoietic stem and progenitor cells after parabiosis treatment [8]. Of note, especially in the case of parabiosis, we observed a reduction of aging signatures derived from various tissues. Nevertheless, in other cases, the enriched signatures precisely matched the tested cell type, such as in case of lung B cells where Grb2 mediates the activation of the transcriptional master regulator Fos (Figure 3B, Supplementary Figure 3A).
Figure 3. (A) Normalized enrichment scores for different cell types in different organs in the Parabiosis2 dataset. We observe substantial heterogeneity, including situations where a cell is negatively enriched both for its actual cell type and for the aging signature of other, very different lineages. This suggests that Parabiosis may be the most comprehensive rejuvenation intervention at this level of analysis. It must be noted that Neutrophils were the only cell type with a mixed rejuvenating and aging signature, but this is consistent with known responses to heterochronic parabiosis experiments. (B) Normalized enrichment score for the shared component between the TRN and the signaling network for Muscle fiber and Lung B cells under the Exercise condition. In this example, the geneSet cell markers for an aged transcriptome perfectly match the celltype the multi-modal network was derived from. (C) Normalized enrichment score for the shared component between the TRN and the signaling network in Reprogramming dataset for Muscle stem cells. In this example, the negative aging signature is found for three different cell types, none matching the one the data was derived from. (D) Bubble plot illustrating the number of cell types per organ and the fraction of cell types per organ where we were able to detect a sustained signalling network associated with the rejuvenation condition, per organ and study. Crosses indicate absence for any cell type. Full equivalence between the geneSet legend labels and the Tabula Muris Senis enrichments can be found in Supplementary Table 2.
Despite the reversal of pro-aging signatures in multiple cell types, we observed other enriched functions that are consistent with our current understanding of the interventions. For instance, E2f target genes, cell cycle control and the G2m checkpoint were negatively enriched after partial reprogramming of the muscle, which displays an expected decrease in cellular proliferation that is consistent with previous studies indicating an up-regulation of the cell cycle inhibitor Cdkn1a in the critical treatment window [32] (Supplementary Figure 3B). Moreover, in the same experiment, cardiac muscle organogenesis was positively enriched suggesting an ongoing re-commitment to a fully differentiated state after de-differentiation (Figure 3C, Supplementary Figure 3B). Interestingly, while Neutrophils in the bone marrow show signs of overall rejuvenation in response to heterochronic parabiosis, their inflammatory potential increases (Supplementary Figure 3C). This implies that parabiosis could have harmful effect on aging alongside its rejuvenating benefits. Finally, the response of T cells in the peripheral blood to Parabiosis2 was mediated by the kinase Pak2 and the master regulators Nfkb1 and Stat3. As a result, we observed a decline in the Pi3k/Akt/mTOR signaling pathway, which is a well-known aging determinant. In fact, hypomorphic Pi3k mice show an increased longevity (Supplementary Figure 3D).
Integration of gene network inference, signaling and intercellular communication analysis
Based on our finding that the transcriptional response to several interventions could be linked to sustained signaling cascades that get activated, we finally aimed at interrogating whether these effects are induced by ligand-receptor mediated cell-cell interactions. Thus, we employed InterCom [17] to reconstruct the cell-cell communication networks of each treated and untreated organ. Similar to our assessment of the intracellular signaling cascades, we focus in the remainder on the most significant interactions that are unique to the treated condition and that involve a receptor as a key signaling molecule. The complete information can be accessed in the SINGULAR database.
Deriving mechanistic insights and testable hypotheses from the response to different rejuvenation strategies could significantly accelerate the development of new anti-aging treatments. Therefore, we focus on two illustrative examples to demonstrate the potentiality of our unified analysis approach. First, we found our analysis to recapitulate known cell communication effects in macrophages after heterochronic parabiosis (Figure 4A). More specifically, our analysis revealed that Gnai2 activates the AP-1 complex genes Jun and Fos upon dissociation of Ccr2 upon recognition of its cognate ligand Ccl2 [33]. Activation of the AP-1 complex in turn leads to the up-regulation of known chemotaxis related genes, such as Cd14, Cxcl2 and Vegfa [34–36]. Indeed, positive enrichment of chemotaxis-related gene sets in a GSEA of the signaling cascades and downstream TFs underscored the chemotactic expression program induced by Ccr2 (Figure 4B). As a second example we chose to illustrate a novel, non-canonical signaling cascade that has not been reported before. In response to exercising, Purkinje cells in the cerebellum form an autocrine loop and interact with oligodendrocyte precursor cells via the Fgf10-Fgfr2 axis (Figure 4C). While our analysis recapitulates the downstream activation of Runx2, which in turn up-regulates and guarantees the expression of Fgfr2, we found that Pax6 is activated by Tcf12 in response to Fgfr2 activation. Although the function of Pax6 has not been reported in Purkinje cells, it is a known neuroprotective transcription factor [37].
Figure 4. Mechanistic insights from a combined approach of all tools used by SINGULAR. (A) Recapitulation of well documented cell communication pathway for Macrophage recruitment under the Parabiosis condition. Ccr2 recognizes Ccl2, which initiates a signaling cascade to activate the AP-1 complex, which leads to the activation of chemotaxis genes. (B) Further validation of this well-known pathway from gene set Enrichment analysis of the members of the connected component of TRN and signaling cascade crosstalk for Gnai2 as a signalling intermediate and Fos, Jun and Cepbp as TRN TFs. Values have been averaged from several related functions, full results in Supplementary Table 4. (C) Novel signaling cascade. In Purkinje cells of the cerebellum, Fgf10 binds to Fgfr2, initiating a cascade in which Pkrca leads to the activation of Mapk1, which recruits Runx2 for further expression of the Fgfr2 receptor, as well as a separate Creppb-mediated signaling cascade that ends with Tcf12 activating Pax6, a transcription factor known for its neuroprotective properties. (D) Intersection between druggable activating TFs in DrugBank and the master regulators uncovered in this study. (E) Number of druggable key signaling molecules for every integrated TRN and signalling cascade.
Identification of potential drugs targeting key TFs and signaling molecules
In order to demonstrate the utility of SINGULAR, we asked whether we can determine drugs that can target the identified TF master regulators and key signaling molecules. For that, we collected all available drug-target relationships in DrugBank and searched for drugs that could activate our master regulators or mimic the effect of rejuvenation interventions on key signaling molecules. For this purpose, we classified TFs as master regulators if they determine at least 30% of the network TFs when activated according to our simulation studies (see online Methods). Unsurprisingly, of the 239 transcriptional master regulators across all cell types, organs and interventions, only 17 could be activated by drugs (Figure 4D). Moreover, these TFs predominantly belong to the class of nuclear receptors, including Nr3c1, Vdr, Nr1i2, Rxra and Ar. However, notable exceptions are the AP-1 complex proteins Jun and Fos as well Trp53. These further underscores the suitability to interfere with the AP-1 complex to mimic the effect of complex interventions. In order to determine whether any of these drugs possess known rejuvenating effects, we cross-referenced them with DrugAge, a database of aging related drugs [38]. As a result, we found several compounds with demonstrated effects on lifespan in model organisms. For instance, Curcumin, a Vdr agonist, extends the maximum lifespan of D. melanogaster on average by 19.5% at high concentrations and Vitamin D3 extends the average lifespan of C. elegans by 26.8% in a dose-dependent manner. Moreover, Bezafibrate, a partial agonist of Nr1i2, has been shown to increase the average lifespan of C. elegans by 13%. In contrast to TFs, the differentially active key signaling molecules between treated and untreated conditions are generally better druggable (Figure 4E). In particular, the microglia specific key signaling molecules after parabiosis App and Mapk14 are targeted by 24 and 56 drugs, respectively. However, none of the identified molecules target both genes.