Based on the considerations in the introduction, we first justify the gene lists we used to construct the healthspan pathway maps for humans and C. elegans. Second, we describe the healthspan pathway maps in detail, specifically in light of gene expression data that we overlaid onto the pathway maps. We then consider the human - C. elegans overlap, followed by some general discussion of our approach, including its strengths and limitations.
From gene lists to maps of healthspan pathways
We used Cytoscape with selected plugins to obtain and annotate a connected network of the human healthspan associated genes from Supplementary Tables 1–3 and the C. elegans genes from Supplementary Tables 4, 5. Specifically, we used GeneMANIA to establish a gene/protein interaction network and to add connecting genes, and subsequently we clustered all genes based on their connectivity, and added GeneOntology-based annotations using AutoAnnotate. The resulting healthspan pathway maps are presented in the following. Moreover, health-related gene expression data are overlaid onto all healthspan pathway maps and will be discussed as well; these data are describing the effects of caloric restriction (CR) in humans [37] and of rapamycin in C. elegans [38], as examples of health-promoting interventions, or they describe the effects of aging and disease in specific tissues.
For humans, we derived a gene list (Supplementary Table 6) summarizing all genes associated with healthspan. (see Supplementary Tables 1–3 to trace back these genes to their origin). This list yielded the network of Figure 1, where the two largest pathways/clusters (15 and 13 genes) are specifically labeled by NOTCH and transcription initiation, and by proliferation, and the smaller pathways/clusters (4, 3, 3 and 3 genes) are labeled by cholesterol and lipid processes, by thymus activation, by myotube (striate muscle) regulation, and by Wnt signaling. In Figure 1 bottom, the list of pathways/clusters is given, and the details of the largest pathway are zoomed in.
Figure 1. A healthspan pathway map for humans, based on Supplementary Tables 1–3, including the list of pathways/clusters with their labels as assigned by AutoAnnotate and their size (number of genes). The largest pathway is zoomed in to reveal details. The size of a gene node is proportional to its GeneMANIA score, which indicates the relevance of the gene with respect to the original list of genes to which another 20 genes are added by GeneMANIA, based on the network data. Genes upregulated by CR are shown in yellow, downregulated genes are shown in blue, and grey denotes genes for which no expression values are available in the caloric restriction dataset [37]. The color of an edge refers to the source of the edge in the underlying network, that is co-expression (pink), common pathway (green), physical interactions (red), shared protein domains (brown), co-localization (blue), predicted (orange), and genetic interaction (green). The thickness of an edge is proportional to its GeneMANIA “normalized max weight”, based on the network data. Genes from the GeneMANIA input list feature a thick circle, while genes added by GeneMANIA do not.
In the largest pathway/cluster, in light of the CR-triggered gene expression changes, the most prominent findings are an induced downregulation of NOTCH4 (and to a lesser extent of NOTCH 2 and 3), as well as of LRP1, and an upregulation of TOMM40 and CREBBP (also known as CBP). The family of NOTCH proteins has various functions, including a pro-inflammatory one [39, 40]. NOTCH4 is upregulated in kidney failure [41], and promotes vascularization/angiogenesis, which includes its upregulation in malignancy [40, 42]. A downregulation of NOTCH4 by CR can thus be taken as beneficial effect. This is less obvious for LRP1, the low-density lipoprotein receptor-related protein 1, which is responsible for membrane integrity and membrane cholesterol homeostasis, thus being involved in proper myelination [43] and vascular integrity [44]. A downregulation of LRP1 during CR could therefore be seen as deleterious. However, LRP1 expression mainly depends on cholesterol levels [45] – and these are lower during fasting. Hence, lower LRP1 expression actually reflects a lower LDL level, which per se has been found to be protective. The upregulations observed for TOMM40 and CREBBP during CR can also be seen as protective. TOMM40 is part of a mitochondrial membrane protein translocase, supporting mitochondrial function [46], and low expression and/or particular risk alleles of this protein are associated with Huntington’s and Alzheimer’s Disease [47, 48]. Of note, TOMM40 upregulation during CR goes together with APOE4 downregulation. Although both genes are closely located on chromosome 19, prompting the speculation that this linkage could imply concordant expression changes, this is obviously not the case here. CREBBP is a transcriptional co-activator with histone-acetyltransferase activity [49], acting primarily on histones 3 and 4, and thus it acts in concert with a range of transcription factors. Its downregulation is deleterious, resulting in, e.g., MHCII expression loss on lymphocytes [50], rendering the lymphocytes dysfunctional for antigen presentation, and in inflammatory signaling [51]. An upregulation of CREBBP by CR is thus likely beneficial. We further investigated the miRNAs that are statistically enriched in the largest healthspan pathway using the TFmir webserver [52], revealing regulation of NOTCH genes implicated in the epithelial-mesenchymal transition, cancer, heart failure and obesity, see Supplementary Results. The genes in the next-largest pathway/clusters, related to cell proliferation and lipids, are also described there in detail, as well as further evidence provided by mapping aging- and disease-related gene expression data onto them, as published or collected by Aramillo Irizar et al. [53].
For C. elegans, the gene list representing all healthspan associated genes is shown in Supplementary Table 7 (see Supplementary Tables 4–5 to trace back these genes to their origin). This list yielded the network of Figure 2, where the largest clusters (9 and 6 genes, respectively) are labeled by immune response process and by terms related to the mitochondrion. Three clusters (of 4 genes each) specifically feature dauer/dormancy, hormone response, and regulation. In Figure 2 bottom, the list of pathways/clusters, and the details of the largest pathway are zoomed in. Regarding the first pathway, rapamycin reduces ets-7 transcription, which was shown to be necessary for the healthspan-promoting effects of salicylamine [54]. Furthermore, rapamycin upregulates the transcription factor daf-16 (a homolog to Foxo) and downregulates the daf-16 inhibitors akt-1 and akt-2, putatively leading to an improved stress- and immune-response and prolonged lifespan via the Insulin/IGF-1 pathway [55]. Along the same lines, the akt-1 and akt-2 activator pdk-1 is also downregulated by rapamycin, further promoting daf-16 activity [56]. In contrast, the daf-16 inhibitor sgk-1 (a homolog to Nrf) is upregulated; however, its inhibitory role is subject of discussion [57]. Finally, the transcription factors hsf-1 and skn-1, both important in stress response processes [58, 59], are slightly downregulated in rapamycin-treated C. elegans. Thus, the stress defense system of C. elegans seems to play a central role in healthspan prolongation. Indeed, stress resistance is frequently discussed as a key to a long and healthy life. Vitagenes, which are genes involved in preserving cellular homeostasis during stress conditions, were shown to be crucial for the beneficial effects of dietary phytochemicals [60]. Furthermore, mild stress, which stimulates repair pathways and the stress defense of an organism including vitagenes, is able to promote healthy ageing in numerous ways [61]. This phenomenon, called hormesis, was held responsible for beneficial effects observed by many compound interventions [62–64]. More specific concepts, like mitohormesis which explains how reactive oxygen species can increase life- and healthspan [65] or the xenohormesis hypothesis which links evolutionary processes to the health-promoting abilities of plant-derived food [66] allow deeper insights into the entanglement of stress and health. In the Supplementary Results, the next-largest pathway/clusters, related to the mitochondrion, to dauer/dormancy, to regulation, and to hormone response are described in detail.
Figure 2. A healthspan pathway map for C. elegans, based on Supplementary Tables 4, 5. See also Figure 1. Gene expression data reflect the effect of rapamycin [38].
For C. elegans, we also derived a gene list from WormBase, taking the genes that are most differentially regulated by healthspan-extending interventions and, at the same time, are annotated with a sufficient number of GO terms (see Methods; Supplementary Table 8). We obtained the network of Figure 3. Curiously, the top healthspan pathways of 11, 9 and 8 genes are related to the endoplasmic reticulum (ER), lipid and membrane, to the peroxisome, macrobody and ER, and to the lysosome. The endoplasmic reticulum, the peroxisome and the lysosome are part of the endomembrane system, together with the mitochondria, contributing to healthspan and longevity in mammals and beyond [67]. Peroxisomal and lysosomal functions connect this pathway to dietary effects on lifespan [68, 69], and to liver disease [70]. The second tier of healthspan pathways (6 or 5 genes) are related to morphogenesis, biosynthesis and transcription.
Figure 3. A healthspan pathway map for C. elegans, based on genes affected the most by healthspan-extending interventions, using WormBase gene expression data. See also Figures 1, 2.
For the WormBase data, the list of pathways/clusters, and the details of the largest pathway, are given in Figure 3, bottom. The ER/lipid-related pathway includes genes involved in fatty acid elongation/production (elo-1 to elo-9; let-767; art-1). Overlaying the rapamycin gene expression data, the well-characterized elo-1 and let-767 genes show some downregulation. However, the importance of elongase genes for health maintenance in general was repeatedly documented. Vásquez and colleagues [71] demonstrated the impairment of touch response in elo-1 mutants. They argue that elo-1 has a crucial role in the synthesis of C20 polyunsaturated fatty acids which are required for mechanosensation. Moreover, elo-1 mutants showed increased resistance to Pseudomonas aeruginosa infections due to the accumulation of gamma-linolenic acid and stearidonic acid [72] and knockdown of elo-1 or elo-2 extend survival during oxidative stress [73]. Finally, art-1 is a steroid reductase that is downregulated by rapamycin in our case, but also in long-lived eat-2 mutants [74]. In the Supplementary Results, the next-largest pathway/clusters, related to the ER, the peroxisome, the lysosome, morphogenesis, biosynthesis and transcription, are described in detail.
Overlap between human and C. elegans health genes and healthspan pathways
Based on reciprocal best orthologs, we found no direct overlap between the human health genes based on genetic associations and the C. elegans healthspan genes based in part on genetic interventions, but mostly on expert analysis of intervention effects (Figure 2), or on gene expression changes related to healthspan-extending interventions (Figure 3). We found some hints at an overlap on the level of the healthspan pathway annotations, considering that “proliferation” is listed for human, and “biosynthesis”, “immune response”, and “mitochondrion” for C. elegans, while “transcription” as well as “lipid” are found for both. Due to the post-mitotic nature of the adult C. elegans, proliferation processes have only minor impact on healthspan in C. elegans. In contrast, given that deregulated cell proliferation is the basis for cancer [75] and that cancer is one of the four main reasons for morbidity and mortality in humans according to the WHO (https://www.who.int/gho/ncd/mortality_morbidity/en/; status as of August 2019), it is not surprising that proliferation is a fundamental part of the human healthspan map. Furthermore, since C. elegans is usually fed on bacteria, which cause pathogenic stress in older nematodes [76, 77], the immune system is of particular importance for the health of nematodes. Finally, differences of the healthspan pathway maps regarding annotations such as “mitochondrion” could also be due to differences in how the underlying data were generated, in addition to species-specific differences.
Regarding lipids, for humans, specific reference is made to APOE/APOC (implicated in cholesterol metabolism); for C. elegans, specific reference is made to the elo set of genes (implicated in fatty acid elongation). The dysregulation of cholesterol and its different manifestations such as high- and low-density lipoprotein cholesterol (HDL-C and LDL-C) are one of the main causes for atherosclerotic cardiovascular diseases (CVD), a top ageing-related deadly disease [78, 79]. In contrast to mammals, C. elegans does not exhibit a heart or blood vessels and it cannot synthesize cholesterol by itself. Furthermore, a transgenic cholesterol-heterotrophic line lives 31% longer [80]. Another interesting difference is that cholesterol’s main task in nematodes is probably not its role as a crucial membrane component, but rather its role as a signaling molecule [81, 82]. Further discrepancies regarding the function and regulation of lipids in humans and C. elegans are summarized in Mullaney and Ashrafi [83]. Nevertheless, and quite surprisingly, numerous key components, functions and regulatory pathways regarding lipid metabolism are indeed comparable in C. elegans: Similarities in the regulation of membrane fluidity [84], of fat depletion after consumption of oats [85], legumes [86], and fibrates [87] as well as after exercise [88], and in the genetic background of obesity [89–91] and fat storage [92] are only a few examples. The adult worm is post-mitotic [93] but also many human diseases and cell senescence processes are associated with tissues that no longer divide, e.g., in the brain [94, 95].
In search for other modes of overlap, we additionally constructed and compared two interaction networks, based on mapping genes to their respective orthologs in the other species. Each of the two interaction networks is based on the union set of the health genes of human (based in turn on genetics, Supplementary Tables 1–3, Figure 1) and of C. elegans (based in turn on the gene expression analysis of healthspan-extending interventions using WormBase, Figure 3). Specifically, as outlined in Figure 4, we added the C. elegans orthologs of the human health genes to the list of C. elegans health genes and vice versa, yielding two separate input gene lists for GeneMANIA to enable the construction of the two interaction networks, one per species. We used strict ortholog mapping rules (only reciprocal best hits were accepted). By design, the two gene lists feature a high degree of overlap (with differences due to missing orthologs), and their subsequent comparison, consisting of the partial network alignments that are based on ortholog mapping on the one hand and the species-specific network data on the other hand can only reveal hypotheses for common healthspan pathways, as long as explicit experimental evidence for a relation to health is only found for one species. Moreover, interaction points between a healthspan pathway with evidence in one species and a healthspan pathway with evidence in the other species may be revealed, if a partial alignment of the interaction networks consists of interacting genes for which the relationship to health was demonstrated only in one species for each pair of orthologs.
Figure 4. Workflow of the main analysis steps. First, 52 human health genes (Supplementary Tables 1–3) were processed with GeneMANIA and AutoAnnotate to determine the human healthspan pathway map (left, see also Figure 1). Analogously, 58 worm health genes (based on gene expression analysis using WormBase) were studied, yielding the C. elegans healthspan pathway map (right, see also Figure 3). Then, to determine overlap across species, the gene lists were extended by the orthologs (calculated by WORMHOLE, see Supplemental Methods) from the respective other species. We then employed GeneMANIA as before, to generate two interaction networks (one per list). and overlaps between these two networks of health genes were determined by GASOLINE (middle, see also Figure 5).
Of the two interaction networks to be aligned, the first network is based on C. elegans health genes, the C. elegans orthologs of human health genes, and C. elegans gene interaction information provided by GeneMANIA. The second network is based on human health genes, the human orthologs of C. elegans health genes, and human gene interaction information provided by GeneMANIA. Despite using similar lists of genes (with differences due to missing orthologs and due to the genes added by GeneMANIA), we can expect that the two GeneMANIA networks are quite different because the interaction data sources employed by GeneMANIA are strongly species-specific. Moreover, we observe that in both cases, the 20 closely interacting genes added by GeneMANIA for one species included no orthologs of the other species. Nevertheless, to identify joint healthspan pathways and interaction points between healthspan pathways, we used GASOLINE [96] to align the two networks wherever feasible, obtaining two partial (subnetwork) alignments as output, as shown in Figure 5.
Figure 5. The two alignments demonstrating overlap of (putative) healthspan pathways in human and C. elegans, based on a GASOLINE alignment of the network of genes implicated in health-related gene expression changes in WormBase (top), and in human health based on genetic studies (bottom), and of corresponding orthologs. Dashed edges indicate orthologs, green edges indicate interactions based on GeneMANIA known for the respective species; the node shape is square if the gene originates from the original lists of health genes and it is circular if the gene is an ortholog, and node colors are based on gene expression changes triggered by rapamycin (in case of C. elegans) or by caloric restriction (in case of human), as in Figures 1–3.
In the first alignment (Figure 5, left), we see an alternating pattern of demonstrated health-relatedness, since pak-2, sad-1 and pig-1 are considered health-related by gene expression analysis using WormBase, while CDKN2B and GSK3B are known to be human health genes (Supplementary Tables 1–3; GSK3B was implicated by a GWAS of the Healthy Aging Index, while CDKN2B was in fact one of the few genes implicated by two independent health studies). The C. elegans genes belong to three small clusters in the healthspan pathway map of Figure 3 (pak-2: lysosomal, sad-1: neural, pig-1: biosynthesis), while the human genes belong to one large (GSK3B: proliferation) and one small (CDKN3B: cyclin-dependent kinase) cluster in the human healthspan pathway map of Figure 1. Interactions in C. elegans are all based on shared domains (kinase signaling, except for the predicted interaction of gsk-3 and C25G6.3, which is based on the Interologous Interaction Database), while interactions in human are based on shared domains, genetic interaction (i.e., large-scale radiation hybrid) and pathway data. Essentially, the healthspan pathway overlap suggested by our analysis involves proliferation-related serine/tyrosine kinase signaling (pak-2/sad-1/pig-1 and PAK4/BRSK2/MELK), Wnt signaling (GSK3) and cyclin-dependent kinase signaling (CDKN2B). Both alignments are described further in detail in the Supplementary Results, and a functional analysis of the genes is given in Supplementary Tables 11, 12.
Given lists of genes, there is a plethora of possibilities to organize the genes into groups of related ones. Motivated by the idea of a “healthspan pathway”, we hypothesized that the genes should be known to interact based on functional gene/protein interaction data (provided by GeneMANIA). Here, as in most other studies, pathways are not assumed to be linear [97]. The (higher-level) interaction among the clusters/healthspan pathways (i.e., the pathway map) is given by the individual gene/protein interactions that are shown between the clusters in Figures 1–3. However, we did not investigate these further.
The small amount of healthspan gene/pathway overlap that we found may be seen from a pessimistic or an optimistic perspective, depending in part on expectations. From the pessimistic perspective, the molecular processes may be completely different, and the C. elegans orthologs of the human health genes are involved in different processes as compared to the human health genes, and vice versa. From the optimistic perspective, it may just be that the number and scope of the investigations that yielded the health genes we studied is still insufficient, annotations are still incomplete, and considering only reciprocal best orthologs may be too restrictive. (We tried a less restrictive mapping of orthologs by relaxing the condition that orthologs must be reciprocal, but the overlap was still negligible; results not shown). Nevertheless, future genetic studies are expected to yield more health genes in both species, and their characterizations are expected to improve. Moreover, when we analyze in detail the effects of intervention studies in C.elegans, we do find clear hints to some mechanisms that underlie healthspan also in human [98]. For example, changes in the Ins/IGF-1 pathway genes daf-2 and daf-16 are found to be associated with many of the features described in Supplementary Table 5, suggesting a fundamental role for immune defense mechanisms (and proliferation) in health maintenance, as described by Ermolaeva et al. [99].
Since C. elegans only exhibits an innate immune system and is missing the adaptive immune response, one could argue that the biological relevance of “immune response” in the C. elegans healthspan pathway map is negligible. However, the strict separation of the immune response into an innate and an adaptive system was questioned by Kvell et al. [100] and more recently by Penkov et al. [101], not least because of the discovery of the trained innate immune response [102]. Furthermore, the suitability of C. elegans as a model for the mammalian immune system and for pathogen response was summarized in several reviews [99, 103, 104]. Indeed, based on the expression of antifungal or antibacterial polypeptides in response to pathogenic stress, this nematode is used to find new antimicrobial drugs [105, 106]. Finally, it was demonstrated that immunosenescence, which is one of the most important healthspan parameters, affects the innate immune system in both organisms, nematodes [107–109] and humans [110].
Of course, the precise definition of phenotype is crucial. If the samples are not really about (lack of) health, in human or in C. elegans, then any subsequent molecular or bioinformatics analyses will compare apples and oranges and may thus fail. Therefore, it is important to use a good phenotyping of health in human as well as in C. elegans, and on this basis, to collect data as genome-wide as possible. For most of the age-related diseases that we use to define health in humans, there is no C. elegans counterpart. E.g., as C. elegans has no heart, it cannot have any heart diseases. In addition, the aging process that may underlie most of these age-related diseases is poorly characterized and hard to quantify in humans. Nonetheless, locomotion degrades with age in both species, due to changes at the muscle as well as neural level. Two related features of physical function, that is, grip strength [111] and the ability to sit and rise from the floor [112] are good predictors of all-cause mortality in humans. Likewise, both in humans and in C. elegans, the ability to withstand various forms of stress decreases with age [113, 114]. Thus, at the level of organs or functional systems, both C. elegans and humans show age-related declines in performance, that may well be due to underlying processes that are similar at the cellular and molecular level. Moreover, the investigation of healthspan in C. elegans already identified additional ageing-related genes, e.g. for EGF signaling, which is known for its connection to ageing in mammals [115, 116]. Interestingly, in C. elegans, the EGF-regulator HPA-2 was identified by analyzing locomotion but not lifespan [117] highlighting the usefulness for phenotyping-assays distinct from lifespan. This is underlined by the observation that locomotion is impaired during ageing in mammals and C. elegans in a similar way [118].
Overall, we suggest that within the limitations of currently available data, the health genes we assembled, the healthspan pathways we constructed based on these, and the overlap we then found between species, are a first glimpse of the species-specific and cross-species molecular basis of health.