Mapping of the gene network that regulates glycan clock of ageing

Glycans are an essential structural component of immunoglobulin G (IgG) that modulate its structure and function. However, regulatory mechanisms behind this complex posttranslational modification are not well known. Previous genome-wide association studies (GWAS) identified 29 genomic regions involved in regulation of IgG glycosylation, but only a few were functionally validated. One of the key functional features of IgG glycosylation is the addition of galactose (galactosylation), a trait which was shown to be associated with ageing. We performed GWAS of IgG galactosylation (N=13,705) and identified 16 significantly associated loci, indicating that IgG galactosylation is regulated by a complex network of genes that extends beyond the galactosyltransferase enzyme that adds galactose to IgG glycans. Gene prioritization identified 37 candidate genes. Using a recently developed CRISPR/dCas9 system we manipulated gene expression of candidate genes in the in vitro IgG expression system. Upregulation of three genes, EEF1A1, MANBA and TNFRSF13B, changed the IgG glycome composition, which confirmed that these three genes are involved in IgG galactosylation in this in vitro expression system.


INTRODUCTION
Glycosylation is a posttranslational modification characterized by the attachment of oligosaccharide chains (glycans) to proteins or lipids [1].Glycans linked to immunoglobulin G (IgG) are essential regulators of effector functions of both native and therapeutic monoclonal antibodies.Alternative glycosylation affects the conformation of the Fc region changing the binding affinity of IgG to downstream immune effectors.Additionally, when deglycosylated, IgG antibodies completely lose their activity in the immune system [2,3].Glycosylated IgG can have pro-and antiinflammatory activity depending on the presence of specific glycan traits.This is exemplified by the removal of core fucose which modulates the affinity of IgG to FcγRIIIa receptor boosting the inflammatory potential by several folds [4,5].The presence of galactose (one (G1) or two (G2) galactose moieties) on IgG glycans also has an important role in the downstream immune responses mediated by IgG antibodies.Most of the new functional studies show that galactosylated IgG has increased complement and FcγR activity [6][7][8][9][10].van Osch et al. [8] even proposed a mechanism where IgG galactosylation promotes the hexamerization of IgG antibodies thus enhancing the activation of the classical complement pathway.However, dozens of population studies performed on thousands of individuals demonstrated that agalactosylated IgG (zero (G0) galactose moieties) associates with inflammation, both in disease but also in advanced age (for a review see [11,12] and references herein).The discovery that IgG galactosylation decreases with aging enabled the development of the glycan clock of ageing [13].It was proposed that agalactosylated IgG which increases with age acts as an effector of pro-inflammatory pathological changes and therefore can be exploited not only as a biomarker, but also functional effector of ageing [14,15].
At the population level, galactosylation is the most variable trait in the human IgG glycome [16,17], with agalactosylated glycans comprising 6%-50% of the total IgG glycome in healthy subjects [18].Galactosylation is also the most variable glycan trait between different strains of mice [19].Elevated levels of agalactosylated IgG glycans were first observed in rheumatoid arthritis [20], and followed by similar discoveries in various autoimmune conditions, such as systemic lupus erythematosus (SLE), inflammatory bowel disease, active spondyloarthropathy, autoimmune vasculitis and adult periodontal disease [21][22][23][24][25]. Changes in levels of galactosylated IgG also occur in infectious diseases such as COVID-19 [26].In addition, decreased IgG galactosylation levels were observed in cancer patients (for a review see [11] and references herein), potentially reflecting the defensive inflammatory response to cancer [27] and acute-phase response processes involved in cancer progression [28].However, the question remains whether the agalactosylated IgG is just a biomarker, or whether it can functionally contribute to disease activity.AGING Several studies have demonstrated that the regulation of IgG glycosylation is largely influenced by genetics.Up to 75% of the variance in some IgG glycan traits can be explained by the genetic component [29].The genetic determinants of populational variation in IgG glycosylation were explored by genome-wide association studies (GWAS) which identified at least 29 candidate genomic loci [30][31][32][33][34].However, the involvement of the specific genes and their potential interactions to generate differential IgG galactosylation are still poorly understood.Here, we conducted a GWAS of IgG galactosylation phenotypes in a study that almost doubles the sample size (N=13,705) compared to previous GWAS of IgG N-glycome [33] and focused on the genes with in silico evidence for involvement in the IgG galactosylation process.To assess their functional role in IgG galactosylation we applied our recently developed HEK293FS IgG transient expression system based on CRISPR-dCas9 molecular tools to functionally test whether the change in the expression of associated genes affects the IgG galactosylation levels [35].

RESULTS
To enable meta-analysis of GWAS of galactosylation phenotypes measured by different analytical platforms and thereby increase the total number of samples, we first developed a protocol for data harmonisation for glycan data generated using different analytical platforms (UPLC and LC-MS).The protocol was established based on the estimated correlation of the glycan traits derived in one cohort that had glycans measured using both UPLC and LC-MS platforms.The aim was to combine IgG subclass-specific glycan information obtained from LC-MS (i.e.galactosylation measured separately on IgG1-4 subclasses) in an appropriate manner to obtain information corresponding to the total IgG galactosylation measured by UPLC (analysis of glycans from both Fab and Fc fragments from all IgG subclasses).As described in Supplementary Methods, we compared different normalization and subclass information summary approaches.In a subset of individuals that were analysed using both methods, the Pearson correlation coefficients ranged from 0.95 and 0.97 for G0, 0.73 to 0.80 for G1, and 0.90 and 0.91 for G2 trait (Appendix Table 1).Different pre-processing approaches performed similarly, so we selected median quotient normalization due to previous recommendation [36], followed by the calculation of the derived traits without IgG subclass weighting in LC-MS data.

Discovery and replication GWAS
We performed a discovery GWAS of IgG galactosylation in seven cohorts of European descent (N=13,705).The association between HRC-imputed SNPs and three phenotypes (G0, G1 and G2) was studied under the assumption of an additive linear model.The inverse-variance meta-analysis of GWAS summary statistics resulted in 16 genome-wide significant (p ≤ 2.5 × 10 -8 ) genomic loci associated with at least one galactosylation trait (Table 1).Quantile-quantile plots of associations for three traits are shown in Supplementary Figure 1.
For twelve loci, the same trait-SNP association reached the significance threshold p-value ≤ 0.0031 (P ≤ 0.05/16 loci) in replication meta-analysis (N=7,775).The association between the top SNP in the HLA region (chr6:31107733-31164511) was not replicated due to the unavailability of the SNP and SNPs in LD in all four replication cohorts.However, the association of the HLA region and IgG glycan patterns was observed in the previous GWAS of IgG glycome [30,31,33,34].For top SNPs in three regions (chr6:74168723-74285118, chr4:103390496-103567348 and chr21:36564553-36665202) the association was not significant after correction for multiple testing in the replication meta-analysis, but the direction of the effect was consistent with the discovery analysis (Table 1).
Due to lack of compatibility in glycan trait definition in the current study and previous GWAS of IgG glycome [30][31][32][33][34], we checked for the overlap of the associated genomic regions across any previously examined IgG glycan traits.Thirteen loci overlap with previously identified and replicated genomic regions (Appendix Table 2).The remaining three associations in chr4:103390496-103567348, chr17:56404349-56418136 and chr6:74168723-74285118 are considered as novel.Among 16 loci, three loci were G2-specific (rs3822960, rs1808192 and rs2526377), two were G1-specific (rs10896045 and rs34562254) and two were G0-specific (rs13250010 and rs199516) (Figure 1A), which is intriguing since conversion of G0 to G1, and then G2 is a consequence of the activity of the single enzyme (product of B4GALT1 gene).Before functional evaluation of GWAS hits, careful inspection of the genomic regions was undertaken to prioritize and choose the genes with strong evidence for a potential role in a biological pathway.We applied a set of in silico methods to prioritise genes for functional analysis.First, we mapped associated variants to genes using mapping strategies based on position, overlap with eQTL associations and chromatin interaction as implemented in FUMA [37], and identified 107 candidate genes in total (Appendix Table 3).The number of genes prioritised by two or more approaches is shown in Figure 1B.Locus-chromosome:start-end position in GRCh37 (hg19) build; Top SNP-rsID identifier for the SNP with the strongest association with the glycan trait; Trait-trait with the lowest p-value for the top SNP in the genomic locus; EA-effect allele for which the effect is reported; OA-non-effect allele; EAF-effect allele frequency; Beta-effect estimate for the effect allele of the top SNP; SE-standard error of the effect estimate; P-p-value for the top SNP-trait association; Beta repl-effect estimate for the effect allele of the top SNP in replication analysis; SE repl-standard error of the effect estimate in the replication analysis; P repl-p-value for the top SNP in the replication analysis; Closest gene-gene found closest to the top SNP in the locus; Prioritized genes-symbol for genes prioritized in the locus.
The gene analysis in MAGMA is based on a multiple linear principal components regression model in which the statistic is computed by combining the p-values of the individual SNPs in a gene while taking into account their LD structure.In gene analysis, SNP data is aggregated to the whole gene level to test the joint association of all SNPs in the gene with the phenotype, thereby making it possible to detect effects consisting of multiple weaker SNP-phenotype associations that would otherwise be missed.The gene-based association test for G0, G1 and G2 identified 38 genes significant at FDR < 0.05 (Appendix Table 4).
Given that missense variants have a direct impact on protein function and therefore a more straightforward effect on the phenotype, we also assessed the functional consequences of associated variants using Variant Effect Predictor [38].For two genes, SLC38A10 and MAPT, variants with potentially pathogenic amino acid (aa) change were detected by both SIFT and Polyphen2 algorithms.For additional four genes, MANBA, SPPL2C, CRHR1 and TNFRSF13B, the variants were predicted to result in a benign aa change.
The details on position and aa changes are listed in Supplementary Table 1.AGING Finally, to assess the potential pleiotropic effects of the variants on IgG galactosylation and gene expression we performed colocalization analysis with the whole blood eQTLgen dataset [39].By comparing the regional association patterns with the approximate Bayesian Factor approach in coloc [40] we identified five genes (KIF3C, NFKB1, MIR-142, EEF1A1 and MTO1) whose expression patterns colocalize with IgG galactosylation (PP4 > 75%) (Supplementary Table 2 and Supplementary Figure 2).Previous prioritization efforts in GWAS of IgG glycosylation, as well as implications for gene's potential function in glycosylation process were taken into account   1C).Supplementary Table 1 contains details on prioritization evidence for each gene.

IgG galactosylation and diseases
Due to existing evidence of altered IgG glycosylation patterns in multiple conditions [11], we assessed the pleiotropy of the IgG galactosylation-associated loci with a range of autoimmune, inflammatory and infectious diseases.The threshold of 75% and higher was applied for the posterior probability for colocalization of traits (PP4).A shared association pattern between SLE and monogalactosylation (PP4 = 82%) was detected in a locus on chromosome 11 (chr11:65555524), while agalactosylation-associated locus chr17:43856639-44863413 colocalized with breast cancer (PP4 = 95%), COVID-19 hospitalization (PP4 = 93%), SLE (PP4 = 89%) and schizophrenia (PP4 = 84%) (Supplementary Table 3).Because of the extensive LD pattern, there are more than twelve candidate genes in the chromosome 17 locus.It is important to note that high PP4 indicates at least one shared causal variant and low PP4 does not mean that two phenotypes do not share one, especially if the probability of both phenotypes being associated with the region but having different causal variants (PP3) is high.

Polygenic score
A polygenic score (PGS) aggregates the effects of many genetic variants into a single number which predicts genetic predisposition for the phenotype.In the standard approach, a PGS is a linear combination of linear regression effect estimates and allele counts at singlenucleotide polymorphisms (SNPs).For each of the G0, G1 and G2 traits, we created two PGS models based on genome-wide association meta-analysis (GWAMA) results.Specifically, we derived the first model based on the SBayesR [41] method, and the second model using a clumping and thresholding (C+T) method.These models were tested within the CEDAR dataset, which has aggregated genotype data and galactosylation information on 162 participants of European ancestry and which was not used in the GWAMA.The estimated SNP-based heritability (h2SNP, SBayesR package [41]) was 36.5% for the G0, 27.2% for G1, and 36.8% for the G2 trait.Our best PGS models explained 7.4% (p-value: 4.79 x 10 -4 ), 2.2% (p-value: 0.059) and 7.4% (p-value: 4.77 x 10 -4 ) of G0, G1 and G2 variance in CEDAR samples.PRS-Phenotype scatter plots and plots showing the mean of trait per group can be found in Supplementary Figures 5-7.The complete results of the implementation of SBayesR and C+T models can be found in Appendix Table 5.

Functional validation of GWAS hits
The genes for functional follow-up were selected based on novelty in GWA studies for IgG galactosylation and by reviewing the evidence for prioritization, expression in B cells (Human Protein Atlas), as well as known roles of the encoded proteins.Newly discovered GWAS hits, EEF1A1 and MANBA-NFKB1, where NFKB1 and MANBA are found in the same locus, were selected and functional follow-up was performed to decipher which one is indeed involved in galactosylation.We also included HIVEP2 because it was previously assigned in the gene network as a transcription factor potentially involved in the regulation of B4GALT1 gene expression [33], and TNFRSF13B for its highly specific expression in B cells and function in humoral immunity.SLC38A10 was included as one of the genes prioritized in chr17:79158040-79268562 locus to test its potential function as a transporter protein in the IgG glycosylation pathway.We also included KIF3C as it was the prioritized gene in chr2:26109539-26149988 locus.
To functionally validate the potential role of the selected GWAS hits in IgG galactosylation, we used the newly developed transient expression system HEK293FS for IgG production with stably integrated dSaCas9-VPR or dSpCas9-KRAB expression cassettes for target gene upregulation or downregulation [35].Therefore, the candidate genes were manipulated in dCas9-VPR or dCas9-KRAB containing monoclonal cell lines transiently transfected with a plasmid carrying genes for IgG heavy and light chains and specific gRNA.Subsequently, secreted IgG was analysed for glycan phenotype.
Utilizing dCas9-VPR cell line and three gRNAs for each locus, we successfully upregulated all selected loci except SLC38A10 and HIVEP2 (Supplementary Figure 3A and Supplementary Table 4).Successful downregulation in the dCas9-KRAB cell line was accomplished using three gRNAs for all selected loci except NFKB1 and HIVEP2 (Supplementary Figure 3B).A significant change of IgG glycome composition was observed after dCas9-VPR targeting of HIVEP2, MANBA, TNFRSF13B and EEF1A1 (Figure 2 and Appendix Table 6).Interestingly, manipulation of the HIVEP2 promotor did not upregulate HIVEP2 transcript level sufficiently to reach statistical significance at transcript level, but it resulted in a significant increase of digalactosylated structures and a concomitant decrease of agalactosylated structures on IgG (Figure 2A).Upregulation of the MANBA locus resulted in a significant decrease of monogalactosylated IgG glycans, but no change in levels of agalactosylated and digalactosylated glycans was observed (Figure 2B).Targeting TNFRSF13B in dCas9-VPR cell line resulted AGING in almost 400-fold increase of its expression and a moderate increase in agalactosylated IgG glycans (Figure 2C).Upregulation of the EEF1A1 locus resulted in a significant decrease of monogalactosylated glycans and in a concomitant increase of agalactosylated IgG glycans (Figure 2D).

DISCUSSION
Using GWAS approach we identified 16 genomic regions associated with IgG galactosylation, which is the molecular basis of the glycan clock of ageing.Thirteen out of 16 loci were associated with traits related to IgG glycosylation in our previous GWAS studies, while three were novel [30][31][32][33][34]. Contrary to previous studies, in which we were looking for genes that were associated with any aspect of IgG glycosylation, in this study we focused on galactosylation, a functionally well-defined element of IgG glycosylation that is the basis of increased inflammation in people with accelerated glycan ageing.This was done by converting percentages of individual glycans in the glycome into sums of glycan structures containing either one, two or no galactose residues.This enabled us to combine GWA studies for which the phenotype analysis was performed using different analytical methods.We also estimated the SNP-based heritability to be 36.5%,27.2% and 36.8%, for G0, G1 and G2, respectively, an estimate that very closely replicates recent heritability estimate for biological age estimate based on these glycan structures [42].Upon prioritization efforts which resulted in 37 candidate genes, we evaluated functional relevance of these genes using HEK293FS transient expression system for IgG production with stably integrated dCas9-KRAB or dCas9-VPR fusion.Results of this study expanded the list of genes for which the functional relevance for in vitro expression and glycosylation of IgG was confirmed with MANBA, TNFRSF13B and EEF1A1 (Figure 2).Thus, out of 13 replicated GWAS hits, for six of them we have functionally confirmed that they act at IgG glycosylation in a way that is conserved between plasma cells and our artificial in vitro expression system in HEK293FS cells (Figure 3).It is likely that some additional GWAS hits may be plasma cell specific and will be confirmed once an efficient system for gene manipulation and IgG production in B cells is developed.However, it is important to note that IgG glycosylation can be affected at multiple other levels (e.g.B-cell selection, maturation, expansion, etc.), thus the absence of functional confirmation of other loci in this assay does not mean that they are not implicated in IgG glycosylation in some other, more complex, manner.
The total heritability of G0, G1, G2 is not known, however, the heritability of individual IgG glycan peaks ranges between 27% and 76% [29].More than half of the studied glycan traits have additive genetic component estimate of 50% which would imply that half of the variance in their values could be explained by common variants, an observation similar to other complex traits [29].The constructed PGS models explained 7.4% of the variance in G0 and G2 traits in an out-of-sample testing, thus capturing about 20% of the estimated SNP heritability.However, for G1 the PGS model explained a minor proportion of SNP heritability.The explanation why for G1, that had comparable SNP heritability, the proportion of variance explained was much less may lie in a flatter distribution and hence individually smaller genetic effects onto G1; and observation which may be described as "higher polygenicity".Indeed, while for G0 and G2 lead associations explained about 1.5% of variance, for G1 the lead variant near TNFRSF13B explained only about 0.5% of its variation.With smaller individual effects, one needs larger discovery GWAS to construct powerful PGS.
Given that the majority of identified candidate SNPs are located in non-coding regions, we applied several in silico approaches to prioritize potential causal genes in associated genomic regions, including exploring functional consequences of associated variants, pleiotropy with gene expression, gene-based association analysis, as well as evidence from previous GWA studies and prior biological knowledge of gene function.Among 37 candidate genes from 16 associated loci, for the two genes, RUNX3 and B4GALT1, the effect on IgG glycosylation has recently been demonstrated using direct gene expression modulation in HEK293FS IgG transient expression system [35,43].
Due to existing evidence of altered IgG glycosylation patterns in multiple conditions [11], we performed colocalization analysis with a range of diseases and observed a high probability for pleiotropic effects in the AP5B1/OVOL1 locus on chromosome 11 for SLE and monogalactosylation.Interestingly, for the same locus, we observed considerable probability for colocalization (PP4=0.56)with asthma and allergy.Although this is below our pre-defined posterior probability for colocalization (PP4) threshold of PP4 > 0.75, the finding is in line with previous results linking this locus with variation of specific digalactosylated structures (FA2FG2S1), expression of the OVOL1 gene and the risk of asthma [34].OVOL1 gene encodes a putative zinc finger containing transcription factor with a role in the development and differentiation of epithelial and germ cells [44].The locus on chromosome 17 was pleiotropic for agalactosylation and SLE, breast cancer, COVID-19 hospitalization and schizophrenia.The colocalized region on chromosome 17 (chr17:43856639-44863413) contains at least twelve genes and is known for numerous copy number variants.It also includes a megabase-long inversion polymorphisms (H1 and inverted H2 forms) which contain partial duplication of KANSL1 gene with inverted H2 isoform having a higher frequency than H1 isoform in the European population [45].The complexity of the locus makes it challenging to prioritise a specific gene without further investigation.
The functional follow-up was based on the previously developed HEK293FS transient expression system for IgG production which was shown to be an excellent tool for studying the role of the genes identified in glycosylation GWAS [35].Despite the fact that we could not demonstrate upregulation of HIVEP2 mRNA using dCas9-VPR, a significant increase in digalactosylated structures with a concomitant decrease in agalactosylated structures was measured from IgG secreted from the same cells.HIVEP2 (HIVEP zinc protein 2) encodes a transcription factor that interacts with numerous viral and cellular promoters, and has an established role in immunity [46,47].Even though we prioritised HIVEP2, the locus interacts with variants in six genes via chromatin interactions (Supplementary Figure 4), including FUCA2 (alpha-L-fucosidase) which is located 0.6 megabases away.It is plausible that dCas9-VPR targeting of the HIVEP2 promoter disrupted these chromatin interactions which in turn altered the expression of other genes mapping to this region.In the case of the locus on chromosome 4, the functional study was utilized as one of the steps to decipher which of the two genes might influence galactosylation process as both NFKB1 and MANBA were prioritised with in silico approaches.The MANBA/NFKB1 region was identified in previous studies as a disease susceptibility locus for primary biliary cholangitis (PBC) [48] and a recent study found a decreased level of IgG galactosylation in PBC patients [49].The NFKB1 gene encodes a subunit of the nuclear factor of kappa light polypeptide gene enhancer in B-cells (NF-κB) transcription factor family, known for its critical role in cell survival and inflammation [50].The upregulation of NFKB1 using dCas9-VPR resulted in a significant change in transcript level, but no change in IgG glycoprofile was observed.On the other hand, the upregulation of the MANBA gene expression resulted in a decrease of IgG monogalactosylated species.The MANBA gene encodes beta mannosidase which is an exoglycosidase enzyme cleaving beta-mannose residues from the non-reducing end of N-linked glycans [51].As such, it has a known role in glycosylation process but currently we cannot speculate about the exact mechanism by which it affects IgG galactosylation.Interestingly, mutations in MANBA are known to affect kidney function [52], blood pressure [53] and cardiovascular disease [54], while the monogalactosylated IgG glycan (G1) (that MANBA affects in our in vitro expression system) was previously identified as the best predictor of future cardio vascular events in women [55,56].While at the moment we do not know putative mechanisms that could explain the link between MANBA, G0 and cardiovascular or renal diseases, these intriguing associations warrant future studies.
The EEF1A1 (Eukaryotic Translation Elongation Factor 1 Alpha 1) gene encodes for an isomer of an alpha subunit of the eukaryotic translation elongation factor-1 complex that has a role in binding of aminoacyl tRNA to the A site on ribosomes during protein synthesis [57].Together with EEF1A2, it is the second most abundant protein (1%-3%) in the cell with other roles outside of protein synthesis, including protein degradation, apoptosis modulation, oncogenesis and viral pathogenesis [58,59].Previous study has shown that EEF1A1 also serves as a signal transducer during inflammation, where it can enhance interleukin 6 expression through STAT3 and PKCδ [60].Interleukin 6 is implicated in different autoimmune and inflammatory diseases [61] and its high levels lead to decreased IgG galactosylation [62].By targeting dCas9-VPR to EEF1A1, we increased its expression level, which resulted in a decrease of galactosylated IgG glycans.Therefore, our results are concordant with the previous findings, suggesting that EEF1A1 upregulation could lead to pro-inflammatory modulation of IgG by decreasing its galactosylation.
A locus harbouring TNF receptor superfamily member 13B (TNFRSF13B) gene was previously identified in the multivariate IgG glycosylation GWAS [34], while here it was associated with IgG monogalactosylation.TNFRSF13B encodes a transmembrane activator calcium modulator and cyclophilin ligand interactor (TACI) protein, a lymphocyte-specific member of the tumour necrosis factor (TNF) receptor superfamily, which has a role in signalling pathway leading to B cell differentiation and antibody production [63,64].Common genetic variation in TNFRSF13B locus was previously associated with different levels of immunoglobulins in serum [65], while rare deleterious variants were implicated in common variable immunodeficiencies and IgA deficiencies [66,67].Therefore, we speculate that TNFRSF13B might impact glycosylation profile of the sample by controlling secretion of certain IgG glycoforms.Despite extensive increase in TNFRSF13B expression upon activation with dCas9-VPR, we observed only a moderate increase in IgG agalactosylation and no changes in mono-or digalactosylation.Our experimental setting might be limited by the B-cell specific function of TNFRSF13B and thus, we might not be able to observe changes comparable to changes in the gene expression level.
The role of four other prioritized genes, NFKB1, KIF3C, HIVEP2 and SLC38A10, could not be validated in our HEK293FS transient expression system.Even though this cell system represents a good model for studies of IgG glycosylation regulation, it has some limitations.HEK293FS cells do not inherently secrete immunoglobulins and their epigenetic context is not equivalent to plasma and B cells.In addition, alternative galactosylation might depend on gene expression in other cell types and their interactions, hence we might not observe the natural effects after manipulation of gene expression in this model system, due to the lack of the right biological context.
In conclusion, in the present study we mapped 16 genetic loci regulating glycan structures that are not only the basis of the glycan clock of ageing, but also functional effectors of inflammation at multiple levels.Furthermore, we were able to confirm the functional role of three genes (MANBA, TNFRSF13B and EEF1A1) in the IgG galactosylation pathway by use of the HEK293FS transient expression system with stably integrated dCas9 fusion designed for this purpose.Figure 3 summarizes all GWAS hits with previously unknown link to IgG glycosylation, whose functional relevance has been shown using this cell system.Further research is needed to fully elucidate functional mechanism behind their role in ageing and to reveal the complete network of gene interactions regulating the complex process of IgG glycosylation.However, this study is an important step in this direction since it clearly indicates that regulation of IgG galactosylation extends far beyond the expression of the galactosyltransferase gene that adds galactose.

MATERIALS AND METHODS
Information about sample numbers, sex and age for each cohort is provided in Appendix Table 7.Further information on genotyping, genotype imputation and quality control can be found in Appendix Table 8.

Glycan measurements
An overview of cohorts and technologies used for IgG glycome measurement (LC-MS or UPLC), as well as the published studies where IgG glycome analysis for the corresponding cohort was described in more detail, are shown in Appendix Table 7.
Briefly, for UPLC-based analysis, IgG was isolated from blood plasma samples using protein G plates, followed by neutralization and denaturation of the protein.N-glycans were enzymatically released from IgG and fluorescently labelled with 2-aminobenzamide dye, followed by separation and quantification by hydrophilic interaction UPLC which resulted in 24 peaks (GP1-GP24), most of which represent a single glycan structure.In cases where multiple glycan structures are found under one peak, the most abundant structure is considered.The list of all glycans and their proportion under each peak can be found in Pučić et al. [18].
For LC-MS-based glycan analysis, IgG was isolated from blood plasma samples by affinity chromatography binding to protein G plates and treated with trypsin to allow cleavage of IgG at specific amino acid sites.The cleavage resulted in different glycopeptides across IgG subclasses, thereby enabling subclass-specific glycan measurements.IgG subclass separation and detection were performed using a nano-LC system coupled with quadrupole-TOF-MS.IgG2 and IgG3 subclasses have the same tryptic glycopeptide moieties, thus the separation of the subclass-specific glycopeptides was not possible.LC-MS quantification resulted in 50 values which correspond to 20 glycans measured on IgG1, 20 glycans on IgG2/3 and 10 glycans on IgG4.The list of glycan structures measured by UPLC and LC-MS along with their description is listed in Appendix Tables 9, 10.

Glycan data pre-processing
Prior to genetic analysis, the best approach for harmonization of glycan measurements by UPLC and LC-MS was chosen (Supplementary Methods) to allow for meta-analysis of GWAS summary statistics from all cohorts.To reduce the impact of experimental variation on the downstream analysis, glycan measurements were normalized using median quotient normalization and batch corrected using empirical Bayes method [68] implemented in ComBat function of the "sva" package [69] in statistical software R [70].Median quotient normalization was applied due to negligible differences between different normalization approaches in the comparison and further supported by previous recommendation [36].Next, three derived traits were calculated as the percentage of structures in the total IgG glycome containing zero, one or two galactose units as G0, G1 and G2, respectively (Appendix Table 11).In this way, the subclass-specific glycan values from LC-MS measurements were summarized to allow for comparison with total glycan data obtained with UPLC.The glycan data was preprocessed centrally in Genos for all cohorts except Leiden Longevity Study (LLS), for which the glycan data was pre-processed by the phenotype provider from the Leiden University Medical Center.

Genome-wide association study
Discovery genome-wide association study was performed in seven cohorts of European descent (CROATIA-Vis, CROATIA-Korcula, CROATIA-Split, TwinsUK, EPIC, VIKING, ORCADES) with a total sample size of 13,705.Quantile normalization was applied to the derived traits in each cohort.For CROATIA, VIKING and ORCADES cohorts, first linear mixed model was fit to adjust for age, sex and cohort-specific covariates while also accounting for the genomic kinship.This step was performed using polygenic function in GenABEL R package [71], followed by testing the association between SNPs and phenotype using the RegScan software v0.5 [72].For TwinsUK cohort, the derived traits were adjusted for age and sex and the association analysis was carried out using GEMMA [73] including kinship matrix to correct for family structure.SNPTEST software was used for association testing of age-and sex-adjusted glycan values with genetic variants in EPIC cohort.GWAS was performed on Haplotype Reference Consortium (HRC) [74] imputed genotypes, assuming an additive linear model of association.Quality control of studyspecific summary statistics was performed prior to meta-analysis using EasyQC R package as described in Winkler et al. [75].
After quality control, the GWAS summary statistics for seven cohorts were pooled using the inverse-variance weighted method for meta-analysis as implemented in METAL software [76].METAL software also allowed for the estimation of genomic control (GC) to correct test statistics to account for population stratification in the studies included in the meta-analysis (lambda GC median =1.00; range 0.98-1.01).Given that we performed GWAS for three correlated traits, we used 5 x 10 -8 /2= 2.5 × 10 -8 as a genome-wide significance threshold, where 2 denotes the number of PCs that explain 99% of the variation in the three galactosylation traits.

Replication analysis
Following the discovery GWAS, we performed replication meta-analysis to validate results from the discovery phase.Four independent cohorts of European descent were used: KORA F4, LLS, EGCUT and GCKD, with a total sample size of 7,775.Glycan measurements for KORA F4 and LLS cohort were quantified with LC-MS, while EGCUT and GCKD glycan measurements were obtained with UPLC.The strongest glycan trait-SNP pair from each locus in the discovery analysis was meta-analysed using the fixedeffect inverse variance method.The significance threshold was set to p-value < 0.05/16 = 0.0031, where 16 is the number of associated genomic regions.We also checked the consistency of effect direction between the discovery and replication analyses in cases where the top SNP did not formally replicate.

Genomic loci definition
Genomic loci associated with galactosylation phenotypes were defined using FUMA v. 1.3.6b[37].SNP2GENE function in FUMA first identifies independent (LD r 2 < 0.6) significant SNPs with MAF > 0.01 to determine borders of genomic locus.LD estimates were inferred from reference panel derived from 10,000 subjects of European descent in UK Biobank [77].The maximum distance of 250kb was used for merging LD blocks into a single genomic locus.SNPs in LD (r 2 > 0.6) with independent SNPs AGING within 250 kb distance were selected as associated SNPs and considered in further analysis.

Gene prioritization
Since we aimed to demonstrate in vitro the functional activity of the proteins encoded by genes potentially causal for IgG galactosylation, we first employed multiple in silico strategies to prioritise candidate genes.Briefly, we used the following criteria: gene mapping (prioritising based on genomic position, overlap with eQTLs and chromatin interactions), functional consequence-based prioritisation (genes whose missense variants were associated with galactosylation), colocalizetion of SNP associations with gene expression in whole blood and with galactosylation, and gene-based analysis.

Gene mapping
Genes in the significantly associated genomic regions were mapped using three approaches implemented in FUMA: positional mapping, eQTL mapping and chromatin interaction mapping.Genes were positionally mapped based on ANNOVAR [78] annotations and the maximum distance of <10 kb between associated variants and genes.The eQTL mapping was based on overlap of galactosylation-associated variants and eQTLs in B and T cells, including Database of Immune Cell Expression (DICE) [79], Fairfax et al. [80] and CEDAR [81] datasets.Only significant eQTL signals at FDR < 0.05 were considered.Chromatin interaction mapping was performed using the Hi-C data derived from B cell line (GM12878) [82] and the suggested FDR value < 1 × 10 -6 for significant chromatin interaction was used.

Functional consequences
To assess potential functional consequences of galactosylation associated variants we used SIFT and Polyphen-2 algorithms, as implemented in the Variant Effect Predictor (VEP) v97 by Ensembl [38].

Colocalization with gene expression in whole blood
The colocalization of galactosylation GWAS signals and gene expression was estimated using Approximate Bayes Factor (ABF) method [83], as implemented in the coloc package [40] in R. The method outputs five posterior probabilities, one for each of the five hypotheses: H0) no association with either of the two traits, H1) association with trait 1, but not with trait 2, H2) association with trait 2, but not with trait 1, H3) association with both trait 1 and trait 2, but two independent causal variants and H4) association with both trait 1 and trait 2 and one shared causal variant.Whole blood eQTL data was obtained from the publicly available eQTLgen dataset which was derived from 31,684 individuals across 37 cohorts [39].Tissue-or cell-specific eQTL data (e.g., B cell or T cell) were not used due to low number of SNPs in the associated regions which overlap in galactosylation GWAS summary statistics and tissue-specific eQTL data to allow for reliable colocalization test.The posterior probabilities for colocalization with cis-eQTLs were computed for each gene found in galactosylationassociated genomic loci using SNP p-values and MAF.The default values for prior probabilities were used (p1 = 1 × 10 -4 , p2 = 1 × 10 -4 and p12 = 1 × 10 -5 ).The threshold of 75% for PP4 (probability of the same shared variant for two traits) indicated positive colocalization and strong support for prioritization of the gene in the given genomic locus.

Gene-based association test
Genome-wide gene-based association test for all three traits was performed by MAGMA v1.08 [84] to evaluate the joint effects of variants in 18,655 proteincoding genes while accounting for LD between those variants.In such test, SNP data is aggregated to the whole gene level to test the joint association of SNPs in the gene with the phenotype to allow detection of effects comprising of multiple weaker SNP-phenotype associations that would potentially be missed.MAGMA uses p-values derived from variant-based analyses as input and implements the SNP-wise mean model which derives a mean χ 2 statistic for SNPs in the gene and a pvalue by using a known approximation of sampling distribution.Only genes significant at FDR < 0.05 were considered.

Colocalization with diseases and traits
The ABF colocalization method was used to explore the pleiotropy between IgG galactosylation and complex diseases for which there is previous evidence of aberrant IgG glycosylation [11] and traits with shared associated variants in the GWAS Catalog, as obtained from GWAS Catalog query (accessed in June 2021).The full list of traits and links for summary statistics download is available in Supplementary Table 5.The default values for prior probabilities were used (p1 = 1 × 10 -4 , p2 = 1 × 10 -4 and p12 = 1 × 10 -5 ).The posterior probabilities by ABF were computed using beta and standard error values if available in the dataset, otherwise, SNP p-values and MAF were used.The threshold of 75% for PP4 (probability of the shared underlying causal variant for two traits) was used for colocalization and evidence of high confidence for pleiotropy between IgG galactosylation and disease or trait.AGING

Polygenic score
SbayesR method reweights the effect of each variant according to the marginal estimate of its effect size, statistical strength of association, the degree of correlation between the variant and other variants nearby, and tuning parameters.This method requires a compatible LD matrix file computed using individuallevel data from a reference population.For these analyses, we used publicly available shrunk sparse GCTB LD matrix including 2.8 million pruned common variants from the full UK Biobank (UKB) European ancestry (n ≈ 450,000) data set and computed from a random set of 50,000 individuals of European ancestry from the UKB data set [41,77].SbayesR was run for each chromosome separately, and with the default parameters except for the number of iterations (N=3000) and p-value (0.9) (Appendix Table 12).
Clumping and thresholding model was built using a pvalue and linkage disequilibrium-driven clumping procedure in PLINK version 1.90b (--clump) [85].In brief, the algorithm forms clumps around SNPs with association p-values less than a provided threshold (p-value=5 x 10 -8 ).Each clump contains all SNPs within 250 kilobases of the index SNP that are also in linkage disequilibrium with the index SNP as determined by a provided pairwise correlation (r 2 =0.2) threshold in the linkage disequilibrium reference.The algorithm iteratively cycles through all index SNPs, beginning with the smallest p-value, only allowing each SNP to appear in one clump.The final output should contain the most significantly disease-associated SNP for each linkage disequilibrium-based clump across the genome.The prediction accuracy was defined as the proportion of the variance of a phenotype that is explained by PGS values (R 2 ).To calculate PGS based on the PGS model we used PLINK2 software, where PGS values were calculated as a weighted sum of allele counts.Out-of-sample prediction accuracy was evaluated using samples from the CEDAR cohort that was not used for discovery or replication.

Plasmid constructs for targeted upregulation/ downregulation of prioritized GWAS hits
Seven genes strongly associated with IgG glycosylation were selected for the follow-up functional validation.A newly developed transient expression system with stably integrated CRISPR/dCas9 fusions dSaCas9-VPR (for gene expression upregulation) and dSpCas9-KRAB (for gene expression downregulation) [35] was used for in vitro validation.Three guide RNAs (gRNAs) were designed using the CRISPick online tool (Broad Institute) for adequate dCas9 orthologue (dSaCas9 or dSpCas9) and each selected locus (KIF3C, NFKB1, MANBA, SLC38A10, TNFRSF13B, EEF1A1 and HIVEP2).Specific gRNAs were assembled with genes coding for IgG light and heavy chains, CBh promoter and bGH terminator using Golden Gate enzymes in three steps as described in Mijakovac et al. [35].Non-targeting gRNAs were assembled in the same manner.Plasmids pORF-hp21, pORF-hp27 (Invivogen, San Diego, CA, USA) and p3SVLT were used for enhanced IgG production [86].Specific gRNA sequences targeting each gene are shown in Appendix Table 13.

Transfection of monoclonal dCas9-VPR/dCas9-KRAB HEK293FS cell lines
The main approach was to use suspension cell lines with integrated expression control machinery based on dCas9 fusions with IgG-producing plasmid bearing gRNA expression cassettes, as described in Mijakovac et al. [35].Briefly, suspension-adapted monoclonal cell lines with stably integrated dSaCas9-VPR (dCas9-VPR) or dSpCas9-KRAB (dCas9-KRAB) were grown until they reached the appropriate density for cell transfection with gRNA-IgG bearing plasmids as well as plasmids used for enhanced protein production in mass ratio gRNA and IgG chain bearing plasmid, p3SVLT, pORF-hp21 (Invivogen, San Diego, CA, USA) and pORF-hp27 (Invivogen, San Diego, CA, USA) 0.69/0.01/0.05/0.25.Transfections were performed with 293fectin transfection reagent diluted in Opti-MEM I Reduced Serum Medium (Thermo Fisher Scientific, Waltham, MA, USA).Five days following transfection, cells were centrifugated and used for gene expression analysis, while supernatant was used for IgG isolation and glycan analysis.

Reverse transcription and quantitative real-time PCR (qPCR)
Total RNA was isolated from cell pellets with Rneasy Mini Kit (Qiagen, Hilden, Germany) and 50 ng of RNA was converted to cDNA using the PrimeScript Rtase (TaKaRa, Kusatsu, Japan) and random hexamer primers (Invitrogen, Waltham, MA, USA).All samples were treated with Turbo Dnase (Invitrogen, Waltham, MA, USA) to remove any remaining DNA.Gene transcripts were detected with SYBR Green Gene Expression Assay using the 7500 Fast Real-Time PCR System.Primer sequences are listed in Appendix Table 14.The mean value of nine replicates was normalized to HPRT1 expression which was used as an endogenous control.Fold changes in transcript expression compared to non-targeting control were analysed using the ΔΔCt method [87].

Glycan measurements
IgG glycan measurements were done as described previously [35] with a few modifications described below.Briefly, this method consisted of five main steps: 1) IgG isolation from HEK293FS transient expression system cell culture supernatants using Protein G Agarose fast flow beads (Merck, Darmstadt, Germany); 2) Enzymatic release of N-glycans from isolated IgG using PNGase F (Promega, Madison, WI, USA); 3) fluorescent labelling of released IgG glycans with procainamide hydrochloride (Thermo Fisher Scientific, Waltham, MA, USA); 4) clean-up of labelled IgG glycans using hydrophilic interaction liquid chromatography solid-phase extraction (HILIC-SPE) on a 0.2 μm Supor filter plate (Pall Corporation, Port Washington, NY, USA) and 5) separation of fluorescently labelled glycans by hydrophilic interaction chromatography on a Waters Acquity UPLC instrument (Waters, Milford, MA, USA).The IgG isolation step was modified from that previously described in the way that all washing steps after incubation of samples with Protein G beads as well as elution of IgG were done on an Orochem filter plate (Orochem Technologies Inc., Naperville, IL, USA).The obtained UPLC chromatograms were all separated in the same manner into 24 peaks and the amount of glycans in each peak was expressed as a percentage of total integrated area.

Statistical analysis
Statistical analysis of RT-QPCR and IgG glycan data was performed using R [70] and GraphPad (GraphPad Software, San Diego, CA, USA).RT-QPCR was done in two technical replicates and biological replicates were pooled from independent experiments.Group differences were determined using a two-tailed t-test on ΔΔCt values.IgG glycans were measured as a percentage of area under chromatogram peaks (Supplementary Table 6).Derived glycan traits (G0, G1 and G2) were calculated as a sum of glycan peaks containing zero, one or two galactose units.Initial round of experiments involved both up-and down-regulation of the genes and was used as pilot study to determine whether the expression of genes changed and if there were significant changes in glycan profile by applying Student's T-test.Five genes (HIVEP2, MANBA, EEF1A1, TNFRSF13B and NFKB1) were selected and the experiments for gene expression upregulation were repeated.To combine results of testing differences in control and treated samples by Student's T-test across multiple experimental runs, meta-analysis approach was applied using metafor R package [88].

Data and materials availability
The full summary statistics for discovery meta-analysis for G0, G1 and G2 traits is deposited at Zenodo (https://www.doi.org/10.5281/zenodo.7227999).The summary statistics for gene expression levels in different tissues and cell types are available in the eQTL catalogue project, DICE project and eQTLGen consortium.The summary statistics for other complex traits are available in publicly available resources, as listed in Supplementary Table 5.Additional data related to this paper may be requested from the authors.

AUTHOR CONTRIBUTIONS
AFH-data analysis and interpretation, visualisation, writing-original draft preparation, writing-review and editing.KM-functional study experiments, data analysis and interpretation, visualisation, writingoriginal draft preparation, writing-review and editing.AM-functional study experiments, data analysis and interpretation, visualisation, writing-original draft preparation, writing-review and editing.AN-PGS calculation, visualisation, writing-original draft preparation, writing-review and editing.SZS-PGS calculation, data analysis.AL-data analysis, THperforming GWAS in EGCUT cohort; EA-performing GWAS in LLS cohort.SS-performing GWAS in KORA F4 cohort.RRCC-performing GWAS in EPIC cohort.MM-performing GWAS in EPIC cohort.YLperforming GWAS in GCKD cohort.TK-IgG glycan quantification.NR-IgG glycan quantification.TŠ-IgG glycan quantification, writing-review and editing.MPB-IgG glycan quantification, writing-review and editing.ITA-IgG glycan quantification, writing-review and editing.IG-IgG glycan quantification.JŠ-IgG glycan quantification.TP-IgG glycan quantification.BR-IgG glycan quantification.PT-IgG glycan quantification.KF-genomic and demographic data provider for EGCUT cohort.MB-genomic and demographic data provider for LLS cohort.MWgenomic and demographic data provider for LLS cohort, writing-review and editing.CG-genomic and demographic data provider for KORA F4 cohort.MBSgenomic and demographic data provider for EPIC cohort, writing-review and editing.CW-genomic and demographic data provider for EPIC cohort, writingreview and editing.OP-genomic and demographic data provider for CROATIA cohorts.CH-genomic and demographic data provider for ORCADES cohort.JFW-genomic and demographic data provider for VIKING cohort, writing-review and editing.TDSgenomic and demographic data provider for TwinsUK cohort.AK-genomic and demographic data provider for GCKD cohort, writing-review and editing.FV-IgG glycan data quality control, supervision, YSA-PGS AGING calculation supervision, writing-original draft preparation, writing-review and editing.AV-data analysis, visualization, supervision of functional study.JK-IgG glycan quantification in functional study, IgG glycan quantification in cohorts, supervision, writingoriginal draft preparation, writing-review and editing.LK-conceptualisation, supervision, data interpretation, writing-original draft preparation, writing-review and editing, VZ-supervision for the functional study, data interpretation, writing-original draft preparation, writing-review and editing., GL-conceptualisation, supervision, data interpretation, glycan data provider, writing-original draft preparation, writing-review and editing.

CONFLICTS OF INTEREST
YSA is a cofounder and a co-owner of PolyOmica and PolyKnomics, private organizations providing research services in the field of quantitative, computational and statistical genomics.GL is the founder and CEO of Genos Ltd, a private research organization that specializes in high throughput glycomics analysis and has several patents in this field.AFH, TŠ, MPB, ITA, IG, JŠ, TP, BR, PT, FV and JK are employees of Genos Ltd.The remaining authors declare no Conflicts of interest.

ETHICAL STATEMENT AND CONSENT
The participating cohorts were approved by local research ethics committees for academic and commercial use of the study.The informed consent was obtained from the participants of the study.

Studied cohorts
Recruitment of participants, sample collection, genotyping and phenotyping in the cohorts used in the galactosylation GWAS was performed by staff members at the University of Zagreb, Croatia, University of Split Medical School, Croatia, UK, United Kingdom, King's College London, UK, German Institute of Human Nutrition, Germany, University of Tartu, Estonia, Leiden University Medical Centre (LUMC), Netherlands, Helmholtz Zentrum München -German Research Center for Environmental Health, Germany, and the nine study centers of the German Chronic Kidney Disease (GCKD) study.

TwinsUK
TwinsUK is a national registry of 12,000 volunteer twins in the UK.The cohort consists of 83% female subjects with a nearly equal number of monozygotic (51%) and dizygotic (49%) twin pairs.With an aim to study the genetics of healthy ageing and complex diseases, a sample of 7000 twins was assessed for a range of clinical, biochemical, behavioural and socioeconomic characteristics.Moreover, several omics' datasets for the TwinsUK dataset are available including genome-wide SNP data.The study participants provided informed consent and ethical approval was obtained for academic and commercial use of the study [1].

The European Prospective Investigation into Cancer and Nutrition
The European Prospective Investigation into Cancer and Nutrition (EPIC) -Potsdam is a prospective cohort study that includes 27548 participants recruited from the Potsdam population in Germany from 1994 to 1998 [2].Participants' age at recruitment ranges between 35 and 65, and the number of male and female subjects is 10904 and 16644, respectively.Initial data collection consisted of anthropometric measurements and blood sample collection used for omics' data derivation [3].Ethical approval was obtained from the ethics committee in Germany and all participants provided informed consent [2].

CROATIA Vis, CROATIA-Split and CROATIA-Korcula
CROATIA-Vis, CROATIA-Korcula and CROATIA-Split cohorts were collected as part of the "10001 Dalmatians" study, a study of Croatian island isolates with participants from six Adriatic islands (Korčula, Vis, Lastovo, Susak, Rab, Mljet) and the city of Split.The aim of the study is to investigate genetic and environmental determinants in health and disease by using the advantage of genetically isolated populations.In the recruitment process, a total of 1008 participants aged 19-93 were recruited for the CROATIA-Vis cohort in villages of Vis and Komiza during 2003 and 2004, 1012 subjects aged 18-85 were recruited in 2009-2010 in the city of Split for CROATIA-Split cohort and data on CROATIA-Korcula subjects (aged 18-98) was collected from the island of Korcula, specifically from the town of Korcula and three villages including Lumbarda, Zrnovo and Racisce.Participants were assessed for a number of anthropometric and physiological measurements, and they donated overnight fasting blood samples which were later used for DNA analysis, biochemical measurements and molecular marker assessment [4].The study participants provided signed informed consent.Ethical approval was obtained for each cohort from ethics committees in Croatia and Scotland.

The Orkney Complex Disease Study
The Orkney Complex Disease Study (ORCADES) is a family-based cohort collected with the goal of identifying genetic risk factors in complex diseases in the population of isolated Orkney Island in northern Scotland.The recruitment started in 2005 and lasted for six years during which 2080 participants were recruited.The subjects were included if they had at least two of their grandparents who were Orcadian.The initial data collection included cardiovascular measurements and fasting blood sample collection, followed by subsequent visits for cognitive function assessment, eye measurements and DEXA scans.The study was approved by ethics committees in Scotland and all participants gave informed consent [5].

Leiden Longevity Study
Leiden Longevity Study (LLS) is a family-based cohort from the Dutch population which was intended for studies of human longevity.Nonagenarian siblings (individuals having a sibling older than 89 years for men and 91 years for women) and their offspring and offspring's spouses (serve as controls) were included in the study if they were European.Initial recruitment AGING started in 2002 and ended in 2006 during which blood samples were collected for assessment of plasma parameters and genetic material extraction.A total of 3359 subjects were included: 944 long-lived proband siblings, 1671 offspring and 744 controls (offspring's spouses).This study was approved by the ethics committee of LUMC, Netherlands.Signed informed consent was given by all participants [6].In the current study, a subset of 1190 participants including only offspring and their spouses is used.

The Cooperative Health Research in the Augsburg Region F4
The Cooperative Health Research in The Augsburg Region (KORA) F4 is a population-based study that was conducted between 2006 and 2008 as a follow-up of the KORA S4 study [7].Participants were randomly selected from the population registry in the Augsburg region and two other neighbouring counties in Germany.The data collection included standard medical and physical examinations.A total of 3080 participants (51% females) aged 32-86 years were recruited [8].Ethical approval for the study was obtained from the Ethics committee of Bavarian Chamber of Physicians, Germany.All participants provided signed informed consent before entering the study.

The Viking Health Study -Shetland
The Viking Health Study -Shetland (VIKING) is an epidemiologic study initiated to explore genetic risk factors for complex diseases.The cohort consists of individuals from an isolated population of Shetland in northern Scotland and the main criteria for recruitment was to have at least two grandparents from Shetland.Between 2013 and 2015, 2105 participants were recruited.A large number of distant relatives makes the VIKING cohort fit for the identification of rare genetic variants influencing the disease risk [9].During initial data collection, data on health-related phenotypes and environmental parameters were collected and participants donated a fasting blood sample.

The Estonian Genome Center of the University of Tartu
The Estonian Genome Center of the University of Tartu (EGCUT) Biobank is a volunteer-based cohort of 52 thousand adult subjects aged ≥ 18 from the Estonian population.The recruitment was conducted throughout Estonia via general practitioners' offices and medical personnel during the 2002-2012 period.Besides completing a questionnaire on topics such as lifestyle, diet and clinical diagnostics, participants also donated blood samples.The cohort was utilized in the exploration of more than 200 traits including anthropometric traits, common and rare diseases, blood biochemistry, as well as lifestyle and personality traits.The data on study participants is being continuously updated via follow-up health checks using electronic health registries and reexaminations [10].

German Chronic Kidney Disease study
German Chronic Kidney Disease (GCKD) study is an ongoing prospective study of kidney disease patients who are under nephrologist care in Germany [11].The current sample size of 5217 makes it one of the largest chronic kidney disease cohort in the world.The subject enrolment was undertaken between 2010 and 2012 via nephrologist practice and outpatient care units of nine study centers throughout different regions in Germany.The mean age of study subjects is 60 years, with 40% of the participants being female.The data collection includes collecting information on sociodemographic factors, medical and family history, as well as obtaining blood samples in a standardized manner, which we immediately processed and stored frozen in a central biobank until measurement of the glycans.

Genotyping, genotype QC and genotype imputation
Genotyping was performed using commercially available SNP genotyping arrays, followed by genotype calling in Illumina and Genome Browser software.Quality control (QC) of genotype data was performed to exclude SNPs and samples with low quality including removal of 1) SNPs with low call rate, 2) SNPs violating the assumptions of Hardy-Weinberg Equilibrium (HWE) and 3) SNPs with low minor allele frequency (MAF) < 1%.Details on SNP exclusion criteria for each cohort and imputation to Haplotype Reference Consortium (HRC) [12] panel are shown in Appendix Table 8.The genotypes were mapped to Genome Reference Consortium GRCh37 (hg19) build.

IgG N-glycan quantification by ultra-performance liquid chromatography
Ultra-performance liquid chromatography (UPLC) is used for quantification of glycan structures attached to Fc (constant region) and Fab (variable region) portions of IgG without the possibility to distinguish them.Detailed protocol for UPLC quantification of IgG Nglycans is published elsewhere [13].
Briefly, IgG was isolated from blood plasma samples using Protein G plates (BIA Separations, Ajdovščina, Slovenia).After filtration, plates were extensively washed to remove unwanted proteins and IgG was AGING released from protein G monoliths using 0.1 M formic acid.Eluates were collected in a 96-well plate and neutralized with neutralization buffer (1 M ammonium bicarbonate) to pH 7.0 to maintain the stability of IgG.IgG samples were dried and denatured using SDS detergent and incubated at 65° C for 10 minutes.Nglycans from IgG were released using recombinant Nglycosidase F followed by fluorescent labelling with 2-aminobenzamide (2-AB) dye.Hydrophilic interaction liquid chromatography (HILIC) based solid-phase extraction (SPE) was used to remove excess protein, reagents and fluorescent label.Fluorescently labelled Nglycans were separated by hydrophilic interaction UPLC on Waters Aquity UPLC H-class instrument (Waters, Milford, MA) with Waters bridged ethylene hybrid (BEH) glycan chromatography column.A linear gradient of 75 to 62% ACN in a 20-min analytical run was used to separate different glycan structures.The retention times for individual glycans were converted to glucose units based on hydrolysed and 2-AB labelled glucose oligomers which were used as external standards for calibration of the system.Data processing was done in two ways depending on the cohort, 1) automatic integration as described in Agakova et al. [14] or 2) using Empower 3 software with an automated processing method with traditional integration algorithm, followed by manual correction of each chromatogram to maintain the same integration intervals in all samples.The resulting chromatograms were separated into 24 peaks where the amount of glycans was expressed as % of the total integrated area in the corresponding peak (GP1-GP24).Total separation of each glycan structure is not possible using the described method, thus resulting in multiple glycan structures being detected under ten peaks.Glycan structures in each peak are listed in Appendix Table 9.

Glycan quantification by liquid chromatography coupled with mass spectrometry
The full name of the method is reverse-phase nanoliquid-chromatography-sheath-flow-electrospray-mass spectrometry (LC-ESI-MS) but in this study, we refer to it as LC-MS.The detailed protocol for analysis of IgG N-glycans using LC-MS is described in Selman et al. [15].Briefly, IgG was isolated by affinity chromatography binding to protein G 96-well plates (BIA Separation, Ajdovščina, Slovenia) and treated with trypsin overnight at 37° C which allowed cleavage of IgG at specific amino acid sites.The cleavage by trypsin resulted in different glycopeptides due to the difference of amino acid sequence in different IgG subclasses, thereby enabling subclass-specific glycan measurements.IgG subclass separation was performed using the Ultimate 3000 HPLC system (Dionex Corporation, Sunnyvale, CA).The SPE trap column was conditioned with mobile phase A and samples were loaded and separated on Ascentis Express C18 nano-LC column (Supelco, Bellefonte, USA) conditioned with mobile phase A and 95% ACN.For detection of separated subclass-specific glycopeptides, the HPLC system was coupled to a Dionex Ultimate UV detector and interfaced to a quadrupole-TOF-MS mass spectrometer (Bruker Daltonics, Bremen, Germany) with a standard ESI source (Bruker Daltonics, Bremen, Germany) and a sheath-flow ESI sprayer (Agilent Technologies, Santa Clara, USA).The mass spectra were recorded in a range between 300 and 2000 m/z with two averages at the frequency of 1Hz.The analysis time for one sample was 16 minutes.The calibration of LCMS datasets was done internally using a list of known glycopeptides and datasets were exported to the open mzXML format by Bruker DataAnalysis 4.0 software, followed by alignment to a master dataset of a typical sample.Inhouse software "Xtractor2D" was used to extract predefined features such as peak maximum or peak area in specific retention time and mass windows.Relative intensities of subclass-specific glycopeptides were obtained by integrating and summing three isotopic peaks.The obtained intensities were then normalized to the total IgG subclass-specific glycopeptide intensities.IgG2 and IgG3 subclasses have the same tryptic glycopeptide moieties, thus not enabling the separation of the subclass-specific glycopeptides.Here, obtained measurements are simply referred to as IgG2/3.LC-MS quantification results in 50 values which refer to 20 glycans measured on IgG1, 20 glycans on IgG2/3 and 10 glycans on IgG4.All glycans measured on IgG4 are fucosylated structures since the nonfucosylated glycans are hard to distinguish from the glycans found on IgG1 [16].The list of glycans measured by LC-MS and their description is listed in Appendix Table 10.

IgG glycan data harmonization
Previously, there were no GWA meta-analyses of IgG N-glycan patterns using GWAS of both UPLC-and LC-MS-derived IgG N-glycan traits, therefore making it necessary to first assess the correlation of the data and methods which should be applied in pre-processing step to make UPLC and LC-MS glycan traits comparable.For this purpose, we used the CROATIA-Vis cohort (n=661) as both UPLC and LC-MS IgG N-glycan measurements are available in the same samples.
We aimed to combine IgG subclass information obtained from LC-MS in an appropriate manner to obtain information corresponding to total IgG glycome values measured by UPLC.Pre-processing of IgG glycome data consists of normalization of the data and batch correction to remove the effects of experimental variation.Normalization procedure is necessary to remove unwanted variation between the samples and allows quantitative comparison of the samples [17].We tested the following three normalization types: total area normalization, largest peak normalization and median quotient normalization, all of which can be applied using "glycanr" [18] package in R software [19].We tested different normalizations both across the total glycome and per IgG subclass in LC-MS data.
Due to varying laboratory conditions during IgG Nglycan measurement, it was necessary to perform batch correction to remove non-biological, experimental variation.The batch correction was performed using ComBat function in R package "sva" [20].We first logtransform the data as ComBat function implements empirical Bayes method for batch correction [21] which assumes a normal distribution of the data, followed by batch correction with ComBat() where the batch is denoted by the plate on which the samples were analysed, and lastly, exponential transformation of the values to the original scale.
We calculated derived traits from the initial traits for purpose of data harmonization and to enable a more straightforward interpretation of the GWAS results so that the discovered genomic loci can be directly linked to the addition of one two galactose residues to the IgG N-glycan chain: agalactosylation (G0), monogalactosylation (G1) and digalactosylation (G2).Formulas for calculation are listed in Appendix Table 11.
Since LC-MS data quantifies glycans attached to different IgG subclasses, in order to calculate different galactosylation traits we also incorporated the approximation of the IgG glycan subclass response factor (RF) to represent the IgG subclass concentration relative to other subclasses.RF is defined as the ratio between the concentration of the analyte and instrument response to the analyte.Using unpublished in-house experimental data, the subclass-specific response factors were approximated as follows: IgG1 with RF=1, IgG2/IgG3 with RF=2 and IgG4 with RF=1.IgG subclasses are present in different quantities in human serum; hence we also incorporated relative concentrations of each IgG subclass in the calculation of derived traits.The following relative measurements were indicated in the literature: 66% for IgG1, 30% for IgG2/IgG3 and 4% for IgG4 [22].The subclass-specific glycan measurements were weighted by the corresponding concentration and response factor before trait calculation.

Pre-processing of glycan data
Glycan data was pre-processed centrally in Genos for all cohorts except the LLS cohort for which the glycan data was pre-processed by a colleague from Leiden University Medical Centre.It is important to note that glycan data for the CROATIA-Korcula cohort was obtained in three instances (2010, 2013 and 2017) and each dataset was pre-processed separately and treated as an individual cohort in downstream analysis.Additionally, the TwinsUK cohort was analysed in four separate batches.Due to differences in sample collection, batches 1 and 2 were treated as one dataset and batches 3 and 4 were treated as the second dataset.
Data points in the 99.9 th percentile were removed and considered as technical outliers.Next, based on the results of the previous data harmonization assessment, median quotient normalization was applied on both UPLC and LC-MS glycan data across 24 and 50 glycan measurements, respectively and the batch correction was applied.Prior to the genetic association test, galactosylation traits in all cohorts were transformed using rank-based inverse normal transformation (mean=0, standard deviation=1).AGING

Appendix Tables
Please browse Full Text version to see the data of Supplementary Table 2. Cohort-name of the study; Platform-platform used for quantification of IgG N-glycans; Reference-published study in which the IgG N-glycome analysis for the cohort was described; N total (glycans)-number of subjects in the cohort with measured glycans; N female-number of female subjects; N male-number of male subjects; median age-median age of participants in the cohort; N in GWAS-number of pariticipants in GWAS.

Figure 1 .
Figure 1.Genome-wide significant associations with galactosylation phenotypes.(A) Top SNP in identified genomic regions for each associated trait, (B) Venn diagram showing the number of genes mapped by positional mapping, chromatin interaction mapping, eQTL mapping and genome-wide gene-based association analysis (MAGMA), (C) Manhattan plot of genome-wide significant associations in IgG galactosylation GWAS with prioritized genes in each locus.Plot shows -log10(p-values) of association on y-axis and SNPs ordered by chromosomal location on x-axis.Red line indicates the genome-wide significance threshold (2.5 ×10 -8 ).Orange gene names indicate novel loci associated with IgG glycosylation.

Figure 2 .
Figure 2. Changes in IgG glycan composition upon targeting of selected GWAS loci associated with IgG galactosylation (HIVEP2, MANBA, TNFRSF13B and EEF1A1) in dCas9-VPR monoclonal cell lines.Samples containing non-targeting gRNAs served as controls.Changes in transcript levels are given as fold change values and changes in IgG phenotype are given as a relative change compared to control samples.(A) Manipulation of the HIVEP2 gene did not result in a statistically significant change in HIVEP2 transcript level, however, did induce a significant increase of digalactosylated structures (G2) with a concomitant decrease of agalactosylated IgG glycan structures (G0).(B) Targeting of MANBA by dCas9-VPR elevated transcription level of this gene which resulted in decrease of monogalactosylated IgG glycan structures (G1) (C) Targeting TNFRSF13B by dCas9-VPR resulted in ~ 400-fold increase of transcript levels and significant change of IgG agalactosylated IgG glycans (G0).(D) Successful upregulation of the EEF1A1 locus was followed by an increase of agalactosylated IgG glycans (G0) with a concomitant decrease of monogalactosylated IgG glycans (G1).Nominal p-value: *<0.05; **<0.01;***<0.001;ns, not significant.

Figure 3 .
Figure 3.A graphical review of up-to-date functional validation of GWAS hits associated with IgG glycosylation using HEK293FS transient expression system with stably integrated dCas9-VPR or dCas9-KRAB fusions.When specific gRNA targeted dCas9-VPR to two estrogen associated genes, RUNX3 and SPINK4 (Mijakovac et al. 2021), as well as to the TNFRSF13B, MANBA and EEF1A1 loci, transcription was upregulated that resulted in decreased levels of galactosylated (red arrow within box), increased levels of agalactosylated IgG glycan structures (green arrow within box) or both.Downregulated transcription of SPPL3 resulted in an increase in galactosylated structures with a concomitant decrease in agalactosylated IgG glycan structures.Protein structures do not depict true protein structures in humans, generic protein shapes are chosen for easier visualization.Figure was created with BioRender.com.Accessed on 16 November 2021.

Table 1 . Pearson's correlation coefficients for UPLC and LC-MS derived glycan traits after applying different normalization types and weighting of the subclass-specific values in LC-MS measurements.
Trait-galactosylation trait which was tested; Gene-symbol of the gene; Chr-chromosome on which gene is found; Startstarting position for the gene sequence in GRCh37(hg19) build; End-end position for the gene sequence; N SNPs-number of SNPs in the gene; Z-Z-score of the association; P-unadjusted P value of the association; P adjusted-adjusted p value using Benjamini and Hochberg method.

Table 5 . Results of implementation of SBayesR and C+T models for G0, G1 and G2.
PRS Model -method that was used to build model; Parameters -sets of parameters for a linear regression model with the trait as the outcome; R2 -part of trait variance that a model explains; R Spearman -the Spearman correlation coefficient between observed and predicted values of the trait; PRS p-value -the two-tailed p-value for the t-stats of PRS param.AGING

Table 6 . Meta-analysis results of testing differences in G0, G1 and G2 traits in treated (upregulation) and control cell line.
Trait values were transformed to obtain normal distribution using Box-Cox method and analyzed using Student's t test.Estimate-effect size estimates; SE-standard error of estimate; p-value-p value of the meta-analysis; CI-confidence interval.