Research Paper Volume 13, Issue 5 pp 6442—6458

Male-specific age estimation based on Y-chromosomal DNA methylation

Athina Vidaki1, , Diego Montiel González1, , Benjamin Planterose Jiménez1, , Manfred Kayser1, ,

  • 1 Department of Genetic Identification, Erasmus University Medical Center Rotterdam, Rotterdam 3000, CA, The Netherlands

Received: March 25, 2020       Accepted: February 25, 2021       Published: March 11, 2021
How to Cite

Copyright: © 2021 Vidaki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Although DNA methylation variation of autosomal CpGs provides robust age predictive biomarkers, no male-specific age predictor exists based on Y-CpGs yet. Since sex chromosomes play an important role in aging, a Y-chromosome-based age predictor would allow studying male-specific aging effects and would also be useful in forensics. Here, we used blood-based DNA methylation microarray data of 1,057 males from six cohorts aged 15-87 and identified 75 Y-CpGs with an interquartile range of ≥0.1. Of these, 22 and six were significantly hyper- and hypomethylated with age (p(cor)<0.05, Bonferroni), respectively. Amongst several machine learning algorithms, a model based on support vector machines with radial kernel performed best in male-specific age prediction. We achieved a mean absolute deviation (MAD) between true and predicted age of 7.54 years (cor=0.81, validation) when using all 75 Y-CpGs, and a MAD of 8.46 years (cor=0.73, validation) based on the most predictive 19 Y-CpGs. The accuracies of both age predictors did not worsen with increased age, in contrast to autosomal CpG-based age predictors that are known to predict age with reduced accuracy in the elderly. Overall, we introduce the first-of-its-kind male-specific epigenetic age predictor for future applications in aging research and forensics.


BIC: Bayesian information criterion; BMIQ: beta mixture quantile; CpG: cytosine-phosphate-guanine site; CV: cross-validation; DNA: Deoxyribonucleic acid; DNAm: DNA methylation age (Horvath clock); EWAS: epigenome-wide association study; FDP: forensic DNA phenotyping; GEO: Gene Expression Omnibus database; HIV: human immunodeficiency viruses; IGV: integrative genomics viewer; IQR: inter-quantile range; MAD: mean absolute deviation; MLR: multiple linear regression; MSE: mean square error; OLS: ordinary least squares; oob: out-of-band; QC: quality control; RELIC: regression on logarithm of internal control probes; RFR: random forest regression; RMSE: root mean square error; RSS: residual sum of squares; SNP: single nucleotide polymorphism; SVM: support vector machine; Y-CpG: Y-chromosome-located CpG.