Research Paper Volume 12, Issue 22 pp 22457—22494

Generalized correlation coefficient for genome-wide association analysis of cognitive ability in twins

Afsaneh Mohammadnejad1, , Marianne Nygaard1, , Shuxia Li1, , Dongfeng Zhang2, , Chunsheng Xu3, , Weilong Li1, , Jesper Lund1, , Lene Christiansen1,4, , Jan Baumbach5,6, , Kaare Christensen1,7, , Jacob v. B. Hjelmborg1, , Qihua Tan1,7, ,

  • 1 Epidemiology, Biostatistics and Biodemography, Department of Public Health, University of Southern Denmark, Odense, Denmark
  • 2 Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao, China
  • 3 Qingdao Center for Disease Control and Prevention, Qingdao, China
  • 4 Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen Ø, Denmark
  • 5 Computational Biomedicine, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
  • 6 Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
  • 7 Unit of Human Genetics, Department of Clinical Research, University of Southern Denmark, Odense, Denmark

Received: July 20, 2020       Accepted: September 29, 2020       Published: November 24, 2020
How to Cite

Copyright: © 2020 Mohammadnejad et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Despite a strong genetic background in cognitive function only a limited number of single nucleotide polymorphisms (SNPs) have been found in genome-wide association studies (GWASs). We hypothesize that this is partially due to mis-specified modeling concerning phenotype distribution as well as the relationship between SNP dosage and the level of the phenotype. To overcome these issues, we introduced an assumption-free method based on generalized correlation coefficient (GCC) in a GWAS of cognitive function in Danish and Chinese twins to compare its performance with traditional linear models. The GCC-based GWAS identified two significant SNPs in Danish samples (rs71419535, p = 1.47e-08; rs905838, p = 1.69e-08) and two significant SNPs in Chinese samples (rs2292999, p = 9.27e-10; rs17019635, p = 2.50e-09). In contrast, linear models failed to detect any genome-wide significant SNPs. The number of top significant genes overlapping between the two samples in the GCC-based GWAS was higher than when applying linear models. The GCC model identified significant genetic variants missed by conventional linear models, with more replicated genes and biological pathways related to cognitive function. Moreover, the GCC-based GWAS was robust in handling correlated samples like twin pairs. GCC is a useful statistical method for GWAS that complements traditional linear models for capturing genetic effects beyond the additive assumption.


SNPs: single nucleotide polymorphisms; GWASs: genome-wide association studies; GCC: generalized correlation coefficient; MIC: maximal information coefficient; MINE: maximal information-based nonparametric exploration; LD: linkage disequilibrium; AD: Alzheimer’s disease; FDR: false discovery rate; DZ: dizygotic; MZ: monozygotic; MADT: Middle-Aged Danish Twins; MoCA: Montreal Cognitive Assessment; MAF: minor allele frequency; HWE: Hardy-Weinberg equilibrium; LME: mixed-linear model; GSEA: Gene-set enrichment analysis.