Differential gene expression orchestrated by transcription factors in osteoporosis: bioinformatics analysis of associated polymorphism elaborating functional relationships

Background: Identification of candidate SNPs from transcription factors (TFs) is a novel concept, while systematic large-scale studies on these SNPs are scarce. Purpose: This study aimed to identify the SNPs of six TF binding sites (TFBSs) and examine the association between candidate SNPs and osteoporosis. Methods: We used the Taiwan BioBank database; University of California, Santa Cruz, reference genome; and a chromatin immunoprecipitation sequencing database to detect 14 SNPs at the potential binding sites of six TFs. Moreover, we performed a case–control study and genotyped 109 patients with osteoporosis (T-score ≤ −2.5 evaluated by dual-energy X-ray absorptiometry) and 262 healthy individuals (T-score ≥ −1) at Tri-Service General Hospital from 2015 to 2019. Furthermore, we used the expression quantitative trait loci (eQTL) from the Genotype-Tissue Expression database to identify downstream gene expression as a criterion for the function of candidate SNPs. Results: Bioinformatic analysis identified 14 SNPs of TFBSs influencing osteoporosis. Of these SNPs, the rs130347 CC + TC genotype had 0.57 times higher risk than the TT genotype (OR = 0.57, p = 0.031). Validation of eQTL analysis revealed that rs130347 T allele influences mRNA expression of downstream A4GALT in whole blood (p = 0.0041) and skeletal tissues (p = 0.011). Conclusions: We successfully identified the unique osteoporosis locus rs130347 in the Taiwanese and functionally validated this finding. In the future, this strategy can be expanded to other diseases to identify susceptible loci and achieve personalized precision medicine.


INTRODUCTION
revealed that the overall prevalence of osteoporosis in people aged ≥50 years increased from 17.4% to 25% in the period of 2001-2011 [3]. The prevalence of osteoporosis in males increased from 6.9% to 13.3%and that of osteoporosis in females increased from 28.1% to 36.2% [3].
Genome-wide association study (GWAS) has become a widely used approach in genome research. This method is based on the use of linkage disequilibrium in chromosomes to map disease-causing alleles [12]. However, for several gene loci, despite repeated analyses by GWAS, their association with complex diseases has remained unclear [13]. An example has shown GWAS in osteoporosis-related study could only explain approximately 20% of the variation in BMD due to missing heritability and stringent statistical test value (p < 5 × 10 −8 ) [14,15]. The pathophysiological causes of osteoporosis are complex, with orchestration of various transcription factor (TF) and biological pathways, forming a complex regulatory network [16,17]. A previous study used the GSE35958 database to analyze differentially expressed genes (DEGs) in patients with osteoporosis and a control group and identified the following osteoporosis-related TFs: E2F TF 4 (E2F4), early growth response 1 (EGR1), JUN proto-oncogene (JUN), trans-acting TF 1 (Sp1), TF 7-like 2 (TCF7L2), tumor protein p53 (TP53), and catenin (cadherinassociated protein) beta 1 (CTNNB1) [18,19]. To our knowledge, polymorphisms in TF binding sites (TFBSs) have been explored their association with disease [20]. As a result, we aligned Taiwan BioBank polymorphisms database to binding sites of 7 chosen TFs by conserved motifs to seek potential disease-related single-nucleotide polymorphisms (SNPs) in Taiwanese. We performed a case-control study to investigate the association between the candidate SNPs and osteoporosis. Finally, we used the Genotype-Tissue Expression (GTEx) database to validate the functionality of SNPs.

Study participants
This case-control study was conducted between March 2015 and October 2019. In this study, 371 postmenopausal women (109 patients with osteoporosis and 262 healthy individuals) were enrolled from Tri-Service General Hospital (TSGH). None of the participants had any history of medication for treating osteoporosis. Data on the demographic and clinical characteristics of all participants were obtained from questionnaires and medical records.

Bone mineral density measurements
BMD (g/cm 2 ), an indicator of osteoporosis, calculated by dividing the bone mineral content (g) by the bone area (cm 2 ) [21], of all participants was measured using dual-energy X-ray absorptiometry (DXA) (GE Medical Systems Lunar, Madison, WI, USA) [22] at the lumbar spine 1-4, and the diagnosis of osteoporosis was based on the World Health Organization (WHO) standards. By osteoporosis is meant BMD measurements at or below the −2.5 standard deviation (SD) from the optimal peak bone density (T-score) of a healthy young adult of the same sex; by contrast, BMD measurement at or above −1 SD from T-score of a healthy young adult of the same sex was considered to reflect bone mass normal [23].

Screening procedures for genetic variation in Taiwanese
We referred to a previous study conducted by Xie et al. [19], which used microarray data to analyze DEGs and obtained seven osteoporosis-related TFs: E2F4, EGR1, JUN, Sp1, TCF7L2, TP53, and CTNNB1. Subsequently, we used the next-generation sequencing (NGS) data of 1,517 people; the data were available from the Taiwan BioBank database and included 74,861,556 genetic variants. We excluded the structural variants (insertions/deletions) because it was not available to use a multifunctional mass spectrometer (mass array) for genotyping. Then, we excluded the SNPs with a call rate of <90%. Finally, we used the chosen SNPs for further alignment.

Identifying the genetic variants that may influence TF binding
First, we used a human reference genome sequence downloaded from the University of California, Santa Cruz (UCSC; GRCh37/hg19). We analyzed genetic variants that may influence TF binding by using bioinformatic sequence alignment techniques and identified the variants located in the TFBS.

Genomic DNA extraction and SNP genotyping
Genomic DNA was isolated from the peripheral blood samples using the standard procedures for proteinase K (Invitrogen, Carlsbad, CA, USA) digestion and the phenol/chloroform method [31]. 14 SNPs, mentioned above, in the TFBSs were genotyped by iPLEX Gold SNP genotyping [32], a genotyping example, rs55785541, shown in Supplementary Figure 1. We used inter-replication validation to assess the quality of the genotyping experiment, which was performed with 19 replicate samples (approximately 5%), and the concordance rate was 100%.

Ethics
This study was reviewed and approved by the institutional ethics committee of TSGH (B202105044). After a detailed explanation of the study objectives, written informed consent was obtained from all participants. All clinical and biological samples were collected and DNA was genotyped after obtaining patient consent.

Statistical analysis
Continuous variables were reported as the mean ± SD and were tested using t-tests and ANOVA. Genotype and allelic frequencies were compared between patients with osteoporosis and healthy individuals using chi-squared or Fisher's exact test. Logistic regression analysis was performed to estimate odds ratios (ORs) and 95% confidence intervals (CIs) [13] with adjustment for age and gender. The analysis was performed using allele type, genotype, dominant, and recessive models. Statistical analyses were performed using SPSS 22.0 (SPSS Inc., Chicago, IL, USA) and R 3.5.1 (R Project for Statistical Computing, Vienna, Austria). A p-value of <0.05 was considered statistically significant.

Candidate TFs and polymorphic TFBSs
According to Xie et al. [19], upstream TFs, E2F4, EGR1, JUN, Sp1, TCF7L2, TP53 and CTNNB1, were identified from DEGs associated with osteoporosis. We then selected NGS data on 1,517 samples from the Taiwan BioBank and excluded 13,614,966 SNPs with structural mutations (insertions/deletions) and 8,854,320 SNPs with a sequencing quality control call rate of <90%. As a result, 52,392,270 SNPs were included in this study.
The UCSC human genomic sequence hg19 and homologous or motif sequences were used for sequence alignment with the SNPs screened from the Taiwan BioBank. The TFs E2F4, EGR1, JUN, Sp1, TCF7L2, TP53, and CTNNB1 had 12,124, 689, 81,754, 97,654, 169,619, 169,319, and 77,606 SNPs, respectively, on the binding sites. After excluding the sites with an MAF of <5%, 1,490, 73, 9,622, 11,567, 17,539, 17,179, and 8,871 binding sites remained, respectively, for E2F4, EGR1, JUN, Sp1, TCF7L2, TP53, and CTNNB1. This was followed by repeated validation using ChIP-Seq data and the exclusion of noncoding regions. Finally, for E2F4, EGR1, JUN, Sp1, TCF7L2, and TP53, 1, 2, 8, 3, 1, and 0 SNPs, respectively, were successfully included in this study. For CTNNB1, as ChIP-Seq data were not available for validation, no SNP was included in this study. In summary, 14 sites in the abovementioned results were included in this study. The detailed candidate results are shown in Table 1.

Basic demographic analysis
Basic demographics are shown in Table 2. There were 262 healthy individuals in the control group and 109 patients in the osteoporosis group. The body mass index (BMI) of the osteoporosis group was lower than that of the control group (p < 0.001). The waist circumference of the osteoporosis group was lower than that of the control group (p < 0.001). Higher proportions of participants in the osteoporosis group took calcium tablets (p = 0.002) and suffered from knee osteoarthritis (p = 0.01). AGING Upstream predictors of seven TFs, E2F4, EGR1, JUN, Sp1, TCF7L2, TP53, and CTNNB1, in osteoporosis [19]. Identification of genetic variants that may influence TFBS through bioinformatic sequence alignment. First, we used the data of a total of 74,861,556 variants (1,517 samples) obtained from the Taiwan BioBank database to screen for Taiwanese population-specific genetic variation. Then, through genetic alignment of GRCh37/hg19 obtained from the National Center for Biotechnology Information database, we found SNPs that may influence the binding affinity. SNPs with an MAF of <5% were excluded from the samples. Chromatin immunoprecipitation sequencing (ChIP-Seq) data obtained from the JASPAR database were used to confirm whether these genetic variants had a combination of the sites. No ChIP-Seq data were available for CTNNB1 validation, and this gene was thus excluded. Finally, we excluded results of the noncoding regions. The variation of 14 SNPs may influence transcription factor binding activity. DEG, differentially expressed gene; NGS, next-generation sequencing; SNP, singlenucleotide polymorphism; Ins/del, insertion/deletion; TFBS, TF binding site; MAF, minor allele frequency.

Association between binding site gene polymorphisms and osteoporosis susceptibility
In total, 14 SNPs were included in this study. All loci conformed to Hardy-Weinberg equilibrium (p > 0.05), with the exception of rs3813600 (p = 0.002). Genotyping results were obtained for 14 SNPs. Our results based on the genotype model showed that rs130347 SNP had a significant association with osteoporosis (p = 0.022; Table 3).
In Table 4, we present the logistic regression analysis data comparing the genotype and allele frequencies of patients with osteoporosis and healthy individuals. For

mRNA expression of SNP polymorphisms with downstream genes
This case-control study showed that rs28481460 and rs130347 are associated with the risk of osteoporosis. In this study, the GTEx database was used for expression quantitative trait loci analysis of the mRNA expression of SNP loci and downstream genes. The GTEx query steps used for gene expression analysis are shown in Supplementary File 2. We observed that the presence of the rs130347 C minor allele in whole blood decreases downstream A4GALT expression (p = 0.0041). In skeletal muscle tissue samples, the presence of the rs130347 C minor allele influences downstream A4GALT expression (p = 0.011; Figure 2). Furthermore, we noted that the presence of the rs28481460 C minor allele in whole blood does not influence downstream ABHD2 expression. In the skeletal muscle tissue samples, the presence of the rs130347 C minor allele does not influence downstream ABHD2 expression (p = 0.68; Figure 3).

DISCUSSION
In this study, we investigated the association of the binding sites of seven TFs (E2F4, EGR1, JUN, Sp1, TCF7L2, TP53, and CTNNB1) with osteoporosis. Our study results revealed that a binding site SNP of the TF, JUN, rs130347, was significantly associated with osteoporosis. In addition, we explored the mRNA expression of rs130347 in the GTEx database and noted that both the whole blood and the skeletal muscle samples showed that the presence of the C allele in rs130347 decreases A4GALT expression.
Osteoporosis is known to be caused by an imbalance between osteoblasts and osteoclasts [33]. Currently, it is known that the biological pathway that influences osteoblasts is the RANK-OPG-RANKL pathway [34]. The binding of RANK and RANKL stimulates NF-κB activation and increases MAPK, JNK, ERK, and p38 activities, and these signaling pathways influence osteoclast formation [35]. JUN and FOS are members of the activator protein 1 (AP-1) family. JNK protease influences osteoblast differentiation by enhancing its binding with Ap-1 family proteins [36]. To validate the effects of JUN on bone growth, a previous study transplanted long bones that induce JUN synthesis into mice with an impaired immune system [37]. The results revealed a significant increase in the bones of the transplanted mice. This showed that JUN is vital to the development of the skeletal system. rs130347 is located upstream of A4GALT, has a length of 29 kb, and is located at 22q13.2. Its primary function is to catalyze the conversion of galactose to lactosylceramide to synthesize globotriaosylceramide (Gb3) [38]. A study demonstrated that Gb3 can bind to verotoxin produced by Escherichia coli to induce apoptosis [39]. Gb3 was also shown to be able to prevent human immunodeficiency virus infection [40]. Moreover, other studies showed that a genetic defect in α-galactosidase in patients with Fabry disease leads to the aberrant accumulation of Gb3 in endothelial cells, causing kidney, heart, and cerebrovascular lesions, and decreases BMD. This results in an increased risk of osteoporosis in patients with Fabry disease [41,42]. However, the association between A4GALT and osteoporosis remains unknown and the biological mechanisms involved are yet to be elucidated. We found that rs130347 is a polymorphic locus located in the binding site of the TF, JUN, and causes a decrease in A4GALT expression.  In this study, we employed a candidate screening method different from previous studies and used the NGS data in the Taiwan BioBank to screen for TFBS polymorphisms that had the highest association with osteoporosis. SNP loci that were not previously investigated for their association with osteoporosis were identified in the case-control study. Compared with the results of GWAS, most SNPs found were not causally related and no association could be found with the disease [43,44]. The bioinformatic analysis used in this study successfully identified one SNP that is correlated with osteoporosis. In future, multiple omics technologies, including genomics, transcriptomics, epigenomics, proteomics, and metabolomics, can be combined to identify the molecular factors contributing to disease pathogenesis and thus address genetic susceptibility to disease development.
Certain potential limitations of this study might have influenced the results. The differential gene data for candidate TFs were obtained from the mesenchymal stem cells of the Caucasian population, which may differ from the RNA sequencing results of the Asian population; this may have influenced the candidate TF results. Furthermore, this study used ChIP-Seq data from the JASPAR database for repeated validation of tissuederived nonosteocytes, which may have influenced the results. Sample size limitations resulted in an inability to examine the effects of genes with an MAF of <5% and structural mutations (insertions/deletions) on osteoporosis. Therefore, we recommend increasing the sample size in the future to examine SNPs that were not covered in this study in order to obtain more osteoporosis-related SNP results.

CONCLUSIONS
In summary, our data demonstrated that rs130347 plays an important role in postmenopausal women with susceptibility to osteoporosis, modulating the epigenetic regulation of a critical osteoporosis-related gene, A4GALT. rs130347 impairs the binding of JUN, which may lead to decreased A4GALT expression. However, the effect of the JUN binding site SNP rs130347 and A4GALT on the development and function of osteoporosis remains incompletely understood, and further exploration of the regulatory mechanism, such as by RNA-Seq of the genomes of Taiwanese patients, is warranted.