Polymorphisms in lncRNA MIR2052HG and susceptibility to breast cancer in Chinese population

Background: Published studies based on pharmacokinetics have explored the relationship between the lncRNA MIR2052HG and the prognosis of breast cancer (BC) resistance and recurrence. However, the underlying association of MIR2052HG SNPs with BC development remains unclear. Methods: Combining bioinformatics and databases, SNPs (Single Nucleotide Polymorphisms) in the MIR2052HG gene were screened, and SNPs in the lncRNA MIR2052HG were selected for genotyping among 504 Chinese Han patients and 505 healthy controls, which were frequency-matched for age (±2 years). Logistic regression analysis was used to explore the association between MIR2052HG SNPs and the BC risk. Interactions between the MIR2052HG SNPs and reproductive factors were further evaluated using the multifactor dimensionality reduction (MDR) method. qRT–PCR was performed to detect MIR2052HG expression in individuals with different genotypes of rs34841297. The target miRNA, miR-4456 of MIR2052HG rs34841297 was predicted by websites and confirmed by performing dual luciferase gene reporter assays. CCK-8 and Transwell experiments were designed to explore the effects of miR-4456 on the proliferation, invasion and migration of BC cells. Results: In this study, nine SNPs were screened. After adjusting for age, menarche age, menopausal status, number of pregnancies, history of abortions, breast feeding history and family history of BC, the results of the logistic regression analysis showed the rs34841297 A/- gene polymorphism was positively correlated with the incidence of BC. Compared with the AA genotype, patients with the A-+-- genotype of rs34841297 at age<50 years, and menarche age<14 years, Premenopausal status, history of abortion, no history of breastfeeding and no family history of tumors in first-degree relatives had an increased risk of BC. MDR results revealed that individuals with rs34841297 - (homozygous deletion) of the A allele who were not menopausal and had no history of breastfeeding had a higher risk of BC. qRT–PCR results revealed that homozygous deletion (1.68±1.37) of the rs34841297 A- genotype resulted in higher MIR2052HG expression than the heterozygous deletion genotype (0.95±0.94) and wild AA genotype (0.26±0.12). Binding between MIR2052HG and miR-4456 was occurred when rs34841297 carried the AA genotype. Moreover, preliminary functional studies indicated that the overexpression of miR-4456 increased the proliferation, invasion and migration of BC cells. Conclusion: Our study showed that the MIR2052HG gene polymorphism may be related to BC susceptibility, and the MIR2052HG rs34841297 A/- genotype may probably affect the proliferation, invasion and migration of BC cells by modulating the interactions with of miR-4456.


INTRODUCTION
According to the latest global cancer burden data released by the International Agency for Research on Cancer (IARC) of the World Health Organization in 2020 [1], the incidence of breast cancer (BC) in women exceeds that of lung cancer, accounting for 11.7% of all new tumors and becoming the most common cancer in the worldwide. Approximately 685,000 deaths from BC were reported, making it the fifth leading cause of cancer-related death in the world [1], 90% of which were caused by distant metastasis of primary tumor cells [2,3]. Among women, the BC incidence and mortality rank first among 159 countries in the world, and the relative increase in cancer risk was the largest in low or medium countries/regions (95% increase and 64% increase from 2020, respectively). Although the mortality rate of BC had decreased in 1991-2017 [4], in the past ten years (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017), the rate of decrease in mortality of females with BC has gradually slowed. The aforementioned survey data showed that the previous preventive or treatment measures were not effective, and the high morbidity and mortality rates have become a public health problem that seriously threatens women's health. Therefore, effective and cost-effective early detection, early diagnosis and individualized treatment of BC have become urgent problems that remain to be solved worldwide.
Current studies have reported many reproductive and environmental factors related to BC [5][6][7]. Genetic factors are also indispensable for increasing the risk of BC. BRCA1 and BRCA2 are currently recognized BC susceptibility genes and are widely measured as predictors of BC risk [8][9][10]. Recently, with the development of high-throughput sequencing technology, noncoding RNAs have been extensively studied, and lncRNAs, new BC biomarkers, are involved in some biological processes, such as cell proliferation, cell cycle, apoptosis, pluripotency differentiation and maintenance [11], resulting in the occurrence and development of some cancers including BC, liver cancer, and lung cancer, by promoting tumor proliferation, invasion, and metastasis [12][13][14][15].
The LncRNA MIR2052HG, also known as FLJ39080 and LOC441355, is a long non-coding RNA located on chromosome 8. James N et al. [16] found that the risk of BC recurrence in individuals with homozygous mutant and heterozygous genotypes of the lncRNA MIR2052HG was lower than that in the wild type homozygous individuals. In addition, MIR2052HG overexpression increases BC cell proliferation and promotes colony formation. Pharmacogenomics studies [17] have shown that as a functional polymorphic gene, MIR2052HG might affect the risk of BC recurrence in women treated with aromatase inhibitors. Single Nucleotide Polymorphisms (SNPs) are polymorphisms in DNA sequences caused by variations in single nucleotides at the genome level. According to genome-wide association studies (GWAS) [18], SNPs in lncRNAs are related to susceptibility to many diseases, and SNPs at the key regulatory position of lncRNAs may substantially disrupt their function. Wang L et al. [19] also found that the SNP rs3802201 in MIR2052HG is closely related to the recurrence. However, this study was mainly based on pharmacogenomics to explore the relationship between the lncRNA MIR2052HG and BC resistance and recurrence. Researchers have not clearly determined whether there is an association between the genetic variants of MIR2052HG and BC susceptibility exists. Therefore, relying on the Han population in Henan, this project screened SNPs in the lncRNA MIR2052HG that affect the occurrence of BC and studied the possible molecular mechanism to discover new risk markers for BC. The results might facilitate the early identification and diagnosis of BC in high-risk populations to achieve the purpose of early prevention of BC.

Basic characteristics of study subjects
Based on a case-control study, the basic information of 504 patients and 505 healthy controls was presented in

Association of MIR2052HG SNPs with BC susceptibility
Different genotype models of nine MIR2052HG SNPs with BC susceptibility were presented in Table 2. The Conclusion: Our study showed that the MIR2052HG gene polymorphism may be related to BC susceptibility, and the MIR2052HG rs34841297 A/-genotype may probably affect the proliferation, invasion and migration of BC cells by modulating the interactions with of miR-4456.

Stratified analysis of the association between MIR2052HG SNPs and breast cancer susceptibility
A stratified analysis was conducted to further explore the relationship between the nine SNPs in the MIR2052HG gene and BC susceptibility. As shown in Table 3, compared with the rs34841297 AA genotype, the A-+--genotype of rs34841297 in patients age<50 years

False positive report probability (FPRP)
FPRP analysis [20] was used to evaluate the reliability of the positive results for MIR2052HG SNPs associated with BC susceptibility. As presented in Supplementary  Table 6, when the critical value of FPRP was set to 0.5 and prior probability was 0.25, the FPRP values of all positive results for SNPs rs2553716, rs269183, rs269198, and rs12546233 were lower than the critical value. A possible association between the SNPs and the risk of BC was observed, which was worthy of further research and verification.

Haplotype analysis
Haplotype analysis was used to test the combined effect of MIR2052HG SNPs (Supplementary Table 7

Multifactor dimensionality reduction
MDR software (multifactor dimensionality reduction 3.0.2) was used to analyze the interaction of genes and reproductive factors. As shown in Table 4, there was an interaction with the rs34841297 -(homozygous deletion) of A genotype, non-menopausal status, and no history of breastfeeding was observed, furthermore, the interaction model revealed a higher risk of BC (OR: 1.771, 95% CI: 1.367-2.941, P<0.001).

Real-time PCR results
The MIR2052HG expression levels in individuals with the rs34841297 --, A-and AA genotypes were shown in Figure 1A. The relative expression in individuals with the homozygous deletion genotype (1.68±1.37) was significantly higher than that in individuals with the heterozygous deletion genotype (0.94±0.95) (P=0.011) and AA genotype (0.26±0.12) (P<0.001). In addition, the relative expression of MIR2052HG in individuals with a homozygous deletion of rs34841297 was significantly higher than that of individuals with the AA genotype (P=0.001).

Dual-luciferase reporter assays
A dual-luciferase reporter assay was performed in 293T cells to determine the biological association between rs34841297 and miR-4456. As shown in Figure 1B, the relative luciferase activity of the rs34841297 W-NC group was significantly higher than rs34841297 W-miR-4456 group (P<0.001), suggesting an interaction between the rs34841297 wild genotype plasmid vector and miR-4456. Meanwhile, the relative luciferase activity of the rs34841297 W-miR-4456 group was lower than the rs34841297 MUT-miR-4456 group (P<0.001), showing that the interaction between the plasmid vector of MIR2052HG and miR-4456 disappeared due to the deletion of the rs34841297 A allele.

The effect of miR-4456 combined with MIR2052HG on cell proliferation, invasion and migration
The  Cross-validation consistency. The results presented in Figure 1D showed that the number of invading MDA-MB-231 cells in the miR-4456 low-expression group was lower than that in the NC group (P<0.001), and the number of invading MDA-MB-231 cells in the miR-4456 overexpression group was higher than that in the NC group (P=0.002). Finally, Transwell experiments were performed to explore the migration ability of BC cells with different miR-4456 expression levels. As presented in Figure 1E,

DISCUSSION
In the present study, six SNPs rs3802201 (C>G), rs2553716 (A>C), rs4259395 (A>G), rs2588297 (G>T), rs10957736 (C>T) and rs12546233 (A>C), reduced the risk of BC. Among them, the rs3802201 (C>G) mutation reduces the risk of breast cancer, which corresponds with the study by Wang L et al. [19]. However, the rs34841297 gene polymorphism was positively correlated with the incidence of BC, and rs34841297 A gene deletion might increase the risk of BC. Furthermore, the associations between nine MIR2052HG SNPs and the BC receptor status (ER, PR, Her-2) were analyzed. Estrogen receptor (ER) can regulate normal breast epithelial cells and breast gland proliferation of cancer cells [21,22]. Progesterone receptor (PR) was a member of the nuclear receptor superfamily of transcription factors that has the biological function of promoting functional recovery and reducing the volume of BC lesions [23]. Human epidermal growth factor receptor 2 (Her-2), as a marker for predicting the prognosis of BC, was regarded as the key to evaluating the efficacy of targeted drugs [16]. Our results indicated that rs3802201, rs2553716 and rs4259395 may exert a protective effect on BC by affecting PR receptor status. The CT+CC genotype of rs269183 was related to Her-2 receptor status and may affect the prognosis of BC. MDR model results showed that with rs34841297, homozygous deletion of the A gene, a non-menopausal status and no history of breastfeeding resulted in a higher risk of BC, which might lead to an increased risk of BC.
MIR2052HG downregulation expression was reported to reduce ERα-positive BC cell growth [24]. Genetic variations in MIR2052HG were associated with the BCfree interval in the MA.27 trial (ClinicalTrials.gov number NCT00066573), and the variant SNPs were associated with increased MIR2052HG expression due to increased ERα binding to EREs [17,25]. What's more, researchers have discovered that SNPs could mediate the occurrence of cancer by affecting the secondary structure and expression of lncRNAs and the biological effects mediated by the interactions between lncRNAs and miRNAs [26][27][28]. Here, we performed qRT-PCR to explore the total expression of MIR2052HG expression in individuals with different genotypes of rs34841297. A dual-luciferase reporter gene experiment was conducted to identify whether the rs34841297 A/-deletion mutation affects the binding ability of MIR2052HG to miR-4456. An interaction was observed between MIR2052HG and miR-4456 mimics when rs34841297 carried wild-type A allele, and the interaction disappeared with the deletion of the A allele, consistent with the results predicted by the LncRNASNP2 and DINAN websites. This result suggested that the rs34841297 polymorphism might affect the MIR2052HG and miR-4456 interaction.
MiRNAs are small noncoding RNAs of approximately 22 nucleotides in length that perform posttranscriptional regulatory functions by binding to specific sites on the target transcript [29]. miR-4456 is a tiny noncoding RNA located on chromosome 5 that putatively influences oxytocin signaling [30]. This study is the first to explore the association between genetic variants in the lncRNA MIR2052HG and BC susceptibility. This study has several advantages. First, the patients included in our study were all newly diagnosed and the controls were selected according to the frequency matching, which might reduce the selection bias in the study. Second, SNPscan highthroughput typing technology was used to SNP typing, making the results more accurate and credible than traditional restriction fragment length polymorphism (PCR-RFLP) typing technology. Finally, genotyping of all SNPs was performed on 10% of randomly selected samples for sequencing verification. Additionally, all cell function experiments were repeated more than three times, which improved the authenticity and reliability of the study results. Nevertheless, this study still has some limitations. All the subjects included in this study were of the Chinese Han population, and further studies of other populations should be performed to verify our results. In addition, the mechanism by which MIR2052HG SNPs modulate BC must be further explored in vivo.

CONCLUSIONS
In conclusion, the study reveals the association between the MIR2052HG gene polymorphism and the occurrence of BC. The MIR2052HG rs34841297 A/-variant may affect the binding of miR-4456 to MIR2052HG and subsequently alter proliferation, invasion and migration of BC cells by regulating the expression of miR-4456 expression, which provides a baseline information for screening high-risk population populations for BC and formulating individualized preventive measures.  Table 2). Combined with the Δ Energy, correlation with BC, and site prediction intersection results, miR-4456 was selected for further functional research. All SNPs were genotyped with SNPscan™ multiple SNP typing kit.

Quantitative real-time PCR (qRT-PCR)
Plasma RNA was extracted from 72 randomly selected healthy controls with TRIzol reagent, DNA was removed and RNA was reverse transcribed into cDNA. The relative expression of MIR2052HG in individuals carrying the rs34841297 polymorphism was detected using qRT-PCR with SYBR-green among individuals with different genotypes of rs34841297, and GAPDH served as the endogenous control. All samples were analyzed in triplicate, and the relative expression was calculated using the method of 2 -ΔCt method. The sequences of primers used in this study were listed in Supplementary Table 3.

Dual-luciferase report assay
According to LncRNASNP2, carried with wild-type MIR2052HG-rs34841297 might gain a binding site for miR-4456, and the dual-luciferase assay verified the biological association between rs34841297 and miR-4456. Following the construction of the MIR2052HG rs34841297 wild-type and mutant pmirGLO plasmids, cotransfected HEK 293T cells with miR-4456 mimic or AGING normal control (NC) mimics by using the riboFECT™ CP kit. Firefly fluorescence activity and Renilla fluorescence activity in each group were detected 72 hours after transfection, and relative luciferase activity was calculated based on firefly/Renilla fluorescence.

Cytological function experiment
qRT-PCR experiments first detected the relative expression of MIR2052HG and miR-4456 in MDA-MB-231 cells and MCF10A cells. The relative expression was calculated using the method of 2-ΔΔCt method. The sequences of MIR2052HG primers used in this study were listed in Supplementary Table 3. And the sequences of miR-4456 and U6 internal reference primers are shown in Supplementary Table 8.

Statistical analysis
Unconditional logistic regression analysis was used to explore BC-related MIR2052HG SNPs and adjusted for age, menarche age, menopausal status, number of pregnancies, number of abortions, breastfeeding history, and family history. SHEsis online software was applied to conduct the haplotype analysis of MIR2052HG.
Multifactor dimensionality reduction (MDR) was used to analyze the interaction between genes and the environment. Independent t tests were applied to compare the relative expression of the lncRNA MIR2052HG with different rs34841297 genotypes, and the false-positive report probability (FPRP) [20] analysis was conducted to verify the positive results and the cut-off value to ensure the reliability and accuracy of the positive results. A t test was used to compare the OD values from the CCK8 experiment among groups with low expression of miR-4456 and the high expression of miR-4456 and the NC group. Independent t tests were used to analyze the accurate counts of stained cells in the two groups obtained in Transwell experiments. Differences in invasion and migration capabilities were calculated using t tests. All data analysis and cell number statistical analyses in this study were performed using SPSS 21.0(t-test, χ2 test and unconditional logistic regression model analysis), ImageJ (calculate scratch healing area, Transwell migration and invasion cell count), and GraphPad Prism 7.0(plot the calculation results). Two-sided P<0.05 was statistically significant.

Informed consent
Informed consent was obtained from all individual participants included in the study.

Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Ethical responsibilities of authors
The manuscript has not been submitted to more than one journal for simultaneous consideration. The manuscript has not been published previously (partly or in full).

Supplementary Tables
Supplementary Table 1