Research Paper Volume 8, Issue 11 pp 2635—2654

Distinct patterns of simple sequence repeats and GC distribution in intragenic and intergenic regions of primate genomes

Wen-Hua Qi1,2, , Chao-chao Yan1, , Wu-Jiao Li1, , Xue-Mei Jiang3, , Guang-Zhou Li4, , Xiu-Yue Zhang1, , Ting-Zhang Hu2, , Jing Li1, , Bi-Song Yue1, ,

  • 1 Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu 610064, China
  • 2 College of Life Science and Engineering, Chongqing Three Gorges University, Chongqing 404100, China
  • 3 College of Environmental and Chemistry Engineering, Chongqing Three Gorges University, Chongqing 404100, China
  • 4 College of Sport and Health, Chongqing Three Gorges University, Chongqing 404100, China
* Equal contribution

Received: June 8, 2016       Accepted: August 22, 2016       Published: September 16, 2016      

https://doi.org/10.18632/aging.101025
How to Cite

Abstract

As the first systematic examination of simple sequence repeats (SSRs) and guanine-cytosine (GC) distribution in intragenic and intergenic regions of ten primates, our study showed that SSRs and GC displayed nonrandom distribution for both intragenic and intergenic regions, suggesting that they have potential roles in transcriptional or translational regulation. Our results suggest that the majority of SSRs are distributed in non-coding regions, such as the introns, TEs, and intergenic regions. In these primates, trinucleotide perfect (P) SSRs were the most abundant repeats type in the 5'UTRs and CDSs, whereas, mononucleotide P-SSRs were the most in the intron, 3'UTRs, TEs, and intergenic regions. The GC-contents varied greatly among different intragenic and intergenic regions: 5'UTRs > CDSs > 3'UTRs > TEs > introns > intergenic regions, and high GC-content was frequently distributed in exon-rich regions. Our results also showed that in the same intragenic and intergenic regions, the distribution of GC-contents were great similarity in the different primates. Tri- and hexanucleotide P-SSRs had the most GC-contents in the 5'UTRs and CDSs, whereas mononucleotide P-SSRs had the least GC-contents in the six genomic regions of these primates. The most frequent motifs for different length varied obviously with the different genomic regions.

Abbreviations

AT-content: adenine-thymine content; GC-content: guanine-cytosine content; MSDB: Microsatellite search and building database; SSRs: Simple sequence repeats; P-SSRs: Perfect SSRs; IP-SSRs: Interrupted perfect SSRs; C-SSRs: Compound SSRs; IC-SSRs: Interrupted compound SSRs; CX-SSRs: Complex SSRs; ICX-SSRs: Interrupted complex SSRs.