Molecular characterization of matrix metalloproteinase gene family across primates

Deregulation of matrix metalloproteinases (MMPs) contributes considerably to cancers, psychiatric disorders, macular degeneration and bone diseases. The use of humans in the development of MMPs as prognostic biomarkers and therapeutic targets is complicated by many factors, while primate models can be useful alternatives for this purpose. Here, we performed genome-enabled identification of putative MMPs across primate species, and comprehensively investigated the genes. Phylogenetic topology of the MMP family showed each type formulates a distinct clade, and was further clustered to classes, largely agreeing with classification based on biochemical properties and domain organization. Across primates, the excess of candidate sites of positive selection was detected for MMP-19, in addition to 1-3 sites in MMP-8, MMP-10 and MMP-26. MMP-26 showed Ka/Ks value above 1 between human and chimpanzee copies. We observed two copies of MMP-19 in the old-world monkey genomes, suggesting gene duplication at the early stage of or prior to the emergence of the lineage. Furin-activatable MMPs demonstrate the most variable properties regarding Domain organization and gene structure. During human aging, MMP-11 showed gradually decreased expression in testis, so as MMP-2, MMP-14, MMP15 and MMP-28 in ovary, while MMP-7 and MMP-21 showed elevated expression, implying their distinct roles in different reproductive organs. Co-expression clusters were formed among human MMPs both within and across classes, and expression correlation was observed in MMP genes across primates. Our results illuminate the utilization of MMPs for the discovery of prognostic biomarkers and therapeutic targets for aging-related diseases and carry new messages on MMP classification.


INTRODUCTION
Matrix metalloproteinases (MMPs), also called matrixins, comprise a family of zinc-dependent endoproteinase. Collectively they are capable of degrading all extracellular matrix components (ECM) as well as non-matrix proteins [1]. The proteolytic activities of MMPs influence many fundamental physiological events involving tissue remodeling, embryogenesis, morphogenesis, angiogenesis, wound healing, and apoptosis [2][3][4]. Under pathological conditions, deregulation of MMP activity causes a variety of pathological outcomes including matrix weakening, tissue destruction, and fibrosis [2,5].
Over recent years, aging-related diseases involved with abnormal regulation of MMPs have attracted the attentions of physicians, medical scientists and pharmaceutical developers. In Brunch's membrane, amounts of MMPs such as MMP-2 and MMP-9 were observed to increase and the degree of covalent modification was enhanced for these MMPs with age AGING [6,7]. The aging changes in the MMP system were shown to be aggressively advanced in age-related macular degeneration, resulting in a large accumulation of abnormal ECM material [8]. Overexpression of MMPs can cause Alzheimer's disease, a progressive neurologic disorder commonly affecting people over the age of 65 [9]. A recent study showed that the alteration of amyloid precursor proteolysis, a hallmark of Alzheimer's disease, was mediated by MMP-14 in human neuronal cells [10]. Parkinson's disease, which ordinarily begins in late life, has been reported to be associated with aberrant expression of MMP-3 and MMP-9 [11,12]. MMPs like MMP-2, MMP-9 and MMP-13 have been shown to participate in bone formation and development, and abnormal regulation of these genes contributes to the development of bone diseases [13]. Cancers, occurring mostly in middle age and above, have been linked to MMPs with evidence as they are an important factor promoting tumor progression [14]. This is especially true for tumors like pancreatic adenocarcinoma, prostate adenocarcinoma, bladder urothelial carcinoma and melanoma, which are commonly at risk above the age of 50 [15][16][17][18]. Collectively they reveal that MMPs widely participate in physiological and pathological processes, and thus are promising targets for diagnosis and drug development of aging-related diseases.
Twenty-four MMP genes are present in the human genome; among them, MMP-23A is a duplicated copy of MMP-23B and likely a pseudogene [19]. Based on biochemical properties, MMPs can be classified into collagenases (including MMP-1, MMP-8 and MMP-13, whose key feature is the ability to cleave interstitial collagens), stromelysins (MMP-3, MMP-10, and MMP-11), gelatinases (MMP-2 and MMP-9, which can digest gelatins), matrilysins (MMP-7 and MMP-26, which can digest ECM components and activate proMMPs), membrane-type MMPs ( Table 1) [20]. Among the unclassified MMPs, MMP-19 is used as a T-cell-derived autoantigen from patients with rheumatoid arthritis, but its exact function is not fully characterized [21]. MMP-20 can digest amelogenin [22]. The limitation of this classification is obvious, as multiple substrates have been revealed for several MMPs. For example, MMP-1, a collagenase, can also cleave tenascin and aggrecan. MMP-14, biochemically classified as a membrane-type MMP, serves as a collagenase, too [23]. Another weakness is that several (7) MMPs are still not classified. These have limited the accuracy of this classification method. An alternative classification was proposed based on domain organization, which categorizes MMPs into four groups, namely archetypal MMPs, matrilysins, gelatinases and furin-activatable MMPs (Table 1) [24]. This classification approach still holds weaknesses as domains defined in this context are not so common as conserved domains commonly referred to like Pfam domains. Therefore, careful investigation is required when allocating a new MMP gene into this classification [25].
In this article, we investigate primate MMPs with evolutionary analyses together with gene structure, genome organization and comparative gene expression analyses. We compare our results with the two stereotypical classification approaches, in the hope to reach consensus for classification approaches. We also identify MMPs of potential value for future studies.

Comprehensive identification of putative MMPs from 11 primate genomes
Primates arose 63-74 million years ago from small terrestrial mammals [26]. To make sure major lineages of primates are covered, 11 species were chosen for identification of putative MMPs in this manuscript ( Figure 1). In addition to the human genome, those of bonobo (Pan paniscus), chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), orangutan (Pongo abelii) and gibbon (Nomascus leucogenys) were included to represent Hominoids. Olive baboon (Papio anubis), gelada (Theropithecus gelada) and golden snub-nosed monkey (Rhinopithecus roxellana) were sampled to represent old-world monkeys. Marmoset (Callithrix jacchus) and mouse lemur (Microcebus murinus) represented new-world monkey and prosimian, respectively. All the genome assemblies are chromosome-level, except that of marmoset which has only scaffold-level assembly available.
Through a combination of Basic Local Alignment Search Tool-Protein (BLASTp) and domain-search-based HMMER, a total of 261 putative MMPs were identified ( Figure 1). Looking at gene counting in genomes, 19

Classification of primate MMPs based on phylogenetic inference
By developing a phylogenetic tree, the peptides were classified into 23 distinct clades, with high bootstrap support at basal nodes ( Figure 2). One human MMP was categorized into each clade, confirming the reliability of the tree topology. Each of the 23 MMP genes was identified in each of the most genomes, suggesting conserved evolution of the MMP family in primates (Supplementary Table 1). In this manuscript, very limited number of genes were missing in certain clades. For example, MMP-3 was failed to identify in mouse lemur, and MMP-7 was not retrieved from the gorilla genome. This result indicates a limited number of MMPs are missing from certain primate genomes, but the possibility of genome assembly errors or annotation artifacts could not be ruled out.
Interestingly, two copies of MMP-19 were present in each of the old-world monkey genome. To resolve relationships of these genes, phylogenetic trees of the MMP-19 peptides alone were reconstructed, which resolved the two copies into two distinct clades with high bootstrap support ( Figure 3). In each clade, 1 copy of MMP-19 was represented for each of the three species, suggesting duplication of MMP-19 at the early stage of or before the emergence of old-world monkeys.
Based on biochemical properties, MMPs can be classified into four groups, namely archetypal MMPs, matrilysins, gelatinases, and furin-activatable MMPs (Table 1) [24]. Our phylogenetic topology shows that MMPs are clustered in four major groups (shown in brown circles in Figure 2). MMP-1s, MMP-8s and MMP-13s are proximate to each other, which biochemically serve as collagenases; MMP-3 and MMP-10, two clades as stromelysins, form an  and MMP-26s serve as matrilysins, and MMP-2s and MMP-9s serve as gelatinases, and they form a group, respectively. In summary, the phylogenetic topology of primate MMPs largely agrees with biochemistrybased classification, which demonstrates the power of phylogenetic analysis as a guide in biochemical characterization of a certain MMP gene.

Evolution of primate MMPs
To investigate the possibility of positive selection among primate MMPs, a maximum-likelihood-based estimation of codon substitution model was carried out, on the basis of codon-enabled sequence alignment.
Hypothesis of positive selection was tested against purifying selection with a likelihood ratio test (LRT).
Overall, primate genes of MMP-8, MMP-9, MMP-10, MMP-19 and MMP-26 showed positive selection, with the M8 model fitting the data significantly better than the M7 model (Table 2). Furthermore, predicted ω for each site was calculated as the weighted mean of ω categories, and candidate sites of positive selection were detected in MMP-8, MMP-10, MMP-19 and MMP-26.
The excess of candidate sites of positive selection is striking for MMP-19, with 12 positive selected sites detected ( Table 2).
To further investigate molecular evolution of human MMPs, evolutionary rates between human and chimpanzee orthologs were calculated. Results showed that Ka (nonsynonymous substitution), Ks (synonymous substitution) and Ka/Ks showed variation among orthologs of MMPs. Only MMP-26 showed Ka/Ks ratio above 1, implying selection of human MMP-26 gene (Table 3). MMP-17 showed the highest synonymous substitution rate (Ks = 0.0398), and MMP-3, MMP-11 and MMP-15 also showed Ks values above 0.025, suggesting relatively relaxed pressure which results in sequence divergence for these genes. Notably, orthologs of MMP-24 showed 0 value of Ka, which indicates that function of human MMP-24 is strictly restricted.

Conservation and divergence of domain organization and gene structure in primate MMPs
Like those of many other families, MMP peptides contain a conserved domain structure, which has been revealed in human and many other organisms [19,24,27]. By querying the databases of Pfam and SignalP, we identified a cascade of domains in each MMP peptide ( Figure 4; function information is described in Supplementary Table 2). All the MMPs contain a domain with Pfam ID PF00413, which is a catalytic metalloproteinase domain (Supplementary Table 2 Table 3). For example, for MMP-24, only the human and    Figure 5).

Genomic organization of primate MMPs
As gene sequence, domain organization and gene structure of the MMP genes show both conservation and divergence, we investigated the conservation of genomic organization for the primate genes. We first mapped the human MMP genes to chromosomes and discovered that the 23 human genes are located at 10 chromosomes. Interestingly, 9 MMP genes are located together as a single cluster in the 11th chromosome, of which most are archetypal MMPs (MMP-1, -3, -8, -10, -12, -13, -20, and -27), and one encodes matrilysin (MMP-7) ( Figure 6).
We then investigated co-synteny of the primate genomes and the MMPs in the co-synteny clusters. We compared the human genome with the genomes of three other primates, chimpanzee, a Hominoid, olive baboon, an old-world monkey, and mouse lemur, a Prosimian species. The genome of marmoset, the only new-world monkey representative, was not assembled to the chromosome level, and thus was excluded from the co-synteny analysis [28]. There was high co-synteny between the human and chimpanzee genomes, and between chimpanzee and olive baboon genomes. There was moderate co-synteny between the olive baboon and mouse lemur genomes. Twenty-one of 23 human genes are located in 12 co-synteny clusters. A cluster in the human 16th chromosome, where MMP-2 and -15 are located, is not shared between chimpanzee and olive baboon. Only 16 MMPs are within the 8 co-synteny clusters between olive baboon and mouse lemur. The co-synteny cluster in the human 11th chromosome, where the 9-gene cluster is located, is conserved across all the four primate species (Figure 7). Together, these results demonstrate that both conservation and divergence of MMPs are attributable to the conserved genome organization over primates and changes of co-synteny as the genetic relationship gets further.

Expression of MMP genes in human and across primates
As MMPs are clustered into different groups, little is known about the relatedness among the MMPs on the gene expression level. We extracted MMP gene expression data of 11 human tissues from Human BodyMap and performed co-expression analysis among the genes. Gene expression data show that, although genes like MMP-2 and MMP-14 show ubiquitous high expression, many genes have tissuespecific expression patterns, and MMPs are expressed in all tissues except skeletal muscles, in which only limited genes like MMP-2 and MMP-14 are expressed ( Figure 8A).
MMPs play important roles in human aging [29]. As effects of aging are particularly evident on both male and female reproductive system [30,31], we further investigated longitudinal expression of the MMP genes in male and female reproductive organs, testis and ovary, respectively, by using GTEx data [32]. In testis, the male sexual organ, MMP-2, MMP-11, MMP-14, MMP-15, MMP-17 and MMP-28 are highly or moderately expressed across all ages from 20-79 ( Figure 8B). MMP-11 showed gradually decreased expression during human aging, and all the other moderately or highly expressed genes showed lower expression at the ages of 20-29 than the older ages in testis ( Figure 8C). In ovary, the female sexual organ, MMP-2, MMP-11, MMP-14, MMP-17 and MMP-28 showed universally moderate or high expression across all ages, the same as those in testis; in addition, MMP-19 and MMP-23B were highly expressed, too, which is different to those in testis ( Figure 8C). Two lowly-expressed genes, MMP-7 and MMP-21, showed elevated expression during aging, and MMP-2, MMP-14, MMP-15 and MMP-28 showed the reversed trend in ovary ( Figure 8D).
Co-expression analysis shows that all the three genes encoding collagenases (MMP-1, MMP-8 and MMP-13) are co-expressed ( Figure 9A). In addition, they formed a co-expression cluster with MMP-10 and MMP-19. Interestingly, the expression of some genes is correlated even they belong to different groups based on stereotypical classifications. For example, a gene cluster formed for five furin-activatable MMP genes, MMP-11, MMP-17, MMP-21, MMP-23A and MMP-23B, although they belong to different groups according to biochemistry-based classification. In addition, groups were formed for genes in different Expression of MMP-2 and MMP-8 showed the most conserved, as they showed co-expression between human and all the four non-human primate species ( Figure 9B).

DISCUSSION
Non-human primates are essential models for modern medical and biochemical studies, because of their genetic, physiological and psychological similarity to humans. Even mouse lemur, which is placed at the basal node of primate taxonomy and roughly half the genetic distance between mice and humans, has been shown as a suitable model organism for drug development and genetics studies [33]. Investigation of gene evolution among primates can shed light on biochemical role, physiological and pathological  functions of a certain gene, gene family, signaling network or metabolic pathway, and can offer insights in the discovery of prognostic biomarkers and therapeutic targets, which have been demonstrated in numerous articles (for example, [34][35][36][37][38][39]). In this article, we provide an example of how this can be utilized in the analysis of the MMP family enabled by the high-quality genomes across primates. MMPs comprise a family of zinc-dependent endoproteinase, responsible for degrading extracellular matrix components [1]. Deregulation of the MMPs has been implicated in many aging-related diseases, such as Alzheimer's disease, Parkinson's disease, age-related macular degeneration, bone diseases and cancers [6,9,[11][12][13][15][16][17][18]. Although biochemical properties of MMPs has been intensively studied, accumulated evidences have shown that our understanding of the MMPs is far from absolute, and some discoveries even have conflicted conclusions with previous findings [23,40]. In this article, we comprehensively investigate the MMP family across the primates from the perspectives of phylogeny, sequence divergence, domain organization, gene structure, genome organization and gene expression. To our knowledge, this is the first comprehensive investigation of MMPs across primates.
We identified signal peptides at the N-termini and Pfam domains at the C-termini in most MMP peptides. Very limited peptides have conserved domains failing to identify, most of which are located at N-or C-termini (Supplementary Table 2). One possible reason is that they represent bona fide loss of the conserved domains during the evolution of certain primate species. But there is a possibility of incomplete genome assembly and genome annotation artifact which can result in incomplete sequence prediction of these MMPs. This is very common in eukaryotic genome assembly and annotation despite rapid advancement of genome sequencing technology [41,42].
Our results showed that furin-activatable MMPs are the most variable among different classes of MMPs, including sequence divergence inferred by phylogeny, domain organization and gene structure. There are four types of MMPs within this class, and biochemical properties vary considerably among them [1,19,43]. Topology of MMP phylogeny shows that sequence divergence among furin-activatable MMPs is considerably large compared to others. Among the furin-activatable MMPs, MMP-19 showed positive selection in primates with the excess of candidate sites of positive selection. Biochemical and physiological roles of MMP-19 have not been characterized, although MMP-19 is used as a T-cell-derived autoantigen from patients with rheumatoid arthritis [21]. Our results indicate that MMP-19 may have important roles, highlighting its value for further studies. Furthermore, MMP-11, MMP-15 and MMP-17 showed relatively high Ks values than the others.
Our results showed that MMP gene sequences are conserved across primate organisms, and gene organization and genome localization are less conserved as the taxonomic relationships are getting more genetically distant. Additionally, we revealed that MMPs of different types can form a single coexpression cluster. For example, MMP-11 and MMP-21 are secreted MMPs, MMP-17 is a GPI-anchored MMP-, and MMP-23 is a type-II transmembrane MMP- [19] (Figure 9A). Expression patterns of these genes are correlated to each other, suggesting their functional relationships. Another notable coexpression cluster is formed with MMP-1, MMP-8, MMP-10, MMP-13 and MMP-19, all of which are archetypal MMPs. Particularly, all the collagenases (MMP-1, MMP-8 and MMP-13) are co-expressed, which have been shown to play roles of collagenases in wounds and ulcers [44,45]. These results suggest functional conservation within archetypal MMPs.
Both gelatinases (MMP-2 and MMP-9) are secreted, zinc-dependent endopeptidases and are cell-surface transducers [46]. They are associated with several cancers and thus can be served as cancer biomarkers [46]. Our results showed that the genes are comprised by the longest peptide sequences. However, they showed different characteristics in expression. MMP-2 had high expression across all organs we investigated, while MMP-9 showed tissue-specific expression ( Figure 8A). MMP-2 showed conserved expression across all five primate species ( Figure 9B), while MMP-9 only shoed expression between human and chimpanzee. These results indicate functional divergence between these two gelatinase genes, which deserve further investigations. Life-history diversity are evident among primates. Human life histories have been described as "slow", as humans have delayed maturity and lower adult mortality compared to chimpanzee [47,48]. As MMPs AGING can mediate aging-associated remodeling of extracellular matrix (ECM) [29], we can expect divergence of evolutionary rates among MMPs. In this article, we report positive selection for MMP-26, and evident sequence divergence for MMP-3, MMP-11, MMP-15 and MMP-17. MMP-26 has been indicated to associate with a wide range of pathological process, such as breast cancer, glioma, wounds and ischemic stroke, but lack of evidence highlights necessity of further investigation of this gene [49][50][51][52]. We also showed decline of expression for MMP-11 in testis and the same trend for MMP-15 in ovary during human aging. However, expression of MMP-15 and MMP-17 was elevated during aging in testis, suggesting their positive regulatory roles in aging-associated ECM remodeling. These results indicate that certain MMPs such as MMP-11, MMP-15 and MMP-17 may play important roles in life-history variation among primates. Interestingly, several genes including MMP-2, MMP-14 and MMP-15 showed reversed trend expression across ages in testis and ovary; MMP-20 showed longitudinal trend of expression in testis, and MMP-28 only showed trend in ovary. These results suggest sexual difference of functions among these genes.
In addition to cancers, evidence has been reported on associations with psychiatric disorders for some MMPs. For example, MMP-14 can be mediate alteration of amyloid precursor proteolysis, a hallmark of Alzheimer's disease, in human neuronal cells [10]. Parkinson's disease has been associated with aberrant expression of MMP-3 and MMP-9 [11,12]. Our results showed that MMP genes such as MMP-15, MMP-16, MMP-17, MMP-24 and MMP-28 were expressed highly or moderately in brain, implying the values in investigation of more genes in association with psychiatric diseases.
In conclusion, we carried out genome-enabled identification and characterization of putative MMPs across 11 primate organisms from the perspectives of phylogeny, domain organization, gene structure, genome organization and gene expression, and we identified MMPs which need attentions in future studies (As summarized in Figure 10). Our results shed new light on the classification of MMP family, and our findings can illuminate the discovery of prognostic biomarkers and therapeutic targets for aging-related diseases. [29,47,48].

Identification of putative MMPs
Sequences of the human MMPs were retrieved from UCSC genome browser [56]. MMPs from non-human primate species were identified by a combination of BLASTp and HMMER searches. By using BLASTp, searches were performed using an E-value cutoff of 1e-10 with 24 human MMP peptide sequences as queries [57]. For HMMER, hmmsearch (HMMER v3.3 package) was used to search for Pfam domain PF00413.23 as a profile with default settings [25,58]. The searching results were combined and redundant sequences were removed. The identified genes were validated by searching the peptide sequences for the Hidden Markov Model profile of Pfam domain PF00413.23, the signature conserved domain of MMPs.

Alignment and phylogenetic inference
Multiple sequence alignments of peptides were performed using MUSCLE v3.8.1551 [59]. Maximum likelihood phylogenetic trees were reconstructed with PhyML v3.3.3 package, with the confidence level of each branch estimated by using 1000 bootstrap replications and WAG model of amino acid evolution [60]. The phylogenetic trees were visualized with Figtree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). The primate MMP genes identified in this article were classified and designated based on the phylogenetic topology and Fanjul-Fenandex et al. [24].

Evolution analysis of MMPs
Sequence divergence of each MMP clade was analyzed using codon substitution model implemented in CodeML under PAML 4.0 [61]. Two models, M7 and M8, of the distribution across amino acid sites were used, by which M7 assumes that ω is ß distributed along the sequence with a range limited by 0 and 1, while M8 assumes positive selection at some sites and the likelihood ratio test (LRT) of M8 against M7 provides a statistical test for positive selection. Sites of positive selection with P-value above 0.95 (ω>1) were retrieved. Nonsynonymous substitutions (Ka), synonymous substitutions (Ks) and Ka/Ks ratio between human and gorilla MMP orthologs were also calculated, which was implemented in codeML under PAML 4.0 [61].

Identification of domains and signal peptides
Conserved domains of the peptide sequences were identified by using hmmsearch under HMMER v3.3 package against Pfam-A database with default trusted cutoffs [25,58]. Signal peptides for the peptides were identified by querying the SignalP-5.0 server [62]. The location of domain and signal peptides were visualized by querying Gene Structure Display Server 2.0 [63].

Gene structure characterization
Peptide molecular weights were calculated by using Bachem's Peptide Calculator (https://www.bachem.com/ knowledge-center/peptide-calculator/). Gene structure information was retrieved from GFF3 files. Gene length was calculated from the start to the end point of each gene, and CDS length was calculated by deducting intron lengths from gene length. While MMP-23s are classified in furin-activatable MMPs, they demonstrate very different characteristics than other MMPs, and thus were investigated separately.

Genomic localization of MMP genes and co-synteny analysis
Localization of genes was retrieved from GFF3 files. The site of the start codon was considered as the gene localization of each gene. The length of each gene was calculated based on the genome sequences. The visualization of the genomics localization was performed by using MapChart 2.32 [64]. Co-synteny analysis was performed by using the MCScanX package [65].

Gene expression analysis
Data of Illumina's Human BodyMap 2.0 were downloaded from Expression Atlas at EBI (https://www.ebi.ac.uk/gxa/), and FPKM values of human MMPs were extracted. Longitudinal expression data for human testis and ovary were retrieved from GTEx Portal [32]. FPKM values of four non-human primates were extracted from non-human primate reference transcriptome resource (NHPRTR) [66]. AGING Spearman's correlation coefficient was calculated for each human gene pair and each MMP gene between primate species, and heatmaps were drawn using R package corrplot [67].

AUTHOR CONTRIBUTIONS
The authors contributed in the following ways: Yinglian Pan: data collection, data analysis, Yadan Fan, Yanda Lu: study design, study supervision; Siyuan Peng: data collection; Haixue Lin: study design; Qingchun Deng: drafting, technical support and final approval of the manuscript. All the authors read and approved the final manuscript.

ACKNOWLEDGMENTS
The data used for the human longitudinal expression analyses of testis and ovary described in this manuscript were obtained from the GTEx Portal on 03/30/2022.