A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution

Di Pietro, Lorena; Boroumand, Mozhgan; Lattanzi, Wanda; Manconi, Barbara; Salvati, Martina; Cabras, Tiziana; Olianas, Alessandra; Flore, Laura; Serrao, Simone; Calò, Carla M.; Francalacci, Paolo; Parolini, Ornella; Castagnola, Massimo

doi:10.3390/ijms241915010

Open AccessArticle

A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution

by

Lorena Di Pietro

^1,2,†

,

Mozhgan Boroumand

^3,†,‡

,

Wanda Lattanzi

^1,2,*

,

Barbara Manconi

⁴

,

Martina Salvati

¹,

Tiziana Cabras

⁴

,

Alessandra Olianas

⁴,

Laura Flore

⁴,

Simone Serrao

⁵

,

Carla M. Calò

⁴,

Paolo Francalacci

⁴

,

Ornella Parolini

^1,2

and

Massimo Castagnola

³

¹

Dipartimento Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, 00168 Rome, Italy

²

Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy

³

Laboratorio di Proteomica, Centro Europeo di Ricerca sul Cervello, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy

⁴

Dipartimento di Scienze della Vita e Dell’ambiente, Università di Cagliari, 09042 Monserrato, Italy

⁵

Department of Medicine and Surgery, Proteomics and Metabolomics Unit, University of Milano-Bicocca, 20854 Vedano al Lambro, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

^‡

Current address: National Institute on Aging, NIH, Baltimore, MD 21224, USA.

Int. J. Mol. Sci. 2023, 24(19), 15010; https://doi.org/10.3390/ijms241915010

Submission received: 17 July 2023 / Revised: 4 October 2023 / Accepted: 6 October 2023 / Published: 9 October 2023

(This article belongs to the Special Issue Recent Advances in Salivary Gland and Their Function 2.0)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Saliva houses over 2000 proteins and peptides with poorly clarified functions, including proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases. Their genes are poorly conserved across related species, reflecting an evolutionary adaptation. We searched the nucleotide substitutions fixed in these salivary proteins’ gene loci in modern humans compared with ancient hominins. We mapped 3472 sequence variants/nucleotide substitutions in coding, noncoding, and 5′-3′ untranslated regions. Despite most of the detected variations being within noncoding regions, the frequency of coding variations was far higher than the general rate found throughout the genome. Among the various missense substitutions, specific substitutions detected in PRB1 and PRB2 genes were responsible for the introduction/abrogation of consensus sequences recognized by convertase enzymes that cleave the protein precursors. Overall, these changes that occurred during the recent human evolution might have generated novel functional features and/or different expression ratios among the various components of the salivary proteome. This may have influenced the homeostasis of the oral cavity environment, possibly conditioning the eating habits of modern humans. However, fixed nucleotide changes in modern humans represented only 7.3% of all the substitutions reported in this study, and no signs of evolutionary pressure or adaptative introgression from archaic hominins were found on the tested genes.

Keywords:

salivary proteins; nucleotide substitutions; evolution

1. Introduction

Saliva is a multifaceted bodily fluid that contains enzymes (amylases, lysozymes, and lipases), proteins, peptides and glycoproteins, lipids (hormones such as testosterone and progesterone), and proteases, along with a high concentration of inorganic ions [1]. To date, more than 2000 proteins and peptides have been identified in saliva [2]. They are mainly involved in the homeostasis of the oral cavity, the digestion process, and the innate immune response [3]. Ninety percent of the salivary proteins and peptides derive from the secretion of the three major salivary glands (parotid, submandibular, and sublingual glands), while the remaining 10% are secreted by minor salivary glands or derive from exfoliated cells and leucocytes present in the gingival–crevicular fluid [4] from plasma exudate, plus some contributions from the oral microbial flora. During their transit in the secretory pathway, salivary proteins undergo a series of post-translational modifications (PTMs), including phosphorylation, N-terminal acetylation, glycosylation, sulfation, and proteolytic cleavages. Further changes in proteins and peptides also occur after secretion in the oral cavity, through the action of exogenous (microflora) and endogenous enzymes [1].

The main contribution to the composition of the human salivary proteome derives from a few protein families. In particular, proline-rich proteins (PRPs), statherin (STATH), P-B peptide, histatins (HTN), cystatins (CST), and amylases (AMY) altogether represent more than 95% (w/w) of all proteins found in saliva to date [5]. PRPs represent the major fraction of the salivary proteome in Homo sapiens (nearly 70% of the total protein content; >50% in weight) and include basic (bPRPs), acidic (aPRPs), and basic glycosylated (gPRPs) PRPs. They share a high abundance of proline, glycine, and glutamine residues, which represent 70–80% of the entire amino acid sequence [6,7]. bPRPs include eleven parent peptides/proteins and more than six parent glycosylated proteins (gPRPs), plus several proteoforms derived from gene polymorphisms and PTMs [8,9,10] (Figure 1). PRPs are encoded by genes belonging to the PRP multigene family, located within the PRB locus mapping on 12p13.2. The locus includes six tandemly linked genes: PRB2–PRB1–PRB4–PRH2–PRB3–PRH1, in the 5′-to-3′ direction, and is highly polymorphic as it contains internally repetitive DNA sequences, leading to frequent recombinational events [11,12]. At least four alleles (S, small; M, medium; L, large; and VL, very large) are present in the Western population of Homo sapiens at PRB1 and PRB3 loci and three (S, M, L) at PRB2 and PRB4 loci [8] (Figure 1). Except for the protein encoded by the PRB3 locus that gives rise to gPRPs, all the bPRP pro-proteins are cleaved completely by pro-protein convertases, generating smaller peptides/proteins, before granule maturation [9] (Figure 1). aPRPs are expressed in two loci, PRH1 and PRH2, mapping on chromosome 12p13. Single amino acid substitution and repeat insertion generate three PRH1 alleles, encoding parotid isoelectric-focusing slow isoform (PIF-s), the parotid acidic protein (Pa)—both 150 residues long—and the double band isoform slow (Db-s)—171 amino acid residues long [10] (Figure 2A). A single nucleotide substitution generates two PHR2 alleles, encoding the PRP-1 and PRP2 isoforms [11] (Figure 2A). A pro-protein convertase partially cleaves PRP-1, PRP2 and PIF-s in 3 N-terminal fragments of 106 residues, called PRP3, PRP4, PIF-f (PRP3 type), and a common C-terminal fragment of 44 amino acids, called P-C peptide. Db-s is cleaved at position 127 generating two peptides: Db-f (f stands for fast) and the P-C peptide (same as above) [12] (Figure 2A). The Pa isoform not carrying the convertase sequence generates a dimeric form through a disulfide bond [13] (Figure 2A). STATH is encoded by the STATH gene located in chromosome 4q13-19 [13,14]. Several STATH proteoforms are detectable in saliva due to phosphorylation, cyclization by transglutaminase 2, and proteolysis by amino-/carboxy-peptidases and convertase action [13,15,16]. P-B is a proline-rich small peptide encoded by the SMR3B gene, mapping on chromosome 4q13.3 [17], near the STATH gene, possibly sharing epigenetic control and/or the DNA replication timeframe [13,15,16]. HTN are small cationic histidine-rich peptides encoded by the HTN1 and HTN3 genes on chromosome 4q13. Despite their high sequence homology, HTN1 and HTN3 have different maturation pathways and biological activities [17,18,19].

CST are inhibitory cysteine proteases involved in the innate immune response [20]. CSTA and CSTB are encoded by CSTA and CSTB genes, respectively, whereas CST-SN, CST-SA, CST-C, CST-S, and CST-D are encoded by CST1-CST5 genes (Figure 2B). Several PTMs occur in CST proteins, including N-acetylation, proteolytic cleavages, phosphorylation, and M-, W-, and C-oxidation, causing different final protein structures detectable in human saliva [21]. Also, two isoforms generated by single amino acid substitutions of cystatin D and cystatin SN are present in saliva [21] (Figure 2B).

The amylase alpha 1A (AMY1A) gene, on chromosome 1p21.1, is responsible for the expression of AMY, which accounts for about 20% of the weight of salivary proteins and is the most abundant protein of the whole saliva of Homo sapiens.

Several comparative studies have shown that the human salivary proteome differs from other species due to genetic divergences that are possible due to environmental factors, including diet and pathogens [22,23,24,25]. A recent study reported the results obtained from the comparison of the salivary proteomes of Homo sapiens sapiens (modern humans) with our closest extant evolutionary relatives, chimpanzees, and gorillas [26]. The authors demonstrated that the salivary protein composition is unique to each species despite their close sequence homology, which likely reflects an evolutionary adaptation [26]. Despite this initial observation, the evolution of human loci-encoding salivary proteins has not been studied to date. Nowadays, the increasing amount of genomic data obtained through sequencing of preserved skeletal remains of extinct hominins, such as Homo neanderthalensis (Neanderthals) and Homo Denisova (Denisovans), can reveal the extent of diversity that has emerged at the genomic level during more recent human evolution.

In this study, we aimed to identify the sequence changes that have been fixed during the recent human evolution in the gene loci encoded for the most abundant salivary proteins (namely, PRPs, statherin, P-B peptide, histatins, cystatins, and amylases) to gather possible functional indications regarding their evolutionary path and their contribution to oral homeostasis and salivary functions. Eating habits may be indeed mutually implicated with salivary proteins’ biology since these are implicated in the modulation of the microbiome of the oral cavity and the entire gastrointestinal tract [26]. To achieve this, we have interrogated the publicly available sequence databases of Neanderthals and Denisovans and compared them with modern human genome sequence data. This allowed us to identify several nucleotide substitutions in the loci coding for the most relevant human salivary protein families.

2. Results

By comparing the genomic sequences of salivary gene loci in modern humans with those of Altai Neanderthals, Chagyrskaya Neanderthals, Vindija Neanderthasl, and Denisovans, we identified an overall number of 3472 sequence variants/nucleotide substitutions across the 17 tested salivary genes in coding, noncoding, 5′-3′ untranslated (UTRs), and regulatory regions. The nucleotide substitutions observed in the 17 salivary-tested genes were summarized in Figure 3. Of the 3472 changed nucleotides, only 428 were in coding regions, and 121 were annotated as synonymous (Figure 3). The remaining 307 nucleotide variations were nonsynonymous (Figure 3), which are known to be subjected to a higher evolutionary pressure and are frequently exposed to natural selection [27,28]. We have, therefore, attempted a functional interpretation of nonsynonymous variations, which is inherently speculative and deserves future functional studies. The potential impact of nonsynonymous variants on salivary proteins’ function of Neanderthals and Denisovans was predicted by a SIFT (sorting intolerant from tolerant) analysis (see Table 1, Table 2 and Table 3), which enables predicting amino acid substitutions that may exert a deleterious effect. The reference single nucleotide polymorphism (SNP) number (rs) and the corresponding frequencies of the 107 missense changes in coding regions were also reported in Table 1, Table 2 and Table 3. Of note, even though the nucleotide changes located in noncoding regions should not affect the primary structure of the encoded protein, they could affect regulatory elements that may modify the splicing and/or the binding of epigenetic modulators and/or chromatin folding/looping. The variants fixed at 100% in modern humans compared to ancient hominines were highlighted in light orange in Table 1, Table 2 and Table 3 and Tables S1–S17.

In the following subparagraphs, the results were detailed considering one locus at a time. Note that given the extreme structure heterogeneity of the tested genes with multiple alleles and different lengths, the nucleotide variations were indicated according to their genomic coordinates (see Section 4 for details).

2.1. Nucleotide Variations in the Gene Loci Encoding Basic Proline-Rich Proteins

2.1.1. PRB1 Gene

The genomic alignment allowed us to identify 130 nucleotide changes in the PRB1 gene in ancient hominines compared with modern humans (Table 1 and Table S1). Fifty-five of these were detected within coding exons and included ten synonymous and forty-five nonsynonymous nucleotide substitutions. Among the nonsynonymous nucleotide substitutions, 20 corresponded to SNPs annotated in modern humans (Table 1). SIFT prediction indicated that 46% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). The T-C transition, which occurred in modern humans at position 11,506,774, causing the substitution of R₇₂ with a Q in the II-2 isoform (Table 1 and Figure 4a), may have an impact on post-translational protein processing. Indeed, the modern human R₇₂ residue is part of the R₇₂SPR₇₅ consensus sequence recognized by the pro-protein convertase responsible for the cleavage between II-2 and P-E peptides. Therefore, we may hypothesize that in archaic species, the PRB-1-encoded protein was a fused peptide spanning 136 amino acids, which integrates the modern II-2 and P-E (Table 1 and Figure 4a). The sequences of the peptides and the resulting putative archaic protein primary structures (named PRB-1 salivary archaic fusion 1 peptide, PRB-1 SAF-1) are reported in Figure 4a. The remaining seventy-five nucleotide changes identified in the PRB1 locus were found to fall within noncoding regions, namely fifty-four in introns, six in upstream regions, one in the 5′ UTR, 1 in the 3′UTR, and thirteen in downstream regions (Table S1).

2.1.2. PRB2 Gene

One hundred and thirty-six nucleotide substitutions were detected in the PRB2 locus in ancient hominines compared with modern humans (Table 1 and Table S2). Thirty-seven of these were identified in introns, ten in upstream regions, one in the 3′UTR, and eight in downstream regions. The remaining eighty variations were found in coding regions, namely two in exon 1 (corresponding to the signal peptide), one in exon 2, and the remaining in exon 3 (Table 1 and Table S2). Of note, the modern human sequence reported in the UniProtKB database corresponded to the L allele coding for the common isoforms IB-8a Con1^- and P-H S₁, the first one with a P residue instead of an S at position 100, the second one with an S residue instead of an A at position 1 [8]. Of the 80 sequence variants found in coding exons, 64 were nonsynonymous, causing amino acid substitutions. SIFT prediction indicated that 19% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). Twenty-six out of the sixty-four nonsynonymous substitutions were annotated as common variants (SNPs) in modern humans (Table 1). In particular, two changes occurring at 11,546,686 bp and 11,546,677 bp caused the substitution of the R₉₃ and R₉₆ with Q within the ancient IB-1 isoform. The two archaic residues were found in all four species, (Table 1). This implied that the archaic hominins’ R₉₃SPR₉₆ consensus sequence, recognized by the pro-protein convertase, apparently lacked two key arginine residues, thus disabling the post-translational cleavage. Therefore, the ancient saliva composition should feature a protein deriving from the fusion of IB-1 and P-J peptides, spanning 157 amino acids (named the PRB-2 salivary archaic fusion 2 peptide, PRB-2 SAF-2 peptide, in Figure 4b). Conversely, the presence of a C nucleotide at 11,546,314 bp in Neanderthals and Denisovans, instead of T in modern humans, led to the introduction of an R instead of the Q₅₉ (Q₂₁₇ in pro-protein) of the IB-8a Con1^- isoform. This archaic primary structure would then include an additional pro-protein convertase consensus sequence, R₅₉SAR₆₂, causing the cleavage of the IB-8a Con1^- protein into two smaller peptides. According to the usual removal of the C-terminal arginine residue observed for almost all the bPRPs, both peptides should be 61 aminoacidic residues long (Figure 4c). These putative archaic hominins’ PRB-2 variants are named by us the PRB-2 salivary archaic cleavage 1 peptide (PRB-2 SAC-1 peptide) and the PRB-2 salivary archaic cleavage 2 peptide (PRB-2 SAC-2 peptide) and are shown in Figure 4c. Of note, the sequence of the PRB-2 SAC-1 peptide exactly corresponds to the sequence of the modern human P-J peptide with an alanine (A₆₁) instead of a serine in the last amino acid residue. The sequence of the PRB-2 SAC-2 peptide exactly corresponds to the modern human P-F peptide with a serine (S₆₁) instead of an alanine in the last amino acid residue (Figure 4d and [9]). The variation at 11,546,395 bp indicated that in archaic hominins, the P₃₁ (P₁₈₉ of pro-protein) residue was replaced by a Q in the IB-8a Con1^-; this change results probably in a deleterious effect on protein function, as predicted by SIFT analysis.

The protein name, the modifications with respect to modern humans, and the corresponding frequencies found in Neanderthals, Chagyrskayas, Vindijas and/or Denisovans are reported for each archaic protein. The positions of each substitution are also reported in the primary sequences (residues in bold characters). q: pyroglutamic acid; S: phosphorylated serine.

2.1.3. PRB3 Gene

We have identified 163 nucleotide variations in the PRB3 locus in ancient hominines compared with modern humans (Table 1 and Table S3). Of these, 53 were detected in coding regions and 110 in noncoding regions (71 within introns, 14 in upstream regions, 2 in the 3′UTR, and 23 in downstream regions; Table S3). The archaic sequences were compared with the allele Gl-2 (or PRP-3M) of modern humans. Fourteen variations identified in coding exons were synonymous, whereas thirty-nine changes were missense variants. Twelve out of the thirty-nine nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 1). PRP3 protein contains eight N-glycosylated Asp residues falling into the NXS/pS sequon; among the substitutions found in the PRB3 gene, only those at position 11,420,728 fall within the consensus sequence (S₁₃₆F), and deleterious results for the protein function were predicted by SIFT (Table 1). Overall, 37.5% of the substitutions were found to be deleterious on the protein function (Table 1). The noncoding variant found at position 11,420,458 could probably affect the splicing process of PRB3 transcripts in ancient hominins since it fell within the GU consensus site (splice donor site) at 5′ end of intron 3 (Table S3).

2.1.4. PRB4 Gene

For the PRB4 locus, we detected 129 nucleotide substitutions in ancient hominines compared with modern humans (Table 1 and Table S4). Of these, 27 were found in coding exons, including 4 synonymous and 23 nonsynonymous (Table 1), and 102 in noncoding regions (Table S4). The archaic sequence was compared with the small allele of the modern human locus coding for P-D peptides and glycosylated protein A (PGA). The 23 missense variants were all found within coding regions for the glycosylated protein A, while none of the identified variations would affect the P-D variant (see Table 1 for details). These variations had no consequence on the consensus sequence of pro-protein convertase or on the sequence of the glycosylation sites. It is interesting to observe that all the archaic sequences reported a code for the P-D P₃₂A variant. Overall, seven out of the twenty-three nonsynonymous in the PRB4 locus corresponded to annotated common variants in modern humans, and only 13% were found to be deleterious on the protein function (Table 1).

2.2. Nucleotide Variations in the Gene Locus Encoding the a-PRP

One hundred and sixty-three nucleotide substitutions have been annotated in the PRH2 gene locus in ancient hominines compared with modern humans (Table 2 and Table S5), of which thirty fell within coding exons, including seven synonymous and twenty-three nonsynonymous. Four of these latter corresponded to annotated common variants in modern humans (Table 2). Sixty-six nucleotide substitutions were identified in introns, seven in upstream regions, three in the 5′UTR, forty-nine in the 3′UTR, and eight in downstream regions (Table S5). The archaic DNA sequences reported in the sequence database used in this study (see Section 4 for details) corresponded to the PRP-1 protein of the PRH2 alleles, thus having a N₅₀ residue. The nucleotide variations reported in Table 1 generated two synonymous substitutions at D₆ and P₁₃₅.

2.3. Nucleotide Variations in the HTN Gene Loci

A total of 188 and 175 nucleotide substitutions were identified in the HTN1 and HTN3 genes, respectively (Table 2, Tables S6 and S7). The nucleotide substitutions reported in HTN1 are distributed as follows: 4 fell within coding exons, including1 synonymous and 3 nonsynonymous, and 184 fell in noncoding regions, including146 within introns, 6 in upstream regions, 3 in the 5′UTR, 9 in the 3′UTR, and 20 in downstream regions (Table 2 and Table S6). Regarding HTN3, 3 nucleotide changes were reported in coding exons (1 synonymous and 2 nonsynonymous), whereas 172 fell in noncoding regions (145 within introns, 9 in upstream regions, 3 in the 5′UTR, 5 in the 3′UTR, and 10 in downstream regions) (Table 2 and Table S7). One missense variant for HTN1 and one for HTN3 found in ancient hominins were also reported as SNPs in modern humans (Table 2).

2.4. Nucleotide Variations in the AMY1A Gene Locus

Two hundred and twelve nucleotide substitutions have been annotated in the AMY1A gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S8). Forty changes fell within coding exons, of which eleven were synonymous and twenty-nine were nonsynonymous. Only one of the nonsynonymous substitutions corresponded to an annotated common variant in modern humans (Table 2). One hundred forty-four nucleotide substitutions were identified in introns, four in upstream regions, nine in the 5′UTR, and fifteen in downstream regions (Table S8).

2.5. Nucleotide Variations in the STATH and P-B Gene Loci

One hundred fifty-nine nucleotide substitutions have been annotated in the STATH gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S9). Six changes fell within coding exons, of which two were synonymous and four were nonsynonymous (Table 2). One hundred fifty-three nucleotide substitutions were detected in introns and regulatory regions (Table S9).

One hundred eighty-seven nucleotide substitutions were detected in the SMR3B locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S10). Of these, 5 were found in coding exons (2 synonymous and 3 nonsynonymous), 155 were in introns, 3 in upstream regions, 3 in 5′UTRs, 10 in 3′UTR, and 11 in downstream regions (Table 2 and Table S10). One missense variant was reported as an SNP in modern humans (Table S10).

2.6. Nucleotide Variations in the CST Gene Loci

2.6.1. CST1 Gene

We have annotated 227 nucleotide substitutions in the CST1 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S11). Of these, 128 were found in introns, 19 in upstream regions, 7 in the 5′UTR, 12 in the 3′UTR, 32 in downstream regions (Table S11), and 29 in coding regions, including 11 synonymous and 18 missense variations (Table 3). The nucleotide variation at 23,731,494 bp caused the substitution of the Y₃(sp) with an H, affecting the third amino acid residue of the signal peptide. This should not impact the function of the protein, although it may have affected the speed of protein translation and/or the correct processing and trafficking. Four substitutions out of eighteen could have a negative impact on protein function, as predicted by SIFT. Overall, nine nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table S11).

2.6.2. CST2 Gene

We detected 167 nucleotide changes in the CST2 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S12). Of these, 103 were in introns, 15 in upstream regions, 8 in the 3′UTR, 17 in downstream noncoding regions (Table S12), and 24 in coding regions (Table 2). The latter included six synonymous and nineteen nonsynonymous variations, eight of which were predicted to have a deleterious effect on protein function (SIFT score < 0.05). Ten out of the eighteen nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 2). Interestingly, the nucleotide change at 23,804,691 bp fell into the canonical DNA-binding motif for the NR3C1 (nuclear receptor subfamily 3 group C member 1) transcription factor, as reported in the UCSC Genome Browser. This variation could most likely affect the affinity of this factor for the regulatory region and thus the expression of the CST2 gene.

2.6.3. CST3 Gene

In the CST3 locus, we have identified 452 nucleotide variations in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S13). Of these, 329 were in introns, 18 in upstream regions, 9 in 5′UTR, 50 in 3′UTR, 29 in downstream noncoding regions (Table S13), and 17 in coding regions, including 9 synonymous and 8 nonsynonymous variations (Table 2). One nucleotide substitution corresponded to an annotated common variant in modern humans (Table 2).

2.6.4. CST4 Gene

Two hundred and sixty-three nucleotide substitutions were detected in the CST4 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S14). These included 130 changes in introns, 42 in upstream regions, 4 in the 5′UTR, 20 in the 3′UTR, 43 in downstream noncoding regions (Table S14), and 24 in coding exons (11 synonymous and 13 missense variations; Table 3). Seven variations in this locus corresponded to annotated common variants in modern humans (Table 3). The change at 23,666,565 bp caused the substitution of the M₁₁₁ with an R in the corresponding Neanderthal peptide structure. Even if it causes the substitution of an uncharged amino acid with a charged one, the SIFT analysis did not predict a deleterious effect of this variant on the function of the archaic protein compared to modern humans.

2.6.5. CST5 Gene

One hundred ninety-three nucleotide substitutions were annotated in the CST5 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S15). Sixteen changes were mapped in the coding region, including eight synonymous and eight nonsynonymous (Table 3). Of the 177 nucleotide substitutions located in noncoding regions, 118 were in introns, 24 in upstream regions, 18 in 3′UTR, and 17 in downstream regions (Table S15). The exonic nucleotide variation generated the codon for an R in both archaic hominins instead of C₂₆. This represented a common variant also found in modern humans (rs1799841). The cystatin D variant with the R₂₆ is frequently detected in the soluble fraction of human saliva, probably because is more soluble than the C₂₆-containing isoform [19]. Moreover, the opposite substitution (R₂₆C) was detectable with high frequency at the same amino acid residue in the cystatin SA gene of Neanderthals. Five out of the eight nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table 3).

2.6.6. CSTA and CSTB Genes

Finally, 394 and 134 nucleotide substitutions were identified in CSTA and CSTB loci, respectively, in Neanderthals and Denisovans compared with modern humans (Table 3, Tables S16 and S17). The nucleotide substitutions reported in CSTA were distributed as follows: 6 fell in coding exons, including 2 synonymous and 4 nonsynonymous, and 388 fell in noncoding regions, including 346 in introns, 10 in upstream regions, 5 in the 5′UTR, 10 in the 3′UTR, and 17 in downstream regions (Table 3 and Table S16). Among these changes, the variation at 122,044,848-122,044,850 positions of CSTA was a CTT deletion, observed exclusively in Denisovans (Table S16). This fell within the canonical DNA-binding motif for the Spi-1 proto-oncogene transcription factor (source: UCSC Genome Browser); therefore, it could probably affect the expression of the CSTA gene in the ancient hominin. Regarding CSTB, 9 nucleotide changes were reported in coding exons (6 synonymous and 3 nonsynonymous), whereas 125 fell in noncoding regions (55 within introns, 27 in upstream regions, 5 in the 5′UTR, 15 in the 3′UTR, and 23 in downstream regions) (Table 3 and Table S17). One missense variant for CSTA and 1 for CSTB found in ancient hominins were also reported as an SNP in modern humans (Table 3).

2.7. Geographic Distribution of Genetic Variants in Modern Humans

Of note, the salivary protein genes tested resulted polymorphic in humans. The frequency of specific coding nonsynonymous genetic variants also changed between different populations, as reported in the Geography of Genetic Variants Browser (https://popgen.uchicago.edu/ggv; accessed on 22 July 2022) (File S1) [29]. In particular, 20 genetic variants (three in the PRB1 gene, six in PRB2, one in PRB3, two in CST1, four in CST2, three in CST5, and one in CSTB; highlighted in red in Table 1, Table 2 and Table 3) displayed a different geographic distribution and specifically; rs554211998, rs201994479, rs34305575, rs6076122, rs111349461, rs55860552, rs568411970, rs145031249, and rs1799841 showed a peculiar allele frequency in African populations (File S1).

2.8. Evolutionary Pressure of Salivary Protein Genes

To investigate if some of the salivary protein genes studied showed evidence of positive selection in anatomically modern humans, we performed a population branch statistics (PBS) analysis [30]. Our results showed no signal of recent selective pressure for the genes analysed, attesting that variants on these genes did not affect individual fitness (File S2). We also implemented the Tajima test as an additional evolutionary analysis to evaluate the selective effects of each observed substation. Tajima’s D values show comparable variance among the genes analysed. The D values were prevalently slightly negative or positive (ranging from −0.698 to 3.359) (File S3), confirming the absence of a selective sweep [31], which was already suggested by the PBS test.

Compared to modern humans, Neanderthal and Denisovan genomes showed evidence of ancient interbreed [32], leading to an uneven distribution of introgressed chromosomal regions because of natural selection [33]. To investigate if some of the salivary protein gene variants studied might be due to interbreeding, we used two databases of archaic introgression based on a comparison with modern genomes from the 1000 genomes project [34] and the Estonian Biocentre collection [35], which also reported data from previous studies [33,36]. However, the considered genes were not encompassed within the chromosomal regions highlighted in the databases and, therefore, did not show an apparent sign of adaptative introgression from archaic hominins.

3. Discussion

The different dietary habits of archaic hominins and modern humans have been mostly attributed to the changes in the availability of natural food resources, the oral bacterial community (microbiota), and climatic conditions [37,38]. A role for salivary proteins can be also inferred, as they are known to be implicated in the modulation of the microbiome of the oral cavity, the entire gastrointestinal tract, and taste perception [39]. aPRPs can promote the attachment of several important bacteria, such as Actinomyces viscosus, Bacteroides gingival, and some strains of Streptococcus mutans. Moreover, both aPRPs and statherin promote the colonization of oral surfaces by Porfiromonas gingivalis [40]. It was reported that the salivary proteins may modulate oral health and homeostasis, maintain a stable ecosystem, and inhibit the growth of cariogenic bacteria [41,42]. Recently, 258 salivary proteins were found differentially expressed between the caries-free and caries-active children [43]. They are also involved in taste perception. In particular, the salivary bPRPs II-2 and Ps-1 contribute to bitter taste sensitivity [44]. Also, some salivary peptides belonging to the bPRPs and the histatin families can bind polyphenols in tannin-rich foods, thus evoking the typical astringent sensation [44]. Salivary proteins play an important role in affecting sweet [45], salt [46], and umami [47] tastes, along with fat, salt, and bitter acceptance [48,49]. Also, cystatins are supposed to affect taste perception, as lower salivary levels of these peptides may enhance proteolysis, which would affect the mucosal pellicle lining of the oral cavity, thereby increasing the accessibility of tastants to taste receptors [49]. Interestingly, most of these proteins have been shown to be modulated in pathological conditions, including tumors and inflammation, suggesting that they play a role as clinically relevant biomarkers [5].

Therefore, a hypothesis has been raising that the evolutionary changes occurred in the structure of these proteins could be associated with the different dietary habits of archaic hominins. In this regard, mutations in different bitter taste receptor genes (namely TAS2R62, TAS2R64, and TAS2R38) and the masticatory myosin gene MYH16, along with the duplication of the salivary amylase gene AMY1 that has occurred in recent human evolution, have been associated with variations in taste sensitivity and the shift toward the food cooking habits of modern humans [50].

Based on this emerging background, in this study, we identified and inferred the functional consequences of the nucleotide substitutions fixed in the gene loci coding for the main salivary proteins in modern humans compared to ancient hominins species (Neanderthals and Denisovans).

By mapping over 3400 nucleotide substitutions, we have shown that the majority (87.7%) of changes are detectable in the genes expressing the most important salivary proteins (proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases) of modern humans, compared with Neanderthals and Denisovans, mapped within noncoding regions.

Quite unexpectedly, our data also showed the presence of nucleotide variations affecting the coding sequence of all 17 gene loci analysed. Overall, the frequency of coding variations in these genomic loci is far higher than the general rate found throughout the genome since previous studies highlighted that relatively few amino acid changes have become fixed in recent human evolution to date [51,52]. To the best of our knowledge, this study provides the first original description of coding nucleotide changes that occurred in salivary protein genes during the recent evolutionary shift of modern humans from Neanderthal and Denisovan species. Focusing on these missense variations, we hypothesized the possible functional effects they could have played in protein structure, processing, and function. Of the 307 missense changes found in the coding regions of the tested genes, 92 were predicted to have a potentially deleterious effect on protein function.

The changes identified in the PRB1 and PRB2 genes are worth particular attention and could be interpreted in light of the extant knowledge of the biology of the encoded proteins. As already mentioned, the PRB protein family is highly polymorphic and, despite being common to all mammals, the proteins belonging to this family feature have significant structural differences among species. For instance, the peptides generated by the convertase cleavage span 50 to 90 amino acids in length in humans and 10 to 40 in pigs, with sensible variations in the peptide sequences [53]. Therefore, bPRPs appear to be non-conserved across species, probably because they are mostly implicated in taste perception and underwent a deep transformation during evolution due to the changing habits and habitats of the species [44]. Interestingly, our results showed that three nucleotide substitutions annotated in the archaic hominins’ PRB1 and PRB2 genes affect specific arginine residues within the consensus sequences of the polypeptide, which are recognized by the pro-protein convertases responsible for their cleavage. These changes could have determined the presence of fused proteins in the archaic hominins’ proteome. The putative “PRB1 salivary archaic fusion 1 peptide” and “PRB2 salivary archaic fusion 2 peptide” could have been possibly associated with additional and/or alternative functions that able to influence the eating habits of extinct hominins. In addition, we have also identified a sequence change in the PRB2 gene that instead generates a new pro-protein convertase consensus sequence in the encoded peptide. As a result, ancient hominins could have expressed two smaller peptides, the “PRB2 salivary archaic cleavage 1 peptide” and the “PRB2 salivary archaic cleavage 2 peptide”, possibly exerting alternative functions, which deserve further functional studies.

The missense nucleotide substitutions annotated in the remaining salivary protein genes described in this study (aPRPs, histatins, amylases, statherin, P-B peptide, and cystatins) could be interpreted, at least in part, considering the putative changes that they can cause in post-translational protein processing, sorting, localization, and trafficking toward secretion. In addition, all the missense variations that introduce or remove a cysteine residue on the archaic cystatins, most likely affecting the conserved sequences involved in the protein-protein binding [53], could also influence protein function.

We also annotated the nucleotide variations fixed within the noncoding regions of modern humans of the tested genes, given these could reasonably affect the expression levels of salivary proteins by changing the affinity of transcriptional regulators for promoters, enhancer and/or silencer elements, and/or the splicing, in addition to changing splice site consensus sequences and leading to the formation of alternative coding transcripts. Also, they could affect post-transcriptional regulation mechanisms, such as the binding of the noncoding regulatory RNAs, leading to varying protein types and amounts that emerged during the recent evolution. Specifically, two nucleotide substitutions found in the CST2 and CSTA gene loci appear to fall within the canonical DNA-binding motifs for specific transcriptional factors, which could most likely intervene in the modulation of their expression. We also annotated 216 changes in the 3′ untranslated regions in 16 of the 17 genes analysed (in all but AMY1A). These substitutions might instead condition the binding of specific microRNA-targeting salivary protein transcripts, modulating their stability and the translation process.

Lastly, 34.9% of the nonsynonymous nucleotide substitutions identified in this study appear to be frequent in the modern human genome, where they are annotated as single nucleotide polymorphisms (SNPs). In addition, some of these coding genetic variants display a different geographic distribution in humans. This observation reduces the evolutionary significance of such changes, which are to be considered in light of the polymorphic nature of these genomic loci. However, taken together, variants showing alternative nucleotide fixation in modern vs. archaic humans represent 7.3% of all the nucleotide substitutions reported in the study.

Also, our results do not suggest any significant evolutionary pressure or sign of adaptative introgression from archaic hominins on the tested genes.

4. Materials and Methods

4.1. Nucleotide Variants Annotation

In order to annotate all the nucleotide variants within the gene loci of the salivary proteins of interest, we compared modern human sequences with Altai Neanderthals (downloaded from http://cdna.eva.mpg.de/Neanderthal/altai/AltaiNeanderthal/bam/, accessed on 2 May 2020), Chagyrskaya Neanderthals (Index of/neandertal/Chagyrskaya/BAM (mpg.de), accessed on 9 December 2022), Vindija Neanderthals (Index of/neandertal/Vindija/bam/Pruefer_etal_2017/Vindija33.19 (mpg.de), accessed on 9 December 2022), and Denisova sequences (http://cdna.eva.mpg.de/denisova/alignments/, accessed on 2 May 2020) [54,55]. The fossil remains, aged between 50,000 and 30,000 years, come from two distinct geographical areas. The female Neanderthal sample from Vindija (Croatia), in the Western Balkans, yielded a 30× genome coverage [56]. The other samples came from two different sites in the Altai Mountains in Siberia (Russia): the genomic data of a female Neanderthal (at 52× coverage) [57] and a juvenile female Denisovan individual (at 30× coverage) [55] came from the Denisova cave, and another female sample came from the Chagyrskaya cave, located about 100 km westward, and yielded a genome of 27× coverage [58]. In particular, we aligned the sequences of modern humans and ancient hominines by means of the Integrative Genomics Viewer (IGV) tool (2.3.72 version) [59,60,61]. Note that the reference genomes annotated in this database are set on the hg19 genome assembly coordinates. We annotated all the nucleotide substitutions with a frequency greater than 10% and a coverage of a minimum of 10 counts in both coding, noncoding, and regulatory sequences (i.e., 5′ and 3′ untranslated and flanking upstream and downstream regulatory regions) for each gene of interest to consider the possible damage and fragmentation to which the ancient hominin DNA was subjected. Of note, the variant frequency indicated the percentage of frequency of that substitution in ancient hominines, as reported by the IGV tool, considering the depth (coverage) of the reads displayed at each locus. For each tested gene, a region of approximately 500 bp upstream and downstream of the first and last exons was, respectively, considered and screened to annotate nucleotide substitutions within regulatory regions able to affect the gene expression rate. The precise hg19 genomic coordinates for each tested gene locus were as follows: PRB1 locus 11,509,000–11,504,200 on chromosome 12; PRB2 locus 11,549,000–11,544,000 on chromosome 12; PRB3 locus 11,423,140–11,418,300 on chromosome 12; PRB4 locus 11,463,900–11,459,500 on chromosome 12; PRH2 locus 11,081,500–11,087,950 on chromosome 12; HTN1 locus 70,915,750–70,925,000 on chromosome 4; HTN3 locus 70,893,670–70,902,700 on chromosome 4; AMY1A locus 104,239,500–104,229,500 on chromosome 1; STATH locus 70,861,200–70,868,790 on chromosome 4; SMR3B locus 71,248,550–71,256,400 on chromosome 4; CST1 locus 23,732,000–23,727,600 on chromosome 20; CST2 locus 23,807,800–23,803,900 on chromosome 20; CST3 locus 23,619,100–23,606,800 on chromosome 20; CST4 locus 23,670,200–23,665,700 on chromosome 20; CST5 locus 23,860,900–23,856,000 on chromosome 20; CSTA locus 122,043,600–122,061,300 on chromosome 3; and CSTB locus 45,196,800–45,193,000 on chromosome 21.

The annotation with the corresponding frequency of all variations in present-day human populations was collected by integrating information from both the dbSNP (Single Nucleotide Polymorphism Database; https://www.ncbi.nlm.nih.gov/snp, accessed on 15 July 2020) and the Ensembl (http://www.ensembl.org/index.html, accessed on 15 July 2020) databases. In particular, the frequency was reported as the Allele Frequency Aggregator (ALFA New). The analysis of regulatory regions in the gene loci analysed was assessed by implementing the information available on the UCSC Genome Browser database (https://genome.ucsc.edu, accessed on 15 July 2020).

The coding sequences of salivary proteins were extracted from the publicly available UniProtKB database (https://www.uniprot.org/, accessed on 15 July 2020): PRB1, primary accession number: P04280; PRB2: P02812; PRB3: Q04118; PRB4: P10163; PRH2: P02810; HTN1: P15515; HTN3: P15516; STATH: P02808; AMY1A: P0DUB6; P-B: P02814, CST1: P01037; CST2: P09228; CST3: P01034; CST4: P01036; CST5: P28325, CSTA: P01040, CSTB: P04080.

4.2. Protein Data Analysis

The potential impact of the amino acid substitution on salivary protein function was predicted by SIFT (sorting intolerant from tolerant) version 5.1.1 using the Genome tool (SIFT nonsynonymous single nucleotide variants (genome-scale), available at the SIFT website (http://sift.jcvi.org/, accessed on 20 June 2022). The SIFT algorithm is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST [62]. SIFT results with a score < 0.05 indicate amino acids deleterious on protein function.

4.3. Selective Pressure Analysis

To detect any possible trace of selective pressure, PBS has been applied. PBS is a statistical three-population test based on the FST fixation index, and it has proven to be one of the best methods of detecting signs of recent natural selection on genomes [31]. Regarding the choice of the three populations, we used three distant populations worldwide (CEU for Europe, CHB for Asia, and YRI for Africa), which are the most commonly used [63,64] and are among the first populations released by the 1000 Genomes, Phase 1 [64].

FST among three possible populations pairs (CEU, CHB, and YRI) has been calculated by VCFtools v0.1.16 [65] using VCF files of each gene under scrutiny. The genes were previously filtrated with Plink 1.9 [66] to keep only the variants with MAF ≥ 0.05. Then, PBS and relative plots were performed with R Studio software (R Core Team 2021, https://www.R-project.org, accessed on 2 December 2022).

5. Conclusions

In conclusion, the nucleotide substitutions that have putatively affected the amino acid composition, the post-translational modification, and/or the gene expression levels of salivary proteins described in this study might have generated novel functional features and a different expression ratio among the several components of the salivary proteome. Given the largely unknown functional roles of most salivary proteins, we may only speculate that these changes could have ultimately modified the entire homeostasis of the oral cavity environment, possibly conditioning the eating habit lifestyle of modern humans. Our data may pave the way to unravelling evolutionary processes that have occurred through changes of salivary composition in the oral cavity homeostasis. This knowledge could provide additional novel cues toward a better understanding of the ability of different species to adapt to different and changing environments.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms241915010/s1.

Author Contributions

Conceptualization: M.C. and O.P.; data elaboration and collection, L.D.P., M.C., M.B., B.M. and A.O.; manuscript editing, L.D.P., W.L., M.C., B.M., T.C., O.P. and S.S. All authors contributed to the discussion and revision of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the FIR 2021 funds (Cagliari, Italy) to T.C. and the “Linea D.1–D.3.1” funds from the Università Cattolica del Sacro Cuore (Rome, Italy) to L.D.P., W.L., and O.P.

Data Availability Statement

All data reported in this manuscript are shown in the results section and further supported by the extended datasets provided in the supplementary files. No new primary datasets to be deposited have been generated.

Acknowledgments

We thank Luca Pagani (Università di Padova) for their useful advice on adaptative introgression.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cabras, T.; Iavarone, F.; Manconi, B.; Olianas, A.; Sanna, M.T.; Castagnola, M.; Messana, I. Top-down analytical platforms for the characterization of the human salivary proteome. Bioanalysis 2014, 6, 563–581. [Google Scholar] [CrossRef] [PubMed]
Bandhakavi, S.; Stone, M.D.; Onsongo, G.; Van Riper, S.K.; Griffin, T.J. A Dynamic Range Compression and Three-Dimensional Peptide Fractionation Analysis Platform Expands Proteome Coverage and the Diagnostic Potential of Whole Saliva. J. Proteome Res. 2009, 8, 5590–5600. [Google Scholar] [CrossRef] [PubMed]
Vila, T.; Rizk, A.M.; Sultan, A.S.; Jabra-Rizk, M.A. The power of saliva: Antimicrobial and beyond. PLoS Pathog. 2019, 15, e1008058. [Google Scholar] [CrossRef]
Ngo, L.H.; Veith, P.D.; Chen, Y.Y.; Chen, D.; Darby, I.B.; Reynolds, E.C. Mass Spectrometric Analyses of Peptides and Proteins in Human Gingival Crevicular Fluid. J. Proteome Res. 2010, 9, 1683–1693. [Google Scholar] [CrossRef] [PubMed]
Boroumand, M.; Olianas, A.; Cabras, T.; Manconi, B.; Fanni, D.; Faa, G.; Desiderio, C.; Messana, I.; Castagnola, M. Saliva, a bodily fluid with recognized and potential diagnostic applications. J. Sep. Sci. 2021, 44, 3677–3690. [Google Scholar] [CrossRef]
Beeley, J.A. Basic proline-rich proteins: Multifunctional defence molecules? Oral Dis. 2012, 7, 69–70. [Google Scholar] [CrossRef]
Hajishengallis, G.; Russell, M.W. Innate Humoral Defense Factors. Mucosal Immunol. 2015, 1, 251–270. [Google Scholar] [CrossRef]
Lyons, K.M.; Azen, E.A.; Goodman, P.A.; Smithies, O. Many protein products from a few loci: Assignment of human salivary proline-rich proteins to specific loci. Genetics 1988, 120, 255–265. [Google Scholar] [CrossRef]
Padiglia, A.; Orrù, R.; Boroumand, M.; Olianas, A.; Manconi, B.; Sanna, M.T.; Desiderio, C.; Iavarone, F.; Liori, B.; Messana, I.; et al. Extensive Characterization of the Human Salivary Basic Proline-Rich Protein Family by Top-Down Mass Spectrometry. J. Proteome Res. 2018, 17, 3292–3307. [Google Scholar] [CrossRef]
Manconi, B.; Castagnola, M.; Cabras, T.; Olianas, A.; Vitali, A.; Desiderio, C.; Sanna, M.T.; Messana, I. The intriguing heterogeneity of human salivary proline-rich proteins. J. Proteom. 2016, 134, 47–56. [Google Scholar] [CrossRef]
Lyons, K.M.; Stein, J.H.; Smithies, O. Length polymorphisms in human proline-rich protein genes generated by intragenic unequal crossing over. Genetics 1988, 120, 267–278. [Google Scholar] [CrossRef] [PubMed]
Azen, E.A.; Amberger, E.; Fisher, S.; Prakobphol, A.; Niece, R.L. PRB1, PRB2, and PRB4 coded polymorphisms among human salivary concanavalin-A binding, II-1, and Po proline-rich proteins. Am. J. Hum. Genet. 1966, 58, 143–153. [Google Scholar]
Messana, I.; Cabras, T.; Pisano, E.; Sanna, M.T.; Olianas, A.; Manconi, B.; Pellegrini, M.; Paludetti, G.; Scarano, E.; Fiorita, A.; et al. Trafficking and Postsecretory Events Responsible for the Formation of Secreted Human Salivary Peptides: A Proteomics Approach. Mol. Cell. Proteom. 2008, 7, 911–926. [Google Scholar] [CrossRef] [PubMed]
Jensen, J.L.; Lamkin, M.S.; Troxler, R.F.; Oppenheim, F.G. Multiple forms of statherin in human salivary secretions. Arch. Oral Biol. 1991, 36, 529–534. [Google Scholar] [CrossRef] [PubMed]
Inzitari, R.; Cabras, T.; Rossetti, D.V.; Fanali, C.; Vitali, A.; Pellegrini, M.; Paludetti, G.; Manni, A.; Giardina, B.; Messana, I.; et al. Detection in human saliva of different statherin and P-B fragments and derivatives. Proteomics 2006, 6, 6370–6379. [Google Scholar] [CrossRef]
Cabras, T.; Inzitari, R.; Fanali, C.; Scarano, E.; Patamia, M.; Sanna, M.T.; Pisano, E.; Giardina, B.; Castagnola, M.; Messana, I. HPLC–MS characterization of cyclo-statherin Q-37, a specific cyclization product of human salivary statherin generated by transglutaminase 2. J. Sep. Sci. 2006, 29, 2600–2608. [Google Scholar] [CrossRef]
Torres, P.; Castro, M.; Reyes, M.; Torres, V. Histatins, wound healing, and cell migration. Oral Dis. 2018, 24, 1150–1160. [Google Scholar] [CrossRef]
Castagnola, M.; Inzitari, R.; Rossetti, D.V.; Olmi, C.; Cabras, T.; Piras, V.; Nicolussi, P.; Sanna, M.T.; Pellegrini, M.; Giardina, B.; et al. A Cascade of 24 Histatins (Histatin 3 Fragments) in Human Saliva: Suggestion for a Pre-Secretory Sequential Cleavage Pathway. J. Biol. Chem. 2004, 279, 41436–41443. [Google Scholar] [CrossRef]
Wang, G. Human Antimicrobial Peptides and Proteins. Pharmaceuticals 2014, 7, 545–594. [Google Scholar] [CrossRef]
Dickinson, D.P. Cysteine peptidases of mammals: Their biological roles and potential effects in the oral cavity and other tissues in health and disease. Crit. Rev. Oral Biol. Med. 2022, 13, 238–275. [Google Scholar] [CrossRef]
Manconi, B.; Liori, B.; Cabras, T.; Vincenzoni, F.; Iavarone, F.; Castagnola, M.; Messana, I.; Olianas, A. Salivary Cystatins: Exploring New Post-Translational Modifications and Polymorphisms by Top-Down High-Resolution Mass Spectrometry. J. Proteome Res. 2017, 16, 4196–4207. [Google Scholar] [CrossRef] [PubMed]
Perry, G.H.; Dominy, N.J.; Claw, K.G.; Lee, A.S.; Fiegler, H.; Redon, R.; Werner, J.; Villanea, F.A.; Mountain, J.L.; Misra, R.; et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 2007, 39, 1256–1260. [Google Scholar] [CrossRef] [PubMed]
Polley, S.; Louzada, S.; Forni, D.; Sironi, M.; Balaskas, T.; Hains, D.S.; Yang, F.; Hollox, E.J. Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc. Natl. Acad. Sci. USA 2015, 112, 5105–5110. [Google Scholar] [CrossRef] [PubMed]
Xu, D.; Pavlidis, P.; Taskent, R.O.; Alachiotis, N.; Flanagan, C.; DeGiorgio, M.; Blekhman, R.; Ruhl, S.; Gokcumen, O. Archaic Hominin Introgression in Africa Contributes to Functional Salivary MUC7 Genetic Variation. Mol. Biol. Evol. 2017, 34, 2704–2715. [Google Scholar] [CrossRef]
Xu, D.; Pavlidis, P.; Thamadilok, S.; Redwood, E.; Fox, S.; Blekhman, R.; Ruhl, S.; Gokcumen, O. Recent evolution of the salivary mucin MUC7. Sci. Rep. 2016, 6, 31791. [Google Scholar] [CrossRef] [PubMed]
Thamadilok, S.; Choi, K.S.; Ruhl, L.; Schulte, F.; Kazim, A.L.; Hardt, M.; Gokcumen, O.; RuhL, S. Human and Nonhuman Primate Lineage-Specific Footprints in the Salivary Proteome. Mol. Biol. Evol. 2020, 37, 395–405. [Google Scholar] [CrossRef] [PubMed]
Edwards, A.W.F. The Genetical Theory of Natural Selection. Genetics 2000, 154, 1419–1426. [Google Scholar] [CrossRef]
Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA 2010, 107, 961–968. [Google Scholar] [CrossRef]
Marcus, J.H.; Novembre, J. Visualizing the geography of genetic variants. Bioinformatics 2017, 33, 594–595. [Google Scholar] [CrossRef]
Yi, X.; Liang, Y.; Huerta-Sanchez, E.; Jin, X.; Cuo, Z.X.; Pool, J.E.; Xu, X.; Jiang, H.; Vinckenbosch, N.; Korneliussen, T.S.; et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 2010, 329, 75–78. [Google Scholar] [CrossRef]
Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989, 123, 585–595. [Google Scholar] [CrossRef] [PubMed]
Skoglund, P.; Jakobsson, M. Archaic human ancestry in East Asia. Proc. Natl. Acad. Sci. USA 2011, 108, 18301–18306. [Google Scholar] [CrossRef] [PubMed]
Sankararaman, S.; Mallick, S.; Patterson, N.; Reich, D. The Combined Landscape of Denisovan and Neanderthal Ancestry in Present-Day Humans. Curr. Biol. 2016, 26, 1241–1247. [Google Scholar] [CrossRef]
Racimo, F.; Marnetto, D.; Huerta-Sánchez, E. Signatures of Archaic Adaptive Introgression in Present-Day Human Populations. Mol. Biol. Evol. 2017, 34, 296–317. [Google Scholar] [CrossRef]
Jagoda, E.; Lawson, D.J.; Wall, J.D.; Lambert, D.; Muller, C.; Westaway, M.; Leavesley, M.; Capellini, T.D.; Mirazón Lahr, M.; Gerbault, P.; et al. Disentangling Immediate Adaptive Introgression from Selection on Standing Introgressed Variation in Humans. Mol. Biol. Evol. 2018, 35, 623–630. [Google Scholar] [CrossRef] [PubMed]
Vernot, B.; Akey, J.M. Resurrecting surviving Neandertal lineages from modern human genomes. Science 2014, 343, 1017–1021. [Google Scholar] [CrossRef] [PubMed]
Weyrich, L.S.; Duchene, S.; Soubrier, J.; Arriola, L.; Llamas, B.; Breen, J.; Morris, A.G.; Alt, K.W.; Caramelli, D.; Dresely, V.; et al. Neanderthal behaviour, diet, and disease inferred from ancient DNA in dental calculus. Nature 2017, 544, 357–361. [Google Scholar] [CrossRef] [PubMed]
El Zaatari, S.; Grine, F.E.; Ungar, P.S.; Hublin, J.J. Neandertal versus Modern Human Dietary Responses to Climatic Fluctuations. PLoS ONE 2016, 11, e0153277. [Google Scholar] [CrossRef]
Cornejo Ulloa, P.; van der Veen, M.H.; Krom, B.P. Review: Modulation of the oral microbiome by the host to promote ecological balance. Odontology 2019, 107, 437–448. [Google Scholar] [CrossRef]
Lamont, R.J.; Jenkinson, H.F. Subgingival colonization by Porphyromonas gingivalis. Oral Microbiol. Immunol. 2000, 15, 341–349. [Google Scholar] [CrossRef]
Laputková, G.; Schwartzová, V.; Bánovčin, J.; Alexovič, M.; Sabo, J. Salivary Protein Roles in Oral Health and as Predictors of Caries Risk. Open Life Sci. 2018, 13, 174–200. [Google Scholar] [CrossRef] [PubMed]
Lynge Pedersen, A.M.; Belstrøm, D. The role of natural salivary defences in maintaining a healthy oral microbiota. J. Dent. 2019, 80, S3–S12. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Jiang, Q.; Yan, G.; Yang, D. The oral microbiome and salivary proteins influence caries in children aged 6 to 8 years. BMC Oral Health 2020, 20, 295. [Google Scholar] [CrossRef] [PubMed]
Cabras, T.; Melis, M.; Castagnola, M.; Padiglia, A.; Tepper, B.J.; Messana, I.; Tomassini Barbarossa, I. Responsiveness to 6-n-Propylthiouracil (PROP) Is Associated with Salivary Levels of Two Specific Basic Proline-Rich Proteins in Humans. PLoS ONE 2012, 7, e30962. [Google Scholar] [CrossRef]
Rodrigues, L.; Costa, G.; Cordeiro, C.; Pinheiro, C.; Amado, F.; Lamy, E. Salivary proteome and glucose levels are related with sweet taste sensitivity in young adults. Food Nutr. Res. 2017, 61, 1389208. [Google Scholar] [CrossRef]
Stolle, T.; Grondinger, F.; Dunkel, A.; Meng, C.; Médard, G.; Kuster, B.; Hofmann, T. Salivary Proteome Patterns Affecting Human Salt Taste Sensitivity. J. Agric. Food Chem. 2017, 65, 9275–9286. [Google Scholar] [CrossRef]
Scinska-Bienkowska, A.; Wrobel, E.; Turzynska, D.; Bidzinski, A.; Jezewska, E.; Sienkiewicz-Jarosz, H.; Golembiowska, K.; Kostowski, W.; Kukwa, A.; Plaznik, A.; et al. Glutamate concentration in whole saliva and taste responses to monosodium glutamate in humans. Nutr. Neurosci. 2006, 9, 25–31. [Google Scholar] [CrossRef]
Méjean, C.; Morzel, M.; Neyraud, E.; Issanchou, S.; Martin, C.; Bozonnet, S.; Urbano, C.; Schlich, P.; Hercberg, S.; Péneau, S.; et al. Salivary Composition Is Associated with Liking and Usual Nutrient Intake. PLoS ONE 2015, 10, e0137473. [Google Scholar] [CrossRef]
Morzel, M.; Chabanet, C.; Schwartz, C.; Lucchi, G.; Ducoroy, P.; Nicklaus, S. Salivary protein profiles are linked to bitter taste acceptance in infants. Eur. J. Pediatr. 2014, 173, 575–582. [Google Scholar] [CrossRef]
Perry, G.H.; Kistler, L.; Kelaita, M.A.; Sams, A.J. Insights into hominin phenotypic and dietary evolution from ancient DNA sequence data. J. Hum. Evol. 2015, 79, 55–63. [Google Scholar] [CrossRef]
Green, R.E.; Krause, J.; Briggs, A.W.; Maricic, T.; Stenzel, U.; Kircher, M.; Patterson, N.; Li, H.; Zhai, W.; Fritz, M.H.; et al. A Draft Sequence of the Neandertal Genome. Science 2010, 328, 710–722. [Google Scholar] [CrossRef] [PubMed]
Burbano, H.A.; Hodges, E.; Green, R.E.; Briggs, A.W.; Krause, J.; Meyer, M.; Good, J.M.; Maricic, T.; Johnson, P.L.; Xuan, Z.; et al. Targeted Investigation of the Neandertal Genome by Array-Based Sequence Capture. Science 2010, 328, 723–725. [Google Scholar] [CrossRef] [PubMed]
Bode, W.; Engh, R.; Musil, D.; Thiele, U.; Huber, R.; Karshikov, A.; Brzin, J.; Kos, J.; Turk, V. The 2.0 A X-ray crystal structure of chicken egg white cystatin and its possible mode of interaction with cysteine proteinases. EMBO J. 1988, 7, 2593–2599. [Google Scholar] [CrossRef] [PubMed]
Mednikova, B.B. A Proximal Pedal Phalanx of a Paleolithic Hominin from Denisova Cave, Altai. Archaeol. Ethnol. Anthropol. Eurasia 2011, 39, 129–138. [Google Scholar] [CrossRef]
Meyer, M.; Kircher, M.; Gansauge, M.T.; Li, H.; Racimo, F.; Mallick, S.; Schraiber, J.G.; Jay, F.; Prüfer, K.; de Filippo, C.; et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 2012, 338, 222–226. [Google Scholar] [CrossRef]
Prüfer, K.; de Filippo, C.; Grote, S.; Mafessoni, F.; Korlević, P.; Hajdinjak, M.; Vernot, B.; Skov, L.; Hsieh, P.; Peyrégne, S.; et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 2017, 358, 655–658. [Google Scholar] [CrossRef]
Prüfer, K.; Racimo, F.; Patterson, N.; Jay, F.; Sankararaman, S.; Sawyer, S.; Heinze, A.; Renaud, G.; Sudmant, P.H.; de Filippo, C.; et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 2014, 505, 43–49. [Google Scholar] [CrossRef]
Mafessoni, F.; Grote, S.; de Filippo, C.; Slon, V.; Kolobova, K.A.; Viola, B.; Markin, S.V.; Chintalapati, M.; Peyrégne, S.; Skov, L.; et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl. Acad. Sci. USA 2020, 117, 15132–15136. [Google Scholar] [CrossRef]
Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative genomics viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef]
Thorvaldsdottir, H.; Robinson, J.T.; Mesirov, J.P. Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Brief. Bioinform. 2013, 14, 178–192. [Google Scholar] [CrossRef]
Robinson, J.T.; Thorvaldsdóttir, H.; Wenger, A.M.; Zehir, A.; Mesirov, J.P. Variant Review with the Integrative Genomics Viewer. Cancer Res. 2017, 77, e31–e34. [Google Scholar] [CrossRef] [PubMed]
Ng, P.C. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003, 31, 3812–3814. [Google Scholar] [CrossRef] [PubMed]
Pfeifer, B.; Alachiotis, N.; Pavlidis, P.; Schimek, M.G. Genome scans for selection and introgression based on k-nearest neighbour techniques. Mol. Ecol. Resour. 2020, 20, 1597–1609. [Google Scholar] [CrossRef] [PubMed]
Bhatia, G.; Patterson, N.; Pasaniuc, B.; Zaitlen, N.; Genovese, G.; Pollack, S.; Mallick, S.; Myers, S.; Tandon, A.; Spencer, C.; et al. Genome-wide comparison of African-ancestry populations from CARe and other cohorts reveals signals of natural selection. Am. J. Hum. Genet. 2011, 89, 368–381. [Google Scholar] [CrossRef]
Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef]

Figure 1. Schematic representation of basic proline-rich genes and encoded proteins: PRB1 (A), PRB2 (B), PRB3 (C), PRB4 (D). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks, with the corresponding symbol on top. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; aminoacidic substitutions are shown for selected isoforms. See text for additional details.

Figure 2. Schematic representation of acidic proline-rich proteins (A) and cystatins (B). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks with corresponding symbols on top. All cystatin alternative proteoforms feature two disulfide bridges (indicated by brackets between Cys), oxidation (ox), and phosphorylation (P) sites. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; ox: oxidation sites; p-E: N-terminal pyroglutamic acid; aminoacidic substitutions are shown for selected isoforms. See text for additional details.

Figure 3. Nucleotide substitutions in salivary protein genes. The pie chart shows the type and number of 3472 nucleotide substitutions across the 17 tested salivary genes. In particular, the 428 substitutions found in coding regions included 307 nonsynonymous changes across all the 17 genes tested. See text for additional details.

Figure 4. Predicted archaic hominins’ PRB-1 (panel (a)) and PRB-2 (panels (b–d)) protein variants.

Table 1. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRB1, PRB2, PRB3, and PRB4 gene loci.

Chromosome Position (hg19)	Gene Region	Modern Human	Altai Neanderthal (Variant Frequency ^a)	Chagyrskaya Neanderthal (Variant Frequency ^a)	Vindija Neanderthal (Variant Frequency ^a)	Denisovan (Variant Frequency ^a)	Codon→Amino Acid	SNP id	SNP Total Frequency (ALFA)	SIFT Results (Score)
PRB1 (reverse reading, chromosome 12)
11,507,477	Exon 2 (II-2)	CTT	CTT (100%)	TTT (13%)	TTT (7%) *	CTT (100%)	GAA→E₁₀ AAA→K₁₀	n.a.	n.a.	Damaging (0.02)
11,507,464	Exon 2 (II-2)	AGG	AGG (100%)	AGG (100%)	AAG (12%)	AGG (100%)	UCC→S₁₄ UUC→F₁₄	rs1173856027	A = 0%	Tolerated (0.72)
11,506,888	Exon 3 (II-2)	GGG	GGG (100%)	GGG (100%)	GAG (12%)	GGG (100%)	CCC→P₃₅ CUC→L₃₅	n.a.	n.a.	Tolerated (0.06)
11,506,856	Exon 3 (II-2)	GGG	GGG (100%)	AGG (11%)	GGG (100%)	GGG (100%)	CCC→P₄₅ UCC→S₄₅	rs762910991	A = 0.003%	Tolerated (0.17)
11,506,853	Exon 3 (II-2)	GGT	TGT (3%) *	GGT (100%)	AGT (15%)	GGT (100%)	CCA→P₄₆ UCA→S₄₆	rs745726339	A = 0%	Damaging (0)
11,506,852	Exon 3 (II-2)	GGT	GGT (100%)	GGT (100%)	GAT (11%)	GGT (100%)	CCA→P₄₆ CUA→L₄₆	n.a.	n.a.	Damaging (0)
11,506,804	Exon 3 (II-2)	GTT	GAT (61%)	GAT (63%)	GAT (60%)	GTT (100%)	CAA→Q₆₂ CUA→L₆₂	n.a.	n.a.	Tolerated (0.29)
11,506,801	Exon 3 (II-2)	CCT	CCT (100%)	CTT (11%)	CTT (5%) *	CCT (100%)	GGA→G₆₃ GAA→E₆₃	n.a.	n.a.	Damaging (0.01)
11,506,790	Exon 3 (II-2)	GTT	GTT (100%)	ATT (11%)	ATT (6%) *	GTT (100%)	CAA→Q₆₇ UAA→stop	rs1409612167	A = 0%	Damaging due to stop
11,506,784	Exon 3 (II-2)	CTG	CTG (100%)	CTG (100%)	TTG (13%)	CTG (100%)	GAC→D₆₉ AAC→N₆₉	rs554211998	T = 0%	Tolerated (0.95)
11,506,774	Exon 3 (II-2)	GCT	GTT (13%)	GTT (8%) *	GTT (6%) *	GTT (9%) *	CGA→R₇₂ CAA→Q₇₂	rs202083397	T = 10.6%	Tolerated (0.08)
11,506,766	Exon 3 (II-2)	GCT	GCT (100%)	GCT (100%)	ACT (12%)	GCT (100%)	CGA→R₇₅ UGA→stop	rs766131639	A = 0%	Damaging due to stop
11,506,730	Exon 3 (Ps-2)	GTT	GTT (100%)	ATT (16%)	GTT (100%)	GTT (100%)	CAA→Q₁₂ UAA→stop	n.a.	n.a.	Damaging due to stop
11,506,723	Exon 3 (Ps-2)	CCA	CCA (100%)	CTA (12%)	CTA (3%) *	CCA (100%)	GGU→G₁₄ GAU→D₁₄	rs534597111	T = 0%	NS
11,506,669	Exon 3 (Ps-2)	GGT	GTT (39%)	GTT (36%)	GTT (55%)	GTT (26%)	CCA→P₃₂ CAA→Q₃₂	rs772365043	C = 0%	NS
11,506,618	Exon 3 (Ps-2)	CCT	CCT (100%)	CTT (17%)	CTT (3%) *	CCT (100%)	GGA→G₄₉ GAA→E₄₉	n.a.	n.a.	NS
11,506,612	Exon 3 (Ps-2)	GGG	GGG (100%)	GAG (11%)	GGG (100%)	GGG (100%)	CCC→P₅₁ CUC→L₅₁	n.a.	n.a.	NS
11,506,577	Exon 3 (IB-6)	GGA	GGA (100%)	AGA (13%)	GGA (100%)	GGA (100%)	CCU→P₂ UCU→S₂	n.a.	n.a.	NS
11,506,514	Exon 3 (IB-6)	GGA	GGA (100%)	AGA (6%) *	AGA (11%)	GGA (100%)	CCU→P₂₃ UCU→S₂₃	n.a.	n.a.	NS
11,506,492	Exon 3 (IB-6)	GGT	GGT (100%)	GGT (100%)	GAT (13%)	GGT (100%)	CCA→P₃₀ CUA→L₃₀	n.a.	n.a.	NS
11,506,490	Exon 3 (IB-6)	GGG	AGG (5%) *	AGG (18%)	AGG (8%) *	GGG (100%)	CCC→P₃₁ UCC→S₃₁	n.a.	n.a.	NS
11,506,486	Exon 3 (IB-6)	GGT	GGT (100%)	GGT (100%)	GTT (18%)	GGT (100%)	CCA→P₃₂ CAA→Q₃₂	rs755622101	T = 1.3%	NS
11,506,473	Exon 3 (Ps-2)	TTC	TTG(100%)	TTG(83%) **	TTG(100%) **	TTG(75%) **	AAG→K₃₇ AAC→N₃₇	rs61930109	G = 72.1%	NS
11,506,403	Exon 3 (Ps-2)	AGG	GGG (50%) **	GGG (50%) **	AGG (100%) **	GGG (100%)	UCC→S₅₉ CCC→P₅₉	n.a.	n.a.	NS
11,506,370	Exon 3 (Ps-2)	GGG	GGG (100%)	GGG (100%)	AGG (21%)	GGG (100%)	CCC→P₇₀ UCC→S₇₀	rs774158904	A = 0%	NS
11,506,369	Exon 3 (Ps-2)	GGG	GGG (93%)	GGG (100%)	GAG (16%)	GGG (100%)	CCC→P₇₁ CUC→L₇₁	rs369001998	A = 0.007%	NS
11,506,339	Exon 3 (Ps-2)	GGG	GGG (97%)	GAG (5%) *	GAG (23%)	GGG (100%)	CCC→P₈₁ CUC→L₈₁	n.a.	n.a.	NS
11,506,333	Exon 3 (Ps-2)	GGA	GGA (100%)	GAA (5%) *	GAA (11%)	GGA (100%)	CCU→P₈₃ CUU→L₈₃	n.a.	n.a.	NS
11,506,309	Exon 3 (Ps-2)	GGT	GAT (4%) *	GAT (6%) *	GAT (17%)	GGT (100%)	CCA→P₉₁ CUU→L₉₁	n.a.	n.a.	Damaging (0.01)
11,506,303	Exon 3 (Ps-2)	GGT	GTT (3%) *	GTT (13%)	GGT (100%)	GGT (100%)	CCA→P₉₃ CAA→Q₉₃	rs201682460	T = 2.8%	Damaging (0)
11,506,301	Exon 3 (Ps-2)	GTT	ATT (4%) *	GTT (100%)	ATT (15%)	GTT (100%)	CAA→Q₉₄ UAA→stop	n.a.	n.a.	Damaging due to stop
11,506,285	Exon 3 (Ps-2)	GGA	GGA (100%)	GGA (100%)	GAA (14%)	GGA (100%)	CCU→P₉₉ CUU→L₉₉	n.a.	n.a.	Damaging (0.01)
11,506,283	Exon 3 (Ps-2)	GTT	GTT (100%)	ATT (14%)	ATT (13%)	GTT (100%)	CAA→Q₁₀₀ UAA→stop	n.a.	n.a.	Damaging due to stop
11,506,250	Exon 3 (Ps-2)	GGT	GGT (100%) **	GGT (100%)	AGT (14%)	GGT (100%)	CCA→P₁₁₁ UCA→S₁₁₁	n.a.	n.a.	Tolerated (0.08)
11,506,249	Exon 3 (Ps-2)	GGT	GGT (100%) **	GGT (100%)	GAT (13%)	GGT (100%)	CCA→P₁₁₁ CUA→L₁₁₁	rs1208300501	A = 0%	Tolerated (0.09)
11,506,246	Exon 3 (Ps-2)	GGG	GGG (100%) **	GAG (18%)	GGG (100%)	GGG (100%)	CCC→P₁₁₂ CUC→L₁₁₂	rs1303924609	A = 0%	Damaging (0.02)
11,506,241	Exon 3 (Ps-2)	GTT	GTT (100%) **	GTT (100%)	ATT (14%)	GTT (100%)	CAA→Q₁₁₄ UAA→stop	rs751826141	A = 0%	Damaging due to stop
11,506,217	Exon 3 (IB-6)	CGG	GGG (67%) **	GGG (17%) **	GGG (25%)	CGG (100%)	GCC→A₆₁ CCC→P₆₁	rs771648794	G = 0.04%	Tolerated (1)
11,506,154	Exon 3 (IB-6)	GGG	GGG (100%)	AGG (17%)	AGG (4%) *	GGG (100%)	CCC→P₈₂ UCC→S₈₂	n.a.	n.a.	Tolerated (0.15)
11,506,150	Exon 3 (IB-6)	GGT	GGT (100%)	GAT (14%)	GGT (100%)	GAT (6%) *	CCA→P₈₃ CUA→L₈₃	rs747444571	A = 0%	Damaging (0.03)
11,506,079	Exon 3 (IB-6)	GGA	GGA (100%)	GGA (100%)	AGA (13%)	GGA (100%)	CCU→P₁₀₇ UCU→S₁₀₇	n.a.	n.a.	Tolerated (0.06)
11,506,075	Exon 3 (IB-6)	GGA	GGA (100%)	GGA (100%)	GAA (13%)	GGA (100%)	CCU→P₁₀₈ CUU→L₁₀₈	n.a.	n.a.	Damaging (0.01)
11,506,070	Exon 3 (IB-6)	CCC	CCC (100%)	CCC (100%)	TCC (12%)	CCC (100%)	GGG→G₁₁₀ AGG→R₁₁₀	n.a.	n.a.	Tolerated (0.3)
11,506,057	Exon 3 (IB-6)	AGG	AGG (100%)	AAG (11%)	AAG (5%) *	AGG (100%)	UCC→S₁₁₄ UUC→F₁₁₄	n.a.	n.a.	Damaging (0.03)
11,506,052	Exon 3 (IB-6)	GGA	GGA (100%)	AGA (10%) *	AGA (18%)	GGA (100%)	CCU→P₁₁₆ UCU→S₁₁₆	rs1372423355	A = 0%	Tolerated (0.06)
PRB2 (reverse reading, chromosome 12)
11,548,429	Exon 1 (Signal)	CGG	CGG (100%)	CAG (3%) *	CAG (13%)	CGG (100%)	GCC→A_11(sp) GUC→V_11(sp)	rs1415819382	A = 0%	Damaging (0)
11,547,429	Exon 2 (IB-1)	CCT	TCT (4%) *	CCT (100%)	TCT (12%)	CCT (100%)	GGA→G₁₈ AGA→R₁₈	n.a.	n.a.	Damaging (0.2)
11,546,899	Exon 3 (IB-1)	CCT	CCT (100%)	CTT (11%)	CCT (100%)	CCT (100%)	GGA→G₂₂ GAA→E₂₂	rs188924826	T = 0.007%	Tolerated (0.1)
11,546,894	Exon 3 (IB-1)	GGG	GGG (100%)	AGG (14%)	GGG (100%)	GGG (100%)	CCC→P₂₄ UCC→S₂₄	n.a.	n.a.	Tolerated (0.73)
11,546,872	Exon 3 (IB-1)	GGA	GGA (100%)	GGA (100%)	GAA (11%)	GGA (100%)	CCU→P₃₁ CUU→L₃₁	rs748769813	A = 0%	Tolerated (0.46)
11,546,830	Exon 3 (IB-1)	GGG	GGG (100%)	GAG (9%) *	GAG (17%)	GGG (100%)	CCC→P₄₅ CUC→L₄₅	n.a.	n.a.	Tolerated (0.1)
11,546,828	Exon 3 (IB-1)	GGT	AGT (3%) *	GGT (100%)	AGT (17%)	GGT (100%)	CCA→P₄₆ UCA→S₄₆	rs755161117	A = 0.007%	Tolerated (0.36)
11,546,825	Exon 3 (IB-1)	GTT	GTT (97%)	GTT (100%)	ATT (17%)	GTT (100%)	CAA→Q₄₇ UAA→stop	n.a.	n.a.	Damaging due to stop
11,546,810	Exon 3 (IB-1)	GGA	GGA (100%)	GGA (100%)	AGA (13%)	GGA (100%)	CCU→P₅₂ UCU→S₅₂	rs1347881375	A = 0%	Tolerated (0.97)
11,546,809	Exon 3 (IB-1)	GGA	GGA (100%)	GAA (6%) *	GAA (12%)	GGA (100%)	CCU→P₅₂ CUU→L₅₂	n.a.	n.a.	Tolerated (0.3)
11,546,807	Exon 3 (IB-1)	GTT	GTT (97%)	ATT (11%)	ATT (11%)	GTT (100%)	CAA→Q₅₃ UAA→stop	n.a.	n.a.	Damaging due to stop
11,546,792	Exon 3 (IB-1)	GGA	GGA (100%)	AGA (18%)	GGA (100%)	GGA (100%)	CCU→P₅₈ UCU→S₅₈	n.a.	n.a.	Tolerated (0.76)
11,546,780	Exon 3 (IB-1)	GGT	GGT (100%)	GGT (100%)	AGT (12%)	GGT (100%)	CCA→P₆₂ UCA→S₆₂	n.a.	n.a.	Tolerated (0.64)
11,546,770	Exon 3 (IB-1)	GGT	GGT (100%)	GGT (100%)	GAT (13%)	GGT (100%)	CCA→P₆₅ CUA→L₆₅	n.a.	n.a.	Tolerated (1)
11,546,764	Exon 3 (IB-1)	GGT	GGT (100%)	GGT (96%)	GAT (12%)	GGT (100%)	CCA→P₆₇ CAA→Q₆₇	rs201994479	T = 0.008%	Tolerated (0.43)
11,546,732	Exon 3 (IB-1)	GGA	GGA (100%)	GGA (100%)	AGA (13%)	GGA (100%)	CCU→P₇₈ UCU→S₇₈	n.a.	n.a.	Tolerated (0.38)
11,546,716	Exon 3 (IB-1)	GTT	GAT (4%) *	GAT (14%)	GTT (97%)	GTT (100%)	CAA→Q₈₃ CUA→L₈₃	n.a.	n.a.	Tolerated (0.32)
11,546,686	Exon 3 (IB-1)	GCT	GTT (42%)	GTT (39%)	GTT (51%)	GTT (29%)	CGA→R₉₃ CAA→Q₉₃	rs76832300	n.a.	Tolerated (0.5)
11,546,677	Exon 3 (IB-1)	GCT	GCT (100%)	GCT (100%)	GCT (100%)	GTT (24%)	CGA→R₉₆ CAA→Q₉₆	rs201144571	T = 0.08%	Tolerated (0.47)
11,546,647	Exon 3 (P-J)	GGG	GGG (100%)	GGG (100%)	GAG (15%)	GGG (100%)	CCC→P₁₀ CUC→L₁₀	n.a.	n.a.	Tolerated (0.18)
11,546,642	Exon 3 (P-J)	GTT	GTT (100%)	GTT (100%)	ATT (17%)	GTT (100%)	CAA→Q₁₂ UAA→stop	n.a.	n.a.	Damaging due to stop
11,546,627	Exon 3 (P-J)	GGA	AGA (3%) *	AGA (11%)	AGA (5%) *	GGA (100%)	CCU→P₁₇ UCU→S₁₇	n.a.	n.a.	Tolerated (0.45)
11,546,618	Exon 3 (P-J)	GGA	GGA (100%)	GGA (93%)	AGA (17%)	GGA (100%)	CCU→P₂₀ UCU→S₂₀	n.a.	n.a.	Tolerated (0.81)
11,546,617	Exon 3 (P-J)	GGA	GGA (100%)	GGA (100%)	GAA (17%)	GGA (100%)	CCU→P₂₀ CUU→L₂₀	rs780517289	A = 0%	Tolerated (0.82)
11,546,615	Exon 3 (P-J)	GGT	GGT (100%)	AGT (12%)	AGT (8%) *	GGT (100%)	CCA→P₂₁ UCA→S₂₁	n.a.	n.a.	Tolerated (0.39)
11,546,614	Exon 3 (P-J)	GGT	GGT (100%)	GAT (11%)	GGT (100%)	GGT (100%)	CCA→P₂₁ CUA→L₂₁	n.a.	n.a.	Tolerated (0.29)
11,546,585	Exon 3 (P-J)	GGG	GGG (100%)	GGG (100%)	AGG (13%)	GGG (100%)	CCC→P₃₁ UCC→S₃₁	n.a.	n.a.	Tolerated (0.53)
11,546,581	Exon 3 (P-J)	GGT	GTT (6%) *	GTT (13%)	GGT (100%)	GGT (100%)	CCA→ P₃₂ CAA→Q₃₂	n.a.	n.a.	Damaging (0.05)
11,546,566	Exon 3 (P-J)	TTT	TCT (8%) *	TCT (12%)	TTT (100%)	TTT (100%)	AAA→K₃₇ AGA→R₃₇	rs746515947	C = 0%	Tolerated (1)
11,546,462	Exon 3 (IB-8a)	GGG	GGG (100%)	AGG (13%)	GGG (100%)	GGG (100%)	CCC→P₉ UCC→S₉	rs201392419	A = 0%	Tolerated (0.58)
11,546,395	Exon 3 (IB-8a)	GGT	GTT (16%)	GTT (10%) *	GTT (13%)	GTT (4%) *	CCA→P₃₁ CAA→Q₃₁	rs11054277	T = 0.01%	Damaging (0)
11,546,380	Exon 3 (IB-8a)	TTT	TCT (17%)	TCT (14%)	TCT (6%) *	TTT (100%)	AAA→K₃₇ AGA→R₃₇	rs11054276	C = 0.01%	Tolerated (1)
11,546,381	Exon 3 (IB-8a)	TTT	TTT (100%)	CTT (100%)	TTT (100%)	GTT (13%)	AAA→K₃₇ CAA→Q₃₇	rs201455726	G = 0.2%	Tolerated (0.42)
11,546,369	Exon 3 (IB-8a)	GGG	GGG (100%)	AGG (12%)	GGG (100%)	GGG (100%)	CCC→P₄₁ UCC→S₄₁	rs1238238576	A = 0%	Tolerated (0.42)
11,546,347	Exon 3 (IB-8a)	GTT	GAT (6%) *	GAT (4%) *	GAT (15%)	GTT (100%)	CAA→Q₄₈ CUA→L₄₈	n.a.	n.a.	Tolerated (0.32)
11,546,342	Exon 3 (IB-8a)	GGT	GGT (100%)	GGT (100%)	AGT (18%)	GGT (100%)	CCA→P₅₀ UCA→S₅₀	n.a.	n.a.	Tolerated (0.41)
11,546,327	Exon 3 (IB-8a)	CTG	CTG (100%)	TTG (11%)	TTG (18%)	CTG (100%)	GAC→D₅₅ AAC→N₅₅	n.a.	n.a.	Tolerated (0.28)
11,546,314	Exon 3 (IB-8a)	GTT	GCT (87%)	GCT (77%)	GCT (67%)	GCT (94%)	CAA→Q₅₉ CGA→R₅₉	rs34305575	C = 7.6%	Tolerated (0.35)
11,546,309	Exon 3 (IB-8a)	CGG	GGG (12%)	GGG (13%)	GGG (18%)	GGG (5%) *	GCC→A₆₁ CCC→P₆₁	rs201308939	G = 3.8%	Tolerated (0.25)
11,546,305	Exon 3 (IB-8a)	GCT	GTT (3%) *	GCT (100%)	GTT (11%)	GCT (100%)	CGA→R₆₂ CAA→Q₆₂	rs199748368	T = 0.07%	Tolerated (0.46)
11,546,300	Exon 3 (IB-8a)	GGA	GGA (100%)	AGA (13%)	GGA (100%)	GGA (100%)	CCU→P₆₄ UCU→S₆₄	rs755713521	n.a.	Tolerated (0.66)
11,546,294	Exon 3 (IB-8a)	CCT	CCT (100%)	TCT (13%)	CCT (100%)	CCT (100%)	GGA→G₆₆ AGA→R₆₆	n.a.	n.a.	Damaging (0.03)
11,546,279	Exon 3 (IB-8a)	GGT	AGT (2%) *	GGT (100%)	AGT (13%)	GGT (100%)	CCA→P₇₁ UCA→S₇₁	n.a.	n.a.	Tolerated (0.67)
11,546,278	Exon 3 (IB-8a)	GGT	GAT (2%) *	GGT (100%)	GAT (13%)	GGT (100%)	CCA→P₇₁ CUA→L₇₁	rs766408532	n.a.	Tolerated (0.26)
11,546,246	Exon 3 (IB-8a)	GGG	GGG (100%)	GGG (100%)	AGG (14%)	GGG (100%)	CCC→P₈₂ UCC→S₈₂	rs1440556057	A = 0.0004%	Tolerated (0.42)
11,546,245	Exon 3 (IB-8a)	GGG	GGG (97%)	GAG (7%) *	GAG (26%)	GAG (7%) *	CCC→P₈₂ CUC→L₈₂	rs1262267049	A = 0.0004%	Tolerated (0.15)
11,546,213	Exon 3 (IB-8a)	GGG	GGG (100%)	AGG (8%) *	AGG (25%)	GGG (100%)	CCC→P₉₃ UCC→S₉₃	rs1408969762	n.a.	Tolerated (0.26)
11,546,187	Exon 3 (IB-8a)	GTT	GTT (96%)	GTC (10%) *	GTC (12%)	GTC (4%) *	CAA→Q₁₀₁ CAC→H₁₀₁	n.a.	n.a.	Tolerated (0.23)
11,546,161	Exon 3 (IB-8a)	GTT	GAT (21%)	GTT (100%)	GAT (30%)	GTT (100%)	CAA→Q₁₁₀ CUA→L₁₁₀	n.a.	n.a.	Tolerated (0.61)
11,546,089	Exon 3 (P-F)	GGG	GGG (100%)	GAG (17%) **	GAG (17%)	GGG (100%)	CCC→P₁₀ CUC→L₁₀	n.a.	n.a.	Tolerated (0.61)
11,546,084	Exon 3 (P-F)	GTT	GTT (100%)	GTT (100%)	ATT (15%)	GTT (100%)	CAA→Q₁₂ UAA→stop	n.a.	n.a.	Damaging due to stop
11,546,059	Exon 3 (P-F)	GGG	GGG (100%)	GAG (7%) *	GAG (21%)	GGG (100%)	CCC→P₂₀ CUC→L₂₀	n.a.	n.a.	Tolerated (0.19)
11,546,050	Exon 3 (P-F)	GGA	GTA (4%) *	GTA (13%)	GGA (100%)	GTA (7%) *	CCU→P₂₃ CAU→H₂₃	n.a.	n.a.	Tolerated (0.56)
11,546,027	Exon 3 (P-F)	GGG	GGG (100%)	AGG (11%)	AGG (7%) *	GGG (100%)	CCC→P₃₁ UCC→S₃₁	rs1201001162	n.a.	Tolerated (0.61)
11,546,023	Exon 3 (P-F)	GGT	GGT (100%)	GTT (5%) *	GTT (13%)	GTT (4%) *	CCA→P₃₂ CAA→Q₃₂	rs201391404	T = 0.059%	Damaging (0.03)
11,546,009	Exon 3 (P-F)	TTT	TTT (100%)	TTT (100%)	TTT (95%)	GTT (12%)	AAA→K₃₇ CAA→ Q₃₇	n.a.	n.a.	Tolerated (0.26)
11,545,975	Exon 3 (P-F)	GTT	GAT (2%) *	GAT (16%)	GAT (33%)	GTT (100%)	CAA→Q₄₈ CUA→L₄₈	n.a.	n.a.	Tolerated (0.31)
11,545,964	Exon 3 (P-F)	GGT	GGT (100%)	CGT (20%)	CGT (22%)	CGT (19%)	CCA→P₅₁ GCA→A₅₁	n.a.	n.a.	Tolerated (0.74)
11,545,904	Exon 3 (P-H)	GGG	GGG (100%)	AGG (3%) *	AGG (11%)	GGG (100%)	CCC→P₁₀ UCC→S₁₀	n.a.	n.a.	Tolerated (0.8)
11,545,868	Exon 3 (P-H)	GGA	GGA (100%)	GGA (100%)	AGA (13%)	GGA (100%)	CCU→P₂₂ UCU→S₂₂	n.a.	n.a.	Tolerated (0.69)
11,545,814	Exon 3 (P-H)	GTC	GTC (100%)	ATC (4%) *	ATC (12%)	GTC (100%)	CAG→Q₄₀ UAG→stop	n.a.	n.a.	Damaging due to stop
11,545,802	Exon 3 (P-H)	GCG	GCG (100%)	GCG (100%)	ACG (11%)	GCG (100%)	CGC→R₄₄ UGC→C₄₄	rs748815572	A = 0%	Tolerated (0.07)
11,545,793	Exon 3 (P-H)	GTT	GTT (100%)	ATT (12%)	GTT (100%)	GTT (100%)	CAA→Q₄₇ UAA→stop	n.a.	n.a.	Damaging due to stop
11,545,790	Exon 3 (P-H)	CCC	CCC (100%)	CCC (100%)	TCC (13%)	CCC (100%)	GGG→G₄₈ AGG→R₄₈	n.a.	n.a.	Tolerated (0.7)
PRB3 (reverse reading, chromosome 12)
11,422,578	Exon 1 (Signal)	CGG	CGG (100%)	CAG (14%)	CAG (3%) *	CGG (100%)	GCC→A_8(sp) GUC→V_8(sp)	rs1337927316	n.a.	Tolerated (0.06)
11,421,578	Exon 2 (Gl-5)	AGG	AGG (100%)	AAG (11%)	AAG (11%)	AGG (100%)	UCC→S₁₄ UUC→F₁₄	n.a.	n.a.	Tolerated (0.32)
11,421,002	Exon 3 (Gl-5)	GGG	GGG (100%)	AGG (11%)	AGG (4%) *	GGG (100%)	CCC→P₄₅ UCC→S₄₅	rs533382585	n.a.	Damaging (0.04)
11,420,989	Exon 3 (Gl-5)	CCG	CCG (100%)	CTG (14%)	CTG (5%) *	CCG (96%)	GGC→G₄₉ GAC→D₄₉	n.a.	n.a.	Damaging (0)
11,420,975	Exon 3 (Gl-5)	CCA	TCA (2%) *	TCA (17%)	CCA (100%)	CCA (100%)	GGU→G₅₄ AGU→S₅₄	rs1197023343	n.a.	Tolerated (0.12)
11,420,974	Exon 3 (Gl-5)	CCA	CCA (100%)	CTA (8%) *	CTA (21%)	CCA (100%)	GGU→G₅₄ GAU→D₅₄	n.a.	n.a.	Tolerated (0.19)
11,420,971	Exon 3 (Gl-5)	GGG	GGG (100%)	GGG (100%)	GAG (11%)	GGG (100%)	CCC→P₅₅ CUC→L₅₅	n.a.	n.a.	Damaging (0.02)
11,420,956	Exon 3 (Gl-5)	CCT	CCT (98%)	CCT (100%)	CTT (14%)	CCT (100%)	GGA→G₆₀ GAA→E₆₀	rs745804122	T = 0%	Tolerated (0.06)
11,420,945	Exon 3 (Gl-5)	CCT	CCT (100%)	CCT (100%)	TCT (14%)	TCT (4%) *	GGA→G₆₄ AGA→R₆₄	rs781151188	T = 0%	Damaging (0.02)
11,420,939	Exon 3 (Gl-5)	GGG	GGG (100%)	AGG (11%) **	AGG (11%)	GGG (100%)	CCC→P₆₆ UCC→S₆₆	n.a.	n.a.	Damaging (0.04)
11,420,927	Exon 3 (Gl-5)	CCT	CCT (100%)	CCT (100%)	TCT (11%)	CCT (100%)	GGA→G₇₀ AGA→R₇₀	n.a.	n.a.	Damaging (0)
11,420,926	Exon 3 (Gl-5)	CCT	CCT (100%)	CCT (100%)	CTT (16%)	CCT (100%)	GGA→G₇₀ GAA→E₇₀	n.a.	n.a.	Damaging (0)
11,420,906	Exon 3 (Gl-5)	GGT	GGT (100%)	GGT (100%)	AGT (12%)	GGT (100%)	CCA→P₇₇ UCA→S₇₇	n.a.	n.a.	Damaging (0.04)
11,420,899	Exon 3 (Gl-5)	GCA	GTA (73%)	GCA (100%)	GTA (65%)	GTA (80%)	CGU→R₇₉ CAU→H₇₉	rs769836435	T = 0.02%	Tolerated (0.59)
11,420,896	Exon 3 (Gl-5)	GGC	GGC (100%)	GGC (100%)	GAC (13%)	GGC (100%)	CCG→P₈₀ CUG→L₈₀	n.a.	n.a.	Tolerated (0.09)
11,420,836	Exon 3 (Gl-5)	GCA	GTA (7%) *	GTA (5%) *	GTA (9%) *	GTA (22%)	CGU→R₁₀₀ CAU→H₁₀₀	n.a.	n.a.	Tolerated (0.24)
11,420,815	Exon 3 (Gl-5)	GGT	GTT (18%)	GGT (100%)	GGT (96%)	GGT (100%)	CCA→P₁₀₇ CAA→Q₁₀₇	rs201963893	T = 0%	Tolerated (0.45)
11,420,803	Exon 3 (Gl-5)	CCT	CCT (100%)	CCT (100%)	CTT (15%)	CCT (100%)	GGA→G₁₁₁ GAA→E₁₁₁	n.a.	n.a.	Tolerated (0.41)
11,420,800	Exon 3 (Gl-5)	CCT	CCT (97%)	CCT (100%)	CTT (11%)	CCT (100%)	GGA→G₁₁₂ GAA→E₁₁₂	n.a.	n.a.	Damaging (0.01)
11,420,780	Exon 3 (Gl-5)	GGC	GGC (100%)	AGC (11%)	GGC (100%)	GGC (100%)	CCG→P₁₁₉ UCG→S₁₁₉	n.a.	n.a.	Damaging (0.04)
11,420,779	Exon 3 (Gl-5)	GGC	GAC (4%) *	GAC (6%) *	GAC (35%)	GGC (100%)	CCG→P₁₁₉ CUG→L₁₁₉	n.a.	n.a.	Damaging (0.03)
11,420,728	Exon 3 (Gl-5)	AGG	AAG (4%) *	AGG (100%)	AAG (11%)	AGG (100%)	UCC→S₁₃₆ UUC→F₁₃₆	n.a.	n.a.	Damaging (0.04)
11,420,716	Exon 3 (Gl-5)	GGC	GAC (4%) *	GGC (100%)	GAC (17%)	GGC (100%)	CCG→P₁₄₀ CUG→L₁₄₀	n.a.	n.a.	Tolerated (0.12)
11,420,687	Exon 3 (Gl-5)	GGG	GGG (98%)	AGG (15%)	GGG (100%)	GGG (100%)	CCC→P₁₅₀ UCC→S₁₅₀	n.a.	n.a.	Tolerated (0.15)
11,420,686	Exon 3 (Gl-5)	GGG	GGG (98%)	GAG (8%) *	GAG (18%)	GGG (100%)	CCC→P₁₅₀ CUC→L₁₅₀	n.a.	n.a.	Tolerated (0.15)
11,420,614	Exon 3 (Gl-2)	CCT	CCT (100%)	CCT (100%)	CTT (11%)	CCT (100%)	GGA→G₁₃₂ GAA→E₁₃₂	rs768625455	n.a.	NS
11,420,597	Exon 3 (Gl-2)	CCA	CCA (100%)	CCA (100%)	TCA (13%)	CCA (100%)	GGU→G₁₃₈ AGU→S₁₃₈	rs780713977	n.a.	Tolerated (0.09)
11,420,588	Exon 3 (Gl-2)	GGA	AGA (4%) *	AGA (10%) *	AGA (16%)	GGA (100%)	CCU→P₁₄₁ UCU→S₁₄₁	n.a.	n.a.	Tolerated (0.78)
11,420,495	Exon 3 (Gl-2)	GGT	AGT (12%)	AGT (3%) *	AGT (6%) *	AGT (14%)	CCA→P₁₇₂ UCA→S₁₇₂	n.a.	n.a.	Tolerated (0.14)
11,420,308	Exon 4 (Gl-2)	GGG	GGG (100%)	AGG (17%)	GGG (100%)	GGG (100%)	CCC→P₂₃₄ UCC→S₂₃₄	rs760324380	A = 0.0008%	Tolerated (0.09)
11,420,307	Exon 4 (Gl-2)	GGG	GGG (100%)	GAG (12%)	GGG (100%)	GGG (100%)	CCC→P₂₃₄ CUC→L₂₃₄	n.a.	n.a.	Damaging (0.03)
11,420,304	Exon 4 (Gl-2)	GGT	GGT (100%)	GAT (12%)	GGT (100%)	GGT (100%)	CCA→P₂₃₅ CUA→L₂₃₅	n.a.	n.a.	Damaging (0.01)
11,420,281	Exon 4 (Gl-2)	GCA	GCA (100%)	ACA (13%)	ACA (10%) *	GCA (100%)	CGU→R₂₄₃ UGU→C₂₄₃	rs758570507	A = 0%	Damaging (0.05)
11,420,278	Exon 4 (Gl-2)	GGG	GGG (100%)	GGG (100%)	AGG (11%)	GGG (100%)	CCC→P₂₄₄ UCC→S₂₄₄	n.a.	n.a.	Tolerated (0.27)
11,420,182	Exon 4 (Gl-2)	GGT	GGT (100%)	GGT (100%)	AGT (11%)	GGT (100%)	CCA→P₂₇₇ UCA→S₂₇₇	rs755939114	A = 0%	Tolerated (0.06)
11,420,170	Exon 4 (Gl-2)	CCC	CCC (100%)	CCC (100%)	TCC (11%)	CCC (100%)	GGG→G₂₈₀ AGG→R₂₈₀	n.a.	n.a.	Tolerated (0.07)
11,420,161	Exon 4 (Gl-2)	GGT	GGT (100%)	GGT (100%)	AGT (13%)	GGT (100%)	CCA→P₂₈₃ UCA→S₂₈₃	n.a.	n.a.	Tolerated (0.21)
11,420,160	Exon 4 (Gl-2)	GGT	GGT (100%)	GGT (100%)	GAT (19%)	GGT (100%)	CCA→P₂₈₃ CUA→L₂₈₃	n.a.	n.a.	Tolerated (0.09)
11,420,154	Exon 4 (Gl-2)	TCT	TTT (3%) *	TCT (100%)	TTT (11%)	TCT (100%)	AGA→R₂₈₅ AAA→K₂₈₅	n.a.	n.a.	Tolerated (0.63)
PRB4 (reverse reading, chromosome 12)
11,463,280	Exon 1 (PGA)	TCA	TGA (100%)	TGA (100%)	TGA (97%)	TGA (100%)	AGU→S₂ ACU→T₂	n.a.	n.a.	Tolerated (0.83)
11,461,801	Exon 3 (PGA)	GCT	GCT (98%)	GCT (97%)	GTT (13%)	GCT (100%)	CGA→R₂₃ CAA→Q₂₃	n.a.	n.a.	Tolerated (0.57)
11,461,772	Exon 3 (PGA)	GCA	GCA (100%)	GCA (96%)	ACA (12%)	GCA (100%)	CGU→R₃₃ UGU→C₃₃	rs77775235	A = 0%	Tolerated (0.06)
11,461,769	Exon 3 (PGA)	GGG	TGG (5%) *	TGG (9%) *	TGG (5%) *	TGG (13%)	CCC→P₃₄ ACC→T₃₄	rs144658455	T = 0%	Tolerated (0.53)
11,461,745	Exon 3 (PGA)	GTT	CTT (8%) *	CTT (8%) *	CTT (5%) *	CTT (12%)	CAA→Q₄₂ GAA→E₄₂	rs76859544	C = 6.8%	Tolerated (1)
11,461,742	Exon 3 (PGA)	CCT	TCT (10%) *	TCT (27%)	TCT (11%)	TCT (7%) *	GGA→G₄₃ AGA→R₄₃	rs776943151	T = 0.05%	Tolerated (0.45)
11,461,706	Exon 3 (PGA)	GGG	TGG (14%)	TGG (23%)	TGG (13%)	TGG (20%)	CCC→P₅₅ ACC→T₅₅	rs12308381	T = 21.6%	Tolerated (0.12)
11,461,675	Exon 3 (PGA)	GCT	GGT (1%) *	GGT (2%) *	GGT (2%) *	GGT (28%)	CGA→R₆₅ CCA→P₆₅	rs75743553	G = 0%	Tolerated (0.32)
11,461,673	Exon 3 (PGA)	GGG	GGG (99%)	AGG (13%)	AGG (2%) *	GGG (100%)	CCC→P₆₆ UCC→S₆₆	rs1332850459	A = 0%	Tolerated (0.25)
11,461,580	Exon 3 (PGA)	TGG	GGG (65%)	GGG (52%)	GGG (24%)	GGG (54%)	ACC→T₉₇ CCC→P₉₇	n.a.	n.a.	Tolerated (0.81)
11,461,570	Exon 3 (PGA)	GGA	GTA (51%)	GTA (54%)	GTA (8%) *	GTA (47%)	CCU→P₁₀₀ CAU→H₁₀₀	n.a.	n.a.	Tolerated (0.59)
11,461,553	Exon 3 (PGA)	TCT	CCT (13%)	CCT (15%)	TCT (100%)	CCT (24%)	AGA→R₁₀₆ GGA→G₁₀₆	n.a.	n.a.	Tolerated (0.84)
11,461,550	Exon 3 (PGA)	GGT	GGT (100%)	AGT (17%)	GGT (100%)	GGT (100%)	CCA→P₁₀₇ UCA→S₁₀₇	n.a.	n.a.	Tolerated (0.50)
11,461,549	Exon 3 (PGA)	GGT	GCT (13%)	GCT (6%) *	GGT (100%)	GCT (13%)	CCA→P₁₀₇ CGA→R₁₀₇	n.a.	n.a.	Tolerated (0.9)
11,461,525	Exon 3 (PGA)	AGG	AGG (100%)	AAG (100%)	AAG (100%)	AGG (100%)	UCC→S₁₁₅ UUC→F₁₁₅	n.a.	n.a.	Damaging (0.04)
11,461,513	Exon 3 (PGA)	GGT	GGT (100%)	GAT (10%) *	GAT (11%)	GGT (100%)	CCA→_P119 CUA→L₁₁₉	n.a.	n.a.	Damaging (0.04)
11,461,471	Exon 3 (PGA)	CCA	CCA (100%)	CTA (4%) *	CTA (14%)	CCA (100%)	GGU→G₁₃₃ GAU→D₁₃₃	n.a.	n.a.	Tolerated (0.46)
11,461,421	Exon 3 (PGA)	GGG	GGG (100%)	AGG (5%) *	AGG (6%) *	AGG (100%)	CCC→P₁₅₀ UCC→S₁₅₀	n.a.	n.a.	Tolerated (0.18)
11,461,420	Exon 3 (PGA)	GGG	GGG (100%)	GAG (11%)	GGG (100%)	GGG (100%)	CCC→P₁₅₀ CUC→L₁₅₀	n.a.	n.a.	Tolerated (0.1)
11,461,412	Exon 3 (PGA)	CTT	CTT (100%)	TTT (14%)	CTT (100%)	CTT (100%)	GAA→E₁₅₃ AAA→K₁₅₃	n.a.	n.a.	Tolerated (0.85)
11,461,319	Exon 4 (P-D P32A)	GGA	GGA (97%)	AGA (9%) *	AGA (11%)	GGA (100%)	CCU→P₂₃ UCU→S₂₃	n.a.	n.a.	Tolerated (0.55)
11,461,309	Exon 4 (P-D P32A)	GGT	GGT (100%)	GGT (100%)	GAT (11%)	GGT (100%)	CCA→P₂₆ CUA→L₂₆	n.a.	n.a.	Damaging (0.01)
11,461,229	Exon 4 (P-D P32A)	GGA	GGA (100%)	AGA (13%)	AGA (4%) *	GGA (100%)	CCU→P₅₄ UCU→S₅₄	n.a.	n.a.	Tolerated (0.13)

^a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.

Table 2. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRH2, HTN1, HTN3, AMY1A, STATH, and SMR3B gene loci.

Chromosome Position (hg19)	Gene Region	Modern Human	Altai Neanderthal (Variant Frequency ^a)	Chagyrskaya Neanderthal (Variant Frequency ^a)	Vindija Neanderthal (Variant Frequency ^a)	Denisovan (Variant Frequency ^a)	Codon→Amino Acid	SNP id	SNP Total Frequency (ALFA)	SIFT Results (Score)
PRH2 (direct reading, chromosome 12)
11,082,885	Exon 2 (PRP-1)	GTT	ATT (2%) *	ATT (12%)	ATT (4%) *	GTT (100%)	GUU→V₁₂ AUU→I₁₂	rs776898585	A = 0%	N.S
11,082,894	Exon 2 (PRP-1)	GTA	GTA (100%)	ATA (12%)	ATA (10%) *	GTA (100%)	GUA→V₁₅ AUA→I₁₅	n.a.	n.a.	Tolerated (0.26)
11,083,305	Exon 3 (PRP-1)	CCA	CCA (98%)	TCA (14%)	TCA (14%)	CCA (100%)	CCA→P₃₃ UCA→S₃₃	n.a.	n.a.	Tolerated (0.07)
11,083,318	Exon 3 (PRP-1)	GGA	GGA (100%)	GAA (14%)	GGA (100%)	GGA (100%)	GGA→G₃₇ GAA→E₃₇	n.a.	n.a.	Tolerated (0.07)
11,083,323	Exon 3 (PRP-1)	CAA	CAA (100%)	TAA (8%) *	TAA (12%)	CAA (100%)	CAA→Q₃₉ UAA→stop	n.a.	n.a.	Damaging due to stop
11,083,426	Exon 3 (PRP-1)	GGA	GGA (100%)	GGA (100%)	GAA (11%)	GGA (100%)	GGA→G₇₃ GAA→E₇₃	n.a.	n.a.	Damaging (0.02)
11,083,431	Exon 3 (PRP-1)	CCA	CCA (100%)	TCA (13%)	TCA (8%) *	TCA (6%) *	CCA→P₇₅ UCA→S₇₅	n.a.	n.a.	Tolerated (0.23)
11,083,452	Exon 3 (PRP-1)	GGA	GGA (100%)	AGA (6%) *	AGA (14%)	GGA (100%)	GGA→G₈₂ AGA→R₈₂	n.a.	n.a.	Damaging (0.01)
11,083,455	Exon 3 (PRP-1)	GGC	GGC (100%)	AGC (17%)	GGC (100%)	GGC (100%)	GGC→G₈₃ AGC→S₈₃	n.a.	n.a.	N.S.
11,083,488	Exon 3 (PRP-1)	GGA	GGA (100%)	GGA (100%)	AGA (11%)	GGA (100%)	GGA→G₉₄ AGA→R₉₄	n.a.	n.a.	Damaging (0.04)
11,083,531	Exon 3 (PRP-1)	AGG	AGG (100%)	AGG (100%)	AAG (18%)	AGG (100%)	AGG→R₁₀₈ AAG→K₁₀₈	n.a.	n.a.	N.S.
11,083,536	Exon 3 (PRP-1)	CAA	CAA (100%)	TAA (11%)	CAA (100%)	CAA (100%)	CAA→Q₁₁₀ UAA→stop	n.a.	n.a.	N.S.
11,083,545	Exon 3 (PRP-1)	CCC	CCC (100%)	TCC (12%)	TCC (6%) *	CCC (100%)	CCC→P₁₁₃ UCC→S₁₁₃	rs1289206423	T = 0%	N.S.
11,083,551	Exon 3 (PRP-1)	CAG	CAG (97%)	CAG (100%)	TAG (13%)	CAG (100%)	CAG→Q₁₁₅ UAG→stop	n.a.	n.a.	N.S.
11,083,570	Exon 3 (PRP-1)	GGT	GGT (100%)	GAT (18%)	GGT (100%)	GGT (100%)	GGU→G₁₂₁ GAU→D₁₂₁	n.a.	n.a.	N.S.
11,083,575	Exon 3 (PRP-1)	CCC	CCC (96%)	TCC (8%) *	TCC (15%)	CCC (100%)	CCC→P₁₂₃ UCC→S₁₂₃	n.a.	n.a.	N.S.
11,083,581	Exon 3 (PRP-1)	CCT	CCT (100%)	TCT (20%)	TCT (8%) *	CCT (100%)	CCU→P₁₂₅ UCU→S₁₂₅	n.a.	n.a.	N.S.
11,083,582	Exon 3 (PRP-1)	CCT	CCT (100%)	CTT (13%)	CTT (8%) *	CCT (100%)	CCU→P₁₂₅ CUU→L₁₂₅	n.a.	n.a.	N.S.
11,083,605	Exon 3 (PRP-1)	CCA	CCA (100%)	TCA (11%)	CCA (100%)	CCA (100%)	CCA→P₁₃₃ UCA→S₁₃₃	rs1343870622	T = 0%	N.S.
11,083,618	Exon 3 (PRP-1)	GGG	GGG (100%)	GAG (11%)	GGG (100%)	GGG (100%)	GGG→G₁₃₇ GAG→E₁₃₇	n.a.	n.a.	N.S.
11,083,635	Exon 3 (PRP-1)	CCT	CCT (100%)	CCT (100%)	TCT (16%)	CCT (100%)	CCU→P₁₄₃ UCU→S₁₄₃	n.a.	n.a.	N.S.
11,083,636	Exon 3 (PRP-1)	CCT	CCT (100%)	CCT (100%)	CTT (11%)	CCT (100%)	CCU→P₁₄₃ CUU→L₁₄₃	n.a.	n.a.	N.S.
11,083,663	Exon 3 (C-term removal)	TCT	TCT (100%)	TCT (100%)	TTT (17%)	TCT (100%)	UCU→S_152(rem) UUU→F_152(rem)	rs746351335	n.a.	N.S.
HTN1 (direct reading, chromosome 4)
70,920,165	Exon 4	CAT	CAT (100%)	TAT (2%) *	TAT (13%)	CAT (100%)	CAU→H₁₅ UAU→Y₁₅	n.a.	n.a.	Tolerated (0.37)
70,921,215	Exon 5	GAA	GAA (100%)	AAA (3%) *	AAA (11%)	GAA (100%)	GAA→E₁₆ AAA→K₁₆	n.a.	n.a.	N.S
70,921,234	Exon 5	CGA	CAA (2%) *	CAA (58%)	CAA (3%) *	CGA (100%)	CGA→R₃₂ CAA→Q₃₂	rs375127098	A = 0.014%	N.S
HTN3 (direct reading, chromosome 4)
70,896,460	Exon 2 (Signal)	ATG	ATG (100%)	ATA (11%)	ATG (100%)	ATG (100%)	AUG→M_0(sp) AUA→I_0(sp)	n.a.	n.a.	N.S
70,897,696	Exon 3 (Signal)	GGA	GGA (100%)	AGA (12%)	AGA (4%) *	GGA (100%)	GGA→G_17(sp) AGA→R_17(sp)	rs1254624179	n.a.	N.S
AMY1A (reverse reading, chromosome 1)
104,238,248	Exon 2 (Signal)	ACC	ACC (100%)	ACC (100%)	ATC (15%)	ACC (100%)	UGG→W_4(sp) UAG→stop	n.a.	n.a.	Damaging due to stop
104,238,189	Exon 2	GCT	GCT (100%)	ACT (13%)	ACT (20%) **	GCT (100%)	CGA→R₁₀ UGA→stop	n.a.	n.a.	Damaging due to stop
104,237,696	Exon 3	ACC	ACC (100%)	ACC (100%)	ATC (17%)	ACC (100%)	UGG→W₅₉ UAG→stop	n.a.	n.a.	Damaging due to stop
104,237,685	Exon 3	GTT	GTT (100%)	GTT (100%)	ATT (14%)	GTT (100%)	CAA→Q₆₃ UAA→stop	n.a.	n.a.	Damaging due to stop
104,237,626	Exon 3	TAC	TAC (100%)	TAC (100%)	TAT (15%)	TAC (100%)	AUG→M₈₂ AUA→I₈₂	n.a.	n.a.	Damaging (0.01)
104,236,795	Exon 4	GCA	GCA (100%)	GCA (100%)	ACA (13%)	GCA (100%)	CGU→R₉₂ UGU→C₉₂	n.a.	n.a.	Damaging (0)
104,236,666	Exon 4	CTA	CTA (100%)	CTA (100%)	TTA (11%)	CTA (100%)	GAU→D₁₃₅ AAU→N₁₃₅	n.a.	n.a.	Tolerated (0.08)
104,236,654	Exon 4	CCA	CCA (100%)	TCA (5%) *	TCA (11%)	CCA (100%)	GGU→G₁₃₉ AGU→S₁₃₉	n.a.	n.a.	Tolerated (0.6)
104,236,152	Exon 5	CAG	CAG (100%)	TAG (15%)	TAG (20%)	CAG (100%)	GUC→V₁₅₇ AUC→I₁₅₇	n.a.	n.a.	Tolerated (0.17)
104,236,146	Exon 5	CTA	CTA (100%)	TTA (8%) *	TTA (12%)	CTA (100%)	GAU→D₁₅₉ AAU→N₁₅₉	n.a.	n.a.	Tolerated (1)
104,236,139	Exon 5	GCA	GTA (4%) *	GTA (7%) *	GTA (12%)	GCA (100%)	CGU→R₁₆₁ CAU→H₁₆₁	n.a.	n.a.	Damaging (0.01)
104,236,080	Exon 5	CTT	CTT (100%)	CTT (100%)	TTT (13%)	CTT (100%)	GAA→E₁₈₁ AAA→K₁₈₁	n.a.	n.a.	Tolerated (0.11)
104,235,996	Exon 5	CGT	CGT (96%)	CGT (100%)	TGT (13%)	CGT (100%)	GCA→A₂₀₉ ACA→T₂₀₉	n.a.	n.a.	Tolerated (0.27)
104,235,164	Exon 6	CTC	CTC (100%)	CTC (100%)	TTC (11%)	CTC (100%)	GAG→E₂₄₀ AAG→K₂₄₀	n.a.	n.a.	Damaging (0.01)
104,235,148	Exon 6	TCA	TCA (100%)	TCA (100%)	TTA (18%)	TCA (100%)	AGU→S₂₄₅ AAU→N₂₄₅	n.a.	n.a.	Tolerated (0.52)
104,235,083	Exon 6	GCG	ACG (3%) *	ACG (6%) *	ACG (12%)	GCG (100%)	CGC→R₂₆₇ UGC→C₂₆₇	n.a.	n.a.	Damaging (0)
104,234,224	Exon 7	CCT	CCT (100%)	CCT (100%)	CTT (13%)	CCT (100%)	GGA→G₂₈₁ GAA→E₂₈₁	n.a.	n.a.	Damaging (0)
104,234,218	Exon 7	CCA	CCA (100%)	CTA (13%)	CTA (15%)	CCA (100%)	GGU→G₂₈₃ GAU→D₂₈₃	n.a.	n.a.	Tolerated (0.25)
104,234,129	Exon 7	GAA	GAA (100%)	AAA (13%)	GAA (100%)	GAA (100%)	CUU→L₃₁₃ UUU→F₃₁₃	n.a.	n.a.	Damaging (0)
104,234,125	Exon 7	TGG	TGG (100%)	TAG (17%)	TGG (100%)	TGG (100%)	ACC→T₃₁₄ AUC→I₃₁₄	n.a.	n.a.	Damaging (0)
104,233,978	Exon 8	GGA	GGA (100%)	AGA (13%)	AGA (11%)	GGA (100%)	CCU→P₃₃₂ UCU→S₃₃₂	n.a.	n.a.	Damaging (0.05)
104,233,977	Exon 8	GGA	GGA (100%)	GAA (6%) *	GAA (11%)	GGA (100%)	CCU→P₃₃₂ CUU→L₃₃₂	n.a.	n.a.	Damaging (0)
104,233,963	Exon 8	GCT	GCT (100%)	GCT (100%)	ACT (14%)	GCT (100%)	CGA→R₃₃₇ UGA→stop	rs19955486	A = 0.08%	Damaging due to stop
104,231,858	Exon 9	ACA	ACA (100%)	ACA (100%)	ATA (11%)	ACA (100%)	UGU→C₃₇₈ UAU→Y₃₇₈	n.a.	n.a.	Damaging (0)
104,231,680	Exon 10	CAC	CAC (100%)	TAC (4%) *	TAC (20%)	CAC (100%)	GUG→V₄₀₁ AUG→M₄₀₁	n.a.	n.a.	Damaging (0)
104,231,643	Exon 10	CCC	CCC (100%)	CTC (5%) *	CTC (11%)	CCC (100%)	GGG→G₄₁₃ GAG→E₄₁₃	n.a.	n.a.	Damaging (0.02)
104,231,622	Exon 10	CCC	CCC (100%)	CCC (100%)	CTC (13%)	CCC (100%)	GGG→G₄₂₀ GAG→E₄₂₀	n.a.	n.a.	Tolerated (0.08)
104,230,237	Exon 11	TGA	TGA (100%)	TGA (100%)	TAA (13%)	TGA (100%)	ACU→T₄₄₂ AUU→I₄₄₂	n.a.	n.a.	Damaging (0)
104,230,129	Exon 11	AGA	AGA (100%)	AGA (100%)	AAA (13%)	AGA (100%)	UCU→S₄₇₈ UUU→F₄₇₈	n.a.	n.a.	Tolerated (0.62)
STATH (direct reading, chromosome 4)
70,866,583	Exon 5	GGG	GGG (100%)	AGG (13%)	AGG (3%) *	GGG (100%)	GGG→G₁₇ AGG→R₁₇	n.a.	n.a.	N.A.
70,866,616	Exon 5	CCA	CCA (98%)	CCA (100%)	TCA (11%)	TCA (3%) *	CCA→P₂₈ UCA→S₂₈	n.a.	n.a.	N.A.
70,866,626	Exon 5	CCA	CCA (100%)	CTA (15%)	CCA (100%)	CCA (96%)	CCA→P₃₁ CUA→L₃₁	n.a.	n.a.	N.A.
70,866,628	Exon 5	CAA	CAA (100%)	TAA (15%)	CAA (100%)	CAA (100%)	CAA→Q₃₂ UAA→stop	n.a.	n.a.	Damaging due to stop
SMR3B (direct reading, chromosome 4)
71,255,405	Exon 3	AGG	AGG (100%)	AGG (100%)	AAG (12%)	AGG (100%)	AGG→R₅ AAG→K₅	rs777831757	A = 0%	NS
71,255,444	Exon 3	CCT	CCT (100%)	CTT (12%)	CTT (3%) *	CCT (100%)	CCU→P₁₈ CUU→L₁₈	n.a.	n.a.	NS
71,255,495	Exon 3	GGG	GGG (100%)	GGG (94%)	GAG (17%)	GGG (100%)	GGG→G₃₅ GAG→E₃₅	n.a.	n.a.	NS

^a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored.

Table 3. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on CST1, CST2, CST3, CST4, CST5, CSTA, and CSTB gene loci.

Chromosome Position (hg19)	Gene Region	Modern Human	Altai Neanderthal (Variant Frequency ^a)	Chagyrskaya Neanderthal (Variant Frequency ^a)	Vindija Neanderthal (Variant Frequency ^a)	Denisovan (Variant Frequency ^a)	Codon→Amino Acid	SNP id	SNP Total Frequency (ALFA)	SIFT Results (Score)
CST1 (reverse reading, chromosome 20)
23,731,494	Exon 1 (Signal)	ATA	GTA (100%)	GTA (95%)	GTA (100%)	GTA (100%)	UAU→Y_3(sp) CAU→H_3(sp)	rs6076122	G = 71.1%	Tolerated (0.11)
23,731,463	Exon 1 (Signal)	TGG	TAG (2%) *	TAG (13%)	TAG (5%) *	TGG (100%)	ACC→T_13(sp) AUC→I_13(sp)	n.a.	n.a.	Tolerated (0.39)
23,731,455	Exon 1 (Signal)	CAC	CAC (100%)	CAC (100%)	TAC (16%)	CAC (100%)	GUG→V_16(sp) AUG→M_16(sp)	n.a.	n.a.	Tolerated (0.23)
23,731,446	Exon 1 (Signal)	CGG	CGG (100%)	CGG (100%)	TGG (11%)	CGG (100%)	GCC→A_19(sp) ACC→T_19(sp)	rs1425228752	T = 0.001%	Damaging (0.01)
23,731,439	Exon 1	TCG	TCG (100%)	TTG (6%) *	TTG (14%)	TCG (100%)	AGC→S₂ AAC→N₂	n.a.	n.a.	Tolerated (0.15)
23,731,428	Exon 1	CTC	CTC (100%)	CTC (100%)	TTC (21%)	CTC (100%)	GAG→E₆ AAG→K₆	rs1292698911	T = 0.0004%	Tolerated (0.66)
23,731,394	Exon 1	CGT	CGT (100%)	CAT (13%)	CGT (100%)	CGT (100%)	GCA→A₁₇ GUA→V₁₇	n.a.	n.a.	Tolerated (0.25)
23,731,344	Exon 1	CTC	TTC (3%) *	CTC (100%)	TTC (11%)	TTC (3%) *	GAG→E₃₄ AAG→K₃₄	rs368203290	T = 0.008%	Tolerated (0.07)
23,731,307	Exon 1	GCA	GCA (100%)	GTA (14%)	GCA (100%)	GTA (6%) *	CGU→R₄₆ CAU→H₄₆	rs758187154	T = 0%	Damaging (0.01)
23,731,281	Exon 1	GTT	GTT (100%)	GTT (100%)	ATT (13%)	GTT (100%)	CAA→Q₅₅ UAA→stop	n.a.	n.a.	Damaging due to stop
23,729,759	Exon 2	CCC	CCC (100%)	CCC (100%)	CGC (26%)	CCC (100%)	GGG→G₅₉ GCG→A₅₉	n.a.	n.a.	Tolerated (1)
23,729,700	Exon 2	GGG	GGG (100%)	GGG (100%)	AGG (11%)	GGG (100%)	CCC→P₇₉ UCC→S₇₉	n.a.	n.a.	Tolerated (0.38)
23,729,699	Exon 2	GGG	GGG (100%)	GAG (3%) *	GAG (11%)	GGG (100%)	CCC→P₇₉ CUC→L₇₉	rs756782667	A = 0%	Tolerated (0.06)
23,729,687	Exon 2	TGG	TGG (100%)	TAG (16%)	TAG (4%) *	TGG (100%)	ACC→T₈₃ AUC→I₈₃	n.a.	n.a.	Damaging (0.02)
23,728,503	Exon 3	GGG	GGG (100%)	AGG (11%)	AGG (3%) *	GGG (100%)	CCC→P₁₀₆ UCC→S₁₀₆	rs754531104	A = 0.004%	Tolerated (0.09)
23,728,494	Exon 3 (Cys-SN)	TTG	CTG (10%) *	CTG (11%)	CTG (14%)	CTG (4%) *	AAC→N₁₀₉ GAC→D₁₀₉	rs3188319	C = 0.004%	Tolerated (1)
23,728,490	Exon 3	TCT	TTT (2%) *	TTT (14%)	TCT (100%)	TCT (100%)	AGA→R₁₁₀ AAA→K₁₁₀	n.a.	n.a.	Tolerated (1)
23,728,487	Exon 3	TCC	TCC (100%)	TTC (13%)	TTC (7%) *	TCC (100%)	AGG→R₁₁₁ AAG→K₁₁₁	rs3188320	T = 0%	Tolerated (0.85)
CST2 (reverse reading, chromosome 20)
23,807,260	Exon 1 (Signal)	CGG	CGG (100%)	CGG (100%)	CAG (14%)	CGG (100%)	GCC→A_12(sp) GUC→V_12(sp)	rs1411653443	A = 0.007%	Damaging (0.02)
23,807,257	Exon 1 (Signal)	TGG	TGG (100%)	TAG (14%)	TGG (100%)	TGG (100%)	ACC→T_13(sp) AUC→I_13(sp)	n.a.	n.a.	Tolerated (0.43)
23,807,245	Exon 1 (Signal)	CGG	CGG (100%)	CAG (14%)	CGG (100%)	CGG (100%)	GCC→A_17(sp) GUC→V_17(sp)	n.a.	n.a.	Tolerated (0.1)
23,807,231	Exon 1	GGG	GGG (100%)	AGG (14%)	AGG (8%) *	GGG (100%)	CCC→P₃ UCC→S₃	n.a.	n.a.	Tolerated (1)
23,807,162	Exon 1	GCA	ACA (95%)	ACA (100%)	ACA (100%)	ACA (8%) *	CGU→R₂₆ UGU→C₂₆	rs111349461	A = 0.06%	Damaging (0.05)
23,807,138	Exon 1	CTC	TTC (3%) *	TTC (12%)	TTC (6%) *	CTC (100%)	GAG→E₃₄ AAG→K₃₄	rs541427772	T = 0.017%	Tolerated (0.07)
23,807,102	Exon 1	GCG	ACG (3%) *	GCG (100%)	ACG (11%)	GCG (100%)	CGC→R₄₆ UGC→C₄₆	rs112783512	A = 0.019%	Tolerated (0.07)
23,807,093	Exon 1	GCC	GCC (100%)	ACC (4%)	ACC (20%)	GCC (100%)	CGG→R₄₉ UGG→W₄₉	rs55860552	A = 0.12%	Damaging (0)
23,807,084	Exon 1	GCT	GCT (100%)	ACT (5%) *	ACT (15%)	GCT (100%)	CGA→R₅₂ UGA→stop	rs568411970	A = 0%	Damaging due to stop
23,807,077	Exon 1	TCC	TCC (100%)	TCC (100%)	TTC (13%)	TCC (100%)	AGG→R₅₄ AAG→K₅₄	n.a.	n.a.	Tolerated (0.34)
23,807,075	Exon 1	CTC	CTC (100%)	TTC (12%)	TTC (12%)	CTC (100%)	GAG→E₅₅ AAG→K₅₅	n.a.	n.a.	Tolerated (1)
23,805,930	Exon 2	TAT	CAT (7%) *	CAT (5%) *	CAT (14%)	CAT (4%) *	AUA→I₆₇ GUA→V₆₇	rs199856966	C = 0.004%	Tolerated (1)
23,805,917	Exon 2	GCT	GTT (2%) *	GTT (13%)	GTT (5%) *	GTT (2%) *	CGA→R₇₁ CAA→Q₇₁	rs150428155	T = 0.008%	Damaging (0.01)
23,805,878	Exon 2	ACA	ACA (100%)	ACA (97%)	ATA (14%)	ACA (100%)	UGU→C₈₄ UAU→Y₈₄	n.a.	n.a.	Damaging (0)
23,805,875	Exon 2	CGG	CGG (100%)	CAG (15%)	CAG (2%) *	CGG (100%)	GCC→A₈₅ GUC→V₈₅	n.a.	n.a.	Tolerated (0.06)
23,804,730	Exon 3	ACG	ACG (100%)	ATG (7%) *	ATG (11%)	ACG (100%)	UGC→C₉₈ UAC→Y₉₈	n.a.	n.a.	Damaging (0)
23,804,702	Exon 3	ACC	ACC (100%)	ACT (12%)	ACC (100%)	ACC (100%)	UGG→W₁₀₇ UGA→stop	rs1380420803	n.a.	Damaging due to stop
23,804,691	Exon 3	TAC	TCC (13%)	TCC (10%) *	TCC (9%) *	TAC (100%)	AUG→M₁₁₁ AGG→R₁₁₁	rs202150666	C = 0.01%	Tolerated (0.31)
CST3 (reverse reading, chromosome 20)
23,618,472	Exon 1 (Signal)	GAG	GAG (100%)	AAG (8%) *	AAG (15%)	GAG (100%)	CUC→L_8(sp) UUC→F_8(sp)	rs1285248919	n.a.	Damaging (0)
23,618,433	Exon 1	GGG	GGG (100%)	GGG (100%)	AGG (13%)	GGG (100%) **	CCC→P_22(sp) UCC→S_22(sp)	n.a.	n.a.	Tolerated (0.5)
23,618,370	Exon 1	CAC	CAC (100%)	CAC (100%)	TAC (13%)	CAC (100%)	GUG→V₁₈ AUG→M₁₈	n.a.	n.a.	Tolerated (0.11)
23,618,358	Exon 1	CCA	CCA (100%)	TCA (22%)	TCA (4%) *	CCA (100%)	GGU→G₂₂ AGU→S₂₂	n.a.	n.a.	Tolerated (0.48)
23,618,357	Exon 1	CCA	CCA (100%)	CTA (11%)	CCA (100%)	CCA (100%)	GGU→G₂₂ GAU→D₂₂	n.a.	n.a.	Tolerated (0.56)
23,618,295	Exon 1	GTG	GTG (100%)	GTG (100%)	ATG (13%)	GTG (100%)	CAC→H₄₃ UAC→Y₄₃	n.a.	n.a.	Tolerated (1)
23,615,994	Exon 2	CCC	CTC (3%) *	CCC (100%)	CTC (13%)	CCC (100%)	GGG→G₅₉ GAG→E₅₉	n.a.	n.a.	Damaging (0.01)
23,614,564	Exon 3	GTC	GTC (100%)	GTC (100%)	ATC (13%)	GTC (100%)	CAG→Q₁₁₈ UAG→stop	n.a.	n.a.	Damaging due to stop
CST4 (reverse reading, chromosome 20)
23,669,566	Exon 1 (Signal)	TGG	TGG (100%)	TAG (7%) *	TAG (11%)	TGG (100%)	ACC→T_13(sp) AUC→I_13(sp)	rs770415022	n.a.	Tolerated (0.37)
23,669,561	Exon 1 (Signal)	CGA	CGA (100%)	CGA (100%)	CGA (100%)	AGA (100%)	GCU→A_15(sp) UCU→S_15(sp)	n.a.	n.a.	Tolerated (0.39)
23,669,539	Exon 1	AGG	AGG (100%)	AAG (5%) *	AAG (13%)	AGG (100%)	UCC→S₃ UUC→F₃	n.a.	n.a.	Tolerated (0.08)
23,669,470	Exon 1	GCA	GCA (100%)	GTA (15%)	GCA (100%)	GTA (17%)	CGU→R₂₆ CAU→H₂₆	rs201273557	T = 0.01%	Tolerated (0.08)
23,669,462	Exon 1	GTG	GTG (100%)	GTG (100%)	ATG (18%)	GTG (100%)	CAC→H₂₉ UAC→Y₂₉	n.a.	n.a.	Tolerated (0.06)
23,669,408	Exon 1	GGC	GGC (100%)	AGC (12%)	GGC (100%)	GGC (100%)	CCG→P₄₇ UCG→S₄₇	n.a.	n.a.	Tolerated (0.06)
23,667,835	Exon 2	AAA	CAA (97%)	CAA (100%)	CAA (90%)	AAA (100%)	UUU→F₅₈ GUU→V₅₈	rs145608577	C = 0.2%	Tolerated (1)
23,667,828	Exon 2	CCC	CCC (100%)	CTC (18%)	CCC (100%)	CCC (100%)	GGG→G₆₀ GAG→E₆₀	rs144556333	T = 0.007%	Damaging (0)
23,667,826	Exon 2	CAC	CAC (100%)	TAC (10%) *	TAC (27%)	CAC (100%)	GUG→V₆₁ AUG→M₆₁	n.a.	n.a.	Tolerated (0.24)
23,667,808	Exon 2	CAT	CAT (100%)	TAT (13%)	CAT (100%)	TAT (4%) *	GUA→V₆₇ AUA→I₆₇	rs774067751	T = 0.007%	Tolerated (0.23)
23,667,792	Exon 2	TGG	TGG (100%)	TAG (13%)	TGG (100%)	TGG (100%)	ACC→T₇₂ AUC→I₇₂	n.a.	n.a.	Damaging (0)
23,667,783	Exon 2	TGG	TGG (100%)	TGG (95%)	TAG (15%)	TGG (100%)	ACC→T₇₅ AUC→I₇₅	rs760057501	A = 0%	Damaging (0.01)
23,666,565	Exon 3	TAC	TCC (88%)	TCC (14%)	TCC (80%)	TAC (100%)	AUG→M₁₁₁ AGG→R₁₁₁	rs779547810	C = 0%	Tolerated (0.87)
CST5 (reverse reading, chromosome 20)
23,860,243	Exon 1	AGC	AAC (3%) *	AGC (100%)	AAC (11%)	AAC (5%) *	UCG→S₄ UUG→L₄	rs145031249	A = 0.011%	Tolerated (0.27)
23,860,211	Exon 1	GTA	GTA (100%)	GTA (100%)	ATA (12%)	GTA (100%)	CAU→H₁₅ UAU→Y₁₅	n.a.	n.a.	Tolerated (1)
23,860,199	Exon 1	GAG	GAG (100%)	AAG (11%)	GAG (100%)	GAG (100%)	CUC→L₁₉ UUC→F₁₉	rs370924959	A = 0%	Tolerated (0.66)
23,860,178	Exon 1	ACA	GCA (93%)	GCA (100%)	GCA (95%)	GCA (100%)	UGU→ C₂₆ CGU→ R₂₆	rs1799841	G = 43.2%	Tolerated (1)
23,860,174	Exon 1	CGG	CGG (100%)	CGG (100%)	CAG (11%)	CGG (100%)	GCC→A₂₇ GUC→V₂₇	n.a.	n.a.	Tolerated (0.18)
23,860,130	Exon 1	CTA	CTA (100%)	CTA (100%)	TTA (14%)	CTA (100%)	GAU→D₄₂ AAU→N₄₂	rs1257216384	n.a.	Tolerated (0.11)
23,860,093	Exon 1	CGG	CGG (100%)	CGG (100%)	CAG (11%)	CGG (100%)	GCC→A₅₄ GUC→V₅₄	n.a.	n.a.	Tolerated (0.11)
23,858,200	Exon 2	TGG	TGG (100%)	TAG (22%)	TGG (100%)	TGG (100%)	ACC→T₇₆ AUC→I₇₆	rs41282292	A = 0.061%	Damaging (0)
CSTA (direct reading, chromosome 3)
122,044,197	Exon 1	GTT	GTT (100%)	ATT (11%)	GTT (100%)	GTT (100%)	GUU→V₂₀ AUU→I₂₀	rs778366890	A = 0%	Tolerated (0.23)
122,056,400	Exon 2	CCA	CCA (100%)	CCA (100%)	TCA (12%)	CCA (100%)	CCA→P₂₅ UCA→S₂₅	n.a.	n.a.	Tolerated (0.74)
122,060,361	Exon 3	CTT	CTT (100%)	CTT (100%)	TTT (16%)	CTT (100%)	CUU→L₈₂ UUU→F₈₂	n.a.	n.a.	Damaging (0)
122,060,373	Exon 3	CAG	CAG (100%)	CAG (100%)	TAG (12%)	CAG (100%)	CAG→Q₈₆ UAG→stop	n.a.	n.a.	Damaging due to stop
CSTB (reverse reading, chromosome 21)
45,194,562	Exon 2	CGC	TGC (2%) *	TGC (11%)	CGC (100%)	CGC (100%)	GCG→A₄₉ ACG→T₄₉	rs559906825	T = 0.007%	Damaging (0)
45,194,138	Exon 3	TGG	TGG (98%)	TCG (13%)	TGG (95%)	TGG (100%)	ACC→T₈₁ AGC→S₈₁	n.a.	n.a.	Tolerated (0.65)
45,194,132	Exon 3	AGA	AGA (100%)	AGA (100%)	AAA (15%)	AGA (100%)	UCU→S₈₃ UUU→F₈₃	n.a.	n.a.	Tolerated (0.1)

^a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Di Pietro, L.; Boroumand, M.; Lattanzi, W.; Manconi, B.; Salvati, M.; Cabras, T.; Olianas, A.; Flore, L.; Serrao, S.; Calò, C.M.; et al. A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution. Int. J. Mol. Sci. 2023, 24, 15010. https://doi.org/10.3390/ijms241915010

AMA Style

Di Pietro L, Boroumand M, Lattanzi W, Manconi B, Salvati M, Cabras T, Olianas A, Flore L, Serrao S, Calò CM, et al. A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution. International Journal of Molecular Sciences. 2023; 24(19):15010. https://doi.org/10.3390/ijms241915010

Chicago/Turabian Style

Di Pietro, Lorena, Mozhgan Boroumand, Wanda Lattanzi, Barbara Manconi, Martina Salvati, Tiziana Cabras, Alessandra Olianas, Laura Flore, Simone Serrao, Carla M. Calò, and et al. 2023. "A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution" International Journal of Molecular Sciences 24, no. 19: 15010. https://doi.org/10.3390/ijms241915010

APA Style

Di Pietro, L., Boroumand, M., Lattanzi, W., Manconi, B., Salvati, M., Cabras, T., Olianas, A., Flore, L., Serrao, S., Calò, C. M., Francalacci, P., Parolini, O., & Castagnola, M. (2023). A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution. International Journal of Molecular Sciences, 24(19), 15010. https://doi.org/10.3390/ijms241915010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution

Abstract

1. Introduction

2. Results

2.1. Nucleotide Variations in the Gene Loci Encoding Basic Proline-Rich Proteins

2.1.1. PRB1 Gene

2.1.2. PRB2 Gene

2.1.3. PRB3 Gene

2.1.4. PRB4 Gene

2.2. Nucleotide Variations in the Gene Locus Encoding the a-PRP

2.3. Nucleotide Variations in the HTN Gene Loci

2.4. Nucleotide Variations in the AMY1A Gene Locus

2.5. Nucleotide Variations in the STATH and P-B Gene Loci

2.6. Nucleotide Variations in the CST Gene Loci

2.6.1. CST1 Gene

2.6.2. CST2 Gene

2.6.3. CST3 Gene

2.6.4. CST4 Gene

2.6.5. CST5 Gene

2.6.6. CSTA and CSTB Genes

2.7. Geographic Distribution of Genetic Variants in Modern Humans

2.8. Evolutionary Pressure of Salivary Protein Genes

3. Discussion

4. Materials and Methods

4.1. Nucleotide Variants Annotation

4.2. Protein Data Analysis

4.3. Selective Pressure Analysis

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI