Search more information of high quality chemicals, good prices and reliable suppliers, visit
www.echemi.com
summary
How new gene functions evolve is a fundamental question
Brief introduction
Parallel independent evolution leading to similar genetic variation has been discussed as a co-driver of convergence responses to adaptive stress (1).
Recent studies have shown that mucin genes are grouped according to their function rather than evolutionary commonalities and may be particularly prone to convergent evolution (8,9).
Most functionally similar genes come from a replication of a common ancestral gene (17).
Results and discussion
Multiple instances of SCPP site evolution from de novo mucin
To establish the basis for studying the evolution of mucin, we constructed a simple but conservative bioinformatics approach that identifies potential mucin genes in a given genome by searching for available gene annotations, and confirms mucin function by validating proline(P), threonine (T) rich exon repeats,as well as serine amino acids
We found that four ferret-specific mucins are localized at the secretory calcium-binding phosphorus protein (SCPP) site (in humans withCSN1s1at the 5' endof the nameat the 3' end).
The orphan mucin gene in the SCPP loci evolved independently
The evolution of genes within the SCPP loci has been discussed in the context of calcium-binding proteins, which are important for the mineralization of bones and teeth and the major protein components in milk and saliva (19).
Next, we asked whether the hypothesized mucin gene we identified encodes a protein with functional mucin properties (Figure 2A).
Muc10 asan example of evolution from de novo mucin
The identification of multiple neomycin genes in the SCPP locus provides a unique opportunity to address the question of whether these genes evolve through new functions after gene replication (17), as new genes from noncoding sequences (23–25) or through other mechanisms (Figure 3A).
We summarized the underlying model of mucin evolution (Figure 3A), and we first askedMuc10 tobe the product of the most recent replication event, No.
We first validated this hypothesis by artificially comparing human PROL1 with mouse and rat MUC10 (Preamble 1gene) peptide sequence analysis showing that these proteins have about 60% and 33% homology at the 5′ and 3′ ends, the former corresponding to signal peptides (Figure 3B). Homology does not extend to the intermediate region of the MUC10 protein, it has at least 9 repeat sequences, and the length of the 39 base pairs (bp) (13 amino acids) is about 85% the same as each other. These exon replicates are not present in any primate PROL1 protein (Figure 3B). Further studies have shown that these repeat sequences are rich in T and S amino acids (Figure 2B) To ensure the validity of the observations, we amplified and sequenced repeat fragments of mouse and rat PROL1from mouse samples (C57BL/6J strains).
This repeat sequence is identical to the mouse reference genomic sequence (sequence file S1), but there is no homologous sequence in the non-rodent genome, which further supports that these repeat sequences were obtained in the ancestors of mice and rats .
While obtaining mucin function, the tissue expression patterns of homologous genes also underwent significant changes inpreamble 1andMuc10genes . In particular, PROL1 is mainly expressed in the human tear gland, while it is almost not expressed in other tissues . In contrast, MUC10 in mice and rats is abundantly expressed in saliva (31) and almost no expression in the lacrimal glands (32). It seems that modulatingthe Muc10 typein mice and the ancestors of mice, has evolved to obtain a strong salivary gland-specific expression .
To explainthe Muc10 typein mice, we considered two cases (Figure 3C). First, it is reasonable thatthe Muc10 type, fromthe preamble 1precursor, may have adoptedMUC7 No. 7after it disappeared in mice and mouse pedigree. In this case, we expectmuc7humans andMuc10 tobe similar in mice as well . Second, it is possible thatthe Muc10 typeevolved independently in the lineage of mice and mice, resulting in a different expression trend than human expression trend7 MuC7In order to distinguish between these two cases, we perform immunohistochemical staining of MUC10 and MUC7 in mouse and human salivary gland tissues ( Figure 3D). Consistent with previous studies (31), we found that MUC10 is expressed only in the submandibular glands in mice, while IN humans MUC7 is expressed in both the submandibular and sublingual glands. In addition, while MUC10 is expressed in all cell types in the mouse submandibular gland, MUC7 is expressed by specific cell populations in the gland. Overall, at the tissue and cell level, the expression patterns of MUC10 and MUC7 are different, suggesting thatthe Regulatory Mechanism of Muc10 islikely to have evolved independently in the mouse lineage.
Lineage-specific mucin evolved from proline-rich precursors
Based on the transition in the rodent lineage from Preamble 1 toMuc10, we hypothesized that other new mucins may have also evolved from proline-rich proteins . Specifically, we are interested in three genes, namely . ,Preamble 1(recently calledOPRPN),SMR3A type(previouslyPreamble 5), andSMR3B type(before Preamble 3) They are adjacent to each other on the SCPP loci and may be identical in blood. To detect whether these genes constitute precursors to neomycins, we looked for sequence homology between these three proteins and the newly identified 28 lineage-specific mucins . We found at least 5 lineage-specific mucins in closely related species that resemble proline-rich non-erosive protein sequences (Figure 4, Figure S4, and Table S1).
We also found that they retain signal peptides from their precursors (60% to 84% amino acid homology) but evolve TS-rich repeat sequences in a lineage-specific manner (Figure 4). For example, similar to mice and rats, rhinos have significantly higher T and S amino acid levels of PROL1 than other species (Wilcoxon test; P<0. 002198) (Figure 2B). However, PROL1 in rhinos and MUC10 in mouse and rat lineages have little sequence homology, suggesting that T and S richness in these proteins is unlikely to pass through lineage-identical . The emergence of new gene functions is often considered a rare phenomenon . Therefore, it is worth noting that in two distant mammalian lineages, rhinoceros and mice, evolution has produced a new mucin gene from the same ancestral gene,And Preface 1These observations are consistent with the evolutionary scenario, where the ancestors secreted proline-rich protein PROL1,independently acquiring mucin function in two different lineages. Rather than being genetically reproduced or non-coding sequences of new functions after evolution of de novo genes .
Our observations offer several avenues for future research. For example, we found in the pangolin genome two new mucinprologues 1andSMR3A/B typesin the pangolin genome that were enriched with exon T- and S-repeat sequences in pangolins (Figure S4).
This is an interesting observation, as these lineage-specific mucins may have contributed to the unusual stickiness of pangolin saliva, a property that was most likely chosen to accommodate the animal's insectivorous habits (33). Thus, our findings suggest that the evolutionary reuse of the mucin gene uses the mechanism we outlined for the evolution of MUC10 in mouse and rat lineages, where T and S-rich exon repeat sequences are obtained from a secreted proline-rich protein (Figure 3A). In conclusion, we believe that the presence of proline-rich secreted proteins at the SCPP site promotes the evolution of mucins.
Rapid evolution of mucin exon repeat sequences
In our previous analysis ofMuc7 No. 7in mammals, we found that its exon repeats retain their T and S levels, but differ greatly in copy numbers within and between species (28). Our results, Muc7 No. 7, contrasted with other exon repeats in the genome, which occur in more than 10% of all protein-coding genes and are generally highly conserved at the nucleotide and copy number levels (34, 35). Based on these results, we hypothesized that the exon repeats of mucin differ in copy number as a response to the overall glycosylation of mucin modulated by various selective pressures, including dietary and pathogenic changes. If this hypothesis is true, we expect that we will observe a fairly large level of copy number change in the interspecies mucin repeats, and that the T and S content of individual repeats will remain unchanged over time.
We studied for the first time the copy number variation of mucin repeats between mammals (Figure 4 and Table S1 ). We found that the number of mucin repeats was basically starting from 3 inseals at Muc19–likecarnivores to 42-year-oldMuc2-like/Smr3a type Independent of the mechanism of repeat length (Figure S5) or copy number change (Figure S6 ). In addition, we have several examples where copy number variation for certain repeating sequences evolves in a species-specific way. For example, we found that the muc10 type of the maximum likelihood tree that reproduces in mice and individual micecan divide the repeat sequences of each species into different clusters with high confidence (Figure S6 ). This finding suggests that in mouse and rat lineages, exon repeat copy numbers expand independently. We have previously reported an increase in and loss of lineage-specific copy numbers of primate MUC7 (28). Overall, the change in copy number of exon mucin repeats that we observed is consistent with the fitness hypothesis described above.
Next, we studied our second expectation, which is that the T and S levels of the mucin exon repeat sequence remain unchanged over the course of evolution. We focus onMuc10rodents andmucus-likeIn felines, a rational arrangement of individual repeating units is possible . By measuring the number of synonymous and non-synonymous nucleotide differences between repeating units, we observed that the frequency of occurrence of non-synonymous changes associated with T and S amino acids occurred less frequently than expected based on the number of synonymous changes (R2<0. 15;Figure S7 ). This finding suggests that repeated T and S levels remained at similar levels and did not follow neutral expectations . For amino acids other than T and S, we observed the expected neutral ratio of non-synonymous differences (R2>0. 65;Figure S7 ). In general, for example,Muc7(28) no. 7, Muc10 type, andmucus-likeexon repeat sequences, mucin repeat sequences adaptively retain their T and S amino acid content , indicating that lineage-specific mucin evolved under selective restriction to retain O-glycosylation .
Lineage-specific mucins are involved in variations in the mammalian salivary glycoprotein group
Previous studies of mucins, mainly in humans, have classified mucin as membrane-bound or secreted (36, 37). Given that the SCPP gene family is composed primarily of genomes encoding secreted proteins, we hypothesize that lineage-specific mucins that evolve at this site will also have secretory properties . We conducted bioinformatics testing of this hypothesis and found that all new lineage-specific mucins were predicted to be secreted (see Materials and Methods; Table S1).
In addition, we did not find transmembrane domains in any of the lineage-specific mucins, which supports that they may be secreted proteins .
We verified previous work (26) showing SCPP mucin,MUC7 type 7andMuc10,with a large number of specific expressions in the salivary glands of humans and mice, respectively (Figure 3D). Therefore, we investigated whether other lineage-specific mucins are also expressed in the salivary glands . With the exception of MUC7 in humans and MUC10 in mice, immunohistochemistry or Western blot analysis of lineage-specific mucins is difficult due to the lack of commercially available empirical antibodies. However, despite limited cross-species expression data from salivary glands, we were able to detect salivary gland expression of some lineage-specific mucins, including batmucus, cowmucus-like, andthe new pangolin gene_9802. Use available RNA sequencing (RNA-seq) data (Figures S4 and S8 ). To further investigate the expression of mucin genes in saliva, we performed liquid chromatography-mass spectrometry (LC-MS) analysis of the entire saliva of humans, mice, rats, pigs, cattle, dogs, and ferrets (see Materials and Methods; Figure 5A). In addition to mucins known to be expressed in saliva, such as MUC5b, MUC7, MUC19, and MUC10, we also found some previously known mucins that were not expressed in saliva, such as MUC4, MUC21, MUC13, MUC2, and MUC16 (Figure 5A). In addition, we found that 8 species of specific mucus are secreted in the saliva of dogs, ferrets, and cows (Figures 5, A, and B, and Table S2 ).
To experimentally verify whether the retention of T and S amino acids in the lineage-specific mucin observed at the sequence level is converted to protein glycosylation, we performed SDS-polyacrylamide gel electrophoresis (PAGE) isolation of salivary proteins based on tris acetate, followed by periodic acid-schiff (PAS) staining,which revealed glycosylated proteins (see Materials and Methods; Figure 5C) (27, 29). By comparing the electrophoretic band types of salivary proteins in pigs, cattle, ferrets, dogs, rats, mice, and humans, we detected a high degree of diversity in glycosylated protein bands between subject species. To confirm that strong staining bands represent mucin at the amino acid sequence level, we excised pas staining bands separately and performed mass spectrometry analysis (see Materials and Methods; Figure 5C). We were able to confirm the large expression of most mucins identified by LC-MS in saliva (Figure 5C and Table S2 ). In lineage-specific mucins, in addition to MUC7 and MUC10, we can identify SMR3A in the saliva of dogs and ferrets, the proteoglycosacid protein in dog saliva, and MUC5AC-like in the saliva of ferrets, which may be bioinformatics predictions of glycosylation.
An unexpected but interesting result of the SDS-PAGE analysis was a high degree of variation in the content of glycosylated proteins in mammalian saliva samples . Our current method has limitations in distinguishing between mucin and other glycoproteins . Therefore, linking glycoprotein variants between mammals to mucin remains a hypothesis that requires further research, perhaps using recently available methods of mucin purification (38). Nonetheless, previous studies have shown that within our SDS-PAGE size range, paS stains the most intense primary glycosylated proteins in human saliva are MUC5B and MUC7 (27, 39,40). Therefore, our findings provide evidence that at least some of the observed differences are driven by mucin . For example, ferret saliva produces at least four times the glycosylated band of human saliva (Figure 5B). This is consistent with our finding that among the species we surveyed, ferrets had the largest number of lineage-specific mucins (Figure 1). In addition to lineage-specific mucins, we found that multiple mucin genes with homologous sequences in almost all mammals are expressed in a species-specific manner in ferret saliva. These observations of ferrets provide another piece of evidence that the high diversity of muculin proteins in mammalian saliva evolved by acquiring new mucin genes and repurposing existing mucins to express and secrete in saliva (Figure 5B).
Establishment of a model of mucin evolution
We documented multiple instances of independent evolution of mucin function in different mammals and showed that most of these newly discovered mucins are located within the SCPP locus. It is unusual that the repeated evolution of this gene function at a particular site does not occur through the replication of the entire gene. Therefore, we constructed a mucin evolutionary model (Figure 6) in which the non-mucin gene encoding proline-rich secreted protein acts as the building block of the new mucin. This hypothesis makes biological sense because proline-rich proteins are structurally (rigid due to the abundance of proline) and functionally (secreted proteins) similar to mucins. They differ from mucins simply because they lack T and S-rich exon repeat sequences and are the main targets of O-glycosylation. Therefore, these genes have the potential to rapidly acquire mucin function by repeatedly adding exon repeat sequences. Our study provides an initial and conservative map with an emphasis on SCPP sites . We conducted a parallel analysis of the recently available, biochemically guided "mysoprotein" database and came to similar conclusions, but identified other candidates for lineage-specific mucin formation (Figure S9 ). Therefore, a more thorough effort is needed to extend this analysis to other species and sites .
Our proposed mucin evolutionary model has three broader implications . First, it uses exon duplication as the main driver of rapid evolution and functional diversity (41). Second, it reveals proline-rich proteins as precursors to mucin production. Third, it argues that glycosylation is a possible force for the adaptive evolution of mammals (42). Our model is consistent with the growing recognition of repeatability, convergence, and reversal as common themes of molecular evolution (43).
In addition to the mechanical insights, our findings raise the question: What is the resilience that causes new mucin gene retention? One clue comes from the saliva expression of these mucins . In humans, the mucus function in saliva is associated with pathogen binding, mucus layer formation, facilitating digestion, and providing viscosity and lubricity to saliva. Therefore, it is safe to say that the new mucin may have beneficial effects in immunity, diet and the mechanical properties of saliva. Previous work, including our study, has shown that O-glycans on mucin interact with pathogens (39). The secreted mucin is thought to be bait (21) saturating pathogen receptors in the secretions, thus preventing them from binding to the surface of the tissue. They can also "tame" pathogenic behaviors, promoting more symbiotic interactions between microbes and host organisms (44,45). The overall density, size, structure, and spatial distribution of mucin O-glycans determine the range of interactions with pathogens (39, 46) so that individual mucins may evolve to target specific microorganisms (47). For example, sialic acid residues, as terminal components of mucin O-glycans, provide molecular motifs for the identification of specific pathogens (48, 49) These themes often change in the evolutionary arms race (49, 50). Thus, lineage-specific mucins may bind to, or be bound to, specific pathogens in a lineage-specific manner, and changes in the number of copies of their exons can fine-tune glycosylation, which may help keep up with changing pathogenic pressures .
The evolution of mucins may also be related to the digestion and perception of different foods by different species. The mucin content in saliva can interact directly with food, altering the ability to perceive (51,52). In addition, mucins can interact and may alter the microbial composition of the gastrointestinal tract (53) and thus affect digestion (54). It has been suggested that oral and gut microbes are in a state of competition in their interactions with gastrointestinal mucins (55). Thus, due to selective pressures formed by diet working together with the gastrointestinal microbiota, some mucins may be adaptively maintained in a particular lineage. Mucins also play a key role in determining the physical properties of body fluids and their function in forming tissue barriers. Therefore, an exciting future area of research will be to study the saliva activity of new mucins versus the physical properties of saliva, such as viscosity, lubricity and spindle pattern (56).
In summary, our study establishes the mechanism by which the common functional and structural properties of a gene cluster promote the recurrence of mucin function in other evolution-unrelated genes. Our findings provide mechanistic insights into the de novo formation of mucins and how they produce diversity in mucin groups. We also open up avenues for future work to characterize the function, formation mechanisms, and adaptive effects of mucins, and at a broader level, to study the evolution of new gene functions.
Materials and methods
Preliminary identification of candidate mucin
Gene and protein annotations are available for download from the National Biotechnology Information Center 's (NCBI) Genome Index Database at ftp:// ftp. ncbi. nih. gov/genomes/ by searching for the keywords "muc", "mucin", "mucin like" and "mucin domain containing" (accessed May 26, 2021), The hypothetical mucin was extracted from this dataset. Each of the species queried (humans, mice, cows, and ferrets) contains some presumed mucin genes that are not annotated by the mucin database www. medkem. gu. se/mucinbiology/databases/ (reviewed on May 26, 2021 ).
BLAST search for homologous sequences
Once we have a list of candidate mucin genes through the keyword search above, we can use NCBI-BLAST to determine the presence or absence of candidate mucins in the reference genome of each human, mouse, cow, and ferret. This step allows us to verify annotations as well as distinguish between lineage-specific genes and homologous genes . Simply put, protein sequences are downloaded from UniProt and NCBI . Search for these sequences in each species using BLASTp (non-redundant protein sequences ). Blast score parameters (57) algorithms are as follows: Matrix, BLOSUM62; Gap cost, exists 11 extensions 1; Composition adjustment, component score matrix adjustment, as described elsewhere (58). The blast hit rate is assessed based on maximum score, total score, query coverage (>30%),e-value (<0. 01) and identification percentage (>20%) . Next, we identified gene annotations in the region of the genome with the highest homology to the candidate protein sequence in the corresponding reference genome. In addition, we used the NCBI and UCSC Genome Browser to compare the genomic locations of these hypothetical genes relative to other known mucin genes to determine collinear locations . Weneed to note in Figure 1 that our pipeline is conservative and relies on the accuracy of gene annotation and the quality of the assembly. We believe that while our main observations remain unchanged, further validation is needed to construct a final map of mammalian mucin content. For example, tandem repeat sequences are particularly difficult to assemble and therefore may be missing in some reference genomes . The recently released Human T2T Alliance Conference (59), arguably the most accurate mammalian reference genome, identifies two new mucins in the human genome, MUC3B and MUC22-like . These are not included in our dataset . Therefore, it is clear that future assembly based on long reading sequences in other mammals will compensate for these shortcomings and expand our understanding of mucins.
Study of mucin properties
We organized a two-pronged pipeline to confirm the mucin properties in these hypothetical mucin candidates. An important feature of mucin is that its repeated sequence of open reading frames is confined to the domain (8). In our pipeline, we used the Tandem Repeat Finder to search for repeat sequences of candidate mucins in all four of our mammalian query species (60). The algorithm identifies repeating moduluses in a given sequence. One problem is that the mold body is difficult to define (for example, we can have multiple duplicate molds in a series repeating array) (e. g. , Figure S6 ). For consistency, we reported all motifs (repeated concatenations)≥3) using the longest motif unit in our analysis.
Next, we locate domains rich in proline, threonine, and serine, an important feature of mucin. We used a Perl script algorithm called PTSpred (61). PTSpred uses a sliding window (50 to 200 amino acids) along a given protein sequence to calculate the percentage of proline, threonine, and serine amino acids within this window. We use recommended thresholds to identify PTS domains . The new (lineage-specific) mucin properties are determined by requiring all of the following features: the presence of greater than 4% of the predicted O-glycosylation sites in each peptide segment, the presence of TS abundances greater than 20% in the peptide sequence, the presence of repeat sequences contained within the gene domain, and finally, the presence of proline, threonine-,and serine-rich amino acid sequences aggregate in exon repeat sequences .
Determination of the secretory potential of proteins
To establish signal peptides on protein sequences, we use signalp5. 0(62), which can be www. cbs. dtu. dk/services/SignalP/, using standard parameters for prediction . In addition, we searched for known mucin domains [such as vascular hemophilia factor-like, epidermal growth factor-like, sperm protein incretin kinase, and agrin domains (8)] using Pfam 32. 0(https://pfam. xfam. org/) (63). The algorithm utilizes multiple sequence alignment and hidden Markov models to predict these regions . At the same time, we used TMHMM to look for the presence of transmembrane helixes in neomyrins (www. cbs. dtu. dk/services/TMHMM/) (64). In addition, to determine the likelihood that new mucins will be secreted, we used an SRTpred server (65) available https://webs. iitd. edu. in/raghava/srtpred/home. html in short, this database uses machine learning algorithms to measure the secretion potential of proteins, with positive values indicating secretion 。 At the same time, we also validated these results in the exported database (available at www. outcelte. com website/) (66), including machine learning to estimate secretion potential . In particular, a score of 0. 5 or higher indicates that there may be secretions . Table S1 reports the results for SRTpred and OUTCYLE.
Determination of protein O-glycosylation potential
Predict O-glycosylation sites with SPRINT-Gly (which can be https://doi. org/10. 1093/bioinformatics/btz215) (67). This deep neural network method predicts the likelihood that a T or S peptide will be O-glycosylated based on the amino acid sequence in each given window. Simply put, the algorithm scans the T and S amino acids in each protein sequence and generates a window containing the upstream 4 amino acids and the 4 downstream amino acids around the identified T or S amino acids. It then assigns a probability of O-glycosylation based on this window and previously confirmed O-glycosylated peptides in humans and mice. To further support the sprint-Gly prediction of potential O-glycan loci, we used Net-O-glyc4. 0 (available in www. cbs. dtu. dk/services/NetOGlyc/) (68), which can estimate potential O-glycosylation between mammalian species trained by O-glycosylation experiments in human cell lines 。 The results of both algorithms are consistent . However, we found that using SPRINT-Gly provides a more rigorous prediction of O-glycosylation, so we chose to use the results of this more conservative algorithm in the graph.
Identification of additional lineage-specific mucins and their possible congeners
As described in the main text, we identified regions of 250-300 kb (depending on species) of genes within the CSN3andAMTNSCPP loci as hot spots for lineage-specific mucins. We then expanded our search for lineage-specific genes in other mammals (49 mammals in total) within this loci. In particular, we identified gene annotations in this hotspot region and downloaded protein sequences . We then use these protein sequences, using our mucin assay pipeline to classify the genes, including determining exon repeats and the O-glycosylation potential of these repeats, as described above . Next, we use a BLAST search, using the same parameters as the initial screening above, to search for homologous sequences of each candidate mucin in other mammalian species. This process allowed us to identify 28 lineage-specific mucins, as described in Table S1.
Identify precursors of lineage-specific mucins
We want to test the hypothesis that at least some lineage-specific mucins evolved from existing genes that did not have TS-rich repeat sequences, such as MUC10 evolved from proline-rich ancestor protein precursors (Figure 3). To do this, we combined gene annotation, BLAST search, and RNA sequence maps to thoroughly search for protein sequences from 28 lineage-specific mucins in mammals. It's worth noting that every precursor we identified was a proline-rich protein . Due to the reproducibility of lineage-specific proteins, our study was not simple . First, duplicate content increases the uncertainty of the explosion similarity search, thereby reducing statistical power . Secondly, due to the rich repetition of PTS, there is a possibility of false alarm explosion hits. So, to avoid including duplicates in the initial BLAST search, we used the first 30 amino acids, which are roughly the same as the signal peptides in the secreted protein. Next, we manually compare lineage-specific mucins with presumed ancestral congeners to identify specific regions of sequence similarity, as described in Figure 4 . We describe in detail the details of our search for each lineage-specific protein below, and we describe in detail below the proline-rich precursors we identified. Overall, our pipeline is conservative, and other lineage-specific mucins may also have proline-rich precursors that we did not detect in this study.
Mucus-like carnivores
To determine carnivorous lineage-specific mucin (called MUC2 in cats, but SMR3A in ferrets and dogs; For the ancestral origin of Figure S2, line 7, we analyzed the first 30 amino acids of the MUC2-like protein sequence in cats (domestic cat, felCat9) versus humans (taxid: 9606, hg38).
We start with an impact on the human genome because gene annotation and the accuracy of protein sequences are optimal for humans, and there may be unknown biases in other species. We found that the SMR3A and SMR3B genes were significantly hit (e=6×10?). 8). We then manually compared human SMR3A and SMR3B with cat MUC2-like protein sequences and found that SMR3A had two highly similar regions, while SMR3B had only one region . We then use BLAST again to verify these individual trim areas (see Alignment in sequence file S1 and Figure 4 foreValues,e< 10 years?). 30). By the way, new components updated during the revision now annotate this gene for cats as SMR3A .
There is ungulate mucus
We were able to track ungulates (cattle, sheep, camels, alpacas and antelopes) with even toes; One lineage-specific mucin found in Figure S2, line 1) is the ancestral proline-rich SMR3B protein . Similar to the pipeline above, we first gave the first 30 amino acid sequences of this lineage-specific mucin to humans and found a significant blow to the SMR3B gene (e=0. 001).
We then narrowed our search to an outer group of pedigrees, with ungulates (taxid: 9787 ). The most significant blow was SMR3B on donkeys (e=3×10?12). We verified that the Donkey SMR3B was not duplicated . Next, we manually align the cow MUC2 and donkey SMR3B sequences and retrieve the values of the distinct parts reported in BLAST e figure 4 and the sequence file S1 .
Rodent MUC10
We found that the first 30 amino acids of the protein exploded into human PROL1 (e=0. 046).
Based on the previous example, we compared amino acid sequences in mice and humans and used BLAST searches to identify similarities and assess their uniqueness . We found that doing the same with mice produced a lower effectevalues . Figure 3B is now reportednotably that gene annotations have led to confusion about the evolutionary origin of these genes. For example, consistent with our results, mice refer to the latest gene annotation in the genome for MUC10 to refer to the gene PROL1 . However, the latest human gene annotation update refers toPreamble 1in humansoprpn inc
Rhino PROL1
When we violently attacked the first 30 amino acids of Rhino PROL1 on humans, we did not find any significant effects . Instead, believing in reference to gene annotations in the genome, we compared Rhino-PROL1 with human PROL1 (now OPRPN ). We found multiple well-aligned sections, which we interrogated in detail using BLAST, and found that some of them had high hit rates (e< 10?). 6). We report these to Figure 4 and sequence file S1 .
Sequence amplification and validation
MouseProl1/Muc10genome sequences are polymerase chain reaction (PCR) amplification and sanger sequencing using standard methods . The primer sequence and sequencing results are found in the sequence file S1. Our sequencing region and mouse (mm10) reference genomes did not differ in the number of replicates and nucleotides .
Phylogenetic and synonymous and non-synonymous site analysis
Lineage-specific mucin sequences found in rodents (Muc10 type) and cats (mucus-like) were downloaded from the NCBI. The repeat contained in the repeat field is manually compiled in textwangler and aligned with CLUSTALW (69) in millions (70). A maximum likelihood phylogenetic tree was constructed using 100 bootstrap replicas. The repeat sequence is then analyzed on the MEGA's paired distance computer to determine changes in homonymous and non-synonymous sites within and between rodents and felines .
RNA sequence data mining
The RNA sequence data used to construct Figure S8 was taken from the expression exon overlay trajectory on the NCBI genomic data viewer (www. ncbi. nlm. nih. gov/genome/gdv/). This database contains comprehensive RNA sequence data from a variety of tissues and species. To determine whether a gene has observable tissue expression, we used a "housekeeper" RNA expression gene,PSMB2 type knownto be expressed in all tissues of all placental mammals (71). If a gene is expressed on an order of magnitude withthe PSMB2 type, we think the gene is "expressed" in that tissue.
Saliva collection
Collect saliva samples from individual human, rat, rat, pig, cow, dog, and ferret individuals and store them in?80 degrees Celsius . Human subjects: Human saliva is collected through passive drooling according to a protocol approved by the Human Subjects Institutional Review Committee (IRB) Committee of the University of Buffalo (Study No. 030-505616). All human participants received informed consent . Samples of other mammals were collected in collaboration with colleagues and other research institutions . For a more detailed description of the collection methods used by different mammalian species, see (iii).
SDS-PAGE isolation of salivary proteins and PAS staining of glycosylated components
Samples are denatured under reducing conditions, 4 × triacetic acid buffers (NuPAGE, Invitrogen, Carlsbad, CA), 2. 5% β-mercaptoethanol (by sample volume) are added and boiled in water for 10 min . Isolate equal amounts of total protein (15 μg per channel) by SDS-PAGE using a 3-8% gradient triacetate microgel (NuPAGE, Invitrogen, Carlsbad, CA). As previously mentioned, staining with PAS shows glycosylated protein bands (40). Stained gels are imaged in transparent mode using a flatbed scanner (ImageScanner III, GE Healthcare ).
Saliva sample preparation for mass spectrometry
Preparation of saliva samples using surfactant-assisted precipitation/granule digestion (71). Simply put, 50 μg of protein is extracted from each saliva sample and SDS is added at a final concentration of 0. 5%.
Samples are sequentially reduced at 56 °C with 10 mM dithiothreitol (DTT) for 30 min and alkylated with 25 mM iodoacetamide (IAM) for 30 min at 37 °C, both of which are performed in a covered heat mixer (Eppendorf ). Six volumes of frozen acetone are then added to the sample under intense vortex action and at?20 °C for 3 h . After centrifugation of 18,000g, 30 min at 4 °C, decant the samples and gently wash the coated protein with 500 μl of methanol. After 1 min of air drying, add 40 μl of 50 mM (pH 8. 4) tricarboxylic acid (FA) to the pellet and add a total volume of 10 μl of trypsin [0. 25 μg/μl, dissolved in 50 mM (pH 8. 4) tris-FA] for continuous shaking at 37 °C for 6 h trypsin digestion . Add 0. 5 μl of FA to stop digestion and centrifuge at 18,000 to separate protein digestiong, 4 °C, 30 min . Carefully transfer the supernatant to the LC vial for analysis .
Removal of protein gel bands and preparation of mass spectrometry
Prepare a cut gel band sample using gel digestion. First cut the gel bands into smaller cubes (1 to 2 mm per size) with a clean scalpel and then transfer to a new Eberin tube (Eppendorf).
Gel cubes are dehydrated by incubating in 500 μl acetonitrile (ACN) for 5 min with continuous rotation and discarding the liquid (all dehydration steps below follow the same procedure unless otherwise specified).
After incubating 500 μl of 50% ACN in 50 mM tris-FA (pH 8. 4) overnight, the gel cube is subsequently dehydrated three times and held in a thermomixer for 5 min at 37 °C to completely evaporate the remaining ACN . Samples were sequentially reduced at 100 μl 10 mM DTT for 30 min at 56 °C and alkylated at 37 °C for 30 min at 100 μl 25 mM IAM, both of which were performed with continuous shaking in a covered thermomixer. The gel block is then dehydrated three times and cultured for 30 min in 200 μl trypsin (0. 0125 μg/μl) (in tris-FA) on ice. Excess trypsin is then removed and replaced with 200 μl of tris-FA, with samples cultured overnight at 37 °C with continuous shaking. Add 20 μl of 5% FA to stop digestion, incubate for 15 min under constant vortex conditions, and then transfer the liquid to a new leaf-shaped tube. Dehydrate the gel band with 500 μl of 50% ACN in 50 mM tris-FA and 500 μl ACN continuously for 15 min and combine the liquid in three steps. The protein digest is dried in SpeedVac and recombinant in 50 μl of 1% ACN and 0. 05% trifluoroacetic acid (ddH)with a slight vortex of 10 min. Samples are centrifuged at 18,000 g, 4 °C for 30 min and carefully transferred the supernatant to an LC vial for analysis .
LC-MS analysis
The LC-MS system consists of the Dionex UltiMate 3000 nm LC system, the Dionex UltiMate 3000 micro LC system with WPS-3000 autosampler and the Orbitrap Fusion Lumos mass spectrometer. Before the nano liquid chromatography column (75 μm inner diameter ×65 cm, filled with 2. 5 μm Xselect CSH C18 material), a large inner diameter (i. d. ) capture column (300-μm i. d. ×5 mm) is installed for large volume sample loading, purification, and delivery . For each sample, inject 4 μl of the derived peptide for LC-MS analysis . Mobile phases A and B are 0. 1% FA in 2% ACN and 0. 1% FA in 88% ACN. The 180 min LC gradient curve is 4%3 min, 4-11%5 min, 11-32%B 117min, 32-50%B 10min, 50-97% B 1min, 97%B, 17min, then balanced to 4%27 min, the mass spectrometer operates in data-correlated acquisition mode with a maximum duty cycle of 3s . In the mass/charge ratio range, MS1 spectra (m/z type) ranging from 400 to 1500 at 120k resolution were obtained with Orbitrap . Automatic gain control and maximum injection time are set to 175% and 50 ms, dynamic exclusion is set to 60 s, and ± 10 ppm . The precursor ion m/z type 1. 2th window was separated with a quadrupole rodand dissociated by a high-energy collision at 30% energy. MS2 spectra were obtained with an ion trap at a rapid scanning rate with a maximum injection time of 35 ms . Detailed LC-ms settings and related information are described in a previous article on Shen, etc. (72).
Search for LC-MS files based on UniProt protein sequence database and the hypothesized mucin sequences of the corresponding species predicted in this study (sequence file S1) (Swiss Prot:Homo sapiens,micromuscular; Swiss Protection +TrEMBL:Brown House Mouse,Cow Taurus,House Dog,Ferret, andScrotum Use Sequence HT to embed proteome discoverer 1. 4 (Thermo Fisher Scientific ). To estimate and control the false detection rate (FDR), a target bait search method combined with a database of forward and reverse protein sequences was applied. Search parameters include: (i) precursor ion mass tolerance: 20ppm; (ii) Product ion mass tolerance: 0. 8da;(iii) Maximum number of missing cleavages per peptide: 2; (iv) Fixed modification: cysteine carbacylization; Dynamic modification: methionine oxidation, acetylated peptide N-terminal. Peptide screening, protein inference and grouping, and FDR control are all done via scaffoldv5. 0. 0. 0 (proteomesoftware Inc.
). Protein identification criteria include 1% protein/peptide FDR and≥2 peptides per protein. Lists of proteins containing relative protein abundance (spectral count) and sequence coverage are exported from Scaffold and manually managed by R using custom scripts . The parameters described here, including the 0. 8-Da mass tolerance for MS2, have been routinely used in the field [see, e. g. , (73)]. Mass spectrometry proteomics data has been deposited via PRIDE into the Protein Exchange Association (74) dataset identifier pxd03197 partner repository .
Parallel detection of lineage-specific mucin evolution using a mucin group database
Our pipeline uses the general definition of mucin, which contains high O-glycosylated T- and S-rich repeat sequences, as a starting point for bioinformatics. Recently, however, the biochemical guidance mucin classification (38) has been published, thus providing another startup database for human mucins. We conducted a parallel analysis of the genes in the top 50 of the "mucin" attributes in this database. Specifically, of these 50 genes, we identified 28 genes that fit our definition of human mucin (i. e. , tandem repeats rich in T and S).
All of these genes have previously been identified as having very high levels of O-glycosylation, so we did not perform additional analysis on this. Of the 28 hypothetical mucin genes, 15 have been included in our previous analysis, including well-described human mucin genes such as MUC5B and MUC2 . In addition,based on our definitions and the biochemical properties of the mucinome database, we identified 13 genes that were not previously labeled as mucin genes, but all exhibited all the characteristics of the mucin gene. In addition, we found that 6 of these 13 genes preserved mucin repeat domains in the mammals we studied, while 7 may have evolved mucin repeat domains in a lineage-specific way (Figure S9).
These results provide additional candidates for exciting future studies to validate the functional and evolutionary relevance of these hypothetical mucin genes.
Statistics
Use the Wilcoxon test to determinethe values inP in Figure 2 as well as Figure S9 . All other statistics performed are mentioned in the appropriate methods section above.
Charts and analytics
All statistical analyses were performed using R. All data and graphs are created in RStudio, Keynote, and BioRender using R.
ethics
Human subjects: Human saliva is collected through passive drooling according to a protocol approved by the University of Buffalo Human Subjects IRB Committee (Study No. 030-505616). All human participants received informed consent . Animal experimentation: Collaborate with colleagues and other research institutions to collect samples from other animals. Samples from all live animals used in this study were collected using minimally invasive methods, such as saliva collection kits or passive saliva . For a description of the source of the sample and the method of collection, please refer to (iii) in the Acknowledgments section .
This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only.
This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of
the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed
description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content
will be removed immediately.
The source of this page with content of products and services is from Internet,
which doesn't represent ECHEMI's opinion. If you have any queries, please write
to service@echemi.com. It will be replied within 5 days.
Moreover, if you find any instances of plagiarism from the page,
please send email to service@echemi.com with relevant evidence.
Trade Alert - Delivering the latest product trends and industry news straight to your inbox. (We`ll never share your email address with a third-party.)