Next Article in Journal
Masdevallia × urbanae (Orchidaceae)—A New, Natural Hybrid between M. floribunda and M. tuerckheimii from Guatemala
Previous Article in Journal
Diversity of Brazilian Troglobitic Fishes: Models of Colonization and Differentiation in Subterranean Habitats
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Molecular Diversity and Evolution of Antimicrobial Peptides in Musca domestica

1
Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Diversity 2021, 13(3), 107; https://doi.org/10.3390/d13030107
Submission received: 12 January 2021 / Revised: 20 February 2021 / Accepted: 21 February 2021 / Published: 1 March 2021

Abstract

:
As a worldwide sanitary insect pest, the housefly Musca domestica can carry and transmit more than 100 human pathogens without suffering any illness itself, indicative of the high efficiency of its innate immune system. Antimicrobial peptides (AMPs) are the effectors of the innate immune system of multicellular organisms and establish the first line of defense to protect hosts from microbial infection. To explore the molecular diversity of the M. domestica AMPs and related evolutionary basis, we conducted a systematic survey of its full AMP components based on a combination of computational approaches. These components include the cysteine-containing peptides (MdDefensins, MdEppins, MdMuslins, MdSVWCs and MdCrustins), the linear α-helical peptides (MdCecropins) and the specific amino acid-rich peptides (MdDomesticins, MdDiptericins, MdEdins and MdAttacins). On this basis, we identified multiple genetic mechanisms that could have shaped the molecular and structural diversity of the M. domestica AMPs, including: (1) Gene duplication; (2) Exon duplication via shuffling; (3) Protein terminal variations; (4) Evolution of disulfide bridges via compensation. Our results not only enlarge the insect AMP family members, but also offer a basic platform for further studying the roles of such molecular diversity in contributing to the high efficiency of the housefly antimicrobial immune system.

Graphical Abstract

1. Introduction

Insects account for 90% of all extant animal organisms in the world [1,2] and co-exist with a variety of microorganisms in different environments [3]. Therefore, they need to evolve a potent defense system for clearing potential invaders. Antimicrobial peptides (AMPs) are effectors of the innate immunity against bacteria, fungi, parasites and viruses [4], which exhibit some common properties (e.g., cationicity, hydrophobicity and amphipathcity) for their antimicrobial activity [5]. Many AMPs are induced from the insect immune organs (e.g., fat body) in response to microbial infections. They secret into hemolymph to reach a concentration between 0.1 and 100 μM to inhibit the growth of exotic microorganisms [6].
Like the counterparts in non-insect organisms, insect AMPs are also classified into three distinct structural classes. They are the cysteine-rich peptides (e.g., drosomycins and insect defensins, two subfamilies of defensins in Drosophila) [7,8,9]; the peptides adopting an α-helical conformation (e.g., cecropins and moricins) [10]; and the peptides with an unusual bias in certain amino acids, such as proline-rich peptides (e.g., metchnikowins, apidaecins, drosocins, and lebocins) [11,12,13] and glycine-rich peptides/proteins (e.g., diptericins, attacins and gloverins) [6]. In Drosophila, their AMPs are initially divided into three functional classes based on their target specificity, which comprises antifungal AMPs (drosomysins and metchnikowins), anti-Gram positive bacterial AMPs (Defensin) and anti-Gram negative bacterial AMPs (cecropins, drosocin, attacins, diptericins and MPAC) [6]. However, subsequent studies demonstrated that some of them exhibit functional overlapping. For example, although the prototypical Drosomycin is a strictly antifungal defensin [6,9], homologs from Drosphila takahashii possess antibacterial activity [14], suggesting that functional diversification occurred in these drosomycin-type defensins. Similarly, the previously defined strictly antibacterial insect defensins and cecropins were later found to have antifungal activity [15,16].
Insects are an important resource for understanding the basic biology of the immune system and for searching for new peptides for anti-infective drugs. In recent years, some computational approaches have been applied to discover AMPs in a given species with whole genomes sequenced [17,18,19,20]. Musca domestica is a worldwide sanitary insect pest whose larvae often feed on microbe-rich, decaying organic materials and the adults are the major vectors of pathogens causing human or animal diseases. Thus, more AMPs might have been evolved [21]. Thanks to the release of its whole sequences [22], we have an opportunity to survey the AMPs in a vector insect for studying their evolution.
Here, we report the molecular diversity of the M. dometica AMPs based on a systematic database search, which could provide us with a special perspective to understand how the evolution of the antimicrobial immune system occurs in a species with the tenacious vitality. We found that different from Drosophila that has a limited number of AMPs [6], M. domestica has largely expanded its AMP number via multiple genetic mechanisms to create structural diversity of their AMPs, which would have commonly shaped its high-efficient antimicrobial immune system. These include: (1) Gene duplication; (2) Exon duplication via shuffling; (3) Protein terminal variations; (4) Evolution of disulfide bridges via compensation. A similar phenomenon was also observed previously in the evolution of AMPs in the parasitoid Nasonia vitripennis [20], suggesting that these two species of insects could suffer from a similar selective pressure to drive the evolution of their antimicrobial systems.

2. Materials and Methods

2.1. Database Searches

Strategies for gene discovery used in this study are provided in the supplementary information (Supplementary Information Figure S1). In brief, potential AMPs were firstly searched against the proteome of M. domestica that was downloaded from the Genome Database (up to September 2, 2020) (https://www.ncbi.nlm.nih.gov/) by filtering using a threshold < 100 amino acids with a signal peptide. The potential peptides were predicted in the Collection of Anti-microbial Peptides (CAMPR3) server (http://www.camp.bicnirrh.res.in/prediction.php), and then they were used as templates to search for more peptides from the non-redundant sequences database until no new peptides appeared by BLASTP (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Secondly, these newly-discovered peptides were again used as queries to mine the whole genome shotgun and nucleotide collection databases (https://blast.ncbi.nlm.nih.gov/Blast.cgi) by TBLASTN. Gene runner (http://www.generunner.net/) was employed to translate a complete open reading frame from a selected nucleotide sequence. Retrieved sequences with a signal peptide and a classical AMP signature were blasted again for new rounds of TBLASTN and BLASTP searches. The method would be continuously repeated until no new hit appeared. Thirdly, BLASTP and TBLASTN programs were used to characterize orthologues of known peptides of Drosophila melanogaster against the database of M. domestica. Finally, the protein pattern method by the PHI-BLAST algorithm program (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was conducted to search for M. domestica AMPs based on the cysteine arrangement pattern of defensins, i.e., the CXXC/CXC motif [23].

2.2. Characteristics Identification

All the potential AMP-like peptides were submitted to SignalP5.1 server (http://www.cbs.dtu.dk/services/SignalP/) for predicting a signal peptide. Pro-peptides were detected by ProP 1.0 Server (http://www.cbs.dtu.dk/services/ProP/). The net charge (NC) (pH = 7), molecular weight (MW) and isoelectric point (PI) were predicted at PROTEIN CALCULATOR v3.4 server (http://protcalc.sourceforge.net/). Peptide properties were then analyzed by ESPript 3.0 server (ESPript 3.x /ENDscript 2.x (ibcp.fr) for secondary structure prediction [24].

2.3. Phylogenetic Tree Construction

Multiple sequences were aligned with MUSCLE (https://www.ebi.ac.uk/Tools/msa/muscle/) and the alignments were used to build phylogenetic trees by iqtree-2.0-rc2 with substitution models BLOSUM62 and PMB. Phylogenetic testing included 1000 replicates of Ultrafast bootstrap (UFBoot) and 1000 replicates of SH-aLRT to provide support for tree branches [25]. In our study, both BLOSUM62 and PMB models generated very similar results with good agreement. The trees presented here were prepared by Evolview v2 (https://evolgenius.info/evolview-v2).

2.4. Structure Modeling and Analysis

The three-dimensional (3D) structures of M. domestica AMPs described here were predicted by I-TASSER server (https://zhanglab.ccmb.med.umich.edu/I-TASSER/) and evaluated by the Verify3D. Except homodimers of MdDefensin20, MdMuslin1 and MdMuslin26 that are displayed by PyMol (https://pymol.org/2/), all structural images are displayed by MolMol (https://sourceforge.net/projects/molmol/). The wheel projection was performed online using the Helical Wheel Projections (http://rzlab.ucr.edu/scripts/wheel/wheel.cgi).
To detect whether cysteines in a specific M. domestica AMP would form one disulfide bridge, we first built its initial structure by I-TASSER and then refined the structure with the help of molecular dynamics simulations or energy minimization. For MdEppin35-1, its model was first obtained on I-TASSER with a position restraint to Cys2 (position 15) and Cys6 (position 45) and Cys3 (position 32) and Cys5 (position 42). The MdMuslin1 and MdMuslin26 were modeled with the potential disulfide bridges in MdMuslin2 and MdMuslin25, respectively, and their initial homodimer structures were assembled by Z-DOCK (http://zdock.umassmed.edu/) [26]. The MdMulsin1 homodimer was used to MD simulations (50 ns) with GROMACS 2020.1 with the OPLS (Optimized Potential for Liquid Simulations)-AA/L all-atom force field (2001 amino acid dihedrals) [27]. Sodium and chloride ions were added to neutralize the total system charge and simulated after the peptide was immersed in a cubic box from the surface at least 1 nm and solvated with SPC water. Solved structure was energy minimized for 5000 steps of steepest descent minimization termination with a maximum force less than 1000 KJ/mol/nm. The temperature at 300 K was maintained by velocity rescaling method, along with the pressure at 1 bar being kept by Parrinello-Rahman methods, followed after the system was equilibration phase of 100 ps number of particles, volume, and temperature (NVT) equilibration and 100 ps number of particles, pressure, and temperature (NPT) equilibration. The particle mesh Ewald method was used for long range electrostatic interactions and the linear constraint solver (LINCS) algorithm constrained all bonds. Trajectories were saved every 2 fs for analysis. A homodimer snapshot was extracted from the simulations for constructing the inter-monomer disulfide bridge by Swiss-PdbViewer (https://spdbv.vital-it.ch/). Energy minimizations were performed with the force fields [28] implemented in the MOE2019 software (OPLS-AA for MdEppin35-1, AMBER 10 for MdDefensin20, MdMuslin1 and MdMuslin26). These homodimers were analyzed by PDBsum (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html) to evaluate their quality.

2.5. Positive Selection Analysis

Codon-substitution models were selected to estimate the nonsynonymous-to-synonymous rate ration (ω = dN/dS) using CODEML implemented in the PAML software package [29,30]. In these models, M0 assumes that all sites have a ω ratio and is used as a control. Two pairs of codon-based likelihood models (M1a/M2a, M7/M8) were chosen for making two likelihood ratio tests (LRTs). M1a (nearly neutral model) constraints a proportion p0 of conserved sites with 0 < ω <1, while a proportion p1 = 1 − p0 of neutral sites with ω1 = 1; M2a (positive selection model) adds an extra class of sites with the proportion p2 = 1 − p0 − p1 and with ω estimated from the data. M7 (β distribution model) does not allow for positively selected sites and M8 (β and ω model) adds an extra class of sites to M7, allowing for ω > 1, which means the presence of positively selected sites. The calculation of posterior probabilities was completed using the Bayes Empirical Bayes (BEB) method [31].

2.6. Co-Evolutionary Analysis

Multiple sequence alignments (MSA) of AMPs from the housefly and a representative structure were submitted to MISTIC2 for co-evolutionary analysis [32] (https://mistic2.leloir.org.ar). Four covariation methods: (1) corrected mutual information (MIp), (2) mean field direct coupling analysis (mfDCA), (3) pseudo-likelihood maximization DCA (plmDCA) and (4) multivariate Gaussian modeling DCA (gaussianDCA) were chosen for analyzing the inter relationship of residues in a protein sequence, which could identify the structurally or functionally important positions. At the same time, these sequences were also input into the Weblogo server [33] (http://weblogo.berkeley.edu/logo.cgi) for creating sequence logos using default parameters.

3. Results

Using a combination of computational approaches, we have largely enlarged the Musca domestica AMP repertoire to 186 AMP-like peptides/proteins (Table A1) [22], in which 148 are considered as newly discovered members. These components included the cysteine-containing peptides (MdDefensins, MdEppins, MdMuslins, MdSVWCs and MdCrustins); the linear α-helical peptides (i.e., MdCecropins); and the specific amino acid-rich AMPs (i.e., MdDomesticins, MdDiptericins, MdEdins and MdAttacins). Their characteristics including length, pI and net charges at pH = 7.0 are provided in Table A1. Our results indicate that most AMPs described here are smaller than 150 amino acids in length with some typical AMP features. Some putative AMPs are larger in length due to internal duplication. These peptides are described in details as follows:

3.1. Cysteine-Containing Peptides

3.1.1. MdDefensins

Defensins are approximately 4 kDa AMPs with three or four conserved disulfide bridges, which exist in nearly all multicellular organisms [34]. Based on their structural characteristics, defensins can be classified into two distinct superfamilies called cis- and trans-defensins. The former includes those with the cysteine-stabilized α-helix/β-sheet (CSαβ) fold produced by plants, fungi and invertebrates; the latter includes α-defensins, β-defensins and θ-defensin from vertebrates as well as big defensins from invertebrates. In insects, Tian and colleagues found three different types of defensin from N. vitripennis including classical insect-type defensins (CITDs), nasonins and navitricins [20]. Insect defensins are composed of an n-terminal loop, an α-helix, followed by an antiparallel β sheet. Defensins in insects show antimicrobial activity on Gram-positive bacteria by forming voltage-dependent channels to disrupt the permeability barrier of the cytoplasmic membrane resulting in cytoplasmic potassium loss [35]. Besides, some insect defensins can kill the Gram-negative Escherichia coli and some fungi [15,36].
In the housefly, there are a total of 21 defensins-like AMPs (named MdDefensins) with 11 new members described here (Figure 1a and Figure S2). Their mature peptides are composed of 40–65 residues and contain three disulfide bridges with net charges ranging from 0.9 to + 4.2 (Table A1). Based on our phylogenetic tree analysis, these peptides can be divided into three groups (Figure 1b): Group I includes MdDefensin1-MdDefensin9 which all belong to the CITDs; Group II includes MdDefensin10-MdDefensin15 which all lack a classical pro-peptide; and Group III includes MdDefensin16-MdDefensin21 which all contain a short n-loop without a pro-peptide (Figure 1b). Among them, some members (e.g., MdDefensin4, 6, MdDefensin13, 15 and MdDefensin16) have been identified to be transcriptionally active after body wall injury [37]. MdDefensin1-MdDefensin16, MdDefensin18, and MdDefensin19 share their precursor organization to CITDs that comprises a signal peptide, an acidic propeptide ending with an R/KXKR motif (X denoting F, Q or Y) or its variants followed by a mature peptide. This motif is lost in MdDefensin17, -20 and -21, giving rise to failure in propeptide processing and thus generating an extended N-terminus (Figure 1a and Figure S2). The 3D structures of four representative MdDefensins with different N-terminal lengths (Figure 1a) show they all adopt a typical fold of CSαβ (Figure 1b). The α-helix spans residues L15-I21 in MdDefensin2, G2 -L28 in MdDefensin10, G24-L34 in MdDefensin17, and N18-I26 in MdDefensin20 and a hydrophobic cluster is present in MdDefensin2 (L15, A17 and A18), MdDefensin10 (W27 and L28), MdDefensin17 (W26, M29, and L34) and MdDefensin20 (L20, L23, I26). The two β-strands constitute an antiparallel sheet linked by a loop that commonly forms a functional γ-core contributing to the antimicrobial activity in some defensins [38]. The N-terminally extended region in MdDefensin17 folds into a short two-stranded antiparallel sheet followed by an α-helix. This unique subdomain structure is firstly found in an insect defensin [39]. MdDefensin20 has evolved a free cysteine that is not involved in the intramolecular disulfide bridges (Figure 1). Compared with the CITDs, MdDefensin17 - MdDefensin21 have a shorter n-loop (Figure 1b and Figure S2), analogous to the antibacterial ancient invertebrate-type defensins (AITDs) [40]. In our analysis, M1a/M2a models identified Ser7 (numbered according to Mddefensin1) as a positively selected site (Table 1). The lacking of positive selection signals in M7/M8 would be due to the sparse sampling of species and low sequence divergence.

3.1.2. MdEppins

Serine protease inhibitors (SPIs) exist in all organisms that participate in many important metabolisms progresses, such as blood coagulation, fibrinolysis, inflammation and immunity. The inhibitors are categorized into four groups (Kazal, Kunitz, Serpin and α macroglobulin), all with a disulfide-rich α/β fold and a P1 site [41]. They show an inhibitory activity against a broad-spectrum of enzymes, e.g., trypsin, chymotrypsin, plasmin, elastase and microbial serine proteases. P1 site plays an important role in specificity and binding strength of serine protease inhibitors because of its exposure to the protease-binding loop [42]. Kunitz serine protease inhibitors (KuSPIs) are extensively distributed in microbes, plants, insects and mammals. In recent years, many studies focused on their antimicrobial activity. For example, in human eppins comprising two potential protease inhibitory domains (a whey acid protein (WAP) or four disulfide core domain and a kunitz domain) has been reported to kill Gram-negative bacteria [43]. IPS1-3, a KuSPI isolated from the cell-free hemolymph of the Galleria mellonella larvae, can be induced to respond to the injected fungal elicitor zymosan [44]. In insect silk, KuSPIs can inhibit bacterial and fungal proteinases [45].
The Eppin family contains 35 members in the housefly (herein named MdEppin1–MdEppin35) and shares a conserved domain with the KuSPI family. Among them, MdEppin35 contains nine KuSPI domains (named MdEppin35-1 to MdEppin35-9) (Figure 2a and Figure S3). The pattern can be drawn as CX8-14CX15-17CX6-7GCX12-13CX3C with three disulfide bridges (C1-C6, C2-C4 and C3-C5) (Figure 2a and Figure S3), in which C1-C6 and C3-C5 are essential for the maintenance of a native conformation but the third one (C2-C4) appears to be involved in stabilizing the binding domains in the loops containing the active site (P1) [46]. The phylogenetic tree reveals that Mdeppin1-Mdeppin8, Mdeppin 35-1 and 35-9 share high similarity to the kuntiz domain of eppin isolated from human whereas Mdeppin35-5 and Mdeppin35-6 are separated as single taxa (Figure 2b). In MdEppin35-1, an N-terminal deletion led to the loss of the first two cysteines. Alternatively, two C-terminal cysteines are evolved, which could compensate the loss via the formation of new bridge bridges to stabilize its structures, as verified by our structural modeling (Figure 2a,c).
Evolutionary analysis identified two positively selected sites (K12 and L18, shown in MdEppin1) whose mutations might be relevant to their functional divergence (Table 2). Consistently, L18 was also identified as an essential site potentially related to the activity of MdEppin, as analyzed by MISTIC. In addition, this analysis suggests its possible connection with other amino acids, including K12, R16 (active site P1), I19, P20 and E33 (Figure 2d,e).

3.1.3. MdMuslins

Kazal-type serine proteinase inhibitors (KaSPI) were firstly isolated by Kazal and colleagues from pancrease [41]. KaSPIs have been identified in many insects such as mosquitos [47], Drosophila [48] and locusts [49]. They have a broad activity in various biological and physiological processes in many organisms, such as blood coagulation and innate immunity. Interestingly, KaSPIs exhibit an antibacterial activity in response to microbial infection. The recombinant CsKSPI inhibits the growth of Gram-positive and Gram-negative [50], and PSKP-1 and its variants reduce E. coli mobility and cell agglutination [51].
In the housefly, 24 muslins (named MdMuslin1-MdMuslin24) are identified to contain the typical kazal domain with three disulfide bridges and nine muslins (named MdMuslin25–MdMuslin33) contain four disulfide bridges (Figure 3a and Figure S4). Among them, four muslins (MdMuslin15, MdMuslin16, MdMuslin23 and MdMuslin24) contain two kazal domains and named -1 and -2. Our phylogenetic tree reveals four types of Mdmuslins (Figure 3b), each type clustering together, in favor of their monophyletic origin. Of them, type I to III contain three disulfide bridges (i.e., C1-C5, C2-C4, C3-C6) which are different from the peptides containing the kuntiz-domain. Their sequence pattern can be drawn as CX1-3CX5PVCX0-5GX6-9NX1-5CX3-6CX7-22C. In the tree, the two domains in the paralogous MdMuslin15 and MdMuslin16 are classified into two different types, indicating that their significant divergence occurred after domain repeats. Similarly, in other two kazal domain-containing members (MdMuslin23 and MdMuslin24) their domain-2 is categorized into type III and domain 1 into type IV. For the eight cysteines-containing members except MdMuslin23-1 and MdMuslin24-1, their sequence pattern can be described as CX1-3CX5PVCX5-6GX5-6CX3NXCX6CX7-12CX2-5C (Figure 3c).
In MdMuslin1 the first cysteine is replaced by a serine and in Muslin26, the fifth cysteine is lost, leading to the initial first disulfide bridge disrupted. In these two molecules, the free cysteines are exposed to their molecular surface, as revealed by our structural models (Figure 3c), suggesting that they might participate in the formation of a homodimer. The conservation analysis indicates that the R10 (P1 site shown as MdMuslin2) has an inner relationship with P7, N11 and P13 (Figure 3d,e).

3.1.4. SVWC Domain AMPs

Single domain von willebrand factor type C (SVWC) proteins mostly contain eight cysteines. Although the Bombyx mori BmSVWC gene was decreased in the cuticle when the insect was infected with fungi [52], granularin from the snails Lymnaea stagnalis was up-regulated during parasitation of the avian schistosome Trichobilharzia ocellata, in favor of a role of this class of proteins in the molluscan internal defense response [53].
In the housefly, the SVWC-type AMPs are expanded to a family comprising 29 small, single domain secreted proteins (named MdSVWC1-MdSVWC29) (Figure 4a and Figure S5). Among them, MdSVWC29 is unique in that it contains two classical fragments, named MdSVWC29-1 and MdSVWC29-2. The family displays a consensus pattern as CX18-23CX4CX10-12CX7-10CX11-14CCX1-5C. They are diverse in sequence, but the eight cysteines are conserved throughout the group to form four disulfide bridges (i.e., C1-C3, C2-C6, C4-C7, and C5-C8). Both conserved introns (phase 1 and phase 2) are located at the nearly identical position among all peptides whose gene structures are available, supporting their common origin (Figure S5). Based on our phylogenetic tree analysis, these MdSVWC peptides can be divided into two distinct groups (Figure 4b). The predicted 3D structures of MdSVWC1 and MdSVWC17 reveal a typical structure in their N-termini that contain a four β-stranded sheet (residues M3-F5, T12-E14, S18-E20, R27-T29 in MdSVWC1 and C6-V8, K11-V13, G16-H21, T27-D32 in MdSVWC17). Residues P37-L41 are folded into an α-helix in MdSVWC1, but G36-E41 in MdSVWC17 are folded into a β-sheet. In addition, the C-terminus of MdSVWC1 forms a β-sheet (residues K58-D60 and F77-C79) and an α-helix (Y87-V91) (Figure 4c).

3.1.5. Crustin-like AMPs

Crustins are antibacterial proteins with a precursor organization including a signal peptide at the N-terminus and a whey acidic protein (WAP) at the C-terminus [54,55]. Crustins have been identified in diverse invertebrate animals, with a WAP domain of approximately 50 amino acids containing a conserved motif and a four disulfide bridge core (4-DSC) in the C-terminus. In previous studies crustins are divided into four families, crustin I contains a cys-rich domain, crustin II contains a glycine-rich domain in the N-terminus. Only one WAP domain exist in crustin III and thus are also named a single WAP domain-containing peptides (SWDs). Type IV contains a cysteine-rich, an aromatic-rich region and a WAP domain in the C-terminus. This type of crustins is exclusively present in ants. Crustins are widely regarded as antimicrobial molecules since they can kill Gram-positive bacteria [56,57] and some fungi [58]. Moreover, LcSWD3 isolated from L.vannamei may contribute to antiviral immune response [59].
Even though the mechanism of action of crustins on pathogens remains unknown, it appears clear that the common WAP domain in all crustins plays a key role in their antibacterial effect. This has been evidenced by several previous observations: (1) the crustin in Fenneropenaeus chinensis, which only contains a glycine-rich region without a WAP domain, exhibits no antibacterial activity [60]. (2) SWAM1 and SWAM2 in mice which belong to the crustin III family can inhibit the growth of both E. coli and S. aureus [61]. Additionally, the reduction and alkylation of the cysteine residues in the WAP domain of a crustin-like peptide from a snake venom destroyed its antimicrobial activity [62].
The housefly crustins (named MdCrustins) are identified as types V and VI based on their sequence characteristics. In comparison with all other crustins, these peptides have developed an extra C-terminal region accompanying the loss of the cysteine-rich or glycine-rich region located between the signal peptide and the WAP region (Figure 5a; see sequences in Figure S6). In the phylogenetic tree, the housefly crustins are clustered together with type III crustins (Figure 5a), supporting their close evolutionary relationship. Structural modeling suggests that the WAP domain in both MdCrustin1 and MdCrustin3 may form four disulfide bridges with a connectivity pattern as C1-C6, C2-C7, C3-C5 and C4-C8. Compared with MdCrustin1, the C-terminal region of MdCrustin3 may be more rigid given that it folds into several β-strands stabilized by two disulfide bridges (C9-C10 and C11-C12) (Figure 5b).

3.2. Linear α-helical Peptides

Cecropins

Cecropins are a group of classical linear α-helical peptides containing 31–39 residues with a molecular weight of about 4 kDa. The first cecropin was firstly isolated from Hyalophora cecropia [63] and later found in many insects, such as Diptera [10,64,65,66,67,68], Lepidoptera [69] and Coleoptera [70] and so on, but not in Hemiptera [4]. Cecropins display a broad spectrum of activity against Gram-positive and Gram-negative bacteria, fungi and HIV virus [63,71]. In some cecropins, Trp2 and Phe5 (e.g., cecropin A and papiliocin) was found to contribute to interactions with the negatively charged bacterial membrane [72] whereas Gly 1, Trp2, Lys4 and Lys5 in sarcotoxin-IA are important for binding with lipid A of LPS [73,74]. In cecropins, the hinge region disrupting the long helix is important for their structural flexibility [75].
In the housefly, 11 cecropins have been identified to contain a typical precursor organization [64]. Using our method, we found another five homologs that share similarity with those found in Diptera (Figure 6a). As reflected by MdCecropin2, in a hydrophobic environment, these peptides adopt an α-helical conformation, in which two α-helices (W2-Q23 and G26-G40) are joined by a flexible hinge comprising G24 and L25 (Figure 6b). The N-terminal helix is strongly basic and the C-terminal one is hydrophobic (Figure 6b).

3.3. Specific Amino Acid-Rich AMPs

3.3.1. Domesticins, Diptericins and Edins

Domesticins are a class of proline-rich AMPs active against Gram-positive and Gram-negative bacteria and some fungi [76,77,78]. In our data mining, we found a new domesticin, named MdDomesticin2 (Figure 7 and Figure S7), with a proline content of 27.5%. Diptericins are about 9 kDa of glycine-rich peptides active on Gram-negative bacteria, which were initially isolated from the fly Phormia terranovae [79]. Some studies have shown that they are bacterially induced through the IMD signaling pathway [80]. In addition, Diptericin in mosquito larva is up-regulated after Sindbis virus infection [81]. For two housefly-sourced diptericin genes that were not named previously but have been found to be transcriptionally active when house fly larvae and pupae were injured [37], we named them MdDiptericinD and MdDiptericinD1. Their proteins share high similarity with other four diptericins previously named (Figure 7 and Figure S8). These diptericins can be clearly divided into two domains: a proline-rich domain and a glycine-rich domain. We found there are a phase 0 intron disrupting the proline rich domain of MdDiptericinD and D1 (Figure S8).
Edins are a class of inducible insect AMPs [82,83] with a precursor comprising a signal peptide, a propeptide ending with a RXXR motif and a mature glycine-rich domain. Interestingly, in the housefly MdEdin6-MdEdin10 have two Gly-rich domains, which is different from the only single glycine-rich domain in the orthologs from other insects and MdEdin1-MdEdin5 in the housefly (Figure 7 and Figure S9).

3.3.2. Attacins

Attacins are a class of 20–23 kDa proteins found in Lepidoptera and Diptera. Attacins have two types, the basic attacins (A-D) and acidic attacins (E-F) [6]. All attacins share a high similarity in their amino acid sequences, but more aspartic acids exist in the acidic attacins. The precursor organization of attacins contains a signal peptide, a propeptide ending with a conserved RXXR, an attacin-N domain, an attacin-C domain (also called G1 domain and G2 domain). In the housefly, 16 attacins can be divided into three types (named MdAttacinA, MdAttacinC and MdAttacinD), in which three MdAttacinA, one MdAttacinC (Herein named MdAttacinC2) and one MdAttacinD (MdAttacinD4) have been named. Not like attacinA and attacinB in Drosophila, MdAttacinAs in the housefly lacks a propeptide. Their glycine proportion in the N-domain is universally higher than that in the N-domain of the fruit fly counterparts (Figure S10). In our study, we found two new MdAttacinCs (MdAttacinC6 and MdAttacin7) (Figure S11). In addition, although some MdAttacinCs have been reported [37], we found that the original MdAttacinC3 contains two whole attacin sequences and thus named C3-1 and C3-2 (Figure 8). MdAttacinCs contain a longer propeptide ending with an RXXR motif. The glycine residues in the attacin-N domain are much less than those in the domain of the fruitfly counterparts, but higher in attacin-G1 and G2 domain (Figure S11). MdAttacinDs lack a signal peptide (Figure S12). However, like MdAttacinA3, MdAttacinD3 (herein named) was also expressed in larval tissues after injury [37]. Compared with the fruit fly attacinDs, we found that the glycine percentages largely varied due to sequence divergence.
Our phylogenetic analysis based on the MSA of diptericin, domesticin and attacinC, a group of proline-rich AMPs described above, reveals a close relationship between MdAttacinC and the D. melanogaster attacinC and a paralogous relationship between diptericins and domesticins (Figure 9). In a similar manner, we analyzed edins, diptericins and attacins, a group of glycine-rich AMPs described above. These can be divided into eight groups based on the evolutionary tree (Figure 9) where the Attacin-N domains belonging to the type attacinA are grouped together (named AttacinA-N group) with the exogenous Attacin-N in the attacinA-D from the fruit fly (Figure 9). The N-domain of attacinD except attacinD4 in the housefly are clustered with the AttacinA-N group. The Attacin-N peptides in type attacinC and attacinD4 are clustered together (named AttacinC-N group). The AttacinA-G1 group contains the G1 domain of attacinA and attacinD1. AttacinC-G1 group contains the G1 domain of attacinC and attacinD4. AttacinA-G2 group contains the G2 domain of attacinA and attacinD1-4 as well as attacinA-D from D. melanogaster. The AttacinC-G2 group contains the G2 domain of attacinC and attacinD4. The diptericin group contains only one Gly-rich domain. All Edins are grouped together (Figure 9). These results reveal that edins and the G2 domain of attacinC have the closest evolutionary relationship and diptericins are closer to the G2 domain of attacinA. Even though MdEdin10 has been named Attacin, we found that it has a closer relationship with the Edin family according to the evolutionary tree (Figure 9). The domain architecture of different proline-rich and glycine-rich AMPs are presented in Figure 10.

4. Discussion

Although some AMPs have been identified previously in M. domestica [77,84,85], more potential AMPs may be discovered by surveying its whole genome and related databases. Here, we identified 148 potential AMPs in M. domestica with different structural types. Compared with D. melanogaster, M. domestica has evolved a set of more complex antimicrobial components with diversification in number, protein size, and structure. For instance, the housefly has 21 defensins with a variable n-loop in length, but D. melanogaster has only one. Besides the increase in number, structural alteration by extension or truncation also contributes to the diversity. For example, some members belonging to MdEppins, MdMuslins and MdEdins have extended their C-termini compared with those in D. melanogaster. What attracts us most is which evolutionary or genetic mechanisms have shaped the diversity in the housefly?

4.1. Gene Duplication

Gene duplication has been found in nearly all organisms. In the M. domestica genome, there are at least seven AMPs families exhibit an adjacent chromosome location with gene structure conservation, which can be considered as a consequence of tandem duplication (Figure 11). Gene duplication followed by positive selection is important for the creation of new biological functions, which has been observed in the evolution of insect multigene of AMPs [20]. In the housefly, we have detected some positive selection signals in the MdDefensin and MdEppin families (Table 1 and Table 2) but not in other families. In addition, as mentioned above, some structural variations are also observed among different paralogs in the housefly defensin family via dynamic insertion/deletions (indels) in their n-loop (Figure S2). Such change could have an impact on their antimicrobial activity [20].

4.2. Exon Duplication via Shuffling

Exon duplication-mediated internal repeats plays an important role in the evolution of proteins as it can create an obvious complexity increase and likely contributes to the emergence of new functions. From the house fly genome, we identified an unusual eppin protein (MdEppin35) that carries nine repeats of kuntiz domain without a protease processing signal. In the MdMuslin family, MdMuslin15, MdMuslin16, MdMuslin23 and MdMuslin24 all share two kazal domains. Moreover, in the MdEdin family, MdEdin6 - MdEdin10 contain two glycine-rich domains (Figure 12a–c). Since there are two same phase introns (i.e., phase 0) corresponding to the boundaries of the MdEppin35-1 domain, we speculate that the evolution of multiple Kuntiz-domains might be a result of exon-shuffling, as illustrated in the schematic diagram (Figure 12d). In this case, the lack of introns at other domain boundaries could be explained by intron loss during evolution [86].

4.3. Protein Terminal Variations

In comparison with their paralogs some members belonging to the three housefly AMP families (i.e., MdCrustins, MdDefensins and MdEppins) have changed their terminal length through truncation or extension. For example, compared with the ancestral state present in the insect lineage (e.g., MdCrustin3, MdCrustin4 and ArWaprinThr1), MdCrustin1 and MdCrustin2 have evolved to form a truncated C-terminus (Figure 5) through the deletion of a disulfide-bonded sub-domain structure (Figure 5 and Figure S6). In the MdDefensins, several members extended their N-termini via the loss of a pro-peptide processing signal and the extended fragment in MdDefensin17 forms an isolated sub-domain structure. In the MdEppin members (e.g., MdEppin4, MdEppin6, MdEppin25, MdEppin33 and MdEppin34), all have an extended C-terminus of 20–103 amino acids (Figure S3). These observations indicate a clear structural diversification occurring among different paralogs of housefly AMPs and could hint at their functional divergence, an open question to be answered in the future.

4.4. Evolution of New Disulfide Bridges

Disulfide bridges are important to both protein structure and function [87]. In this work, we found that several M. domestica AMPs have odd cysteines. For example, MdDefensin20 has evolved one additional cysteine in its N-terminus (Figure 1) whereas MdMuslin1 and MdMuslin26 have a cysteine mutation to a non-cysteine residue (Figure 3c). To a secreted protein, the presence of a free cysteine often is detrimental to its structural stability due to air oxidization, especially when this residue is exposed to the molecular surface. We thus speculated this free cysteine might be involved in the formations of a homodimer structure, as previously observed in scorpion venom lipolysis activating peptides [88]. This speculation is supported by our structural modeling, in which one intermolecular disulfide bridge is clearly formed between two monomer MdDefensin20 (Figure 13a). A Ramachandran plot indicates that in this homodimer almost all φ/ψ torsion angles are found in the favored or additionally allowed regions except several residues (Figure 13b). In MdMuslin1, the fifth cysteine in monomer lost the opportunity to form the intramolecular disulfide bridge, but alternatively an intermolecular disulfide bridge could be found by ZDOCK method (Figure 13c), Ramachandran plot indicates that in this homodimer almost all φ/ψ torsion angles are found in the favored or additionally allowed regions except a serine in position 5 (Figure 13d). A similar observation is also made in MdMuslin26, in which the loss of the sixth cysteine leads to the first cysteine residue being not paired and thus one predicted intermolecular disulfide bridge links the two monomers into a homodimer (Figure 13e,f).
In the two types of MdMuslins, the type I peptides are shared with the orthologs from a diversity of species whereas the type II peptides are only shared with the orthologs from a Diptera species. This observation suggests that the former might represent the ancestor from which the latter emerged via evolutionary gain of one additional disulfide bridge in the common ancestor of Diptera. Since members with this disulfide bridge all have a long C-terminus whereas members without this bridge exhibit more divergence in their C-terminal length, it appears that gaining a disulfide bridge during evolution could have an impact on the stabilization of the size of a protein. In Mdeppin35-1, its two unusual cysteines are located at the C-terminus. Due to a deletion in its N-terminus, this peptide lose the first two cysteine residues compared with other paralogs (Figure 2), leading to the disruption of the original disulfide bridge pattern, which probably makes it become a pseudogene in the case of the loss of its structural stability. However, our structural modeling indicates that the two C-terminal cysteine residues can provide new pairings for disulfide bridge formation (Figure 2) to save the life of this gene via restoring its structural stability. These observations suggest that when the structurally important cysteines are substituted or deleted in protein evolution, compensation might be a choice.

5. Conclusions

In prior studies, some insect-derived AMPs have been found in the housefly based on biochemical purification combined with functional identification as well as the analysis of genomic sequences. In this work, we used a combination of computational approaches to establish a relative complete M. domestica peptidome associated with its antimicrobial immunity, which largely expands the repertoire of AMP-like molecules in a sanitary insect pest. These molecules exhibit considerable diversity in their gene numbers and structural type with some new architectures that may be assembled as a homodimer structure or display repeats by multiple homologous domains or even a novel fold different from its ancestral state. It is clear that such diversity can attribute to several evolutionary scenarios involving gene and exon duplication, terminal variations and disulfide bridge reconstruction. Although the housefly has a closer phylogenetic relationship with Drosophila (both belonging to the Order Diptera), their evolution in developing their antimicrobial immune system remarkably differs. Compared with Drosophila, the housefly seems to have increased the complexity of the system, a similar case also previously reported in the parasitoid Nasonia vitripennis. This might be a result of evolutionary convergence in facing the selective pressure towards parasitism and a dirty microbe-rich environment. Our work will offer a basic platform to further study the immune and evolutionary significance of these newly discovered AMPs and the role of the molecular and structural diversity in contributing to the immune response of houseflies.

Supplementary Materials

The following are available online at https://www.mdpi.com/1424-2818/13/3/107/s1, Figure S1: The strategy for database searches for putative M. domestica antimicrobial peptides, Figure S2: Multiple sequence alignment (MSA) of defensins, Figure S3: MSA of MdEppins, Figure S4: MSA of MdMuslins, Figure S5: MSA of SVWC AMPs, Figure S6: MSA of Crustins, Figure S7: Domesticins, Figure S8: MSA of Diptericins, Figure S9: MSA of Edins, Figure S10: MSA of AttacinA, Figure S11: MSA of AttacinC, Figure S12: MSA of AttacinD.

Author Contributions

S.Z. conceived and designed research. S.Q. performed sequence, structural and evolutionary analyses. B.G. performed energy minimization analyses of peptide structures. S.Q. wrote the manuscript with assistance from all other authors. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 31870766) to S.Z.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data supporting the reported results can be found at https://www.ncbi.nlm.nih.gov/.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Characteristics of AMPs in Musca domestica.
Table A1. Characteristics of AMPs in Musca domestica.
NameWGS/ESTGenBank No.TypeSizeMW(Da)Net ChargepI
ORFMP
MdDefensin1 # AIP98387.1Csαβ93414158.743.28.29
MdDefensin2 #AY260152.1AAP33451.1Csαβ92403995.563.28.31
MdDefensin3 #KJ867444.1AIL24687.1Csαβ99403995.563.28.31
MdDefensin4 #AQPM01000006.1:16485-16782XP_005174767.1Csαβ93403995.563.28.31
MdDefensin5 #EF175879.1ABM66377.1Csαβ93403995.563.28.31
MdDefensin6 #AQPM01000006.1:14681-14958XP_005174766.1Csαβ92403996.563.28.31
MdDefensin7 #AQPM01000006.1:10815-10994AGS57597.1Csαβ91404009.593.28.31
MdDefensin8AQPM01000006.1:19117-19389XP_005174768.1Csαβ91404228.884.28.55
MdDefensin9KM047667.1 Csαβ97414158.743.28.29
MdDefensin10AQPM01000006.1:27384-27647 Csαβ88444642.393.98.29
MdDefensin11NDYK01163469.1:1738-2002 Csαβ72444642.393.98.29
MdDefensin12NDYK01012409.1:563-805 Csαβ81414454.162.78.29
MdDefensin13 #AQPM01000009.1:1990-2309XP_005174769.1Csαβ83424383.172.08.03
MdDefensin14NDYK01067664.1:1236-24 Csαβ83424383.172.08.03
MdDefensin15 #AQPM01000008.1:929-1186XP_011291282.1Csαβ86414232.992.28.03
MdDefensin16 #AQPM01000010.1:7421-7708XP_011292449.1Csαβ96434707.766.08.99
MdDefensin17 XP_011290793.1Csαβ75525492.291.07.66
MdDefensin18AQPM01000010.1:5050-5322XP_019893215.1Csαβ91414495.133.08.29
MdDefensin19AQPM01000010.1:909-1677 Csαβ84384360.192.78.29
MdDefensin20NDYK01134510.1:970-1187 Csαβ64455075.930.97.61
MdDefensin21AQPM01000007.1:876-1139 Csαβ88657013.821.07.66
MdEppin1 XP_005182439.1kuntiz domain1209010213.55−1.25.55
MdEppin2NDYK01073931.1:1975-2334 kuntiz domain103738437.571.77.98
MdEppin3AQPM01068620.1:8089-8448 kuntiz domain103738437.571.77.98
MdEppin4AQPM01069148.1:21728-22230XP_005182685.1kuntiz domain16814916683,41−194.53
MdEppin5 XP_005192098.1kuntiz domain108809321.664.78.71
MdEppin6 XP_005192099.1kuntiz domain20117319487.26−1.25.73
MdEppin7NDYK01008682:1427-1791 kuntiz domain99758489.551.27.66
MdEppin8 XP_005182689.1kuntiz domain93748636.787.09.23
MdEppin9AQPM01082327.1:250-573 kuntiz domain1089110121.48−1.15.97
MdEppin10 XP_019893036.1kuntiz domain83667318.191,07.66
MdEppin11AQPM01068767.1:4133-4939 kuntiz domain83607255.207.99.13
MdEppin12AQPM01058660.1:4502-4813 kuntiz domain104839255.26−2.15.27
MdEppin13 XP_011291014.1kuntiz domain84636886.53−6.04.30
MdEppin14AQPM01069148.1:1494-1791XP_005182706.1kuntiz domain85636862.55−3.04.87
MdEppin15AQPM01069148.1:7578-7914XP_005182682.2kuntiz domain91636966.53−8.04.26
MdEppin16AQPM01069148.1:12552-12854XP_005182683.1kuntiz domain82636946.74−2.05.27
MdEppin17AQPM01069148.1:3783-4100XP_005182681.1kuntiz domain84647101.81−2.85.36
MdEppin18AQPM01069148.1:15772-16090 kuntiz domain86667424.27−1.05.97
MdEppin19 XP_005182684.1kuntiz domain83637036.78−3.05.05
MdEppin20NDYK01230132:2648-2965 kuntiz domain84647098.93−1.85.78
MdEppin21NDYK01230132:3440-3753 kuntiz domain87677507.45−1.66.25
MdEppin22 XM_005182623.3kuntiz domain83647162.96−2.55.69
MdEppin23 XP_005182678.1kuntiz domain79596444.22−3.04.94
MdEppin24AQPM01069146.1:10524-10829XP_011292137.1kuntiz domain78596581.382.78.27
MdEppin25AQPM01069146.1:7808-8193XP_019892012.1kuntiz domain1078810130.34−5.24.75
MdEppin26 XP_019892010.1kuntiz domain78596468.20−3.04.94
MdEppin27 XP_005190040.1kuntiz domain88687497.467.79.41
MdEppin28NDYK01190170:236-565 kuntiz domain82627018.898.79.70
MdEppin29AQPM01002184.1:2478-2770XP_005192097.1kuntiz domain77576485.14−1.05.97
MdEppin30 XP_019895340.1kuntiz domain88687688.604.28.50
MdEppin31AQPM01011896.1:10591-10914 kuntiz domain85657374.264.28.50
MdEppin32NDYK01025868:2318-2646 kuntiz domain85657445.396.28.95
MdEppin33 XP_005182687.1kuntiz domain14512012984.717.79.13
MdEppin34AQPM01069148.1:24856-25351XP_019892019.1kuntiz domain14612113055.787.79.13
MdEppin35-1NDYK01174947.1:14053-17900 kuntiz domain638465130.920.47.23
MdEppin35-2 kuntiz domain 748622.47−5.04.80
MdEppin35-3 kuntiz domain 869705.43−9.04.46
MdEppin35-4 kuntiz domain 849442.15−14.24.05
MdEppin35-5 kuntiz domain 586481.97−7.04.23
MdEppin35-6 kuntiz domain 657352.07−4.84.77
MdEppin35-7 kuntiz domain 606626.16−4.04.70
MdEppin35-8 kuntiz domain 718226.17−0.36.44
MdEppin35-9 kuntiz domain 819210.260.77.61
MdMuslin1AQPM01019378.1:40908-41220 kazal domain81616923.73−2.75.36
MdMuslin2AQPM01019378.1:43260-44056 kazal domain75566374.40−0.16.95
MdMuslin3AQPM01019378.1:47740-48051 kazal domain81626880.61−4.34.50
MdMuslin4AQPM01019378.1:52156-52447 kazal domain76576384.33−1.05.88
MdMuslin5AQPM01019378.1:56578-56881XP_005175924.1kazal domain78596608.54−3.04.98
MdMuslin6AQPM01019378.1:57384-57671XP_005175923.1kazal domain73535905.825.08.73
MdMuslin7NDYK01036986.1:1550-1765 kazal domain70515592.32−1.35.12
MdMuslin8NDYK01185191.1:567-854XP_011294170.1kazal domain75566360.274.28.52
MdMuslin9 XP_005175922.1kazal domain75566165.88−1.35.27
MdMuslin10AQPM01042957.1:886-1101XP_005175386.1kazal domain72485340.000.77.61
MdMuslin11NDYK01172454.1:1382-1630XP_005190386.1kazal domain83636751.57−1.05.97
MdMuslin12NDYK01221433.1:787-1002XP_005188808.1kazal domain82495613.513.78.50
MdMuslin13 XP_005178654.1kazal domain13611712354.504.08.55
MdMuslin14 XP_005188224.1kazal domain79596456.06−5.34.49
MdMuslin15-1 XP_005188061.1kazal domain154768566.70−3.34.77
MdMuslin15-2 kazal domain 576219.321.77.98
MdMuslin16-1AQPM01094907.1:8864-9376 kazal domain148778667.80−3.34.77
MdMuslin16-2 kazal domain 505537.531.77.98
MdMuslin17NDYK01122947.1:2962-3420XP_005190966.1kazal domain88677092.010.77.61
MdMuslin18 XP_005190967.1kazal domain98707304.274.78.80
MdMuslin19NDYK01014763:1939-2223 kazal domain95677112.236.79.30
MdMuslin20 XP_019893634.1kazal domain83626943.64−8.04.16
MdMuslin21 XP_011296278.1kazal domain98778811.174.28.52
MdMuslin22 XP_005188497.1kazal domain91727752.702.07.98
MdMuslin23-1 XP_005190965.2kazal domain136576848.863.08.27
MdMuslin23-2 kazal domain 525968.770.77.61
MdMuslin24-1 XP_019894976.1kazal domain135617210.20−1.35.41
MdMuslin24-2 kazal domain 526100.76−0.56.72
MdMuslin25AQPM01000599.1:6069-6748 kazal domain87647410.4310.410.09
MdMuslin26NDYK01190177.1:7-310 kazal domain77566388.295.28.65
MdMuslin27AQPM01000601.1:7182-7517XP_005190993.1kazal domain87647268.369.710.89
MdMuslin28AQPM01000604.1:11632-11959XP_011295627.1kazal domain89667438.507.99.30
MdMuslin29AQPM01000605.1:5737-6069 kazal domain87647267.418.99.30
MdMuslin30 XP_005190994.1kazal domain90677619.849.99.51
MdMuslin31NDYK01044765.1:1300-1610 kazal domain83617115.2310.910.61
MdMuslin32AQPM01000603.1:4008-4322 kazal domain84596658.825.98.66
MdMuslin33AQPM01000603.1:639-963 kazal domain89667007.077.18.90
MdSVWC1 XP_005183514.1svwc14913014382.45−4.35.08
MdSVWC2 XP_011295325.1svwc102818733.00−1.15.88
MdSVWC3NDYK01201822.1:136-580XP_005175282.1svwc102818741.00−1.15.88
MdSVWC4 XP_005189656.1svwc13410311455.80−3.35.01
MdSVWC5AQPM01015706.1:698-1200 svwc1129410684.30−1.66.25
MdSVWC6 XP_005175408.1svwc12010011347.190.47.19
MdSVWC7AQPM01092559.1:3374-5544XP_005187657.1svwc18216319163.261.97.87
MdSVWC8AQPM01024958.1:41-530XP_005190002.1svwc1139410879.12−2.65.78
MdSVWC9AQPM01015051.1: 3997-5502XP_005175283.1svwc1118710386.73−2.16.34
MdSVWC10AQPM01015049.1:461-992 svwc100809437.827.48.88
MdSVWC11 XP_011295271.1svwc104849971.320.67.26
MdSVWC12AQPM01015050.1:115-2228XP_005175281.1svwc104799215.33−1.66.25
MdSVWC13 XP_005175284.1svwc88687510.653.98.29
MdSVWC14AQPM01081312.1:14881-15505 svwc102798904.370.97.52
MdSVWC15NW_004765359.1:113654-114278JZ121963.1svwc95728095.36−0.16.91
MdSVWC16 XP_005184317.1svwc1249610843.560.27.09
MdSVWC17AQPM01092395.1:1108-1569XP_005187600.1svwc12110211254.72−0.46.86
MdSVWC18 XP_005180761.1svwc107889682.024.28.31
MdSVWC19AQPM01056437.1:2396-2894XP_011290794.1svwc113899482.15−0.86.44
MdSVWC20AQPM01095428.1:3680-4100XP_005188179.1svwc102849247.47−5.14.75
MdSVWC21AQPM01060615.1:2747-3172XP_005180301.1svwc103859126.49−1.66.25
MdSVWC22AQPM01095425.1:2644-3102XP_005188177.1svwc1139510296.72−4.65.10
MdSVWC23 XP_011294423.1svwc12010211126.55−4.65.10
MdSVWC24AQPM01095428.1:13247-13749XP_011294424.1svwc105859320.744.28.31
MdSVWC25AQPM01095423.1:787-1232XP_005188176.1svwc106848913.273.28.12
MdSVWC26AQPM01019310.1:3498-3948XP_005175918.1svwc106849408.71−0.66.72
MdSVWC27AQPM01095427.1:815-1458XP_005188181.1svwc1159510876.26−6.94.77
MdSVWC28AQPM01095425.1:13938-14587 svwc1159510891.33−7.84.67
MdSVWC29-1 XP_005188180.2svwc13711713288.15−7.94.60
MdSVWC29-2 svwc101839053.29−0.86.48
MdCrustin1AQPM01030484.1:548-1269 wappin domain100677493.402.98.10
MdCrustin2 XP_011295532.1wappin domain92677493.402.98.10
MdCrustin3 XP_005190815.1wappin domain115949819.935.88.37
MdCrustin4 XP_011295531.1wappin domain1209510541.926.88.48
MdCecropin1 # ABB17292.1α-helix63404271.975.110.56
MdCecropin2 #AQPM01058001.1:775-1030XP_005179713.1α-helix64414342.066.110.66
MdCecropin3 #AQPM01058001.1:8963-9222XP_005179700.1α-helix64414386.116.110.66
MdCecropin4 #AQPM01058004.1:2369-2661XP_019890986.1α-helix63404257.945.110.56
MdCecropin5NDYK01010340.1:75-311 α-helix64414461.104.210.94
MdCecropin6 #AQPM01058004.1:5352-5771XP_005179717.1α-helix64414370.116.110.66
MdCecropin7 #AQPM01058000.1:20546-20804XP_005179712.1α-helix63414356.056.111.12
MdCecropin8 #AQPM01058001.1:4386-4681AXG50148.1α-helix64414356.096.110.66
MdCecropin9 #AQPM01058006.1:1083-1338XP_005179718.1α-helix64414461.104.210.94
MdCecropin10 #AQPM01058008.1:3578-4516XP_005179719.1α-helix62414464.063.210.28
MdCecropin11 #AQPM01058004.1:928-1210AIW52264.1α-helix64414356.096.110.66
MdCecropin12JZ121081.1 α-helix63404227.915.110.56
MdCecropin13ES608288.1 α-helix64414341.126.110.66
MdCecropin14 #AQPM01100428.1:3910-4122XP_011294761.1α-helix69444546.242.99.34
MdCecropin15AQPM01100427.1:1792-2047XP_019894290.1α-helix69444518.233.99.72
MdCecropin16AQPM01100428.1:8445-8711 α-helix69444560.272.99.34
MdDiptericin1 #FJ748596.1:148-344ACN61637.1Proline and Glycine rich99798721.391.48.50
MdDiptericin2 #FJ794602.1:25-321ACO35257.1Proline and Glycine rich99798721.391.48.50
MdDiptericin3 #KM205631.151-347 Proline and Glycine rich99798596.583.58.69
MdDiptericin4 #FJ795370.1:65-364ACN93798.1Proline and Glycine rich99798725.341.48.50
MdDiptericinD #AQPM01092243.1:221-1086NP_001295957.2Proline and Glycine rich99798770.471.48.41
MdDiptericinD1 #AQPM01092241.1:4245-4608XP_005187575.1Proline and Glycine rich99798711.361.48.50
MdDomesticin1 # AHA56721.1Proline rich65404583.336.911.41
MdDomesticin2AQPM01056449.1:10440-12589Proline rich65404525.205.910.98
MdEdin1AQPM01067938.1:2033-2341 Glycine rich102626987.33−0.96.67
MdEdin2AQPM01067938.1:5150-5500 Glycine rich116657321.742.49.20
MdEdin3AQPM01067936.1:1742-2049 Glycine rich101616827.11−1.96.34
MdEdin4AQPM01067936.1:4517-4852 Glycine rich120657331.793.69.70
MdEdin5JZ121894.1 Glycine rich116657359.792.49.25
MdEdin6AQPM01067939.1:3916-4450 Glycine rich17712713760.561.17.66
MdEdin7NDYK01101123.1:1387-1910 Glycine rich17412713721.521.98.50
MdEdin8AQPM01067938.1:9198-9731 Glycine rich17712713752.540.77.47
MdEdin9AQPM01067938.1:15033-15560 Glycine rich17412513612.443.69.34
MdEdin10 # AFP64086.1Glycine rich17512713740.571.78.50
MdAttacinA1 # XP_011296530.1Glycine rich20818820002.016.99.66
MdAttacinA2AQPM01013309.1:3192-3887XP_019890218.1Glycine rich20818819613.346.99.81
MdAttacinA3 # AAY59540.1Glycine rich20818819688.416.99.81
MdAttacinA4 # AAR23786.1Glycine rich20818819672.416.99.81
MdAttacinA5 XP_019890219.1Glycine rich20818819718.446.99.81
MdAttacinC1 #AQPM01059487.1:2038-2826 Proline and Glycine rich24119220420.062.98.92
MdAttacinC2 # ACO35258.1Proline and Glycine rich24119220334.911.98.41
MdAttacinC3-1 XP_005180079.2Proline and Glycine rich25020121385.125.19.34
MdAttacinC3-2 Proline and Glycine rich24119220449.114.19.23
MdAttacinC4 #AQPM01059487.1:7828-8614XP_005180076.1Proline and Glycine rich24119220383.933.18.92
MdAttacinC5 #NDYK01054543.1:3986-4696 Proline and Glycine rich24119219973.485.19.34
MdAttacinC6AQPM01059487.1:3997-5074 Proline and Glycine rich24119220349.913.18.92
MdAttacinC7JZ121354.1:31-753 Proline and Glycine rich24119220384.911.78.41
MdAttacinD1 XP_011296538.1Glycine rich18118119122.948.19.91
MdAttacinD2 XP_005178516.1Glycine rich18918919412.195.39.41
MdAttacinD3 #NDYK01109436.1:1208-1899XP_005178550.1Glycine rich19119119712.797.89.70
MdAttacinD4 # AFP64340.1Glycine rich19719721110.836.19.65
Note: Previously known sequences are labeled by “#”.

References

  1. Hoffmann, J.A. Innate immunity of insects. Curr. Opin. Immunol. 1995, 7, 4–10. [Google Scholar] [CrossRef]
  2. Stork, N.E.; McBroom, J.; Gely, C.; Hamilton, A.J. New approaches narrow global species estimates for beetles, insects, and terrestrial arthropods. Proc. Natl. Acad. Sci. USA 2015, 112, 7519–7523. [Google Scholar] [CrossRef] [Green Version]
  3. Lemaitre, B.; Hoffmann, J. The host defense of Drosophila melanogaster. Annu. Rev. Immunol. 2007, 25, 697–743. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Flores-Villegas, A.L.; Salazar-Schettino, P.M.; Cordoba-Aguilar, A.; Gutierrez-Cabrera, A.E.; Rojas-Wastavino, G.E.; Bucio-Torres, M.I.; Cabrera-Bravo, M. Immune defence mechanisms of triatomines against bacteria, viruses, fungi and parasites. Bull. Entomol. Res. 2015, 105, 523–532. [Google Scholar] [CrossRef] [PubMed]
  5. Park, Y.; Hahm, K.S. Antimicrobial peptides (AMPs): Peptide structure and mode of action. J. Biochem. Mol. Biol. 2005, 38, 507–516. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Imler, J.L.; Bulet, P. Antimicrobial peptides in Drosophila: Structures, activities and gene regulation. Chem. Immunol. Allergy 2005, 86, 1–21. [Google Scholar] [PubMed]
  7. Zhang, Z.T.; Zhu, S.Y. Drosomycin, an essential component of antifungal defence in Drosophila. Insect. Mol. Biol. 2009, 18, 549–556. [Google Scholar] [CrossRef]
  8. Hanson, M.A.; Lemaitre, B. New insights on Drosophila antimicrobial peptide function in host defense and beyond. Curr. Opin. Immunol. 2020, 62, 22–30. [Google Scholar] [CrossRef]
  9. Bulet, P.; Hetru, C.; Dimarcq, J.L.; Hoffmann, D. Antimicrobial peptides in insects; structure and function. Dev. Comp. Immunol. 1999, 23, 329–344. [Google Scholar] [CrossRef]
  10. Kaushal, A.; Gupta, K.; Shah, R.; van Hoek, M.L. Antimicrobial activity of mosquito cecropin peptides against Francisella. Dev Comp. Immunol. 2016, 63, 171–180. [Google Scholar] [CrossRef]
  11. Casteels, P.; Ampe, C.; Jacobs, F.; Vaeck, M.; Tempst, P. Apidaecins: Antibacterial peptides from honeybees. EMBO J. 1989, 8, 2387–2391. [Google Scholar] [CrossRef]
  12. Charlet, M.; Lagueux, M.; Reichhart, J.M.; Hoffmann, D.; Braun, A.; Meister, M. Cloning of the gene encoding the antibacterial peptide drosocin involved in Drosophila immunity. Eur. J. Biochem. 1996, 241, 699–706. [Google Scholar] [CrossRef] [PubMed]
  13. Chowdhury, S.; Taniai, K.; Hara, S.; Kadonookuda, K.; Kato, Y.; Yamamoto, M.; Xu, J.; Choi, S.K.; Debnath, N.C.; Choi, H.K.; et al. cDNA cloning and gene expression of lebocin, a novel member of antibacterial peptides from the silkworm, Bombyx mori. Biochem. Biophys. Res. Commun. 1995, 214, 271–278. [Google Scholar] [CrossRef]
  14. Gao, B.; Zhu, S. The drosomycin multigene family: Three-disulfide variants from Drosophila takahashii possess antibacterial activity. Sci. Rep. 2016, 6, 32175–32186. [Google Scholar] [CrossRef] [Green Version]
  15. Thevissen, K.; Kristensen, H.H.; Thomma, B.P.; Cammue, B.P.; François, I.E. Therapeutic potential of antifungal plant and insect defensins. Drug Discov. Today 2007, 12, 966–971. [Google Scholar] [CrossRef] [PubMed]
  16. Mylonakis, E.; Podsiadlowski, L.; Muhammed, M.; Vilcinskas, A. Diversity, evolution and medical applications of insect antimicrobial peptides. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2016, 371, 1695–1705. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Christophides, G.K.; Zdobnov, E.; Barillas-Mury, C.; Birney, E.; Blandin, S.; Blass, C.; Brey, P.T.; Collins, F.H.; Danielli, A.; Dimopoulos, G.; et al. Immunity-related genes and gene families in Anopheles gambiae. Science 2002, 298, 159–165. [Google Scholar] [CrossRef] [Green Version]
  18. Tanaka, H.; Ishibashi, J.; Fujita, K.; Nakajima, Y.; Sagisaka, A.; Tomimoto, K.; Suzuki, N.; Yoshiyama, M.; Kaneko, Y.; Iwasaki, T.; et al. A genome-wide analysis of genes and gene families involved in innate immunity of Bombyx mori. Insect. Biochem. Mol. Biol. 2008, 38, 1087–1110. [Google Scholar] [CrossRef]
  19. Waterhouse, R.M.; Kriventseva, E.V.; Meister, S.; Xi, Z.; Alvarez, K.S.; Bartholomay, L.C.; Barillas-Mury, C.; Bian, G.; Blandin, S.; Christensen, B.M.; et al. Evolutionary dynamics of immune-related genes and pathways in disease-vector mosquitoes. Science 2007, 316, 1738–1743. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Tian, C.; Gao, B.; Fang, Q.; Ye, G.; Zhu, S. Antimicrobial peptide-like genes in Nasonia vitripennis: A genomic perspective. BMC Genomics 2010, 11, 187. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Niu, Y.; Zheng, D.; Yao, B.; Cai, Z.; Zhao, Z.; Wu, S.; Cong, P.; Yang, D. A novel bioconversion for value-added products from food waste using Musca domestica. Waste Manag. 2017, 61, 455–460. [Google Scholar] [CrossRef]
  22. Scott, J.G.; Warren, W.C.; Beukeboom, L.W.; Bopp, D.; Clark, A.G.; Giers, S.D.; Hediger, M.; Jones, A.K.; Kasai, S.; Leichter, C.A.; et al. Genome of the house fly, Musca domestica L. a global vector of diseases with adaptations to a septic environment. Genome Biol. 2014, 15, 466–482. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Zhu, S.; Gao, B.; Tytgat, J. Phylogenetic distribution, functional epitopes and evolution of the CSαβ superfamily. Cell Mol. Life Sci. 2005, 62, 2257–2269. [Google Scholar] [CrossRef] [PubMed]
  24. Robert, X.; Gouet, P. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res. 2014, 42, W320–W324. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Zhu, S.; Gao, B.; Peigneur, S.; Tytgat, J. How a scorpion toxin selectively captures a prey sodium channel: The molecular and evolutionary basis uncovered. Mol. Biol. Evol. 2020, 37, 3149–3164. [Google Scholar] [CrossRef]
  26. Pierce, B.G.; Wiehe, K.; Hwang, H.; Kim, B.H.; Vreven, T.; Weng, Z. ZDOCK server: Interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics 2014, 30, 1771–1773. [Google Scholar] [CrossRef]
  27. Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A.E.; Berendsen, H.J. GROMACS: Fast, flexible, free. J. Comput. Chem. 2005, 26, 1701–1718. [Google Scholar] [CrossRef]
  28. Kaminski, G.A.; Friesner, R.A.; Tirado-Rives, J.; Jorgensen, W.L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 2001, 105, 6474–6487. [Google Scholar] [CrossRef]
  29. Zhu, S.; Gao, B. Positive selection in cathelicidin host defense peptides: Adaptation to exogenous pathogens or endogenous receptors? Heredity 2017, 118, 453–465. [Google Scholar] [CrossRef] [Green Version]
  30. Yang, Z.; Nielsen, R.; Goldman, N.; Pedersen, A.M.K. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 2000, 155, 431–449. [Google Scholar]
  31. Yang, Z.; Wong, W.S.W.; Nielsen, R. Bayes empirical Bayes inference of amino acid sites under positive selection. Mol. Biol. Evol. 2005, 22, 1107–1118. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Colell, E.A.; Iserte, J.A.; Simonetti, F.L.; Marino-Buslje, C. MISTIC2: Comprehensive server to study coevolution in protein families. Nucleic Acids Res. 2018, 46, W323–W328. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Crooks, G.E.; Hon, G.; Chandonia, J.M.; Brenner, S.E. WebLogo: A sequence logo generator. Genome Res. 2004, 14, 1188–1190. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Zhou, W.; Gao, B.; Zhu, S. Did cis- and trans-defensins derive from a common ancestor? Immunogenetics 2019, 71, 61–69. [Google Scholar] [CrossRef]
  35. Cociancich, S.; Ghazi, A.; Hetru, C.; Hoffmann, J.A.; Letellier, L. Insect defensin, an inducible antibacterial peptide, forms voltage-dependent channels in Micrococcus luteus. J. Biol. Chem. 1993, 268, 19239–19245. [Google Scholar] [CrossRef]
  36. Lee, Y.S.; Yun, E.K.; Jang, W.S.; Kim, I.; Lee, J.H.; Park, S.Y.; Ryu, K.S.; Seo, S.J.; Kim, C.H.; Lee, I.H. Purification, cDNA cloning and expression of an insect defensin from the great wax moth, Galleria mellonella. Insect. Mol. Biol. 2004, 13, 65–72. [Google Scholar] [CrossRef]
  37. Andoh, M.; Ueno, T.; Kawasaki, K. Tissue-dependent induction of antimicrobial peptide genes after body wall injury in house fly (Musca domestica) larvae. Drug Discov. Ther. 2018, 12, 355–362. [Google Scholar] [CrossRef] [Green Version]
  38. Yount, N.Y.; Yeaman, M.R. Multidimensional signatures in antimicrobial peptides. Proc. Natl. Acad. Sci. USA 2004, 101, 7363–7368. [Google Scholar] [CrossRef] [Green Version]
  39. Koehbach, J. Structure-Activity Relationships of Insect. Defensins. Front. Chem. 2007, 5, 45–54. [Google Scholar] [CrossRef] [Green Version]
  40. Zhu, S. Discovery of six families of fungal defensin-like peptides provides insights into origin and evolution of the CSαβ defensins. Mol. Immunol. 2008, 45, 828–838. [Google Scholar] [CrossRef]
  41. Laskowski, M., Jr.; Kato, I. Protein inhibitors of proteinases. Annu. Rev. Biochem. 1980, 49, 593–626. [Google Scholar] [CrossRef] [PubMed]
  42. Kanost, M.R. Serine proteinase inhibitors in arthropod immunity. Dev. Comp. Immunol. 1999, 23, 291–301. [Google Scholar] [CrossRef]
  43. McCrudden, M.T.; Dafforn, T.R.; Houston, D.F.; Turkington, P.T.; Timson, D.J. Functional domains of the human epididymal protease inhibitor, eppin. FEBS J. 2008, 275, 1742–1750. [Google Scholar] [CrossRef]
  44. Fröbius, A.C.; Kanost, M.R.; Götz, P.; Vilcinskas, A. Isolation and characterization of novel inducible serine protease inhibitors from larval hemolymph of the greater wax moth Galleria mellonella. Eur. J. Biochem. 2000, 267, 2046–2053. [Google Scholar] [CrossRef] [PubMed]
  45. Nirmala, X.; Kodrík, D.; Žurovec, M.; Sehnal, F. Insect silk contains both a Kunitz-type and a unique Kazal-typeproteinase inhibitor. Eur. J. Biochem. 2001, 268, 2064–2073. [Google Scholar] [CrossRef] [Green Version]
  46. de Magalhaes, M.T.Q.; Mambelli, F.S.; Santos, B.P.O.; Morais, S.B.; Oliveira, S.C. Serine protease inhibitors containing a Kunitz domain: Their role in modulation of host inflammatory responses and parasite survival. Microbes Infect. 2018, 20, 606–609. [Google Scholar] [CrossRef]
  47. Watanabe, R.M.; Soares, T.S.; Morais-Zani, K.; Tanaka-Azevedo, A.M.; Maciel, C.; Capurro, M.L.; Torquato, R.J.; Tanaka, A.S. A novel trypsin Kazal-type inhibitor from Aedes aegypti with thrombin coagulant inhibitory activity. Biochimie 2010, 92, 933–939. [Google Scholar] [CrossRef]
  48. Niimi, T.; Yokoyama, H.; Goto, A.; Beck, K.; Kitagawa, Y. A Drosophila gene encoding multiple splice variants of Kazal-type serine protease inhibitor-like proteins with potential destinations of mitochondria, cytosol and the secretory pathway. Eur. J. Biochem. 1999, 266, 282–292. [Google Scholar] [CrossRef] [Green Version]
  49. Brillard-Bourdet, M.; Hamdaoui, A.; Hajjar, E.; Boudier, C.; Reuter, N.; Ehret-Sabatier, L.; Bieth, J.G.; Gauthier, F. A novel locust (Schistocerca gregaria) serine protease inhibitor with a high affinity for neutrophil elastase. Biochem. J. 2006, 400, 467–476. [Google Scholar] [CrossRef] [Green Version]
  50. Kumaresan, V.; Harikrishnan, R.; Arockiaraj, J. A potential Kazal-type serine protease inhibitor involves in kinetics of protease inhibition and bacteriostatic activity. Fish Shellfish Immunol. 2015, 42, 430–438. [Google Scholar] [CrossRef]
  51. Gebhard, L.G.; Carrizo, F.U.; Stern, A.L.; Burgardt, N.I.; Faivovich, J.; Lavilla, E.; Ermacora, M.R. A Kazal prolyl endopeptidase inhibitor isolated from the skin of Phyllomedusa sauvagii. Eur. J. Biochem. 2004, 271, 2117–2126. [Google Scholar] [CrossRef] [Green Version]
  52. Han, F.; Lu, A.; Yuan, Y.; Huang, W.; Beerntsen, B.T.; Huang, J.; Ling, E. Characterization of an entomopathogenic fungi target integument protein, Bombyx mori single domain von Willebrand factor type C, in the silkworm, Bombyx mori. Insect. Mol. Biol. 2017, 26, 308–316. [Google Scholar] [CrossRef] [PubMed]
  53. Smit, A.B.; de Jong-Brink, M.; Li, K.W.; Sassen, M.M.J.; Spijker, S.; van Elk, R.; Buijs, S.P.; van Minnen, J.; van Kesteren, R.E. Granularin, a novel molluscan opsonin comprising a single vWF type C domain is up-regulated during parasitation. FEBS J. 2004, 18, 845–847. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  54. Smith, V.J.; Fernandes, J.M.; Kemp, G.D.; Hauton, C. Crustins: Enigmatic WAP domain-containing antibacterial proteins from crustaceans. Dev. Comp. Immunol. 2008, 32, 758–772. [Google Scholar] [CrossRef] [Green Version]
  55. Vargas-Albores, F.; Martinez-Porchas, M. Crustins are distinctive members of the WAP-containing protein superfamily: An improved classification approach. Dev. Comp. Immunol. 2017, 76, 9–17. [Google Scholar] [CrossRef]
  56. Afsal, V.V.; Antony, S.P.; Sathyan, N.; Philip, R. Molecular characterization and phylogenetic analysis of two antimicrobial peptides: Anti-lipopolysaccharide factor and crustin from the brown mud crab, Scylla serrata. Results Immunol. 2011, 1, 6–10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  57. Brockton, V.; Hammond, J.A.; Smith, V.J. Gene characterisation, isoforms and recombinant expression of carcinin, an antibacterial protein from the shore crab, Carcinus maenas. Mol. Immunol. 2007, 44, 943–949. [Google Scholar] [CrossRef] [Green Version]
  58. Antony, S.P.; Singh, I.S.; Sudheer, N.S.; Vrinda, S.; Priyaja, P.; Philip, R. Molecular characterization of a crustin-like antimicrobial peptide in the giant tiger shrimp, Penaeus monodon, and its expression profile in response to various immunostimulants and challenge with WSSV. Immunobiology 2011, 216, 184–194. [Google Scholar] [CrossRef]
  59. Yang, L.; Niu, S.; Gao, J.; Zuo, H.; Yuan, J.; Weng, S.; He, J.; Xu, X. A single WAP domain (SWD)-containing protein with antiviral activity from Pacific white shrimp Litopenaeus vannamei. Fish Shellfish Immunol. 2018, 73, 167–174. [Google Scholar] [CrossRef]
  60. Zhang, J.; Li, F.; Wang, Z.; Xiang, J. Cloning and recombinant expression of a crustin-like gene from Chinese shrimp, Fenneropenaeus chinensis. J. Biotechnol. 2007, 127, 605–614. [Google Scholar] [CrossRef]
  61. Hagiwara, K.; Kikuchi, T.; Endo, Y.; Usui, K.; Takahashi, M.; Shibata, N.; Kusakabe, T.; Xin, H.; Hoshi, S.; Miki, M.; et al. Mouse SWAM1 and SWAM2 are antibacterial proteins composed of a single whey acidic protein motif. J. Immunol. 2003, 170, 1973–1979. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  62. Nair, D.G.; Fry, B.G.; Alewood, P.; Kumar, P.P.; Kini, R.M. Antimicrobial activity of omwaprin, a new member of the waprin family of snake venom proteins. Biochem. J. 2007, 402, 93–104. [Google Scholar] [CrossRef]
  63. Steiner, H.; Hultmark, D.; Engström, A.; Bennich, H.; Boman, H.G. Sequence and specificity of two antibacterial proteins involved in insect immunity. Nature 1981, 182, 246–248. [Google Scholar] [CrossRef]
  64. Peng, J.; Wu, Z.; Liu, W.; Long, H.; Zhu, G.; Guo, G.; Wu, J. Antimicrobial functional divergence of the cecropin antibacterial peptide gene family in Musca domestica. Parasit. Vectors 2019, 12, 537–546. [Google Scholar] [CrossRef] [PubMed]
  65. Boulanger, N.; Munks, R.J.; Hamilton, J.V.; Vovelle, F.; Brun, R.; Lehane, M.J.; Bulet, P. Epithelial innate immunity. A novel antimicrobial peptide with antiparasitic activity in the blood-sucking insect Stomoxys calcitrans. J. Biol. Chem. 2002, 277, 49921–49926. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  66. Vizioli, J.; Bulet, P.; Charlet, M.; Lowenberger, C.; Blass, C.; Muller, H.M.; Dimopoulos, G.; Hoffmann, J.; Kafatos, F.C.; Richman, A. Cloning and analysis of a cecropin gene from the malaria vector mosquito, Anopheles gambiae. Insect. Mol. Biol. 2000, 9, 75–84. [Google Scholar] [CrossRef] [Green Version]
  67. Ekengren, S.; Hultmark, D. Drosophila cecropin as an antifungal agent. Insect. Biochem. Mol. Biol. 1999, 29, 965–972. [Google Scholar] [CrossRef]
  68. Okada, M.; Natori, S. Primary structure of sarcotoxin I, an antibacterial protein induced in the hemolymph of Sarcophaga peregrina (flesh fly) larvae. J. Biol. Chem. 1985, 260, 7174–7177. [Google Scholar] [CrossRef]
  69. Ouyang, L.; Xu, X.; Freed, S.; Gao, Y.; Yu, J.; Wang, S.; Ju, W.; Zhang, Y.; Jin, F. Cecropins from Plutella xylostella and their interaction with Metarhizium anisopliae. PLoS ONE 2015, 10, e0142451. [Google Scholar] [CrossRef] [PubMed]
  70. Saito, A.; Ueda, K.; Imamura, M.; Atsumi, S.; Tabunoki, H.; Miura, N.; Watanabe, A.; Kitami, M.; Sato, R. Purification and cDNA cloning of a cecropin from the longicorn beetle, Acalolepta luxuriosa. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 2005, 142, 317–323. [Google Scholar] [CrossRef]
  71. Kim, J.K.; Lee, E.; Shin, S.; Jeong, K.W.; Lee, J.Y.; Bae, S.Y.; Kim, S.H.; Lee, J.; Kim, S.R.; Lee, D.G.; et al. Structure and function of papiliocin with antimicrobial and anti-inflammatory activities isolated from the swallowtail butterfly, Papilio xuthus. J. Biol. Chem. 2011, 286, 41296–41311. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  72. Lee, E.; Jeong, K.W.; Lee, J.; Shin, A.; Kim, J.-K.; Lee, J.; Lee, D.G.; Kim, Y. Structure-activity relationships of cecropin-like peptides and their interactions with phospholipid membrane. BMB Rep. 2013, 46, 282–287. [Google Scholar] [CrossRef]
  73. Yagi-Utsumi, M.; Yamaguchi, Y.; Boonsri, P.; Iguchi, T.; Okemoto, K.; Natori, S.; Kato, K. Stable isotope-assisted NMR characterization of interaction between lipid A and sarcotoxin IA, a cecropin-type antibacterial peptide. Biochem. Biophys. Res. Commun. 2013, 431, 136–140. [Google Scholar] [CrossRef] [PubMed]
  74. Okemoto, K.; Nakajima, Y.; Fujioka, T.; Natori, S. Participation of two N-terminal residues in LPS neutralizing activity of sarcotoxin IA. J. Biochem. 2002, 131, 277–281. [Google Scholar] [CrossRef] [PubMed]
  75. Oh, D.; Shin, S.Y.; Lee, S.; Kang, J.H.; Kim, S.D.; Ryu, P.D.; Hahm, K.S.; Kim, Y. Role of the hinge region and the tryptophan residue in the synthetic antimicrobial peptides, cecropin A(1–8)-magainin 2(1–12) and its analogues, on their antibiotic activities and structures. Biochemistry 2000, 39, 11855–11864. [Google Scholar] [CrossRef] [PubMed]
  76. Pei, Z.; Sun, X.; Tang, Y.; Wang, K.; Gao, Y.; Ma, H. Cloning, expression, and purification of a new antimicrobial peptide gene from Musca domestica larva. Gene 2014, 549, 41–45. [Google Scholar] [CrossRef]
  77. Tang, T.; Li, X.; Yang, X.; Yu, X.; Wang, J.; Liu, F.; Huang, D. Transcriptional response of Musca domestica larvae to bacterial infection. PLoS ONE 2014, 9, e104867. [Google Scholar] [CrossRef] [Green Version]
  78. Scocchi, M.; Tossi, A.; Gennaro, R. Proline-rich antimicrobial peptides: Converging to a non-lytic mechanism of action. Cell Mol. Life Sci. 2011, 68, 2317–2330. [Google Scholar] [CrossRef] [PubMed]
  79. Dimarcq, J.L.; Zachary, D.; Hoffmann, J.A.; Hoffmann, D.; Reichhart, J.M. Insect immunity: Expression of the two major inducible antibacterial peptides, defensin and diptericin, in Phormia terranovae. EMBO J. 1990, 9, 2507–2515. [Google Scholar] [CrossRef] [PubMed]
  80. Lee, J.H.; Cho, K.S.; Lee, J.; Yoo, J.; Lee, J.; Chung, J. Diptericin-like protein: An immune response gene regulated by the anti-bacterial gene induction pathway in Drosophila. Gene 2001, 271, 233–238. [Google Scholar] [CrossRef]
  81. Kim, C.H.; Muturi, E.J. Effect of larval density and Sindbis virus infection on immune responses in Aedes aegypti. J. Insect. Physiol. 2013, 59, 604–610. [Google Scholar] [CrossRef]
  82. Vanha-Aho, L.M.; Anderl, I.; Vesala, L.; Hultmark, D.; Valanne, S.; Ramet, M. Edin expression in the fat body is required in the defense against parasitic wasps in Drosophila melanogaster. PLoS Pathog. 2015, 11, e1004895. [Google Scholar] [CrossRef] [PubMed]
  83. Verleyen, P.; Baggerman, G.; D’Hertog, W.; Vierstraete, E.; Husson, S.J.; Schoofs, L. Identification of new immune induced molecules in the haemolymph of Drosophila melanogaster by 2D-nanoLC MS/MS. J. Insect. Physiol. 2006, 52, 379–388. [Google Scholar] [CrossRef] [PubMed]
  84. Mura, M.E.; Ruiu, L. Brevibacillus laterosporus pathogenesis and local immune response regulation in the house fly midgut. J. Invertebr. Pathol. 2017, 145, 55–61. [Google Scholar] [CrossRef]
  85. Kawasaki, K.; Andoh, M. Properties of induced antimicrobial activity in Musca domestica larvae. Drug Discov. Ther. 2017, 11, 156–160. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  86. Mourier, T.; Jeffares, D.C. Eukaryotic intron loss. Science 2003, 300, 1393. [Google Scholar] [CrossRef] [Green Version]
  87. Hogg, P.J. Disulfide bonds as switches for protein function. Trends Biochem. Sci. 2003, 28, 210–214. [Google Scholar] [CrossRef]
  88. Zhu, S.; Gao, B. Molecular characterization of a new scorpion venom lipolysis activating peptide: Evidence for disulfide bridge-mediated functional switch of peptides. FEBS Lett. 2006, 580, 6825–6836. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Defensin-like AMPs. (a) Comparison of representative MdDefensins (For the full-set MdDefensins sequences, see Figure S2) and insect defensins from other insects. Previously known sequences are labeled by a red “#”. Cysteines are shaded in yellow and the conserved glycines in grey. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. The extended N-terminus in MdDefesin10, MdDefesin 17, and MdDefesin 20 is shadowed in cyan. Structural elements, including three loops (designated as n-loop, m-loop and c-loop), the α-helix, the two-stranded β-sheet, the γ-Core motif as well as the three conserved disulfides are indicated at the bottom. The free cysteine (Cys1) is underlined once. Dm: Drosophila melanogaster (GenBank: P36192.1), Lucifensin: Lucilia sericata (PDB:2LLD), Pt: Protophormia terraenovae (GenBank: P10891.2), Sb: Simulium bannaense (GenBank: AJP36711.1). (b) Phylogenetic tree and 3D structure representatives of the defensins from the housefly and other insects. This tree was inferred using iqtree-2.0-rc2. Significant bootstrap values are indicated by a black circle for SH-aLRT > 90 and white for UFBoot > 85. See GenBank IDs and other details for each AMP in Table A1. The structures are shown as ribbons by MolMol, with their N- and C-termini labeled. The n-loops for all the structures and the free cysteine (Cys1) in MdDefensin20 are denoted. The extra N-terminal domain in MdDefensin17 is circled in blue.
Figure 1. Defensin-like AMPs. (a) Comparison of representative MdDefensins (For the full-set MdDefensins sequences, see Figure S2) and insect defensins from other insects. Previously known sequences are labeled by a red “#”. Cysteines are shaded in yellow and the conserved glycines in grey. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. The extended N-terminus in MdDefesin10, MdDefesin 17, and MdDefesin 20 is shadowed in cyan. Structural elements, including three loops (designated as n-loop, m-loop and c-loop), the α-helix, the two-stranded β-sheet, the γ-Core motif as well as the three conserved disulfides are indicated at the bottom. The free cysteine (Cys1) is underlined once. Dm: Drosophila melanogaster (GenBank: P36192.1), Lucifensin: Lucilia sericata (PDB:2LLD), Pt: Protophormia terraenovae (GenBank: P10891.2), Sb: Simulium bannaense (GenBank: AJP36711.1). (b) Phylogenetic tree and 3D structure representatives of the defensins from the housefly and other insects. This tree was inferred using iqtree-2.0-rc2. Significant bootstrap values are indicated by a black circle for SH-aLRT > 90 and white for UFBoot > 85. See GenBank IDs and other details for each AMP in Table A1. The structures are shown as ribbons by MolMol, with their N- and C-termini labeled. The n-loops for all the structures and the free cysteine (Cys1) in MdDefensin20 are denoted. The extra N-terminal domain in MdDefensin17 is circled in blue.
Diversity 13 00107 g001
Figure 2. Eppin-like AMPs. (a) Comparison of representative MdEppins (For the full-set MdEppin sequences, see Figure S3) and Kuntiz-domain-type AMPs from other species. Cysteines involved in the formation of disulfides are colored in yellow. Identical residues are shadowed in grey. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. The highly conserved domain in eppins is boxed in green. The P1 amino acids are italicized and shadowed in cyan. Conserved disulfides, α-helix and two-stranded β-sheets are also indicated at the bottom, while the disulfides in MdEppin35-1 are displayed above MdEppin35-1, in which newly emerged ones are shown in dark red. Gm: Galleria mellonella (GenBank: AAK40037.1), Pp: Pseudechis_porphyriacus (GenBank: sp_B5G6G6.1), Hm: Homo sapiens (GenBank:AAG00547.1). The phase 1 intron or 2 intron is boxed in green or red, and phase 0 intron showed by black lines. & represents only signal peptide and kuntiz-domain in human eppin are displayed. (b) Phylogenetic tree constructed from the alignment of amino acid sequences present in Figure S3 by iqtree with a maximum-likelihood method. Branches with a significant bootstrap value are indicated by black circles for SH-aLRT > 90 and white for UFBoot > 85. (c) 3D models of MdEppin-1 and 35-1. The disulfides are shown as color sticks (blue for MdEppin-1 and red for MdEppin35-1) with the conserved one indicated by a blue arrow. (d) The cicro visualization of the coevolution of MdEppin. Amino acid names and the position are in the outer ring. Conservation (second ring) from light blue (lower) to red (higher); cScore (third ring) from yellow (lower) to violet (higher). pScore (inner ring) form green (lower) to red (higher). Inner lines are the top 5% covariation scores. (e) Weblogo of MdEppins, and the cola of positions on cScore from yellow (lower) to violet (higher) is shown on the top with the distance being displayed. The P1 site is arrowed in blue and the positively selected sites are arrowed in turquoise. The highly conserved domain is boxed in green.
Figure 2. Eppin-like AMPs. (a) Comparison of representative MdEppins (For the full-set MdEppin sequences, see Figure S3) and Kuntiz-domain-type AMPs from other species. Cysteines involved in the formation of disulfides are colored in yellow. Identical residues are shadowed in grey. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. The highly conserved domain in eppins is boxed in green. The P1 amino acids are italicized and shadowed in cyan. Conserved disulfides, α-helix and two-stranded β-sheets are also indicated at the bottom, while the disulfides in MdEppin35-1 are displayed above MdEppin35-1, in which newly emerged ones are shown in dark red. Gm: Galleria mellonella (GenBank: AAK40037.1), Pp: Pseudechis_porphyriacus (GenBank: sp_B5G6G6.1), Hm: Homo sapiens (GenBank:AAG00547.1). The phase 1 intron or 2 intron is boxed in green or red, and phase 0 intron showed by black lines. & represents only signal peptide and kuntiz-domain in human eppin are displayed. (b) Phylogenetic tree constructed from the alignment of amino acid sequences present in Figure S3 by iqtree with a maximum-likelihood method. Branches with a significant bootstrap value are indicated by black circles for SH-aLRT > 90 and white for UFBoot > 85. (c) 3D models of MdEppin-1 and 35-1. The disulfides are shown as color sticks (blue for MdEppin-1 and red for MdEppin35-1) with the conserved one indicated by a blue arrow. (d) The cicro visualization of the coevolution of MdEppin. Amino acid names and the position are in the outer ring. Conservation (second ring) from light blue (lower) to red (higher); cScore (third ring) from yellow (lower) to violet (higher). pScore (inner ring) form green (lower) to red (higher). Inner lines are the top 5% covariation scores. (e) Weblogo of MdEppins, and the cola of positions on cScore from yellow (lower) to violet (higher) is shown on the top with the distance being displayed. The P1 site is arrowed in blue and the positively selected sites are arrowed in turquoise. The highly conserved domain is boxed in green.
Diversity 13 00107 g002
Figure 3. MdMuslins-like AMPs. (a) Multiple sequence alignment (MSA) of representative MdMuslins (For the full-set MdMuslins, see Figure S4) and kazal domains from other sepcies. Two serine residues mutated from the conserved cysteines are circled. Cysteines involved in the formation of disulfides are colored in yellow. Conservation replacements are shadowed in grey. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. The P1 amino acids are italicized and shadowed in cyan. Residues split by phase 1 introns are shadowed in green. α-helix and β-sheets are also indicated at the bottom together with the potential disulfides being showed with black lines, and the fourth bridge by dotted line. Aae: Aedes aegypti (GenBank: ABF18209.1). Aal: Aedes albopictus (GenBank: JAC06964.1), Ac: Apis cerana (GenBank: AGW24880.1), Cs: Channa striata(GenBank: CDG86164.1), Cp: Culex pipiens pallens (GenBank: AFN41343.1), DaCOW: Drosophila ananassae (GenBank: XP_001953960.2:219-270), Df: Drosophila ficusphila (GenBank: XP_017047190.1), Ds: Drosophila simulans (GenBank: XP_002105007.1). (b) Phylogenetic tree of the sequences constructed from the alignment of amino acid sequences present in Figure S4 by iqtree with a maximum-likelihood method. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. Muslins could be divided into four groups denoted by different colors. (c) 3D structures of MdMuslin1, MdMuslin2, MdMuslin25 and MdMuslin26. The disulfides are shown as blue sticks. The unpaired cysteine residues in MdMuslin1 (Cys5) and MdMuslin26 (Cys1) are displayed. C4-C8 in MdMuslin25 and MdMuslin26 is pointed out by a black arrow. (d) The coevolution of MdMuslins in M. domestica. (e) Weblogo of MdMuslins. The results indicate the residues in MdMuslins are changeable, and the potential site likely contributing to evolution of P1 (blue) are displayed with the distance (cola of the c-Score) and position (turquoise).
Figure 3. MdMuslins-like AMPs. (a) Multiple sequence alignment (MSA) of representative MdMuslins (For the full-set MdMuslins, see Figure S4) and kazal domains from other sepcies. Two serine residues mutated from the conserved cysteines are circled. Cysteines involved in the formation of disulfides are colored in yellow. Conservation replacements are shadowed in grey. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. The P1 amino acids are italicized and shadowed in cyan. Residues split by phase 1 introns are shadowed in green. α-helix and β-sheets are also indicated at the bottom together with the potential disulfides being showed with black lines, and the fourth bridge by dotted line. Aae: Aedes aegypti (GenBank: ABF18209.1). Aal: Aedes albopictus (GenBank: JAC06964.1), Ac: Apis cerana (GenBank: AGW24880.1), Cs: Channa striata(GenBank: CDG86164.1), Cp: Culex pipiens pallens (GenBank: AFN41343.1), DaCOW: Drosophila ananassae (GenBank: XP_001953960.2:219-270), Df: Drosophila ficusphila (GenBank: XP_017047190.1), Ds: Drosophila simulans (GenBank: XP_002105007.1). (b) Phylogenetic tree of the sequences constructed from the alignment of amino acid sequences present in Figure S4 by iqtree with a maximum-likelihood method. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. Muslins could be divided into four groups denoted by different colors. (c) 3D structures of MdMuslin1, MdMuslin2, MdMuslin25 and MdMuslin26. The disulfides are shown as blue sticks. The unpaired cysteine residues in MdMuslin1 (Cys5) and MdMuslin26 (Cys1) are displayed. C4-C8 in MdMuslin25 and MdMuslin26 is pointed out by a black arrow. (d) The coevolution of MdMuslins in M. domestica. (e) Weblogo of MdMuslins. The results indicate the residues in MdMuslins are changeable, and the potential site likely contributing to evolution of P1 (blue) are displayed with the distance (cola of the c-Score) and position (turquoise).
Diversity 13 00107 g003
Figure 4. MdSVWC-domain peptides. (a) Comparison of representative MdSVWCs (For the full-set sequences, see Figure S5) and homologs from other species. Cysteines involved in the formation of disulfide bridges are colored in yellow. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. Disulfide bridges, α-helix and β-sheets are showed at the bottom. Residues split by phase 1 and 2 introns are shadowed in green and red, respectively. Bm: Bombyx mandarina (GenBank: XP_028031732.1), Dw: Drosophila willistoni (GenBank: XP_015032364.1), Granularin: Lymnaea stagnalis (GenBank: AAS20460.1), Lv: Litopenaeus vannamei (GenBank: HQ541159.1). (b) Phylogenetic tree constructed from the alignment of amino acid sequences present in Figure S5. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. (c) The 3D models of MdSVWC1 and MdSVWC17. The four-β-stranded sheet and the α-helix are located on the N-terminus of MdSVWC1. For MdSVWC17, the α-helix in its N-terminus is replaced by a β-strand.
Figure 4. MdSVWC-domain peptides. (a) Comparison of representative MdSVWCs (For the full-set sequences, see Figure S5) and homologs from other species. Cysteines involved in the formation of disulfide bridges are colored in yellow. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. Disulfide bridges, α-helix and β-sheets are showed at the bottom. Residues split by phase 1 and 2 introns are shadowed in green and red, respectively. Bm: Bombyx mandarina (GenBank: XP_028031732.1), Dw: Drosophila willistoni (GenBank: XP_015032364.1), Granularin: Lymnaea stagnalis (GenBank: AAS20460.1), Lv: Litopenaeus vannamei (GenBank: HQ541159.1). (b) Phylogenetic tree constructed from the alignment of amino acid sequences present in Figure S5. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. (c) The 3D models of MdSVWC1 and MdSVWC17. The four-β-stranded sheet and the α-helix are located on the N-terminus of MdSVWC1. For MdSVWC17, the α-helix in its N-terminus is replaced by a β-strand.
Diversity 13 00107 g004
Figure 5. The phylogenetic tree and structural models of Crustins. (a) Phylogenetic tree of crustins (details in Figure S6) inferred using iqtree-1.62. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. Members belonging to the same subtype are clustered together and their branches are marked by the same color with six subgroups designated. The predicted domains are also showed. (b) Structure of MdCrustin1 (yellow) and MdCrustin3 (blue). Disulfides are shown as green sticks and the C-terminal cysteine-rich domain in MdCrustins3 is circled in red.
Figure 5. The phylogenetic tree and structural models of Crustins. (a) Phylogenetic tree of crustins (details in Figure S6) inferred using iqtree-1.62. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. Members belonging to the same subtype are clustered together and their branches are marked by the same color with six subgroups designated. The predicted domains are also showed. (b) Structure of MdCrustin1 (yellow) and MdCrustin3 (blue). Disulfides are shown as green sticks and the C-terminal cysteine-rich domain in MdCrustins3 is circled in red.
Diversity 13 00107 g005
Figure 6. Cecropin-like peptides. (a) MSA of cecropins. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. Conserved residues are shadowed in cyan. # represent the peptides previously known. Dm: Drosophila melanogaster (NP_524588.1). The black line denotes the position a conserved phase 0 intron. (b) Helical-wheel and spheres diagram of MdCecropin2. Left: The helical wheel projection showing the amphiphilic characteristics of the cecropin. By default, the output presents the hydrophilic residues as circles, hydrophobic residues as diamonds, potentially negatively charged as triangles, and potentially positively charged as pentagons. Hydrophobicity is color coded as well: the most hydrophobic residue is green, and the amount of green is decreasing proportionally to the hydrophobicity, with zero hydrophobicity coded as yellow. Hydrophilic residues are coded red with pure red being the most hydrophilic (uncharged) residue, and the amount of red decreasing proportionally to the hydrophilicity. The potentially charged residues are light blue. Middle: Carton model of MdCecropin2. The residues are colored as followed: the hydrophilic residues are in cyan, positive ones are in blue, negative are in red, and the hydrophobic ones are in gray. Glycine and Leucine residue at position 24, 25 serves to connect the two helices which are shown as stick as well. Right: The spheres diagram shows the structure of Mdcecropin2 with the same color codes with the carton picture.
Figure 6. Cecropin-like peptides. (a) MSA of cecropins. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. Conserved residues are shadowed in cyan. # represent the peptides previously known. Dm: Drosophila melanogaster (NP_524588.1). The black line denotes the position a conserved phase 0 intron. (b) Helical-wheel and spheres diagram of MdCecropin2. Left: The helical wheel projection showing the amphiphilic characteristics of the cecropin. By default, the output presents the hydrophilic residues as circles, hydrophobic residues as diamonds, potentially negatively charged as triangles, and potentially positively charged as pentagons. Hydrophobicity is color coded as well: the most hydrophobic residue is green, and the amount of green is decreasing proportionally to the hydrophobicity, with zero hydrophobicity coded as yellow. Hydrophilic residues are coded red with pure red being the most hydrophilic (uncharged) residue, and the amount of red decreasing proportionally to the hydrophilicity. The potentially charged residues are light blue. Middle: Carton model of MdCecropin2. The residues are colored as followed: the hydrophilic residues are in cyan, positive ones are in blue, negative are in red, and the hydrophobic ones are in gray. Glycine and Leucine residue at position 24, 25 serves to connect the two helices which are shown as stick as well. Right: The spheres diagram shows the structure of Mdcecropin2 with the same color codes with the carton picture.
Diversity 13 00107 g006
Figure 7. MSA of specific amino acid-rich AMPs. (a) Proline-rich AMPs. Prolines are shadowed in pink and the RXXR motifs are boxed in red. (b) and (c) Glycine-rich AMPs (G1 and G2 domains). Glycines are highlighted in dark red. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. # represent the peptides previously known. The phase-2 introns are shadowed in red. The details of these peptides including Domesticin, Diptericin, Edin, AttacinA and AttacinC, and AttacinD are provided in Supplementary Information Figures S7–S12, respectively. DsIp18: Drosophila serrata (GenBank: XP_020804442.1), DmDiptericin: Drosophila melanogaster (GenBank: AAB82521.1), DmEdin: Drosophila melanogaster (GenBank: NP_730278.1), DmAttacinA: Drosophila melanogaster (GenBank: ABS52579.1), DmAttacinC: Drosophila melanogaster (GenBank: NP_523729.3), DmAttacinD: Drosophila melanogaster (GenBank: NP_524391.2).
Figure 7. MSA of specific amino acid-rich AMPs. (a) Proline-rich AMPs. Prolines are shadowed in pink and the RXXR motifs are boxed in red. (b) and (c) Glycine-rich AMPs (G1 and G2 domains). Glycines are highlighted in dark red. Basic residues (K, R, H) and acidic residues (E, D) are highlighted in blue and red, respectively. # represent the peptides previously known. The phase-2 introns are shadowed in red. The details of these peptides including Domesticin, Diptericin, Edin, AttacinA and AttacinC, and AttacinD are provided in Supplementary Information Figures S7–S12, respectively. DsIp18: Drosophila serrata (GenBank: XP_020804442.1), DmDiptericin: Drosophila melanogaster (GenBank: AAB82521.1), DmEdin: Drosophila melanogaster (GenBank: NP_730278.1), DmAttacinA: Drosophila melanogaster (GenBank: ABS52579.1), DmAttacinC: Drosophila melanogaster (GenBank: NP_523729.3), DmAttacinD: Drosophila melanogaster (GenBank: NP_524391.2).
Diversity 13 00107 g007
Figure 8. Phylogenetic tree of proline-rich AMPs in insects. The tree was constructed based on the MSA of proline-rich peptides in M. domestica and other insects. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. For the sequence, prolines are shadowed in pink with percentage of proline residues being calculated. The identity percentages compared with Formaecin-1 were calculated as well. The RxRR motifs are boxed in red. # represent the peptides previously known and “a” represents known mature peptides. The taxonomy is displayed on the right. BmLebocin: Bombyx mori (GenBank: AAB35218.1), DmAttacinA: Drosophila melanogaster (GenBank: ABS52579.1), DmAttacinB: Drosophila melanogaster (GenBank: Q9V751.2), DmAttacinC: Drosophila melanogaster (GenBank: NP_523729.3), DmDiptericin: Drosophila melanogaster (GenBank: AAB82521.1), DmDrosocin: Drosophila melanogaster (GenBank: CAA79936.1), DmMetchnikowin: Drosophila melanogaster(GenBank: NP_523752.1),HvHeliocin: Heliothis virescens (GenBank: P83427.1), Mg: Myrmecia gulosa (GenBank: P81438), PaPyrrhocoricin: Pyrrhocoris apterus (GenBank: P37362), PpMetalnikowin: Palomena prasine (GenBank: P80408).
Figure 8. Phylogenetic tree of proline-rich AMPs in insects. The tree was constructed based on the MSA of proline-rich peptides in M. domestica and other insects. Branches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. For the sequence, prolines are shadowed in pink with percentage of proline residues being calculated. The identity percentages compared with Formaecin-1 were calculated as well. The RxRR motifs are boxed in red. # represent the peptides previously known and “a” represents known mature peptides. The taxonomy is displayed on the right. BmLebocin: Bombyx mori (GenBank: AAB35218.1), DmAttacinA: Drosophila melanogaster (GenBank: ABS52579.1), DmAttacinB: Drosophila melanogaster (GenBank: Q9V751.2), DmAttacinC: Drosophila melanogaster (GenBank: NP_523729.3), DmDiptericin: Drosophila melanogaster (GenBank: AAB82521.1), DmDrosocin: Drosophila melanogaster (GenBank: CAA79936.1), DmMetchnikowin: Drosophila melanogaster(GenBank: NP_523752.1),HvHeliocin: Heliothis virescens (GenBank: P83427.1), Mg: Myrmecia gulosa (GenBank: P81438), PaPyrrhocoricin: Pyrrhocoris apterus (GenBank: P37362), PpMetalnikowin: Palomena prasine (GenBank: P80408).
Diversity 13 00107 g008
Figure 9. Phylogenetic tree of the glycine-rich domain in the MdAttacins, MdEdins, MdDiptericins families. The G1-domain of attacinA-D from the fruit fly was used as an outgroup. The different group are shadowed in different colors. The N-domain, G1 and G2 domain are abbreviated as N, G1 and G2. Braches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. # represent the peptides previously known.
Figure 9. Phylogenetic tree of the glycine-rich domain in the MdAttacins, MdEdins, MdDiptericins families. The G1-domain of attacinA-D from the fruit fly was used as an outgroup. The different group are shadowed in different colors. The N-domain, G1 and G2 domain are abbreviated as N, G1 and G2. Braches with a significant bootstrap value are indicated by black circle for SH-aLRT > 90 and white for UFBoot > 85. # represent the peptides previously known.
Diversity 13 00107 g009
Figure 10. Structural domains of the specific amino acid-rich AMPs. Different domains are presented in different colors. Notes are shown in the right bottom. # represent the peptides previously known.
Figure 10. Structural domains of the specific amino acid-rich AMPs. Different domains are presented in different colors. Notes are shown in the right bottom. # represent the peptides previously known.
Diversity 13 00107 g010
Figure 11. Gene duplication in the M. domestica AMP genes. (a) The clusters of MdDefensins. The two clusters are respectively located on Genbank ID AQPM01000006.1 and AQPM01000010.1. (b) The clusters of MdEppins. The big cluster is located on GenBank ID AQPM01069148.1; and two small clusters are respectively located on GenBank ID NDYK01230132.1 and AQPM01069146.1. (c) The cluster of MdMuslins (AQPM01019378.1 and AQPM01000603.1). (d) The clusters of MdSVWCs (AQPM01095428.1 and AQPM01095425.1). (e) The clusters of MdCecropins (AQPM01058001.1, AQPM01058004.1 and AQPM01100428.1). (f) The clusters of MdEdins (AQPM01067938.1 and AQPM01067936.1). (g) The MdAttacinC cluster (AQPM01059487.1). Chromosome fragments are shown in lines and genes coding AMPs are represented by boxes. Introns are indicated in triangles with different colors.
Figure 11. Gene duplication in the M. domestica AMP genes. (a) The clusters of MdDefensins. The two clusters are respectively located on Genbank ID AQPM01000006.1 and AQPM01000010.1. (b) The clusters of MdEppins. The big cluster is located on GenBank ID AQPM01069148.1; and two small clusters are respectively located on GenBank ID NDYK01230132.1 and AQPM01069146.1. (c) The cluster of MdMuslins (AQPM01019378.1 and AQPM01000603.1). (d) The clusters of MdSVWCs (AQPM01095428.1 and AQPM01095425.1). (e) The clusters of MdCecropins (AQPM01058001.1, AQPM01058004.1 and AQPM01100428.1). (f) The clusters of MdEdins (AQPM01067938.1 and AQPM01067936.1). (g) The MdAttacinC cluster (AQPM01059487.1). Chromosome fragments are shown in lines and genes coding AMPs are represented by boxes. Introns are indicated in triangles with different colors.
Diversity 13 00107 g011
Figure 12. Domain repeats in different M. domestica AMPs. (a) MdEppin35. (b) MdMuslin16. (c) MdEdin6. Each repeated domain structure is also shown here. The disulfides are shown as blue sticks. Introns are indicated in triangles with phase 1 in green, phase 2 in red, and phase 0 in blue. (d) Schematic diagram showing a potential exon-shuffling creating the multiple-domain MdEppin35. In this process, the two same phase introns (here phase 0) located on the boundaries of the domain could contribute to its insertion into an ancestor via intron-mediated shuffling.
Figure 12. Domain repeats in different M. domestica AMPs. (a) MdEppin35. (b) MdMuslin16. (c) MdEdin6. Each repeated domain structure is also shown here. The disulfides are shown as blue sticks. Introns are indicated in triangles with phase 1 in green, phase 2 in red, and phase 0 in blue. (d) Schematic diagram showing a potential exon-shuffling creating the multiple-domain MdEppin35. In this process, the two same phase introns (here phase 0) located on the boundaries of the domain could contribute to its insertion into an ancestor via intron-mediated shuffling.
Diversity 13 00107 g012
Figure 13. The homodimers of MdDefensin20, MdMuslin1 and MdMuslin26. (a,c,e) The 3D structures with the interchain disulfide bridges built between two monomers denoted by blue cyan arrows. (b,d,f) Ramachandran plot analysis with PROCHECK. In this plot, almost all φ/ψ torsion angles are found in the favored or additionally allowed regions and only few residues are in a disallowed region.
Figure 13. The homodimers of MdDefensin20, MdMuslin1 and MdMuslin26. (a,c,e) The 3D structures with the interchain disulfide bridges built between two monomers denoted by blue cyan arrows. (b,d,f) Ramachandran plot analysis with PROCHECK. In this plot, almost all φ/ψ torsion angles are found in the favored or additionally allowed regions and only few residues are in a disallowed region.
Diversity 13 00107 g013
Table 1. Maximum likelihood estimates of parameters and sites inferred to be under positive selection in the M. domestica defensins.
Table 1. Maximum likelihood estimates of parameters and sites inferred to be under positive selection in the M. domestica defensins.
ModellLRTParametersPositive Selected Sites
M0–918.761 ω = 0.16294None
M1a–891.66 ρ0 = 0.24997, ω0 = 0.00000Not allowed
ρ1 = 0.75003, ω1 = 1.00000
M2a–888.785.74ρ0 = 0.24996, ω0 = 0.00000
ρ1 = 0.66063, ω1 = 1.000007 S *, 22 L
ρ2 = 0.08941 ω2 = 3.89787
M7–875.388 p = 0.39543, q = 1.05516Not allowed
M8–875.260.26p0 = 0.97856, p = 0.41043, q = 1.166707 S
(p1 = 0.02144), ω = 2.30984
Note: l is the log likelihood; LRT is likelihood ratio test, which is twice the log likelihood difference (2Δl) between the null models (M1a and M7) and their alternative models (M2a and M8): M1a/M2a = 5.74 (df = 2, p = 0.05670); M7/M8 = 0.26 (p = 1). Positively selected sites identified by the BEB method under M2a with posterior probabilities P > 0.6 are shown and with P > 0.95 by “*”.
Table 2. Maximum likelihood estimates of parameters and sites inferred to be under positive selection in the M. domestica eppins.
Table 2. Maximum likelihood estimates of parameters and sites inferred to be under positive selection in the M. domestica eppins.
ModellLRTParametersPositive Selected Sites
M0–5115.94 ω = 0.10448None
M1a–4880.09 ρ0 = 0.32678, ω0 = 0.05181Not allowed
ρ1 = 0.67322, ω1 = 1.00000
M2a–4872.9211.22ρ0 = 0.32607, ω0 = 0.057607R, 12K *, 14V, 18L *, 20P, 33E, 47P, 57Q
ρ1 = 0.52462, ω1 = 1.00000
ρ2 = 0.14931, ω2 = 2.14278
M7–4765.64 p = 0.52104, q = 2.05131Not allowed
M8–4765.640.00p0 = 1.00000, p = 0.52104, q = 2.05131None
(p1 = 0.00000), ω = 1.00000
Note: 2Δl between null models (M1a and M7) and their alternative models (M2a and M8): M1a/M2a = 11.22 (df = 2, p = 0.00366); M7/M8 = 0.00 (p = 1). Positively selected sites identified by the BEB method under M2a with posterior probabilities P > 0.6 are shown and those with P > 0.95 indicated by “*”.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qi, S.; Gao, B.; Zhu, S. Molecular Diversity and Evolution of Antimicrobial Peptides in Musca domestica. Diversity 2021, 13, 107. https://doi.org/10.3390/d13030107

AMA Style

Qi S, Gao B, Zhu S. Molecular Diversity and Evolution of Antimicrobial Peptides in Musca domestica. Diversity. 2021; 13(3):107. https://doi.org/10.3390/d13030107

Chicago/Turabian Style

Qi, Sudong, Bin Gao, and Shunyi Zhu. 2021. "Molecular Diversity and Evolution of Antimicrobial Peptides in Musca domestica" Diversity 13, no. 3: 107. https://doi.org/10.3390/d13030107

APA Style

Qi, S., Gao, B., & Zhu, S. (2021). Molecular Diversity and Evolution of Antimicrobial Peptides in Musca domestica. Diversity, 13(3), 107. https://doi.org/10.3390/d13030107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop