Next Article in Journal
Self-Replicating Alphaviruses: From Pathogens to Therapeutic Agents
Previous Article in Journal
Antisera-Neutralizing Capacity of a Highly Evolved Type 2 Vaccine-Derived Poliovirus from an Immunodeficient Patient
Previous Article in Special Issue
Redundancy in Innate Immune Pathways That Promote CD8+ T-Cell Responses in AAV1 Muscle Gene Transfer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Adeno-Associated Virus Replication Protein Rep78 Contains a Strictly C-Terminal Sequence Motif Conserved Across Dependoparvoviruses

by
David G. Karlin
1,2
1
Division Phytomedicine, Thaer-Institute of Agricultural and Horticultural Sciences, Humboldt-Universität zu Berlin, Lentzeallee 55/57, D-14195 Berlin, Germany
2
Independent Researcher, 13000 Marseille, France
Viruses 2024, 16(11), 1760; https://doi.org/10.3390/v16111760
Submission received: 23 October 2024 / Revised: 7 November 2024 / Accepted: 8 November 2024 / Published: 12 November 2024
(This article belongs to the Special Issue Virology and Immunology of Gene Therapy)

Abstract

:
Adeno-Associated Viruses (AAVs, genus Dependoparvovirus) are the leading gene therapy vector. Until recently, efforts to enhance their capacity for gene delivery had focused on their capsids. However, efforts are increasingly shifting towards improving the viral replication protein, Rep78. We discovered that Rep78 and its shorter isoform Rep52 contain a strictly C-terminal sequence motif, DDx3EQ, conserved in most dependoparvoviruses. The motif is highly negatively charged and devoid of prolines. Its wide conservation suggests that it is required for the life cycle of dependoparvoviruses. Despite its short length, the motif’s strictly C-terminal position has the potential to endow it with a high recognition specificity. A candidate target of the DDx3EQ motif might be the DNA-binding interface of the origin-binding domain of Rep78, which is highly positively charged. Published studies suggest that this motif is not required for recombinant AAV production, but that substitutions within it might improve production.

1. Introduction

Adeno-Associated Viruses (AAVs, genus Dependoparvovirus) are the leading vector for delivering gene therapies [1,2,3,4]. Recombinant AAVs can package foreign genes into their capsid [1,5,6], and, until recently, efforts to enhance gene delivery had focused on tailoring and improving capsids. However, efforts are increasingly shifting to improving the viral replication protein encoded by the rep gene [7,8,9].
Rep encodes four protein isoforms (Figure 1) thanks to a combination of alternative promoters and alternative splicing sites [10]: two long isoforms (Rep78 and Rep68) and two short isoforms (Rep52 and Rep40). The larger Rep proteins, Rep78 and Rep68, are required to replicate the genome, while Rep52 and Rep40 facilitate the packaging of the genome [11]. Rep78 and Rep68 are sufficient for recombinant AAV production [12].
Three main regions have been delineated in Rep78 (Figure 1): an origin-binding domain [13], a helicase domain [14], and a C-terminal region, predicted to contain zinc-fingers [15]. While the origin-binding and helicase domains have been systematically investigated, there has been no in-depth sequence analysis of the C-terminus beyond the putative identification of three zinc fingers [15]. Here, we examined sequences of Rep 78 across all dependoparvoviruses, beyond the usual ones employed in gene therapy (AAV1 to AAV13), and discovered that Rep78 contains a strictly C-terminal motif conserved in most dependoparvoviruses.
The representation is to scale. Znf: Zinc finger. The DDx3EQ motif was discovered in the present study. The numbering of all Rep proteins is given with reference to Rep78.

2. Materials and Methods

2.1. Protein Sequence Analysis

We extracted Dependoparvovirus sequences from NCBI’s Genbank [16] on 1 July 2024. We used Psi-Coffee [17] for multiple sequence alignment. Alignments are shown with Jalview [18] using the ClustalX colouring scheme [19]. We used flDPnn [20] for predicting disordered regions.
The compositional bias of the DDx3EQ motif was assessed using Composition Profiler [21] against two datasets, (1) SwissProt (version 51) [22] and (2) a dataset composed of all dependoparvovirus Rep78 sequences (available in File S2), after having removed their DDx3EQ motif, i.e., the last C-terminal 7 aa in each sequence.

2.2. Sequence Motif Searches

We looked for known motifs similar to the DDx3EQ motif using Comparimotif [23] and TOMtOM [24].
We used Comparimotif to scan the databases ELM [25] (March 2022 release) with the regular expression [DEP]D[^P][^P][^P]EQ$, in which [^P] corresponds to any aa except P, and ‘$’ specifies that the motif must be a C-terminal. The request was made through a restful API: https://slim.icr.ac.uk/restapi/rest/get/comparimotif?task=run_comparimotif&motif=[DEP]D[^P][^P][^P]EQ$, accessed on 19 August 2024.
We also used TOMtOM [24] to scan the database Prosite (april 2021 release) [26].
We looked for proteins containing the DDx3EQ motif using Patternsearch, ran from the web-based version of the MPI toolkit [27] (https://toolkit.tuebingen.mpg.de/), accessed on 5 August 2024, against three databases that are subsets from Genbank [16]: (1) the database nr_vir70_12Mar containing viral proteins clustered at 70% sequence identity on 12 March 2024; (2) the database Homo Sapiens_4Jul containing Homo Sapiens proteins on 4 July 2024; (3) and the database PDB_nr_12_Mar containing proteins with an experimentally solved 3D structure on 12 March 2024. We used, as input, the regular expression [DEP]-D-{P}-{P}-{P}-E-Q>, which follows the Prosite syntax [28], in which {P} corresponds to an excluded P aa and ‘>’ specifies that the motif must be C-terminal.

2.3. D Structure Prediction and Visualization

We predicted the 3D structure of the C-terminal region of AAV2 Rep78 using Alphafold3 [29] with 3 zinc atoms. AlphaFold3 outputs a measure of reliability of the 3D structure for each aa, pLDDT. pLDDT ≥ 0.70 corresponds to a reliable prediction and pLDDT ≥ 0.90 corresponds to a highly reliable prediction (expected to be competitive with an experimentally solved 3D structure) [30]. Structures were visualized using ChimeraX (version 1.8) [31].
We also used Alphafold3 to predict the 3D structure of a putative complex between the origin-binding domain and the C-terminal DDx3EQ peptide of AAV2 Rep78 (D615DCIFEQ621). Alphafold3 provides a measure of reliability of the interaction, ipTM. An ipTM > 0.8 indicates a reliably predicted interaction, 0.8 ≥ ipTM ≥ 0.6 corresponds to a “gray zone” in which predictions may be correct or incorrect, and ipTM < 0.6 indicates an unreliable prediction [32,33].

3. Results

3.1. The C-Terminal Region of Rep78 Contains 3 Predicted Zinc Fingers and Flexible Regions

We analyzed the Rep78 protein of AAV2, the Dependoparvovirus model species (Genbank accession number YP_680423.1, see Table 1). The C-terminal region of Rep78 starts with a linker predicted to be disordered (aa 493–521 in AAV2, see Figure 1). We modelled the 3D structure of the remaining C-terminal part (aa 522–621) using Alphafold3 [29] (the coordinates of the model are in File S1). The model contains two regions reliably predicted to adopt a fixed 3D structure (in red in Figure 2A; see also Figure 1):
(1)
aa 525–573 are composed of two zinc fingers (named 1 and 2) of the CHCC type (Figure 2A, left). These zinc fingers are predicted to adopt a fixed conformation relative to each other (Figure 2B).
(2)
aa 587–612 are composed of a third zinc finger, also of the CHCC type, followed by a predicted α-helix (Figure 2A, right).
All of the three zinc fingers follow the consensus sequence C-x(2)-H-x(n)-C-x(2)-C, corresponding to an unusual type of zinc finger, found, for example, in the mengovirus Leader protein and in archaeal transcription factors [34], and divergent from classical zinc fingers [35,36].
The remaining regions are not reliably modelled using Alphafold3, despite being predicted to be ordered, which indicates that they are conformationally flexible; they are visible as blue or white ribbons in Figure 2.

3.2. The C-Terminal Region of Dependoparvoviral Rep78 and Rep52 Contains a Conserved Motif, DDx3EQ, Not Similar to a Known Motif

The C-terminal region of Rep78 is highly variable in sequence across dependoparvoviruses, as shown in Figure 3 (see also File S2). However, we noticed that in almost all dependoparvoviruses, it contains a D-D-x(3)-E-Q sequence motif (in which x(3) represents a consecutive stretch of any three aa) at the C-terminus (aa 615–621 in AAV2). The motif is shown in Figure 3, right, and for simplicity, we will refer to it as DDx3EQ.
In all dependoparvoviruses, the last aa of the motif, Q, is also the last aa of Rep78. This strictly C-terminal position confers a markedly enhanced specificity to motifs [37] (see Section 4).
Only a handful of dependoparvovirus Rep78 proteins do not have the DDx3EQ motif (File S2), being the related viruses desmodus rotundus dependoparvovirus (Dependoparvovirus chiropteran2) [38] and feline dependoparvovirus (Dependoparvovirus carnivoran1) [39], the canary dependoparvoviruses 1 and 2 [40], and five bird dependoparvoviruses [41]: isolates ltt164par2 (Genbank accession number QLF86430.1), sis142par1 (QKE54964.1), zftwig05par3 (QKN88780.1), wpk049par01 (QKE60686.1), and avian AAV isolate BR_DF12 (YP_010802670.1). The latter presents a striking case. Its rep gene contains a long (1803 nucleotides) reading frame overlapping that of Rep78, which encodes a potential protein of 243 aa ending with a C-terminal DDx3EQ motif. The sequence of that protein and its location within Rep are presented in File S3.
Finally, we found that the DDx3EQ motif is not similar to a known motif, according to both Comparimotif [23] and TOMtOM [24] (see Section 2).
Figure 3. The variable C-terminal region of dependoparvovirus Rep78 contains a DDx3EQ motif. Top panel: Sequence alignment of the C-terminal region of Rep78 among representative dependoparvoviruses. Note its high variability and the conserved DDx3EQ motif at the very C-terminus. Bottom panel: sequence logo of the DDx3EQ motif, made using WebLogo [42].
Figure 3. The variable C-terminal region of dependoparvovirus Rep78 contains a DDx3EQ motif. Top panel: Sequence alignment of the C-terminal region of Rep78 among representative dependoparvoviruses. Note its high variability and the conserved DDx3EQ motif at the very C-terminus. Bottom panel: sequence logo of the DDx3EQ motif, made using WebLogo [42].
Viruses 16 01760 g003

3.3. The Motif Contains Three Strictly Conserved aa, Is Highly Negatively Charged, and Is Devoid of Prolines

The frequency of each aa at each position of the motif is shown in Figure 3, bottom panel. Three positions are strictly conserved (Figure 3, bottom panel): an aspartate in position 2 (D616 in AAV2), a glutamate in position 6 (E620), and a glutamine in position 7 (Q621). Position 1 almost exclusively contains an aspartate (D615), rarely a glutamate (also negatively charged) or a proline. Position 3 is enriched in hydrophobic aa; position 4 is enriched in negatively charged aa, depleted in hydrophobic aa, and contains no positively charged aa; and position 5 is enriched in polar aa, particularly in charged ones.
The motif is significantly (p < 0.005) enriched in negative aa compared both to the protein database SwissProt and to the rest of Rep78 (see Section 2); its negative charge is expected to be further increased by its C-terminal carboxylate ion (COO).
Strikingly, the motif is completely devoid of prolines, except at position 1 in anser anser dependoparvovirus (Figure 3) and a few related species, suggesting that forming an α-helix might be required for its function.

3.4. A Conserved DDx3EQ Motif Is Found in One Protein from a Eukaryotic Virus and in One Human Protein

To obtain clues regarding the function of the DDx3EQ motif, we searched for other proteins from either eukaryotic viruses or humans that would have the motif conserved in at least another species (see Section 2).
In eukaryotic viruses other than dependoparvoviruses, we could only identify one protein with a conserved C-terminal DDx3EQ motif, the protease 2A from the genus Enterovirus. As an example, the C-terminus of the protease 2A of enterovirus D (NP_740416.1) is EDdamEQ, i.e., with an E in position 1 instead of the D most commonly found in dependoparvovirus Rep78—see Figure 3, bottom panel).
The motif is found in the species Enterovirus A–B, D, F, and H–J, but not in Enterovirus E, G, and K, nor in the three species Rhinovirus A, B, and C. In Enterovirus C, the motif is degenerate, i.e., there is an E in position 2 instead of the strictly conserved D. File S4 presents an alignment of enterovirus proteases 2A that have the DDx3EQ motif.
The enteroviral protease 2A is cleaved immediately after the conserved Q of the motif by another enteroviral protease, 3C [43]. Apart from this Q, no position of the DDx3EQ motif corresponds to the cleavage specificity of 3C, whose main specificity determinant is an A three aa upstream of the Q at which the cleavage occurs (i.e., in position 4 of the motif) [44]. Therefore, the presence of the motif in the 2A protease does not stem from a requirement for cleavage by the 3C protease. Interestingly, removing the 5 aa immediately upstream of the C-terminal Q (i.e., most of the motif) from the 2A protease of poliovirus (Enterovirus C) is lethal without affecting its protease function [45].
The DDx3EQ motif forms a coil with no regular secondary structure in the 2A protease of coxsackievirus B4 (Enterovirus B) [46], similar to the Alphafold3 prediction for the AAV2 Rep78 motif (Figure 2A). The motif is not visible in the structure of the related coxsackievirus B3 2A protease, suggesting that it is flexible [47].
Finally, we could only identify a single human protein with a conserved C-terminal DDx3EQ motif: Cep57L1 (Centrosomal protein 57 kDa-like protein 1, Uniprot accession number Q8IYX8). Its last 7 C-terminal aa are DDimwEQ. The motif is conserved across amniotes (clade Amniota). File S5 presents an alignment of Cep57L1 orthologs that have the DDx3EQ motif. Cep57L1 contributes to maintaining centriole engagement during interphase [48]. No functional data are available regarding the role of its C-terminus, to our knowledge, and in a recent study, Cep57L1 was not part of the proteins identified as having mutations in their C-terminus that cause disease in humans [49].

4. Discussion

4.1. The DDx3EQ Motif Should Have a High Binding Specificity Despite Its Short Length, and Is Probably Essential for AAV Replication

Only a handful of strictly C-terminal sequence motifs have been described in eukaryotic viruses [37,50]. The C-terminal position confers a high binding specificity to these motifs, even when relatively short, because only one free carboxy group is found in each protein at the C-terminus, where it can be recognized by specialized enzymes. For example, with the average length of a human protein being ~600aa, a motif containing a glutamine with a free C-terminal carboxy group is found 600 times less frequently than a glutamine within a non C-terminal motif [37].
Given the high rate of evolution of viral proteins, the DDx3EQ motif is most probably essential for dependoparvoviruses, since it is conserved in almost the whole genus. We cannot infer its function from published experimental studies, since, to our knowledge, no study tested the effect of substitutions or deletions of the very C-terminus of Rep78 (aa 608–621, beyond zinc finger 3) on the replication of wild-type AAVs. The further downstream substitution we are aware of is in aa 607 in zinc finger 3 [15], and the second most downstream substitution we are aware of concerns aa 540 [51].
Interestingly, we could only identify a single viral protein (the enterovirus protease 2A) and a single human protein (Cep57L1) with a conserved DDx3EQ motif. We note that the presence of the motif in these proteins may be coincidental and does not imply functional similarity to Rep78.

4.2. Hypothesis: The DDx3EQ Motif May Bind the DNA-Binding Interface of the Origin-Binding Domain of Rep78

“No one believes a hypothesis except its originator, but everyone believes an experiment except the experimenter” (William Ian Beardmore Beveridge). A study on human C-terminal motifs found that they typically have either one of three functions, in decreasing order of frequency [52]: (1) directing post-translational modification [53]; (2) binding another protein(s); (3) directing trafficking through the cell [37]. We can only provide a meaningful hypothesis regarding the second function, i.e., binding a protein.
Given the considerable rate of sequence evolution of viral proteins, the fact that such a short motif contains three strictly conserved aa suggests that it binds either a cellular protein or a highly conserved region of a viral protein. A prime candidate would be the DNA-binding interface of the origin-binding domain of Rep78, which is well conserved in sequence and positively charged, while the DDx3EQ motif is highly negatively charged. This hypothesis is biologically meaningful since (a) the DDx3EQ motif and the origin-binding domain are always in close proximity, being part of the same protein, and (b) binding the motif would provide a mechanism for regulating the interaction of this domain with the inverted terminal repeats during the replication cycle [54]. In this scenario, the DDx3EQ motif of Rep 52, the shorter isoform of Rep78, could not interact in cis with the origin-binding domain, since Rep52 is devoid of this domain (Figure 1). We emphasize that this scenario is merely proposed as a biological hypothesis meant to guide experiments.
Attempts to confirm or infirm our hypothesis using Alphafold3 were unsuccessful; Alphafold3 output a reliability estimator (ipTM) of only 0.55 for the interaction between the origin-binding domain and the DDx3EQ peptide of AAV2 Rep78. This value indicates an unreliable prediction (see Section 2) which cannot determine whether the complex exists or not.

4.3. The DDx3EQ Motif Might Not Be Necessary for Recombinant AAV Production, but Substitutions Within It Improve Production

The DDx3EQ motif is found in most taxa of AAVs relevant for gene therapy [55], i.e., AAV 1–4 and 6–13 (Dependoparvovirus primate 1), AAV5 (Dependoparvovirus mammalian1), and porcine AAV1 (unclassified) [56]. A recent study found that in the absence of Rep78, Rep68 was not sufficient for efficient recombinant AAV production [8], indicating that the C-terminal region of Rep78 is also required. It would be interesting to determine whether the DDx3EQ motif in particular contributes to this requirement.
In that regard, a recent study systematically tested the effect of all single aa substitutions in Rep78 and Rep68 on the production of recombinant AAVs [9]. Substitutions of conserved positions of the motif or introduction of prolines (normally absent from the motif) did not result in significantly lower production, indicating that the DDx3EQ motif is not necessary for production of recombinant AAVs, at least in the conditions tested. Intriguingly, in that study, not only were some substitutions neutral, but most even had a mildly beneficial effect on recombinant AAV production (i.e., in Figure 2 of [9], the last 7aa of Rep78 form a red “patch”). We will detail these briefly below. This study tested two production platforms. In the first one, pCMV-Rep78/68, Rep68 and Rep78 were produced and mutated from one plasmid, and the other AAV proteins (Rep40, Rep52, and the capsid proteins) were produced from other plasmids. In the second platform, wtAAV2, all of the AAV proteins were produced from a single plasmid, and all four Rep proteins were thus mutated simultaneously.
Although numerous substitutions were mildly beneficial, only a few were significantly beneficial. In the first platform, no substitution had a significant fitness effect. In the wtAAV2 platform, three substitutions significantly improved production: I618T, affecting position 4 (T being observed at this position in some dependoparvoviruses, see Figure 2, bottom panel); F619N, affecting position 5 (N being seen at this position in some dependoparvoviruses); and E620S, which affects the strictly conserved E in position 6.
In summary, the DDx3EQ motif might not be necessary for efficient production of recombinant AAVs, but substitutions within it have the potential to improve production. Note that recombinant AAV production, as measured in [9], and wild-type AAV replication are not identical processes. As such, it is possible that the DDx3EQ motif may be essential for wild-type AAV replication, but dispensable for recombinant AAV production.

4.4. Sequence Motifs Can Be Identified Even Within Highly Variable Protein Regions by Examining Alignment of Orthologs

As Figure 2 makes clear, the DDx3EQ motif is clearly visible in the alignment of Dependoparvovirus Rep78, even by a non-expert. Many such motifs can be identified in viral proteins using simple visual examination (e.g., soyuz1 and soyuz2 in Paramyxovirinae [57]).
Conversely, these motifs are not detectable even with advanced homology detection software commonly used to ascribe functions to viral proteins [58] (such as PSI-BLAST [59] or HHpred [60]), because they are too short (7–20 aa) and “hidden” within a highly variable region. Therefore, we recommend systematically aligning variable regions of orthologous proteins across suitable evolutionary distances (i.e., genus or subfamily) and examining them for conserved sequence motifs.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v16111760/s1, File S1: Alphafold3 model of AAV2 Rep78 C-terminal region aa 522–621; File S2: Multiple sequence alignment of Dependoparvovirus Rep78 proteins; File S3: putative ‘X protein’ from Avian AAV isolate BR_DF12 encoded by a reading frame overlapping that of Rep78 and ending with a DDx3EQ motif; File S4: Multiple sequence alignment of enterovirus 2A proteases that have a Cterminal DDx3EQ motif; File S5: Multiple sequence alignment of CEP57L1 orthologs that have a Cterminal DDx3EQ motif.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article and Supplementary Materials.

Acknowledgments

I thank N. Davey for his insights regarding the motif analysis and N. Jain, L. Galibert, and J. Qiu for their feedback on this manuscript.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Pupo, A.; Fernández, A.; Low, S.H.; François, A.; Suárez-Amarán, L.; Samulski, R.J. AAV vectors: The Rubik’s cube of human gene therapy. Mol. Ther. 2022, 30, 3515–3541. [Google Scholar] [CrossRef] [PubMed]
  2. Wang, D.; Tai, P.W.L.; Gao, G. Adeno-associated virus vector as a platform for gene therapy delivery. Nat. Rev. Drug Discov. 2019, 18, 358–378. [Google Scholar] [CrossRef] [PubMed]
  3. Weinmann, J.; Grimm, D. Next-generation AAV vectors for clinical use: An ever-accelerating race. Virus Genes 2017, 53, 707–713. [Google Scholar] [CrossRef] [PubMed]
  4. Li, C.; Samulski, R.J. Engineering adeno-associated virus vectors for gene therapy. Nat. Rev. Genet. 2020, 21, 255–272. [Google Scholar] [CrossRef]
  5. Aponte-Ubillus, J.J.; Barajas, D.; Peltier, J.; Bardliving, C.; Shamlou, P.; Gold, D. Molecular design for recombinant adeno-associated virus (rAAV) vector production. Appl. Microbiol. Biotechnol. 2018, 102, 1045–1054. [Google Scholar] [CrossRef]
  6. Catalán-Tatjer, D.; Tzimou, K.; Nielsen, L.K.; Lavado-García, J. Unravelling the essential elements for recombinant adeno-associated virus (rAAV) production in animal cell-based platforms. Biotechnol. Adv. 2024, 73, 108370. [Google Scholar] [CrossRef]
  7. Mietzsch, M.; Eddington, C.; Jose, A.; Hsi, J.; Chipman, P.; Henley, T.; Choudhry, M.; McKenna, R.; Agbandje-McKenna, M. Improved Genome Packaging Efficiency of Adeno-associated Virus Vectors Using Rep Hybrids. J. Virol. 2021, 95, e0077321. [Google Scholar] [CrossRef]
  8. Johari, Y.B.; Pohle, T.H.; Whitehead, J.; Scarrott, J.M.; Liu, P.; Mayer, A.; James, D.C. Molecular design of controllable recombinant adeno-associated virus (AAV) expression systems for enhanced vector production. Biotechnol. J. 2024, 19, 2300685. [Google Scholar] [CrossRef]
  9. Jain, N.K.; Ogden, P.J.; Church, G.M. Comprehensive mutagenesis maps the effect of all single-codon mutations in the AAV2 rep gene on AAV production. eLife 2024, 12, RP87730. [Google Scholar] [CrossRef]
  10. Qiu, J.; Pintel, D. Processing of adeno-associated virus RNA. Front. Biosci. 2008, 13, 3101–3115. [Google Scholar] [CrossRef]
  11. King, J.A. DNA helicase-mediated packaging of adeno-associated virus type 2 genomes into preformed capsids. EMBO J. 2001, 20, 3282–3291. [Google Scholar] [CrossRef] [PubMed]
  12. Hölscher, C.; Kleinschmidt, J.A.; Bürkle, A. High-level expression of adeno-associated virus (AAV) Rep78 or Rep68 protein is sufficient for infectious-particle formation by a rep-negative AAV mutant. J. Virol. 1995, 69, 6880–6885. [Google Scholar] [CrossRef] [PubMed]
  13. Im, D.S.; Muzyczka, N. The AAV origin binding protein Rep68 is an ATP-dependent site-specific endonuclease with DNA helicase activity. Cell 1990, 61, 447–457. [Google Scholar] [CrossRef] [PubMed]
  14. Smith, R.H.; Kotin, R.M. The Rep52 gene product of adeno-associated virus is a DNA helicase with 3′-to-5′ polarity. J. Virol. 1998, 72, 4874–4881. [Google Scholar] [CrossRef]
  15. Saudan, P. Inhibition of S-phase progression by adeno-associated virus Rep78 protein is mediated by hypophosphorylated pRb. EMBO J. 2000, 19, 4351–4361. [Google Scholar] [CrossRef]
  16. Sayers, E.W.; Bolton, E.E.; Brister, J.R.; Canese, K.; Chan, J.; Comeau, D.C.; Farrell, C.M.; Feldgarden, M.; Fine, A.M.; Funk, K.; et al. Database Resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res. 2023, 51, D29–D38. [Google Scholar] [CrossRef]
  17. Floden, E.W.; Tommaso, P.D.; Chatzou, M.; Magis, C.; Notredame, C.; Chang, J.-M. PSI/TM-Coffee: A web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases. Nucleic Acids Res. 2016, 44, W339–W343. [Google Scholar] [CrossRef]
  18. Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2—A multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25, 1189–1191. [Google Scholar] [CrossRef] [PubMed]
  19. Procter, J.B.; Thompson, J.; Letunic, I.; Creevey, C.; Jossinet, F.; Barton, G.J. Visualization of multiple alignments, phylogenies and gene family evolution. Nat. Methods 2010, 7, S16–S25. [Google Scholar] [CrossRef]
  20. Hu, G.; Katuwawala, A.; Wang, K.; Wu, Z.; Ghadermarzi, S.; Gao, J.; Kurgan, L. flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat. Commun. 2021, 12, 4438. [Google Scholar] [CrossRef]
  21. Vacic, V.; Uversky, V.N.; Dunker, A.K.; Lonardi, S. Composition Profiler: A tool for discovery and visualization of amino acid composition differences. BMC Bioinform. 2007, 8, 211. [Google Scholar] [CrossRef] [PubMed]
  22. Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bairoch, A. UniProtKB/Swiss-Prot. Methods Mol. Biol. 2007, 406, 89–112. [Google Scholar] [CrossRef] [PubMed]
  23. Edwards, R.J.; Davey, N.E.; Shields, D.C. CompariMotif: Quick and easy comparisons of sequence motifs. Bioinformatics 2008, 24, 1307–1309. [Google Scholar] [CrossRef]
  24. Gupta, S.; Stamatoyannopoulos, J.A.; Bailey, T.L.; Noble, W. Quantifying similarity between motifs. Genome Biol. 2007, 8, R24. [Google Scholar] [CrossRef]
  25. Kumar, M.; Michael, S.; Alvarado-Valverde, J.; Mészáros, B.; Sámano-Sánchez, H.; Zeke, A.; Dobson, L.; Lazar, T.; Örd, M.; Nagpal, A.; et al. The Eukaryotic Linear Motif resource: 2022 release. Nucleic Acids Res. 2022, 50, D497–D508. [Google Scholar] [CrossRef]
  26. Sigrist, C.J.A.; de Castro, E.; Cerutti, L.; Cuche, B.A.; Hulo, N.; Bridge, A.; Bougueleret, L.; Xenarios, I. New continuing developments at, PROSITE. Nucleic Acids Res. 2013, 41, D344–D347. [Google Scholar] [CrossRef]
  27. Zimmermann, L.; Stephens, A.; Nam, S.-Z.; Rau, D.; Kübler, J.; Lozajic, M.; Gabler, F.; Söding, J.; Lupas, A.N.; Alva, V. A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. J. Mol. Biol. 2018, 430, 2237–2243. [Google Scholar] [CrossRef] [PubMed]
  28. De Castro, E.; Sigrist, C.J.A.; Gattiker, A.; Bulliard, V.; Langendijk-Genevaux, P.S.; Gasteiger, E.; Bairoch, A.; Hulo, N. ScanProsite: Detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006, 34, W362–W365. [Google Scholar] [CrossRef] [PubMed]
  29. Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
  30. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  31. Goddard, T.D.; Huang, C.C.; Meng, E.C.; Pettersen, E.F.; Couch, G.S.; Morris, J.H.; Ferrin, T.E. UCSF ChimeraX: Meeting modern challenges in visualization and analysis: UCSF ChimeraX Visualization System. Protein Sci. 2018, 27, 14–25. [Google Scholar] [CrossRef] [PubMed]
  32. Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, R.; Blackwell, S.; Yim, J.; et al. Protein complex prediction with AlphaFold-Multimer. bioRXiv 2021. [Google Scholar] [CrossRef]
  33. Wee, J.; Wei, G.-W. Evaluation of AlphaFold 3’s Protein–Protein Complexes for Predicting Binding Free Energy Changes upon Mutation. J. Chem. Inf. Model. 2024, 64, 6676–6683. [Google Scholar] [CrossRef] [PubMed]
  34. Cornilescu, C.C.; Porter, F.W.; Zhao, K.Q.; Palmenberg, A.C.; Markley, J.L. NMR structure of the mengovirus Leader protein zinc-finger domain. FEBS Lett. 2008, 582, 896–900. [Google Scholar] [CrossRef]
  35. Krishna, S.S. Structural classification of zinc fingers: SURVEY AND SUMMARY. Nucleic Acids Res. 2003, 31, 532–550. [Google Scholar] [CrossRef]
  36. Matthews, J.M.; Sunde, M. Zinc fingers—Folds for many occasions. IUBMB Life 2002, 54, 351–355. [Google Scholar] [CrossRef]
  37. Sharma, S.; Schiller, M.R. The carboxy-terminus, a key regulator of protein function. Crit. Rev. Biochem. Mol. Biol. 2019, 54, 85–102. [Google Scholar] [CrossRef]
  38. de Souza, W.M.; Dennis, T.; Fumagalli, M.J.; Araujo, J.; Sabino-Santos, G., Jr.; Maia, F.G.M.; Acrani, G.O.; De Oliveira Torres Carrasco, A.; Romeiro, M.F.; Modha, S. Novel Parvoviruses from Wild and Domestic Animals in Brazil Provide New Insights into Parvovirus Distribution and Diversity. Viruses 2018, 10, 143. [Google Scholar] [CrossRef]
  39. Li, Y.; Gordon, E.; Idle, A.; Altan, E.; Seguin, M.A.; Estrada, M.; Deng, X.; Delwart, E. Virome of a Feline Outbreak of Diarrhea and Vomiting Includes Bocaviruses and a Novel Chapparvovirus. Viruses 2020, 12, 506. [Google Scholar] [CrossRef]
  40. Zhang, Y.; Talukder, S.; Bhuiyan, M.S.A.; He, L.; Sarker, S. Opportunistic sampling of yellow canary (Crithagra flaviventris) has revealed a high genetic diversity of detected parvoviral sequences. Virology 2024, 595, 110081. [Google Scholar] [CrossRef]
  41. Dai, Z.; Wang, H.; Wu, H.; Zhang, Q.; Ji, L.; Wang, X.; Shen, Q.; Yang, S.; Ma, X.; Shan, T. Parvovirus dark matter in the cloaca of wild birds. GigaScience 2022, 12, giad001. [Google Scholar] [CrossRef] [PubMed]
  42. Crooks, G.E.; Hon, G.; Chandonia, J.-M.; Brenner, S.E. WebLogo: A sequence logo generator. Genome Res. 2004, 14, 1188–1190. [Google Scholar] [CrossRef] [PubMed]
  43. Laitinen, O.H.; Svedin, E.; Kapell, S.; Nurminen, A.; Hytönen, V.P.; Flodström-Tullberg, M. Enteroviral proteases: Structure, host interactions and pathogenicity: Pathogenicity of enteroviral proteases. Rev. Med. Virol. 2016, 26, 251–267. [Google Scholar] [CrossRef] [PubMed]
  44. Blom, N.; Hansen, J.; Brunak, S.; Blaas, D. Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks. Protein Sci. 1996, 5, 2203–2216. [Google Scholar] [CrossRef] [PubMed]
  45. Li, X.; Lu, H.-H.; Mueller, S.; Wimmer, E. The C-terminal residues of poliovirus proteinase 2Apro are critical for viral RNA replication but not for cis- or trans-proteolytic cleavage. J. Gen. Virol. 2001, 82, 397–408. [Google Scholar] [CrossRef]
  46. Baxter, N.J.; Roetzer, A.; Liebig, H.-D.; Sedelnikova, S.E.; Hounslow, A.M.; Skern, T.; Waltho, J.P. Structure and Dynamics of Coxsackievirus B4 2A Proteinase, an Enyzme Involved in the Etiology of Heart Disease. J. Virol. 2006, 80, 1451–1462. [Google Scholar] [CrossRef]
  47. Peters, C.E.; Schulze-Gahmen, U.; Eckhardt, M.; Jang, G.M.; Xu, J.; Pulido, E.H.; Bardine, C.; Craik, C.S.; Ott, M.; Gozani, O.; et al. Structure-function analysis of enterovirus protease 2A in complex with its essential host factor SETD3. Nat. Commun. 2022, 13, 5282. [Google Scholar] [CrossRef]
  48. Ito, K.K.; Watanabe, K.; Ishida, H.; Matsuhashi, K.; Chinen, T.; Hata, S.; Kitagawa, D. Cep57 and Cep57L1 maintain centriole engagement in interphase to ensure centriole duplication cycle. J. Cell Biol. 2021, 220, e202005153. [Google Scholar] [CrossRef]
  49. FitzHugh, Z.T.; Schiller, M.R. Systematic Assessment of Protein C-Termini Mutated in Human Disorders. Biomolecules 2023, 13, 355. [Google Scholar] [CrossRef]
  50. Sobhy, H. A Review of Functional Motifs Utilized by Viruses. Proteomes 2016, 4, 3. [Google Scholar] [CrossRef]
  51. Di Pasquale, G.; Chiorini, J.A. PKA/PrKX activity is a modulator of AAV/adenovirus interaction. EMBO J. 2003, 22, 1716–1724. [Google Scholar] [CrossRef] [PubMed]
  52. Sharma, S.; Toledo, O.; Hedden, M.; Lyon, K.F.; Brooks, S.B.; David, R.P.; Limtong, J.; Newsome, J.M.; Novakovic, N.; Rajasekaran, S.; et al. The Functional Human C-Terminome. PLoS ONE 2016, 11, e0152731. [Google Scholar] [CrossRef] [PubMed]
  53. Chen, L.; Kashina, A. Post-translational Modifications of the Protein Termini. Front. Cell Dev. Biol. 2021, 9, 719590. [Google Scholar] [CrossRef]
  54. Hickman, A.B.; Ronning, D.R.; Perez, Z.N.; Kotin, R.M.; Dyda, F. The Nuclease Domain of Adeno-Associated Virus Rep Coordinates Replication Initiation Using Two Distinct DNA Recognition Interfaces. Mol. Cell 2004, 13, 403–414. [Google Scholar] [CrossRef] [PubMed]
  55. Issa, S.S.; Shaimardanova, A.A.; Solovyeva, V.V.; Rizvanov, A.A. Various AAV Serotypes and Their Applications in Gene Therapy: An Overview. Cells 2023, 12, 785. [Google Scholar] [CrossRef] [PubMed]
  56. Puppo, A.; Bello, A.; Manfredi, A.; Cesi, G.; Marrocco, E.; Della Corte, M.; Rossi, S.; Giunti, M.; Bacci, M.L.; Simonelli, F.; et al. Recombinant Vectors Based on Porcine Adeno-Associated Viral Serotypes Transduce the Murine and Pig Retina. PLoS ONE 2013, 8, e59025. [Google Scholar] [CrossRef]
  57. Karlin, D.; Belshaw, R. Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins. PLoS ONE 2012, 7, e31719. [Google Scholar] [CrossRef]
  58. Kuchibhatla, D.B.; Sherman, W.A.; Chung, B.Y.W.; Cook, S.; Schneider, G.; Eisenhaber, B.; Karlin, D.G. Powerful sequence similarity search methods and in-depth manual analyses can identify remote homologs in many apparently “orphan” viral proteins. J. Virol. 2014, 88, 10–20. [Google Scholar] [CrossRef]
  59. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef]
  60. Hildebrand, A.; Remmert, M.; Biegert, A.; Söding, J. Fast and accurate automatic structure prediction with HHpred: Structure Prediction with HHpred. Proteins 2009, 77, 128–132. [Google Scholar] [CrossRef]
Figure 1. Domain organization of the four proteins produced from the AAV2 rep gene.
Figure 1. Domain organization of the four proteins produced from the AAV2 rep gene.
Viruses 16 01760 g001
Figure 2. Predicted 3D structure of the C-terminal region of AAV2 Rep78 (aa 522–621). (A) Structure predicted using Alphafold3. Zinc ions are pictured as spheres. Regions for which Alphafold3 did not predict a fixed structure are represented as blue or white ribbons. In particular, this is the case of the C-terminal DDx3EQ motif (see text), whose aa are only shown as an illustration, since their position is not reliably predicted. (B) PAE (Predicted Alignment Error). Green rectangles represent the regions of Rep78 in which all aa are predicted to have a fixed conformation with respect to each other.
Figure 2. Predicted 3D structure of the C-terminal region of AAV2 Rep78 (aa 522–621). (A) Structure predicted using Alphafold3. Zinc ions are pictured as spheres. Regions for which Alphafold3 did not predict a fixed structure are represented as blue or white ribbons. In particular, this is the case of the C-terminal DDx3EQ motif (see text), whose aa are only shown as an illustration, since their position is not reliably predicted. (B) PAE (Predicted Alignment Error). Green rectangles represent the regions of Rep78 in which all aa are predicted to have a fixed conformation with respect to each other.
Viruses 16 01760 g002
Table 1. Rep78 proteins presented in Figure 2.
Table 1. Rep78 proteins presented in Figure 2.
Common NameSpecies or TaxonGenbank Accession Number
AAV2Dependoparvovirus primate1YP_680423.1
AAV3Dependoparvovirus primate1NP_043940
AAV5Dependoparvovirus mammalian1YP_068408.1
AAV12Dependoparvovirus primate1DQ813647
AAV (isolate Croatia cul1_12)UnclassifiedQHY93489
AAV (isolate MHH-05-2015)UnclassifiedYP_009552823.1
AAV—Po1 [porcine AAV1]UnclassifiedACN42943.1
Anser anser dependoparvovirusUnclassifiedQTE04020.1
Avian AAV (strain DA-1)Dependoparvovirus avian1YP_077182.1
Bat AAV (strain YNM)Dependoparvovirus chiropteran1YP_003858571.1
Bearded dragon parvovirusDependoparvovirus squamate2YP_009154712.1
California sea lion AAV1Dependoparvovirus pinniped1YP_009507366.1
Canine parvovirus (isolate ParvoviridaeDogfe340C1)Unclassified (1)WDW25820.1
Dependoparvovirus (isolate cfw059par1)UnclassifiedQKN88755.1
Marsupial AAV1UnclassifiedAZP54391.1
Muscovy duck parvovirusDependoparvovirus anseriform1YP_068410.1
Parvoviridae (isolate swa134par3)UnclassifiedQKE54950.1
Psittacidae dependoparvovirusUnclassifiedQTE03943.1
Rhinolophus pusillus AAV (isolate BtAAV-CXC1)UnclassifiedQDX47269.1
Rhinolophus pusillus AAV1 (isolate Rp-BtAAV1_34C_MJ_YN_2012)UnclassifiedATV81500.1
Serpentine AAV2UnclassifiedACJ66590.1
Snake parvovirus 1Dependoparvovirus squamate1YP_068093.1
Tadarida brasiliensis associated dependoparvovirusUnclassifiedUJO02142.1
AAV: Adeno-associated virus. (1) Erroneously classified as Protoparvovirus carnivoran1 in Genbank.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karlin, D.G. The Adeno-Associated Virus Replication Protein Rep78 Contains a Strictly C-Terminal Sequence Motif Conserved Across Dependoparvoviruses. Viruses 2024, 16, 1760. https://doi.org/10.3390/v16111760

AMA Style

Karlin DG. The Adeno-Associated Virus Replication Protein Rep78 Contains a Strictly C-Terminal Sequence Motif Conserved Across Dependoparvoviruses. Viruses. 2024; 16(11):1760. https://doi.org/10.3390/v16111760

Chicago/Turabian Style

Karlin, David G. 2024. "The Adeno-Associated Virus Replication Protein Rep78 Contains a Strictly C-Terminal Sequence Motif Conserved Across Dependoparvoviruses" Viruses 16, no. 11: 1760. https://doi.org/10.3390/v16111760

APA Style

Karlin, D. G. (2024). The Adeno-Associated Virus Replication Protein Rep78 Contains a Strictly C-Terminal Sequence Motif Conserved Across Dependoparvoviruses. Viruses, 16(11), 1760. https://doi.org/10.3390/v16111760

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop