Next Article in Journal
Genome-Wide Analysis of the NAC Transcription Factor Gene Family Reveals Differential Expression Patterns and Cold-Stress Responses in the Woody Plant Prunus mume
Next Article in Special Issue
Molecular Factors of Hypochlorite Tolerance in the Hypersaline Archaeon Haloferax volcanii
Previous Article in Journal
Playing Hide-and-Seek in Beta-Globin Genes: Gene Conversion Transferring a Beneficial Mutation between Differentially Expressed Gene Duplicates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Complete Genome Sequence of the Model Halovirus PhiH1 (ΦH1)

1
Computational Biology Group, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
2
Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Parkville, VIC 3052, Australia
3
Department of Biology; Microbiology and Archaea, TU Darmstadt, Schnittspahnstrasse 10, 64287 Darmstadt, Germany
4
Department of Microbiology, Immunobiology and Genetics, MFPL Laboratories, University of Vienna, Dr. Bohr-Gasse 9, Vienna 1030, Austria
*
Author to whom correspondence should be addressed.
Genes 2018, 9(10), 493; https://doi.org/10.3390/genes9100493
Submission received: 11 September 2018 / Revised: 5 October 2018 / Accepted: 8 October 2018 / Published: 12 October 2018
(This article belongs to the Special Issue Genetics of Halophilic Microorganisms)

Abstract

:
The halophilic myohalovirus Halobacterium virus phiH (ΦH) was first described in 1982 and was isolated from a spontaneously lysed culture of Halobacterium salinarum strain R1. Until 1994, it was used extensively as a model to study the molecular genetics of haloarchaea, but only parts of the viral genome were sequenced during this period. Using Sanger sequencing combined with high-coverage Illumina sequencing, the full genome sequence of the major variant (phiH1) of this halovirus has been determined. The dsDNA genome is 58,072 bp in length and carries 97 protein-coding genes. We have integrated this information with the previously described transcription mapping data. PhiH could be classified into Myoviridae Type1, Cluster 4 based on capsid assembly and structural proteins (VIRFAM). The closest relative was Natrialba virus phiCh1 (φCh1), which shared 63% nucleotide identity and displayed a high level of gene synteny. This close relationship was supported by phylogenetic tree reconstructions. The complete sequence of this historically important virus will allow its inclusion in studies of comparative genomics and virus diversity.

1. Introduction

The temperate myovirus Halobacterium virus phiH (ΦH) infects the extremely halophilic archaeon Halobacterium salinarum strain R1 (DSM 671) and was isolated after the spontaneous lysis of a culture of its host [1]. Purified virions require 3.5 M NaCl for stability, have an isometric head of 64 nm diameter and a long, contractile tail (170 × 18 nm) with short tail fibres [1,2]. Virus preparations contain 3 major and 10 minor proteins [3]. The virus genome is linear dsDNA with a G+C content of 64%, contains a pac site, is about 3% terminally redundant and partially circularly permuted, and estimated to be 59 kb in length [4,5]. In the provirus state, the genome is extrachromosomal, covalently closed and circular, and 57 kb in length [4]. While always classified within the Myoviridae, the genus name has changed over the years from phiH-like viruses to Phihlikevirus, and most recently to Myohalovirus [6,7]. The species name itself has changed from Halobacterium phage phiH to Halobacterium virus phiH [6,7] but for convenience we will refer to it from here onwards simply as phiH, and the analysed variant as phiH1 or halovirus phiH1.
The original lysate of phiH was found to consist of a mixture of several distinct variants that appeared to have arisen from the activity of insertion sequences. The predominant variant, phiH1, was plaque-purified and a restriction map determined [5]. This was used for further study [3]. PhiH1 became a key model in the study of gene expression and regulation in haloarchaea and was instrumental in the development of genetic tools and methods in these extremophiles. Examples include the polyethylene glycol (PEG)-mediated transfection method [8], the pUBP1 cloning/expression vector [9], the identification of archaeal promoters, mapping transcription start and stop sites [10] and the analysis of gene regulation via repression [11,12]. The presence and function of antisense RNA in haloarchaea was first described in this virus [13]. An 11 kb invertible segment of the virus genome, called the L-region, was found to be flanked on one or both sides by the insertion sequence ISH1.8, and could also circularize to form a 12 kb plasmid (including one copy of ISH1.8), with subsequent loss of the remaining phage DNA [14]. A strain carrying this plasmid was immune to infection [14].
Unfortunately, work on phiH stopped in 1994 [15,16] but a related virus, Natrialba virus phiCh1 (φCh1), was described a few years later [17] and continues to be studied in the Witte laboratory [18,19]. PhiCh1 infects a haloalkaliphilic archaeon, Natrialba magadii, and the genomes of both host and virus are fully sequenced [18,20]. The provirus state of phiCh1 corresponds to plasmid pNMAG03 carried by Nab. magadii. A full comparison between phiCh1 and phiH1 was prevented as only parts of the phiH genome were ever determined. This deficit also prevented the inclusion of phiH in broad-ranging studies of virus diversity, taxonomy, and evolution. The aim of this study was to complete the phiH1 genome sequence and provide a thorough annotation. This will not only provide a better understanding of the results from previous studies on this virus but also allow complete genomic comparisons with a wealth of other datasets, including other sequenced viruses, haloarchaeal proviruses, metaviromic/metagenomic and environmental RNA sequences.

2. Materials and Methods

2.1. Virus DNA and Sequencing Methods

Purified phiH1 DNA [1] was originally provided to F. Pfeifer by Hans-Peter Klenk while both were working in the department of W. Zillig [21]. The DNA was stored frozen at −80 °C until use. Sequencing was performed in two stages. For the first stage, all available sequences of phiH1 were downloaded from National Center for Biotechnology Information (NCBI) [22] and imported into the Phred–Phrap–Consed package [23]. Overlapping sequences were assembled and primers designed to gather additional sequences using Sanger technology. This consisted either of primer-walking directly on virus DNA, or on polymerase chain reaction (PCR) amplimers, or PCR-sequencing across gaps. The resulting sequence reads were progressively assembled into contigs, base calls inspected manually and corrected where needed, and new primers designed for further rounds of sequencing until all gaps were closed. Except for overlaps, this approach left most of the previously published sequences unchecked.
In the second stage, short-read Illumina HiSeq sequencing of phiH1 DNA was performed (Max-Planck Genome Centre, Cologne, Germany). This returned 243 Mb of high quality sequence data (coverage = 4200-fold). De-novo assembly did not produce a single contig, due to short read-lengths and the presence of repeat sequences within the viral genome, but reads could be confidently mapped to the genome sequence obtained in the first stage (Map to Reference option; Geneious mapper method) in order to improve the sequence reliability.

2.2. CRISPR Spacer Searches

The crass v0.3.12 software [24] was used to extract CRISPR spacer sequences from genomic/metagenomic data available at the NCBI SRA database (accessed 27 July 2018) [25], as described previously [26]. These included all available genomes of members of the class Halobacteria, and metagenomes of hypersaline environments. CRISPR direct repeats (DR) identified by crass were used to search the CRISPRfinder database (accessed 25 July 2018) [27] for haloarchaea with matching or closely matching DR.

2.3. Bioinformatic Methods

Gene annotation used a combination of gene prediction with GeneMarkS-2 [28] and manual refinement using database searches (BLASTp/BLASTn; nr databases) at the NCBI webserver [29]. Repeats were identified by BLASTn, dot-plot comparison in Yass [30], and with tools within the Geneious software suite [31]. Circos plots were performed via the circoletto webserver [32]. Plots are coloured by the ‘score/max’ ratio of tBLASTx bitscores (real score/maximal score). Colours are: blue ≤ 0.25, green ≤ 0.50, orange ≤ 0.75, red > 0.75. Sequence mapping, alignments, editing and phylogenetic tree reconstructions were performed with Geneious software version 10.2 [31]. For phylogenetic tree reconstructions, protein sequences were first aligned using CLUSTALW, and trees inferred using the Neighbor-Joining algorithm (within Geneious). Consensus trees were determined after 100 bootstrap repetitions. Protein structural modelling used the I-Tasser webserver [33]. Identification of the pac site utilised the program PhageTerm [34] as implemented on the CPT Phage Galaxy [35]. The VIRFAM webserver [36] uses proteins of the phage head-neck-tail module to cluster phages into related groups, and was used to classify phiH1.

2.4. Data Availability

The phiH1 genome sequence has been deposited at Genbank under the accession MK002701. Raw reads were submitted to the SRA archive under accession SRP159490.

3. Results and Discussion

3.1. Sequence and Annotation of PhiH1

The previously sequenced regions of the phiH1 genome represented about 50% of the complete sequence (Figure 1, red lines). Using virus DNA as template, the gaps between these sequences were PCR amplified and Sanger sequenced. However, the quality of the previously sequenced regions was of uncertain reliability. High-coverage Illumina sequencing (ca. 4200-fold) was then used to enhance sequence confidence. Sequence revisions were only found to be required in previously deposited sequences but not to Sanger sequencing results of the first stage of the project. While the virus DNA found in capsid particles is linear, the head-full packaging process produces a population of molecules that are terminally redundant and partially circularly permuted [1]. The complete genome sequence determined in the current study is represented as the provirus form; a circular sequence of 58,072 bp. This value is close to the published size of 57 kb, estimated from restriction fragment sizes [4,37]. The G+C content of the genome was 63.7%, almost identical to the published value of 64% [3] but slightly lower than that of the host chromosome (68.0%) [38].
The original restriction map of phiH1 DNA, as determined by [5], corresponded closely with the in silico map inferred from the phiH1 genome sequence (Figure S1). The pac site located at the left end of the restriction map matched closely to the corresponding pac sequence of phiCh1. While the pac site of phiCh1 had been localized by restriction mapping [18], it had not been precisely mapped. For consistency, the start point of phiH1 was set to the corresponding start of phiCh1 even though this splits the terS gene. Using this numbering, the program PhageTerm [34] was used to analyse the mapping of Illumina reads to the phiH1 genome, and this located the pac site terminal base at nt 46, with high probability (p = 2.5 × 10−238). This is within the terS coding sequence (CDS) close to the stop codon and within a GC-rich region that is strongly conserved between phiH1 and phiCh1.
Annotation of the phiH1 genome resulted in 97 CDS (Table 1), most of which were encoded on the plus strand (86/97, Figure 2 panel b), and were frequently closely spaced, with 45 overlapping at start/stop codons and 23 separated by ≤8 nt. Many genes were in functional groupings typical of bacteriophages (Figure 2 panel b). The left end of the genome encodes DNA packaging proteins (e.g., terminase, portal protein), then virus assembly and structural proteins (e.g., major capsid protein, tape-measure protein, tail proteins). The three main proteins of purified virus were originally labelled by their estimated sizes on sodium dodecylsufate (SDS)-polyacrylamide gels (22, 53 and 80 kDa) [1], which were later revised to 27, 46 and 80 kDa [3] but in 1994, Stolt et al. [39] determined their N-terminal amino acid sequences and used this information to map the proteins (HP20, HP32 and HP67) to their genes (hp20, hp32, hp67) and sequence them. The inferred molecular weights (MWs) of proteins HP20 and HP67 were noted by these authors to be much smaller than previous estimates. In the present study, the locations of these genes on the full genome sequence have been resolved, an error in the hp20 (accession X80161) coding sequence was corrected, and the MWs of the inferred proteins calculated (11.6, 35.4 and 45.5 kDa). For consistency we have retained the original gene names (Table 1).
The next genomic region is a replication/regulatory module (the L-region) that encodes RepR (repressor), a ParA-family protein (partition) and RepH (replication). There is also a VapC-like protein that together with the small overlapping upstream CDS may form a toxin‒antitoxin pair that could be involved in plasmid maintenance [40]. The right end of the genome carries many genes with unknown function but includes genes specifying DNA methylases and cell lysis proteins. The taxonomic position of phiH1 was assessed using the VIRFAM webserver [36], which classifies bacteriophages and archaeal viruses based on the order and similarity of capsid assembly/structural proteins. Consistent with previous studies [6], phiH1 was classified by this system as a member of the Myoviridae (Type1, Cluster 4).
A GC-profile plot [41] of the phiH1 genome shows a major low point inflection within the L-region (Figure 2, panel a), indicating a potential replication origin. The L-region is ~12 kb in length, can replicate as a plasmid in Halobacterium [14], and carries genes encoding a replication protein (RepH), and a DNA-binding repressor (RepR). It can also provide cells with immunity to infection by phiH1 virus. The transcription program of phiH1 during lytic growth (panels c, d and e) has been summarized from previous studies, and shows temporal changes (early, middle and late transcripts). The broad directions of transcription reflect the closely spaced and similarly directed gene clusters as well as the correspondence with functional gene groupings (panel b). The lowest two panels (d, e) summarize the results of hybridizing labelled transcripts from infected cells to Southern blots of restricted phiH1 DNA [42], so mapping transcripts to fragments of the virus genome. Panel c shows a summary of the virus-specific transcripts that were sized by agarose gel electrophoresis and had 5′ start sites mapped. While transcription across the L-region has been examined in more detail compared to the rest of the genome, there remains much that is incomplete or uncertain. For example, the 3′ end of the late transcript labelled TLL, which is depicted ending in a dotted line and question mark (at ~21 kb), has not been determined. This transcript could potentially extend for another 5.5 kb. Counter-transcripts are commonly produced by prokaryotes and their viruses and play important roles in gene regulation. Their presence and activity in phiH1 gene expression has been studied and was one of the first reports of antisense RNA in Archaea [13]. However, this interesting topic remains to be fully explored.
Corrections to the previously sequenced regions resulted in significant changes to several coding sequences. For example, the tnpB gene of transposon ISH1.8 (nt 41,906–43,789) was thought to be inactive as it was split into three CDS by multiple mutations [43]. The high-quality Illumina sequence data show, however, that the gene is intact and that the previously reported transposon ISH1.8 (X00805) is actually an exact copy of transposon ISH12 from the host Hbt. salinarum strain R1 [44]. The element plays a key role in the mobilisation of the L-region of the genome to form the 12 kb plasmid, pΦHL [14,43]. Another case is the Dcm5 cytosine methylase, which was also reported as being split [39]. The revised sequence shows that the gene codes for a single, probably functional protein (PhiH1_405, nt 47,732–49,618) and not for the two parts (dcm5a, dcm5b) as previously reported. Although phiH1 carries three potentially active DNA methylase genes (dcm5, yhdJ and ycdA), the presence of modified bases in phiH1 DNA was not detected in the chromatographic (high-pressure liquid chromatography) profiles of deoxyribonucleosides released by enzymatic hydrolysis [45]. In that study, the genomes of phiH and another, unrelated virus (phiN) were analysed, and while unmodified dC was detected in phiH, the phiN genome contained only methylated dC (Figure 3 in [45]). The related halovirus phiCh1 carries homologs of two of the phiH1 methylases, and one of them, N6-adenine methylase (ORF94/M·φCh1-I, corresponding to YcdA of phiH1) has been shown to methylate DNA at GATC motifs [46] but the proportion of sites found to be modified in virus DNA by M·φCh1-I varies from 5% to 50%, depending upon the infection conditions. Modifying only some of the available sites is presumably advantageous to avoid host restriction, as distinct enzymes may target either unmethylated or methylated sites.

3.2. Matches to CRISPR Spacers

The phiH1 genome was used to search for matching CRISPR spacers among metagenomic datasets of hypersaline environments downloaded from the NCBI sequence read archive (SRA; see methods). Only four spacers showing close to moderate similarity to phiH1 were detected (Table 2). These spacers match to virus genes encoding structural and non-structural proteins, and the DRs of these spacers show that they are carried by haloarchaea. The datasets include metagenomes from the USA and Iran, as well as an isolate from the Andaman Islands, India. The results suggest that phiH1-like viruses are geographically widespread.

3.3. Relatives and Phylogeny of PhiH1

The only close matches to the phiH1 genome in the GenBank database were phiCh1 and the corresponding Nab. magadii plasmid pNMAG03 (BLASTn, accessed 20 July 2018). A dot-plot comparison of phiH with phiCh1 (Figure S2) revealed a largely colinear relationship (green line) and an overall nucleotide similarity of 63%. The plot also highlights several indels (line gaps) and two regions showing inversions (red lines). Inversion 1 (nt 24,227–27,767) corresponds closely in sequence and arrangement to the invertible region described in phiCh1 (ORF34-36) that has a central XerD type integrase/recombinase gene flanked by inverted repeats, and facilitates switching between two related tail fibre genes, each containing numerous short repeats [18]. The phiH1 orthologous integrase is PhiH1_175. In the current sequence version, PhiH1_165 is active while PhiH1_185 is uncoupled from a start codon and thus is inactivated. Upon inversion of the genome segment, PhiH1_185 is activated while PhiH1_165 is inactivated. Overall, this results in tail fibre protein switching, which may affect receptor binding specificity and host range of phiH1. The similarity in the tail fibre protein repeats is high enough to be detectable at the DNA level, which results in the X-shaped pattern for this region in the dot-plot. Inversion 2 (nt 31,932–34,126) occurs within the phiH1 L-segment, encompasses four CDS including a ParA-domain protein, and is nearby a different integrase/recombinase gene (Int2, PhiH1_240). Protein searches (BLASTp) of the phiH1 genome returned matches to phiCh1, a limited number of haloarchaeal genomes (often 5–10, which may flag proviral regions) and the haloarchaeal caudoviruses BJ1 [47] and CGphi46 (NC_021537), both of which infect Halorubrum spp. Pairwise alignments (BLASTp) between all phiH1 and matching phiCh1 proteins gave an average protein sequence identity of 70% (range 39–95%; with a few exceptions, see footnote 5, Table 1). Figure 3 is a graphical comparison of phiH1 proteins (tBLASTx) with those of phiCh1, BJ1, CGphi46 and, as an outlier, HSTV-1. The Haloarcula caudovirus HSTV-1 [48] shows very low similarity to phiH1. The figure summarizes the close similarity of phiH1 and phiCh1 proteins. BJ1 and CGphi46 show far fewer matching regions, mainly to proteins encoded near the left end of the phiH1 genome, a region specifying portal and capsid proteins. The three significant matches to HSTV-1 were to a methyltransferase (HSTV1_52), a hypothetical protein (HSTV1_53), and a DNA polymerase sliding clamp protein (HSTV1_40).
While several caudovirus proteins have been used to infer virus phylogenies, the major capsid protein (MCP) is often used because of its functional constraints maintaining a conserved structure [49]. Figure 4 shows a tree reconstruction using an alignment of phiH1 MCP (HP32, 35.4 kDa) and related sequences. Haloarchaeal proteins are seen to branch together (pink shading) and within this cluster the phiH1 and phiCh1 MCPs form a distinct and closely branching clade. These two proteins share 82% amino acid identity. The MCPs of CGphi46 and BJ1 branch at distant locations from each other and from phiH1 MCP. Structures of close homologs of phiH1 HP32 have not yet been determined. However, the major capsid proteins of bacterial caudoviruses and eukaryotic herpesviruses share a common folding structure, the archetype of which is the phage HK97 MCP (gp5) [50]. Consistent with this, modelling of the phiH1 MCP (I-Tasser) returned bacteriophage HK97 gp5 (PDB 2fs3A) as the closest matching structure (Template Modeling (TM)-score = 0.848, Root-Mean-Square Deviation (RMSD) = 1.17). Based on structure prediction and homology modelling, the HK97-fold may also be present in the MCP of phiCh1 [49]. The structure of the MCP of the haloarchaeal podovirus, HSTV-1, has recently been shown to be of the HK97 type [48].
PhiH1 and phiCh1 display a close sequence similarity across most of their genomes yet infect physiologically and biochemically different haloarchaeal hosts. Hbt. salinarum is a widely distributed neutrophilic heterotroph with glycolipid-containing membranes, and has often been isolated from spoilage of salted products while Nab. magadii is a haloalkaliphile (optimum pH 9.5) that lacks glycolipids [51] and is restricted in its distribution to highly alkaline salt lakes [52]. Looking more widely, the presence of phiH1 MCP homologs in diverse genera of haloarchaea and two haloviruses (Figure 4) indicates that the Myohalovirus genus and related viruses are a highly successful group, the reasons for which are worthy of more detailed study, particularly when large-scale cultivation of Halobacterium becomes more common [53]. PhiH1 has been well studied in the past, and the completion of its genome sequence now allows it to be included in much of the sequence-based studies used today, including comparative virology, detection of proviruses in archaeal genomes, virus evolution and the microbial ecology of hypersaline environments.

Supplementary Materials

The following are available online at https://www.mdpi.com/2073-4425/9/10/493/s1, Figure S1: Original phiH1 restriction map compared to in silico map from genome sequence. Figure S2: Dot plot sequence comparison of phiH1 and phiCh1 genomes.

Author Contributions

Conceptualization, visualization, investigation, M.D.-S.; data curation, Fr.P., M.D.-S.; funding acquisition, D.O., Fr.P.; project administration, Fr.P.; resources, A.W., Fe.P.; writing—original draft, Fr.P., M.D.-S.; writing—review & editing, A.W., D.O., Fe.P., Fr.P., M.D.-S.; validation, A.W.

Funding

This research was funded by the Max Planck Society, Germany, to Dieter Oesterhelt, emeritus Director of the Department of Membrane Biochemistry, Max Planck Institute, Martinsried, Germany.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Schnabel, H.; Zillig, W.; Pfaffle, M.; Schnabel, R.; Michel, H.; Delius, H. Halobacterium halobium phage ΦH. EMBO J. 1982, 1, 87–92. [Google Scholar] [CrossRef] [PubMed]
  2. Zillig, W.; Gropp, F.; Henschen, A.; Neumann, H.; Palm, P.; Reiter, W.D.; Rettenberger, M.; Schnabel, H.; Yeats, S. Archaebacteria virus host systems. Syst. Appl. Microbiol. 1986, 7, 58–66. [Google Scholar] [CrossRef]
  3. Zillig, W.; Reiter, W.-D.; Palm, P.; Gropp, F.; Neumann, H.; Rettenberger, M. Viruses of Archaebacteria. In The Bacteriophages; Calendar, R., Ed.; Plenum Publishing Corpn: New York, NY, USA, 1988. [Google Scholar]
  4. Schnabel, H.; Zillig, W. Circular structure of the genome of phage ΦH in a lysogenic Halobacterium halobium. Mol. Gen. Genet. 1984, 193, 422–426. [Google Scholar] [CrossRef]
  5. Schnabel, H.; Schramm, E.; Schnabel, R.; Zillig, W. Structural variability in the genome of phage ΦH of Halobacterium halobium. Mol. Gen. Genet. 1982, 188, 370–377. [Google Scholar] [CrossRef]
  6. ICTV Report, C. ICTV Online (10th) Report on Virus Taxonomy. Available online: https://talk.ictvonline.org/taxonomy/p/taxonomy-history?taxnode_id=20170459 (accessed on 19 March 2018).
  7. Krupovic, M.; Dutilh, B.E.; Adriaenssens, E.M.; Wittmann, J.; Vogensen, F.K.; Sullivan, M.B.; Rumnieks, J.; Prangishvili, D.; Lavigne, R.; Kropinski, A.M.; et al. Taxonomy of prokaryotic viruses: Update from the ICTV bacterial and archaeal viruses subcommittee. Arch. Virol. 2016, 161, 1095–1099. [Google Scholar] [CrossRef] [PubMed]
  8. Cline, S.W.; Doolittle, W.F. Efficient transfection of the archaebacterium Halobacterium halobium. J. Bacteriol. 1987, 169, 1341–1344. [Google Scholar] [CrossRef] [PubMed]
  9. Blaseio, U.; Pfeifer, F. Transformation of Halobacterium halobium: Development of vectors and investigation of gas vesicle synthesis. Proc. Natl. Acad. Sci. USA 1990, 87, 6772–6776. [Google Scholar] [CrossRef] [PubMed]
  10. Stolt, P.; Zillig, W. In vivo studies on the effects of immunity genes on early lytic transcription in the Halobacterium salinarium phage ϕH. Mol. Gen. Genet. 1992, 235, 197–204. [Google Scholar] [CrossRef] [PubMed]
  11. Ken, R.; Hackett, N.R. Halobacterium halobium strains lysogenic for phage phiH contain a protein resembling coliphage repressors. J. Bacteriol. 1991, 173, 955–960. [Google Scholar] [CrossRef] [PubMed]
  12. Stolt, P.; Zillig, W. In vivo and in vitro analysis of transcription of the L region from the Halobacterium salinarium phage ϕH: Definition of a repressor-enhancing gene. Virology 1993, 195, 649–658. [Google Scholar] [CrossRef] [PubMed]
  13. Stolt, P.; Zillig, W. Antisense RNA mediates transcriptional processing in an archaebacterium, indicating a novel kind of RNase activity. Mol. Microbiol. 1993, 7, 875–882. [Google Scholar] [CrossRef] [PubMed]
  14. Schnabel, H. An immune strain of Halobacterium halobium carries the invertible L segment of phage ΦH as a plasmid. Proc. Natl. Acad. Sci. USA 1984, 81, 1017–1020. [Google Scholar] [CrossRef] [PubMed]
  15. Stolt, P.; Zillig, W. Transcription of the halophage ΦH repressor gene is abolished by transcription from an inversely oriented lytic promoter. FEBS Lett. 1994, 344, 125–128. [Google Scholar] [CrossRef] [Green Version]
  16. Stolt, P.; Zillig, W. Gene regulation in halophage ΦH; more than promoters. Syst. Appl. Microbiol. 1993, 16, 591–596. [Google Scholar] [CrossRef]
  17. Witte, A.; Baranyi, U.; Klein, R.; Sulzner, M.; Luo, C.; Wanner, G.; Krüger, D.H.; Lubitz, W. Characterization of Natronobacterium magadii phage ϕCh1, a unique archaeal phage containing DNA and RNA. Mol. Microbiol. 1997, 23, 603–616. [Google Scholar] [CrossRef] [PubMed]
  18. Klein, R.; Baranyi, U.; Rössler, N.; Greineder, B.; Scholz, H.; Witte, A. Natrialba magadii virus ϕCh1: First complete nucleotide sequence and functional organization of a virus infecting a haloalkaliphilic archaeon. Mol. Microbiol. 2002, 45, 851–863. [Google Scholar] [CrossRef] [PubMed]
  19. Selb, R.; Derntl, C.; Klein, R.; Alte, B.; Hofbauer, C.; Kaufmann, M.; Beraha, J.; Schoner, L.; Witte, A. The viral gene ORF79 encodes a repressor regulating induction of the lytic life cycle in the haloalkaliphilic virus phiCh1. J. Virol. 2017, 91. [Google Scholar] [CrossRef] [PubMed]
  20. Siddaramappa, S.; Challacombe, J.F.; Decastro, R.E.; Pfeiffer, F.; Sastre, D.E.; Gimenez, M.I.; Paggi, R.A.; Detter, J.C.; Davenport, K.W.; Goodwin, L.A.; et al. A comparative genomics perspective on the genetic content of the alkaliphilic haloarchaeon Natrialba magadii ATCC 43099T. BMC Genom. 2012, 13, 165. [Google Scholar] [CrossRef] [PubMed]
  21. Zillig, W.; Palm, P.; Reiter, W.D.; Gropp, F.; Puhler, G.; Klenk, H.P. Comparative evaluation of gene expression in Archaebacteria. Eur. J. Biochem. 1988, 173, 473–482. [Google Scholar] [CrossRef] [PubMed]
  22. National Center for Biotechnology Information. Available online: https://www.ncbi.nlm.nih.gov/ (accessed on 11 October 2018).
  23. Gordon, D. Viewing and editing assembled sequences using Consed. Curr. Protoc. Bioinform. 2003, 2. [Google Scholar] [CrossRef]
  24. Skennerton, C.T.; Imelfort, M.; Tyson, G.W. Crass: Identification and reconstruction of CRISPR from unassembled metagenomic data. Nucleic Acids Res. 2013, 41, e105. [Google Scholar] [CrossRef] [PubMed]
  25. Leinonen, R.; Sugawara, H.; Shumway, M. The Sequence Read Archive. Nucleic Acids Res. 2010, 39, D19–D21. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Dyall-Smith, M.; Pfeiffer, F. The PL6-family plasmids of Haloquadratum are virus-related. Front. Microbiol. 2018, 9, 1070. [Google Scholar] [CrossRef] [PubMed]
  27. CRISPRs Web Server. Available online: http://crispr.i2bc.paris-saclay.fr/ (accessed on 11 October 2018).
  28. Lomsadze, A.; Gemayel, K.; Tang, S.; Borodovsky, M. Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes. Genome Res. 2018, 28, 1079–1089. [Google Scholar] [CrossRef] [PubMed]
  29. National Center for Biotechnology Information BLAST. Available online: https://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed on 11 October 2018).
  30. YASS Genomic Similarity Search Tool. Available online: http://bioinfo.lifl.fr/yass/index.php (accessed on 11 October 2018).
  31. Geneious. Available online: https://www.geneious.com/geneious/ (accessed on 11 October 2018).
  32. Circoletto. Available online: http://tools.bat.infspire.org/circoletto/ (accessed on 11 October 2018).
  33. I-Tasser, Protein Structure & Function Predictions. Available online: https://zhanglab.ccmb.med.umich.edu/I-TASSER (accessed on 11 October 2018).
  34. Garneau, J.R.; Depardieu, F.; Fortier, L.C.; Bikard, D.; Monot, M. PhageTerm: A tool for fast and accurate determination of phage termini and packaging mechanism using next-generation sequencing data. Sci. Rep. 2017, 7, 8292. [Google Scholar] [CrossRef] [PubMed]
  35. CPT Phage Galaxy. Available online: https://cpt.tamu.edu/galaxy-pub/ (accessed on 11 October 2018).
  36. VIRFAM, Remote Homology Detection of Viral Protein Families. Available online: http://biodev.cea.fr/virfam/ (accessed on 11 October 2018).
  37. Schnabel, H.; Schnabel, R.; Yeats, S.; Tu, J.; Gierl, A.; Neumann, H.; Zillig, W. Genome organization and transcription in Archaebacteria. Folia Biol. (Praha) 1984, 30, 2–6. [Google Scholar] [PubMed]
  38. Pfeiffer, F.; Schuster, S.C.; Broicher, A.; Falb, M.; Palm, P.; Rodewald, K.; Ruepp, A.; Soppa, J.; Tittor, J.; Oesterhelt, D. Evolution in the laboratory: The genome of Halobacterium salinarum strain R1 compared to that of strain NRC-1. Genomics 2008, 91, 335–346. [Google Scholar] [CrossRef] [PubMed]
  39. Stolt, P.; Grampp, B.; Zillig, W. Genes for DNA cytosine methyltransferases and structural proteins, expressed during lytic growth by the phage ΦH of the archaebacterium Halobacterium salinarium. Biol. Chem. Hoppe Seyler 1994, 375, 747–757. [Google Scholar] [CrossRef] [PubMed]
  40. Jin, G.; Pavelka, M.S., Jr.; Butler, J.S. Structure-function analysis of VapB4 antitoxin identifies critical features of a minimal VapC4 toxin-binding module. J. Bacteriol. 2015, 197, 1197–1207. [Google Scholar] [CrossRef] [PubMed]
  41. Gao, F.; Zhang, C.T. GC-Profile: A web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res. 2006, 34, W686–W691. [Google Scholar] [CrossRef] [PubMed]
  42. Gropp, F. Genexpression im Archaebakterium Halobacterium halobium: Der Phage ΦH und die DNA-abhängige RNA-Polymerase. Ph.D. Thesis, Ludwig-Maximilians-Universitaet Muenchen, Munich, Germany, 26 July 1989. [Google Scholar]
  43. Gropp, F.; Grampp, B.; Stolt, P.; Palm, P.; Zillig, W. The immunity-conferring plasmid pϕHL from the Halobacterium salinarium phage ϕH: Nucleotide sequence and transcription. Virology 1992, 190, 45–54. [Google Scholar] [CrossRef]
  44. ISfinder. Available online: https://isfinder.biotoul.fr/ (accessed on 11 October 2018).
  45. Vogelsang-Wenke, H.; Oesterhelt, D. Isolation of a halobacterial phage with a fully cytosine-methylated genome. MGG Mol. Gen. Genet. 1988, 211, 407–414. [Google Scholar] [CrossRef]
  46. Baranyi, U.; Klein, R.; Lubitz, W.; Kruger, D.H.; Witte, A. The archaeal halophilic virus-encoded Dam-like methyltransferase M. ϕCh1-I methylates adenine residues and complements dam mutants in the low salt environment of Escherichia coli. Mol. Microbiol. 2000, 35, 1168–1179. [Google Scholar] [CrossRef] [PubMed]
  47. Pagaling, E.; Haigh, R.D.; Grant, W.D.; Cowan, D.A.; Jones, B.E.; Ma, Y.; Ventosa, A.; Heaphy, S. Sequence analysis of an archaeal virus isolated from a hypersaline lake in Inner Mongolia, China. BMC Genom. 2007, 8, 410. [Google Scholar] [CrossRef] [PubMed]
  48. Pietilä, M.K.; Laurinmäki, P.; Russell, D.A.; Ko, C.C.; Jacobs-Sera, D.; Hendrix, R.W.; Bamford, D.H.; Butcher, S.J. Structure of the archaeal head-tailed virus HSTV-1 completes the HK97 fold story. Proc. Natl. Acad. Sci. USA 2013, 110, 10604–10609. [Google Scholar] [CrossRef] [PubMed]
  49. Krupovic, M.; Forterre, P.; Bamford, D.H. Comparative analysis of the mosaic genomes of tailed archaeal viruses and proviruses suggests common themes for virion architecture and assembly with tailed viruses of bacteria. J. Mol. Biol. 2010, 397, 144–160. [Google Scholar] [CrossRef] [PubMed]
  50. Baker, M.L.; Jiang, W.; Rixon, F.J.; Chiu, W. Common ancestry of herpesviruses and tailed DNA bacteriophages. J. Virol. 2005, 79, 14967–14970. [Google Scholar] [CrossRef] [PubMed]
  51. Kamekura, M.; Dyall-Smith, M. Taxonomy of the family Halobacteriaceae and the description of two new genera Halorubrobacterium and Natrialba. J. Gen. Appl. Microbiol. 1995, 41, 333–350. [Google Scholar] [CrossRef]
  52. Tindall, B.J.; Ross, H.N.M.; Grant, W.D. Natronobacterium gen. nov. and Natronococcus gen. nov. Two new genera of haloalkaliphilic archaebacteria. Syst. Appl. Microbiol. 1984, 5, 41–57. [Google Scholar] [CrossRef]
  53. Kalenov, S.V.; Baurina, M.M.; Skladnev, D.A.; Kuznetsov, A.Y. High-effective cultivation of Halobacterium salinarum providing with bacteriorhodopsin production under controlled stress. J. Biotechnol. 2016, 233, 211–218. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Diagram of the phiH1 genome with lines below showing regions previously sequenced (red) along with their database accessions. The blue lines (NEW) indicate regions sequenced in the present study by Sanger sequencing. Tick marks (dark green) below the blue lines show the positions of oligonucleotide primers used for PCR and primer-walking. Dots at the right and left contig ends indicate sequence continuity between them. Scale bar at top shows position in bp.
Figure 1. Diagram of the phiH1 genome with lines below showing regions previously sequenced (red) along with their database accessions. The blue lines (NEW) indicate regions sequenced in the present study by Sanger sequencing. Tick marks (dark green) below the blue lines show the positions of oligonucleotide primers used for PCR and primer-walking. Dots at the right and left contig ends indicate sequence continuity between them. Scale bar at top shows position in bp.
Genes 09 00493 g001
Figure 2. PhiH1 GC-profile, genetic map, and corresponding transcription program (adapted from [16,42]). (a) GC-profile of the phiH1 genome. (b) Genetic map of the phiH1 genome, showing coding sequences as red, blue or grey arrows. Dotted lines above indicate gene clusters involved in particular functions. Some CDS are labelled above the map, e.g., TerL, terminase large subunit; Portal, portal protein; Tape, tape-measure protein; RepH, replicase (label within CDS arrow); Mt, DNA methylases; TerS, terminase small subunit. Some genes are shown below the map, such as hp32, encoding the major capsid protein HP32. Panels c, d and e summarise transcription data from previously published studies, and above them is a colour key that indicates the time of appearance of early (0–1 h, blue), middle (1–2 h, green) and late (>2 h, pink) transcripts. (c) Precise mapping of viral transcripts, including start and termination sites [16,39]. (d) Summary transcription program of lytic infection based on hybridisation of labelled infected-cell transcripts to restriction fragments of virus DNA [42]. Thin coloured lines indicate whether continuing transcription persists over time. (e) The transcription map data of [42] are shown, projected onto the in silico restriction map of phiH1, as determined from the complete genome sequence (this study). Enzymes are indicated at the left. Numbers on the restriction map refer to those of the original publication of [42] (see also Figure S1). Coloured shading follows that of panels c and d. Dotted pattern shown beyond the right-hand pac site indicates terminal redundancy of virus DNA. Scale bars (in kb) are shown below panels a and e.
Figure 2. PhiH1 GC-profile, genetic map, and corresponding transcription program (adapted from [16,42]). (a) GC-profile of the phiH1 genome. (b) Genetic map of the phiH1 genome, showing coding sequences as red, blue or grey arrows. Dotted lines above indicate gene clusters involved in particular functions. Some CDS are labelled above the map, e.g., TerL, terminase large subunit; Portal, portal protein; Tape, tape-measure protein; RepH, replicase (label within CDS arrow); Mt, DNA methylases; TerS, terminase small subunit. Some genes are shown below the map, such as hp32, encoding the major capsid protein HP32. Panels c, d and e summarise transcription data from previously published studies, and above them is a colour key that indicates the time of appearance of early (0–1 h, blue), middle (1–2 h, green) and late (>2 h, pink) transcripts. (c) Precise mapping of viral transcripts, including start and termination sites [16,39]. (d) Summary transcription program of lytic infection based on hybridisation of labelled infected-cell transcripts to restriction fragments of virus DNA [42]. Thin coloured lines indicate whether continuing transcription persists over time. (e) The transcription map data of [42] are shown, projected onto the in silico restriction map of phiH1, as determined from the complete genome sequence (this study). Enzymes are indicated at the left. Numbers on the restriction map refer to those of the original publication of [42] (see also Figure S1). Coloured shading follows that of panels c and d. Dotted pattern shown beyond the right-hand pac site indicates terminal redundancy of virus DNA. Scale bars (in kb) are shown below panels a and e.
Genes 09 00493 g002
Figure 3. Circos plot of amino acid similarity (tBLASTx) between phiH1 and the haloviruses phiCh1, BJ1, CGphi46 and HSTV-1. The threshold for connecting lines was E-value ≤ 10−40, with line colours reflecting the ratio of actual tBLASTx bitscore to the maximal score (using ‘score/max’ ratio colouring with blue ≤ 0.25, green ≤ 0.50, orange ≤ 0.75, red > 0.75). The outer histogram counts how many times each colour has hit the specific part of the sequence and uses an equivalent colouring scheme. The distance between successive tick marks shown along each virus genome represents 0.1 of the full genome length. Protein names shown along the phiH1 genome indicate the positions of the corresponding genes.
Figure 3. Circos plot of amino acid similarity (tBLASTx) between phiH1 and the haloviruses phiCh1, BJ1, CGphi46 and HSTV-1. The threshold for connecting lines was E-value ≤ 10−40, with line colours reflecting the ratio of actual tBLASTx bitscore to the maximal score (using ‘score/max’ ratio colouring with blue ≤ 0.25, green ≤ 0.50, orange ≤ 0.75, red > 0.75). The outer histogram counts how many times each colour has hit the specific part of the sequence and uses an equivalent colouring scheme. The distance between successive tick marks shown along each virus genome represents 0.1 of the full genome length. Protein names shown along the phiH1 genome indicate the positions of the corresponding genes.
Genes 09 00493 g003
Figure 4. Phylogenetic tree reconstruction (NJ method) of major capsid proteins (MCP) of phiH1, other haloviruses and related proteins of haloarchaea. Species names of haloarchaeal species are shown, with accession numbers given at the right side. Bootstrap confidence values (100 repetitions) are shown at branch points. The pink shading highlights taxa belonging to the class Halobacteria. Scale bar (expected changes per site) is shown at top. The outgroup (not shown) consisted of distantly related MCP sequences of Bacillus spp. (WP_001060157.1, WP_098773561.1, WP_001064748.1 and WP_000178926.1).
Figure 4. Phylogenetic tree reconstruction (NJ method) of major capsid proteins (MCP) of phiH1, other haloviruses and related proteins of haloarchaea. Species names of haloarchaeal species are shown, with accession numbers given at the right side. Bootstrap confidence values (100 repetitions) are shown at branch points. The pink shading highlights taxa belonging to the class Halobacteria. Scale bar (expected changes per site) is shown at top. The outgroup (not shown) consisted of distantly related MCP sequences of Bacillus spp. (WP_001060157.1, WP_098773561.1, WP_001064748.1 and WP_000178926.1).
Genes 09 00493 g004
Table 1. Annotated coding sequences (CDS) of halovirus phiH1.
Table 1. Annotated coding sequences (CDS) of halovirus phiH1.
Start (nt)Stop (nt)Locus_tagLength (bp)DirectionGeneProductHomologs1: phiCh1, ORF pNMAG03 [Other]
115717PhiH1_005603+-uncharacterized proteinPhiCh1p02, ORF1
Nmag_4251
7102371PhiH1_0101662+terLterminase large subunit TerLPhiCh1p03, ORF2
Nmag_4252
23772505PhiH1_015129+-uncharacterized proteinNmag_4253
24982689PhiH1_020192+-uncharacterized proteinPhiCh1p05, ORF4
Nmag_4255
26864242PhiH1_0251557+porportal protein PorPhiCh1p07, ORF6
Nmag_4257
42465187PhiH1_030942+-head morphogenesis proteinPhiCh1p08, ORF7
Nmag_4258
52615587PhiH1_035327+hp20capsid protein HP20[AJF28118.1]
56677466PhiH1_0401800+-prohead protease4 PhiCh1p09, ORF8
4 PhiCh1p10, ORF9
Nmag_4259
75068468PhiH1_045963+hp32major capsid protein HP32PhiCh1p12, ORF11
Nmag_4260
84818933PhiH1_050453+-uncharacterized proteinPhiCh1p13, ORF12
Nmag_4261
89409542PhiH1_055603+adahead-tail adaptor protein AdaPhiCh1p14, ORF13
Nmag_4262
95399919PhiH1_060381+hcohead closure protein type 1 HcoPhiCh1p15, ORF14
Nmag_4263
992110,202PhiH1_065282+-uncharacterized proteinPhiCh1p16, ORF15
Nmag_4264
10,20210,636PhiH1_070435+nepprobable neck protein type 1 NepPhiCh1p17, ORF16
Nmag_4265
10,64311,239PhiH1_075597+tcotail completion protein type 1 TcoPhiCh1p18, ORF17
Nmag_4266
11,25912,557PhiH1_0801299+hp67tail sheath protein HP67PhiCh1p19, ORF18
Nmag_4267
12,60713,002PhiH1_085396+-probable structural proteinPhiCh1p20, ORF19
Nmag_4268
13,00613,407PhiH1_090402+-uncharacterized proteinPhiCh1p21, ORF20
Nmag_4269
13,57213,745PhiH1_095174-DUF4177 domain protein[SEH60446.1]
13,79216,581PhiH1_1002790+tpmtape-measure tail protein Tpm4 PhiCh1p23, ORF22
4 PhiCh1p24, ORF23
Nmag_4272
16,58317,104PhiH1_105522+-uncharacterized proteinPhiCh1p25, ORF24
Nmag_4273
17,10817,446PhiH1_110339+-uncharacterized proteinPhiCh1p26, ORF25
Nmag_4274
17,45018,298PhiH1_115849+-uncharacterized proteinPhiCh1p27, ORF26
Nmag_4275
18,30618,446PhiH1_120141+-CxxC motif protein[SEH61109.1]
18,44318,988PhiH1_125546+-uncharacterized proteinPhiCh1p29, ORF28
Nmag_4276
18,98819,146PhiH1_130159+-uncharacterized protein-
19,14319,508PhiH1_135366+-virus-related protein[AGM10900.1]
19,50519,867PhiH1_140363+-uncharacterized proteinPhiCh1p30, ORF29
Nmag_4277
19,87421,148PhiH1_1451275+bpjbaseplate J family protein BpjPhiCh1p31, ORF30
Nmag_4278
21,13522,277PhiH1_1501143+-uncharacterized proteinPhiCh1p32, ORF31
Nmag_4279
22,29522,678PhiH1_155384+-virus-related protein[AFH21897.1]
22,68323,249PhiH1_160567+-virus-related protein[AFH21653.1]
23,25225,504PhiH1_1652253+-repeat-containing tail fibre proteinPhiCh1p37, ORF36
Nmag_4282
PhiCh1p35, ORF34
Nmag_4286
25,50625,787PhiH1_170282+-uncharacterized proteinNmag_4285
25,82526,499PhiH1_175675+int1tyrosine integrase/recombinase Int1PhiCh1p36, ORF35
Nmag_4284
26,49026,792PhiH1_180303-uncharacterized proteinNmag_4283
26,79827,766PhiH1_185969-repeat-containing tail fibre protein 2PhiCh1p37, ORF36
Nmag_4282
PhiCh1p35, ORF34
Nmag_4286
27,80328,150PhiH1_190348+-YncB-like endonuclease[AGM11801.1]
28,15328,386PhiH1_195234+-virus-related protein[AGC34510.1]
28,37928,675PhiH1_200297+-uncharacterized protein[EMA49173.1]
28,68228,783PhiH1_205102+-uncharacterized protein-
28,78829,357PhiH1_210570+-transmembrane domain protein-
29,39429,642PhiH1_215249-uncharacterized protein-
29,65129,941PhiH1_220291-uncharacterized proteinPhiCh1p40, ORF39
Nmag_4289
30,10430,244PhiH1_225144+-uncharacterized protein-
30,25030,414PhiH1_230165+-uncharacterized proteinPhiCh1p44, ORF43
Nmag_4292
30,41130,806PhiH1_235396+-VapC family toxinPhiCh1p45, ORF44
Nmag_4293
30,80331,465PhiH1_240663int2tyrosine integrase/recombinase Int2PhiCh1p46, ORF45
Nmag_4294
31,68031,934PhiH1_245255+-uncharacterized protein-
31,93932,271PhiH1_250333+-uncharacterized proteinNmag_4297
32,42032,857PhiH1_255438-HNH-type endonucleasePhiCh1p48, ORF47
Nmag_4296
32,85433,255PhiH1_260402-uncharacterized protein[ELY96531.1]
33,24834,024PhiH1_265777-parA domain proteinPhiCh1p47, ORF46
Nmag_4295
34,16134,430PhiH1_270270repRrepressor protein RepR5 PhiCh1p49, ORF48
5 Nmag_4298
[ELZ06324.1]
34,73035,071PhiH1_275342+-uncharacterized protein-
35,06835,424PhiH1_280357+-uncharacterized proteinPhiCh1p50, ORF49
35,38138,167PhiH1_2852787+repHplasmid replication protein RepH4 PhiCh1p54, ORF53
4 PhiCh1p55, ORF54
Nmag_4299
38,26238,489PhiH1_290228immprobable immunity protein ImmPhiCh1p56, ORF55
Nmag_4300
38,73339,263PhiH1_295531+-transcriptional regulator, PadR-like familyPhiCh1p57, ORF56
Nmag_4301
39,26039,385PhiH1_300126+-CxxC motif protein-
39,38239,978PhiH1_305597+-uncharacterized proteinPhiCh1p59, ORF58
Nmag_4303
39,97540,133PhiH1_310159+-uncharacterized protein-
40,15340,902PhiH1_315750+pcnADNA polymerase sliding clamp PcnAPhiCh1p60, ORF59
Nmag_4211
40,90841,339PhiH1_320432+-uncharacterized proteinPhiCh1p61, ORF60
Nmag_4212
41,33941,554PhiH1_325216+-uncharacterized proteinPhiCh1p62, ORF61
Nmag_4213
41,54742,041PhiH1_330495+-uncharacterized protein-
42,09842,490PhiH1_335393+tnpAIS200-type transposase TnpA[CAP12925.1]
42,49243,748PhiH1_3401257+tnpBIS1341-type transposase TnpB[CAP12926.1]
43,80844,014PhiH1_345207+-uncharacterized protein-
44,00744,234PhiH1_350228+-uncharacterized proteinPhiCh1p66, ORF65
Nmag_4217
44,23144,656PhiH1_355426+-CxxC motif proteinPhiCh1p68, ORF67
Nmag_4219
44,64645,026PhiH1_360381+-uncharacterized proteinPhiCh1p69, ORF68
Nmag_4220
45,02345,646PhiH1_365624+-HNH-type endonuclease[KYG11427.1]
45,63945,926PhiH1_370288+-uncharacterized proteinPhiCh1p71, ORF70
Nmag_4222
45,91946,350PhiH1_375432+-DUF4326 domain proteinPhiCh1p72, ORF71
Nmag_4223
46,34346,441PhiH1_38099+-uncharacterized protein-
46,43846,884PhiH1_385447+-CxxC motif proteinPhiCh1p74, ORF73
Nmag_4225
46,86547,038PhiH1_390174+-uncharacterized protein5 PhiCh1p73, ORF72
5 Nmag_4224
47,03147,447PhiH1_395417+-uncharacterized protein-
47,44047,739PhiH1_400300+-NTPase protein[PLX87675.1]
47,73249,618PhiH1_4051887+dcm5C-5 cytosine-specific DNA methylase Dcm55 PhiCh1p81, ORF80
[PCR88664.1]
49,61149,931PhiH1_410321+-uncharacterized proteinPhiCh1p82, ORF81
Nmag_4234
49,91850,037PhiH1_415120+-CxxC motif protein-
50,09151,452PhiH1_4201362+yhdJDNA methylase N-4/N-6 domain protein YhdJPhiCh1p83, ORF82
Nmag_4235
51,44952,024PhiH1_425576+-uncharacterized proteinPhiCh1p84, ORF83
Nmag_4236
52,02152,791PhiH1_430771+-uncharacterized proteinPhiCh1p85, ORF84
Nmag_4237
52,78453,152PhiH1_435369+-uncharacterized proteinPhiCh1p88, ORF87
Nmag_4240
53,14553,504PhiH1_440360+-uncharacterized proteinPhiCh1p89, ORF88
Nmag_4241
53,78854,369PhiH1_445582+-CxxC motif proteinPhiCh1p90, ORF89
Nmag_4242
54,40354,771PhiH1_450369+-uncharacterized proteinPhiCh1p91, ORF90
Nmag_4243
54,79455,147PhiH1_455354+-uncharacterized protein-
55,14455,401PhiH1_460258+-transmembrane domain proteinPhiCh1p93, ORF92
Nmag_4244
55,39455,729PhiH1_465336+-transmembrane domain protein 3PhiCh1p94, ORF93
Nmag_4245
55,79457,053PhiH1_4701260+ycdADNA methylase N-4/N-6 domain protein YcdAPhiCh1p95, ORF94
Nmag_4246
57,04657,564PhiH1_475519+-uncharacterized proteinPhiCh1p96, ORF95
Nmag_4247
57,62157,830PhiH1_480210+-CxxC motif proteinPhiCh1p98, ORF97
Nmag_4249
57,827><63PhiH1_485309+terSterminase small subunit TerSPhiCh1p01, ORF98
Nmag_4250
1 PhiCh1/pNMAG03 homologs of phiH1 proteins show BLASTp E-values < 10−20. For phiCh1 proteins, both the PhiCh1p and originally assigned ORF codes (ORF for open reading frame) are shown (e.g., PhiCh1p02, ORF1). Codes starting with ORF represent the original annotation of the phiCh1 genome [17] (GB accession AF440695.1); and codes starting with PhiCh1p represent the RefSeq version of the annotation of the same genome sequence (GB accession NC_004084). The number shift is due to the terS gene, the N-terminal part being encoded at the end of the genome, and the C-terminal part at its beginning. This ORF is complete in the provirus state due to circularization and in the linear virus state due to terminal redundancy. This gene is ORF98 in the original annotation and PhiCh1p01 in the RefSeq annotation. Codes starting with Nmag_ represent the annotation of the Natrialba magadii plasmid pNMAG03 [20] (accession CP001935.1). The point of ring opening in pNMAG03 was set between Nmag_4303 and Nmag_4211. Codes in square brackets represent NCBI accessions referring to homologous proteins (BLASTp E-values ≤ 10−11), which are from other sources. 2 Gene PhiH1_185 is encoded on an invertible segment. In the current sequence version, it is inactivated because it is uncoupled from a start codon. By genome inversion, it becomes activated while its partner gene PhiH1_165 becomes inactivated. Overall, this results in tail fibre protein switching. 3 This protein (PhiH1_465) has three predicted transmembrane domains and has been suspected to function as a holin [18]. 4 In these cases, the phiCh1 gene is split into two CDS but is continuous in phiH1. 5 These proteins are more distantly related (show less than 39% sequence identity or fall above BLASTp E-values of 10−20). In these cases, a similar genetic context supports their stated relationship.
Table 2. CRISPR spacers matching phiH1.
Table 2. CRISPR spacers matching phiH1.
No.CRISPR Spacer Matches to phiH1 1Translation 2
Genes 09 00493 i001
1 The matching spacer sequences were found in the following NCBI bioprojects using the crass program: PRJNA337743, (SRA SRR4030040; Alviso Ponds, San Francisco, CA, USA; metagenome); PRJNA245787 (Halostagnicola sp. A56 26 genome; Andaman Islands, India); PRJEB18068 (Lake Meyghan, Iran; metagenome). Aligned sequences show nt positions for phiH1, and asterisks indicated identical bases. DR: direct repeat (with haloarchaea containing most closely matching DR shown in brackets). 2 Symbols under alignment (*:.) indicate identical, similar and weakly similar residues, respectively (based on Gonnet PAM 250 matrix).

Share and Cite

MDPI and ACS Style

Dyall-Smith, M.; Pfeifer, F.; Witte, A.; Oesterhelt, D.; Pfeiffer, F. Complete Genome Sequence of the Model Halovirus PhiH1 (ΦH1). Genes 2018, 9, 493. https://doi.org/10.3390/genes9100493

AMA Style

Dyall-Smith M, Pfeifer F, Witte A, Oesterhelt D, Pfeiffer F. Complete Genome Sequence of the Model Halovirus PhiH1 (ΦH1). Genes. 2018; 9(10):493. https://doi.org/10.3390/genes9100493

Chicago/Turabian Style

Dyall-Smith, Mike, Felicitas Pfeifer, Angela Witte, Dieter Oesterhelt, and Friedhelm Pfeiffer. 2018. "Complete Genome Sequence of the Model Halovirus PhiH1 (ΦH1)" Genes 9, no. 10: 493. https://doi.org/10.3390/genes9100493

APA Style

Dyall-Smith, M., Pfeifer, F., Witte, A., Oesterhelt, D., & Pfeiffer, F. (2018). Complete Genome Sequence of the Model Halovirus PhiH1 (ΦH1). Genes, 9(10), 493. https://doi.org/10.3390/genes9100493

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop