Elusive Recurrent Bacterial Contamination in a Diatom Culture: A Case Study
Round 1
Reviewer 1 Report
Generally, failure information is hardly shared due to its difficulty, and scientists hesitate to publish such results. In the present study, the authors deal with the series of diatom NGS data consecutively cultured in their laboratory and found a large amount of contaminated bacterial sequence reads in the prepared libraries. As the authors mentioned, this information may provide guidelines for future work on axenic diatom culture and methods for managing contamination. I appreciate their opinion that sharing the failure information among the scientific community is beneficial to the scientists who do not have sufficient experience with specific experiments, however, I have several concerns in this manuscript, especially in the data presentation.
- This manuscript contains two main figures and one table. However, they are poorly organized and do not tell what the authors want to say. Fig. 1 shows the number of bacterial reads in a transcriptome data of F. radians. Why do authors show only this NGS data? As they present, more than ten NGS runs as shown in Table 1. What about the other NGS runs?
- The interpretation of Fig.2 is insufficient. Why did only G9_illumina show a different pattern, and why can the authors conclude this was contaminated? The figure is hard to see and difficult to understand.
- The authors performed 16S rDNA amplicon sequencing, but the data has not been presented in the manuscript. What bacteria is present other than Sphingomonas?If the topic of this manuscript is related to contamination management, the population and kind of contaminants should be presented and discussed.
Minor point
- The information of the type of sequencing analysis (DNA resequencing or transcriptome analysis) is lacking in Table 1.
- 104 cells/mL (letter 4 should be shown as superecript) (Page2).
- NGS data should be deposited on Genbank/EMBL/DDBJ databases.
- It should be described more detail about Emmelien Vancaester’s data (Page 3).
- 16 mmol/m2 (square should be shown as superscript) (Page 3).
- The contents of Ref16 should be briefly written in the sentence (Page 4).
Author Response
(Note to editors: the reviewer's comments did not include p.4. We have preserved the original numbering, which is why response to question 5 goes right after the response to question 3)
1. In our opinion, the interpretation of transcriptomic read mapping is fundamentally different from that of genomic reads. By their nature, transcriptomic reads should predominantly map to exons, while genomic ones should map more or less evenly all over the genome, regardless of coding status of any particular region. Thus, it makes sense to measure transcriptomic mapping in terms of reads per gene, while genomic mapping is measured in reads per contig, but not vice versa. With these two mappings being essentially measured in different units, we have decided to show them separately: transcriptome in Figure 1 and genomic libraries in Table 1.
Further, Fig. 1 and Table 1 make different statements about the underlying biological reality: the former shows that the genes in question, unlike the majority of other genes present in the assembly, are not a part of F. radians active gene repertoire (at least as observed in 2019-2020 under conditions used in our transcriptomic experiment). By itself, this fact doesn’t prove that they are not part of the genome, because they could’ve been pseudogenized or expressed only under some exotic conditions. This is unlikely in light of other evidence, but not disproven by transcriptome alone.
The mapping of genomic reads, on the other hand, directly shows that scaffolds in question are not evenly covered by all sequencing runs even within a single strain. Unlike with transcriptome, there is no plausible explanation for these results that doesn’t involve contamination.
Thus, the difference between genomic and transcriptomic data warrants separate representation and separate discussions. The corresponding paragraphs were rewritten to emphasize this difference.
2. Fig. 2 shows the read coverage of 16S alignment. For most libraries this coverage forms similar roughly U-shaped curves that correspond to random placement of reads. The curve is not flat because the number of gaps in the alignment is not uniform, and this seemingly low-coverage central region corresponds to more variable (and more practically important for database creators, due to its importance for metabarcoding) V2-V6 regions with numerous gaps. For library GKAKADZ02, on the other hand, the curve is different: it contains the same two peaks on the edges (they are lower, but this is just the artifact of normalizing coverage), but there is also a huge sharply defined plateau in the middle. It means that the library contains a lot of reads that start and end on certain precise positions, instead of being distributed more or less randomly. A normal shotgun library should not produce such a plateau, but an SSU amplicon should. Taken together with an elevated bacterial diversity and the absence of non-16S bacterial DNA, this result suggests contamination with such an amplicon.
The text was rewritten to explain the idea behind this analysis more clearly.
3. Complete taxonomies for all libraries are available in Supplementary Table 1. Those contain unclassified and chloroplast-derived reads for samples from 2020 and 2021, and also include Sphingomonadales in the sample from 2017. In our opinion, the rest of the phylotypes can be safely ignored: they consist of very few reads (the very largest one, cand. Udaeobacter in the 2021 sample, has 7 reads in the library of 27.4 thousands, and the vast majority are singletons), and together they cover under 0.5% of each library. Such small phylotypes are often excluded from downstream metabarcoding analyses due to their unreliability. We have decided that it would be more prudent to present their composition in the Supplement, if only as an example of false positive observation. However, we do not believe that they merit more than a brief mention in main text.
Excluding these spurious phylotypes and unclassified reads, amplicons from 2020 and 2021 are purely organellar, and the sample from 2017 also includes Sphingomonas sp. (but no other bacteria). The text was edited to state this more explicitly.
5. As stated in its description, Table 1 only includes genomic libraries. Transcriptomic data are described separately in previous paragraphs and Fig. 1 (see our response to question 1). For this reason, we do not believe that it is necessary to separately note the genomic nature of each library.
6. The typo was corrected.
7. As stated in the “Data availability” section, all sequencing data were deposited in NCBI SRA under following BioProject IDs:
PRJNA762154: 16S amplicons produced in this work
PRJNA484600: transcriptomic libraries for F. radians strains Ax BK280 and A6 (the latter was not used in this paper)
We have also added the reference for original genomic libraries:
PRJNA764820: genomic reads for F. radians strains G9 and Ax BK280 (not released publicly, pending the release of this paper)
8. The data provided by Dr Emmelien Vancaester consisted solely of a list of genes/scaffolds they have identified as Sphingomonas. We have added total numbers of Sphingomonas-derived genes and scaffolds to the text. A more detailed description of their research is also available in Introduction and Results.
9. The typo was corrected.
10. The paragraph was expanded to include more details of ref. 16.
Reviewer 2 Report
The presented work is devoted to a relevant topic, the control of organisms axenic culture for uncontrolled (invisible) bacterial contamination at the level of analysis of the results of whole genome sequencing. In the course of analyzing the whole genome data of the axenic culture of the Baikal diatom Fragilaria radians, the authors of the work identified parts of genomic scaffold related to the genome of bacteria of the genus Sphingomonas. The analysis scheme made by the authors can be used in other studies on the control of xenic cultures.
A number of important comment s can be made on the work.
- Authors should provide the results of microscopy with using DAPI staining of the radians starting material, which was used to isolate cells before culture creation, and microscopy with using DAPI staining of the culture during cultivation. To demonstrate the absence of obvious microbial contamination.
- The authors of the work do not provide any information on the degree of homology of genes from scaffoldes of putative Sphingomonas from their genomic assemblies with reference genomes from the genus Sphingomonas from NCBI ReSeq or other databases. Do Sphingomonas scaffolds from axenic culture of F. radians contain full-length 16S rRNA genes, and how closely related are these genes to other full-length 16S rRNA genes from the databases? All this data must be added to the work and, if possible, visualized in the form of phylogenetic reconstructions.
- It is not clear from the work whether the Sphingomonas scaffolds from different assemblies (assemblies of different sequencing rounds, different platforms, and different cultures) were compared for similarity. Did these scaffolds differ in GC composition, frequencies of K-mer and similarities in the same genes (if the same genes were found in scaffolds of different assemblies)? Is it possible to say that it the same strain (or species) of Sphingomonas was encountered in different cultures in different rounds of sequencing. Authors need to answer these questions in their work.
- Is it possible, on the basis of a functional analysis of the genes that make up Sphingomonas scaffolds, to suggest whether this microorganism is an intracellular parasite symbiont of radians. Such parasitic microorganisms are often found in nature. If Sphingomonas is an intracellular parasite of F. radians, then it is understandable why it was not possible to get rid of it during culture preparation and fix it by microscopy.
After eliminating the comments and re-reviewing the work, it can be published in the journal.
Author Response
1. Microphotographs of cells used in production of contaminated libraries (both in 2011-2012 and 2017) were added as Fig. 3.
2. Full-length Sphingomonas 16S rRNA is present in the genomic assembly. Although it has multiple ≥99% identity hits in both SILVA reference alignments and NCBI nr, all these hits come from uncultured bacteria and do not have a precise taxonomy. Thus, we make no claims regarding the exact species of the contaminant.
The text was edited to include this observation. We have also performed phylogenetic analyses of 16S sequences, described in our response to question 3.
3. The entire analysis is centered on a single assembly, namely the one published in 2015 by Galachyants et al. [2]. Further, all Sphingomonas-containing libraries were found to come from a single sequencing platform (454 Roche), a single sequencing facility (the one in Limnological Institute SB RAS), and a relatively brief period of time (February 2011 to October 2012). Since none of the contaminated libraries alone has enough Sphingomonas material to assemble a complete genome, no genomics-scale comparisons could be performed. The only other observation of Sphingomonas sp. in supposedly axenic culture was made by 16S sequencing of 2017 biomass. There is also an earlier description of Sphingomonas (or possibly Novosphingobium) co-isolated with F. radians before axenization (Zakharova et al. 2010, added to the paper as [25]).
To clarify the relationships between these three strains, phylogenetic trees were built using assembly-derived 16S from 2011-2012, OTU consensus sequences from 2017, and Sanger 16S sequence from 2010. The trees show that contamination in 2011-2012 and 2017 was definitely caused by two different strains. There is some uncertainty as to whether the strain from 2011-2012 is sister to that observed in 2010, but we believe that it is more likely to be unrelated.
These results were added to the paper. Both phylogenetic trees were added as Suppl. 2, and a simplified version of Suppl. 2A was added to the main text as Fig. 4.
4. We are certain that Sphingomonas’ hypothetical intracellular lifestyle wouldn’t be able to prevent it from being detected. Both diatom strains were used in numerous cytological studies involving epifluorescent microscopy and transmission electron microscopy (e.g. https://doi.org/10.3390/d13100469), in which it would be impossible to miss an intracellular parasite. DAPI also stains the DNA within a diatom cell (for example, its own genome is visible in most photographs on Fig. 3, in the middle of the cell), so it is equally useful in detecting extra- and intracellular bacteria.
As for the genome-based studies, we see no reliable way of proving (or disproving) the parasitic lifestyle using available data. Parasitism is usually associated with genome reduction and loss of metabolic pathways, but according to BUSCO analyses, incomplete genome recovered from the F. radians assembly misses approx. 15% of the genes. It would be very difficult to distinguish between true gene losses (which should be relatively few, considering that most members of this genus are free-living and therefore transition to a parasitism has to be very recent) and lack of observations caused by insufficient data.
Round 2
Reviewer 2 Report
I carefully read the new version of the manuscript and the authors' responses to my comments. The authors made all the necessary corrections and additions in their work. The manuscript can be accepted for publication.