2.1. Sampling in the Solfatara Hot Springs, mDNA Extraction, and Sequencing
The aim of this study is to explore the microbial communities populating three sites of the Pisciarelli hot springs (40°49′45.1″ N; 14°8′49.4″ E), named Site A, Site B, and Site C (
Figure 1A) and investigate their differences in terms biodiversity and potential source of enzymes.
Site A (94.1 °C; pH 5.2) is the largest pool of the area, mainly made by water and mud. Recently, the microbial community populating Site A (then called Pool 1) has been characterized through a metagenomic approach showing the dominance of Archaea belonging to the genera
Acidianus (40%) and
Pyrobaculum (25%), as well as the presence of sequences related to new phyla ascribable to the Sulfolobaceae family. A dramatic morphological change in the area of interest, which occurred at the end of July 2019, led to the expansion of the area previously known as Pool 1, which merged to the proximal Pool 2 pool (92 °C, pH 1.5), generating a completely new site (Site A). This new pool, if compared to Pool 1, changed both in terms of extension of the pool boundaries and in terms of temperature, showing, in particular, an increase of about 9 °C compared to the conditions previously observed [
20].
Site B (47.7 °C; pH 5.8) is mainly composed of mud and is physically adjacent to Site A from which it receives part of the liquid fraction. Nonetheless, Site B has its emission of gas, which can be observed through the formation of bubbles that contribute to the mixing of the liquid on the surface.
Site C (73.0 °C; pH 2.5) differently, is physically distant from Site A and Site B and is a shallow pool of water located near to the rocky wall of the area (
Figure 1A) and characterized by intense steam jets.
Temperature and pH were monitored in-situ at the three sites and the samples, composed of water and sediment, were collected and taken to the laboratory where, by centrifugation, sediments of 17 g, 50 g, and 40 g were obtained from Sites A, B, and C, respectively.
The sediment obtained from each sample was treated for the extraction of the whole mDNA obtaining 27, 60, and 164 ng/g of sediment from Site A, Site B, and Site C, respectively (
Figure 1B,C).
The mDNA was then sequenced in outsourcing by Novogene-Europe (Cambridge, UK) through Illumina MiSeq (150 PE), obtaining 23,830,104 clean reads from Site A, 22,933,864 from Site B, and 23,961,446 from Site C.
2.2. Microbial Communities
To evaluate the composition of the microbial communities populating the three sites, the obtained reads were analyzed by blastn against the NCBI NT nucleotide database.
The analysis revealed that all three sites are dominated by Archaea, and Site A had the highest number of reads (84%) assigned to this kingdom, followed by Site B (67%) and Site C (53%) (
Figure 2). It is worth noting that each site showed a high number of reads that had no match in the NT database (unassigned). Notably, unassigned reads represented 15, 27, and 41% of the whole reads of Site A, B, and C, respectively. Furthermore, Site B had the highest number of reads assigned to the kingdom of Bacteria (5%) compared to the Sites A and C, where less than 1% of sequences could be assigned to this kingdom. Site C, on the contrary, showed a percentage of viral sequences (4.6%) higher than Sites A and B (0.3 and 0.8%, respectively).
The detailed investigation of the taxonomically assigned reads, at the level of genus and species (
Figure 3 and
Figure 4), revealed that Site A is mainly dominated by the genus
Acidianus (47%), in particular the species
A. ambivalens (30%) and
A. hospitalis (16%), followed by the genus
Pyrobaculum (35%) mainly attributable to the species
P. arsenaticum (31%). This result differs from what was previously observed on the Pool 1 site in which we observed the dominance of
A. hospitalis (40%) followed by
P. arsenaticum (20%) [
20].
A notable difference between the result observed in Site A and the previous study consists in the percentage of reads of unknown origin, which decreased from 32% in Pool 1 to 15.4% in Site A. This variation indicates a change in biodiversity of the microbial community populating the pool as an effect of the geothermal events that occurred between March 2012 and July 2019 (Pool 1 Biosample ID: SAMN09692669).
Site B, alike Site A, also showed
Acidianus (40%) as the dominant genus, followed by
Pyrobaculum (25%) (
Figure 3) with a relative abundance of species comparable to what was observed in Site A (
Figure 4). This parallelism can be explained considering that the two sites are not physically distant to each other and that the liquid fraction present in Site B was partially provided by the proximal Site A (
Figure 1A). Indeed, the main difference between Site A and Site B is that in the latter, 5% of the reads were assigned to bacterial origins and the number of sequences not assigned to known microorganisms was 1.7-fold higher than those present in Site A.
Reads assigned to the genus
Acidianus were also present in the sample from Site C. These, however, represented only 18% of the total reads, while the dominant genus was represented by
Metallosphaera (33%) (
Figure 3), mainly attributable to the species
M. prunae, which is known to grow at temperatures between 55 and 80 °C and pH range 1.0–4.5 [
25]. This result is in line with what was previously observed in another extremely acidic site of the Pisciarelli hot spring, named Pool 2 (92 °C, pH 1.5) (Pool 2 Biosample ID: SAMN09692670), where
Metallosphaera sedula was the dominant species [
20]. Nonetheless, the most abundant component (41.5%) of Site C consisted of reads that did not match any known sequence.
Another remarkable difference between Site C and the other two sites was the number of viral reads identified in this sample. As mentioned above, 4.6% of Site C reads were taxonomically assigned to viruses. Among these, viruses mainly belonging to Bicaudaviridae (different variants of Acidianus two-tailed virus and disparate types of Sulfolobus monocaudavirus), Ampullaviridae (Acidianus bottle-shaped viruses), Fuselloviridae (Sulfolobus spindle-shaped viruses), and Ligamenvirales (Sulfolobales rod-shaped viruses) were identified.
It is known that viruses play a key role in horizontal gene transfer (HGT) in prokaryotes [
26]. Transfer of DNA has been shown to be involved in genome evolution and in adaptation to high temperatures [
27]. In particular, it has been proven that spindle-shaped fuselloviruses that infect
Sulfolobus and
Acidianus species can promote the virus-mediated HGT between different hosts [
28], contributing significantly to the dynamic of the prokaryotic genomes. Thus, the presence in high percentages of viral sequences in Site C might be attributed as a survival mechanism against rapid environmental changing of this extreme site. Therefore, the peculiar microbial composition of Site C is presumably related to the considerably more acidic pH value if compared to the other two mud pools.
2.3. Analysis of Bacteria Communities
To evaluate the bacterial communities present in Sites A, B, and C, the reads of the three samples were analyzed in detail in the NT database (
Table 1).
In Site A, 22,276 were assigned to bacteria (
Figure 5A) whose most abundant genera were represented by the mesophilic/moderately thermophilic bacterium
Acidithiobacillus (15%) and the hyperthermophilic
Hydrogenobacter (8%), while the remaining 78% belonged to different genera whose relative abundance was less than 7% (
Figure 5B). Regarding the genus
Acidithiobacillus it is important to highlight that, although this is mainly represented by mesophilic microorganisms, it also groups the moderately thermophilic
Acidithiobacillus caldus with an optimal growth pH between 2.0–2.5 and with an optimal temperature of 45 °C [
32].
As previously indicated, among the three sites, Site B showed by far the highest percentage of bacterial reads (1,188,674). The analysis of the relative abundances of bacteria present in the microbial community of Site B (
Figure 5B) allowed us to identify the hyperthermophiles
Thermoanaerobacter (26%) and
Caldanaerobacter (9%), the thermophile
Thermoanaerobacterium (10%), and
Acidithiobacillus, which, unlike in Site A, was much less abundant here (4%). It is important to note that one of the most abundant bacterial genus present in Site B was
Thiomonas (14%), which, although generally grouping mesophilic species, also includes moderately thermophilic species identified in geothermal sources at ~45 °C and able to grow at temperatures up to 50 °C and in the pH range 4.0–7.0 [
29,
30,
31].
In Site C, 141,784 reads were assigned to the kingdom of Bacteria. Of these, 15 and 52% were assigned to the genera
Shigella and
Escherichia, respectively (
Figure 5A). Since both are mesophilic gammaproteobacteria whose natural habitat is the human and animal gut [
33], these reads were considered as environmental contaminations and not taken into account for the purpose of evaluating the bacterial population.
Among the remaining 46,141 reads of Site C, the most abundant genera were
Clostridium (10%) and
Aeromonas (8%). The
Clostridium genus includes obligate anaerobic bacteria and Gram-positive bacteria and are capable of forming spores in adverse environmental conditions, which populate soil, sand, rivers, swimming pools, river bank mud, and marine sediments [
34]. Unfortunately, the low number of reads from Site C, associated with this genus, has not allowed for a more detailed taxonomic annotation, and it is therefore currently impossible to trace the species present in the sample. However, recently, two studies on microbial communities populating Malaysian hot springs (temperatures range 50–110 °C) [
35] and five hot springs in Eritrea (temperatures between 45 and 100 °C) [
36] revealed the presence of various human pathogens, including
Clostridium spp. and
Aeromonas. In addition, a novel thermophilic
Clostridium species (
C. thermarum) from a thermal spring in China has recently been identified and characterized [
37].
2.4. Assembly, Clustering, and Taxonomic Analysis of Unassigned Reads
To identify possible chunks of individual genomes present in the three samples, all the reads were separately assembled by MEGAHIT [
38] obtaining 6296, 38,136, and 16,854 contigs in Sites A, B, and C, respectively (
Table 2).
Contigs with a length ≥1000 bp were analyzed by MyCC [
39], thus allowing the identification of 25 clusters in Site A, 21 clusters in Site B, and 16 clusters in Site C (
Figure S1).
The clusters obtained were then analyzed by CheckM [
40], which allowed us to validate those with completeness values ≥20%, obtaining five clusters in Site A (7, 12, 17, 22, and 23), fifteen clusters in Site B (2, 3, 4, 6, 7, 8, 9, 11, 13, 14, 15, 16, 18, 19, and 20), and five clusters in Site C (1, 4, 5, 8, and 10) (
Tables S1 and S2).
To obtain a taxonomic assignment, the validated clusters were analyzed by Diamond (in blastx mode) using the NCBI Refseq Protein database [
41] (
Table 3). The result of this analysis made it possible to note the Site A clusters as belonging to the
Crenarchaeota phylum, in particular related to the genera
Acidianus,
Pyrobaculum, and
Desulfurococcus.
Site C, on the contrary, was characterized by clusters entirely related to Acidianus spp. and by two clusters (4 and 5) with a highly heterogeneous assignment that prevented the identification of a dominant species.
Differently, the clusters identified in Site B were mainly assigned to bacteria belonging to the phyla Firmicutes and Proteobacteria, confirming what was already observed from the taxonomic analysis of the reads. Three of the clusters of Site B were assigned to the Crenarchaeota phylum, related to Desulforococcus spp. and P. arsenaticum.
To identify the origin of the reads that had no match against the NCBI NT nucleotide database, these reads were aligned using Bowite2 [
42] (
Table 4). Regarding the unassigned Site A reads, most of them were aligned with clusters 9, 22, and 23. The first two clusters were assigned to the genus
Acidianus (
Table S1) and presumably represented the result of HGT, as previously reported [
20].
Instead, cluster 23 was assigned to the genus Hydrogenobacter, suggesting the possible presence of microorganisms not yet identified belonging to the Aquificaceae family.
As regards the unassigned Site B reads, these were aligned mainly against clusters 7 and 13. While cluster 13 was assigned to Pyrobaculum arsenaticum, indicating also in this case a probable HGT event, cluster 7 was instead taxonomically identified only at the family level as Sulfolobaceae.
The unassigned reads of Site C represented a completely special case. Indeed, these mostly aligned (>85%) to cluster 10, whose taxonomic analysis was classified as related to the genus
Acidianus. Observing the contamination value of cluster 10 (45%) (
Table S1), with which CheckM indicated the percentage of the expected number of duplicate single-copy markers, it was legitimate to assume that this cluster had grouped contigs belonging to species of
Acidianus not yet identified.
2.6. Functional Annotation and CAZome Analysis
To assess the metabolic potential of the microbial consortia populating the three sites, the contigs obtained by the assembly were analyzed by Prodigal [
44] identifying 14,933 ORFs on Site A, 81,938 ORFs on Site B, and 31,179 ORFs on Site C. Then, the amino acid sequences of the identified ORFs were functionally classified using the COG and SEED databases (
Figure 6 and
Figure 7).
The analysis of the three samples showed an average comparable distribution of the functional categories reported in both databases. However, by observing in more detail the classification according to the COG database (
Figure 6), it is possible to observe a marked difference in relation to the sequences assigned to the functional category “Signal transduction mechanisms” (Category T), where the percentage of ORFs of Site B was two-fold greater than those of Sites A and C. A more in-depth analysis of the ORFs assigned to this category revealed that this difference was mainly due to the high number of ORF annotated histidine kinases of bacterial origin.
In addition, the annotation using the SEED database (
Figure 7) showed an average homogeneous distribution between the functional classes, but there were clear differences in relation to the category “Protein Metabolism”, more abundant in Site A; “Carbohydrates”, more abundant in Site C, where it was also the most represented category; as well as in the categories “Mobility and Chemotaxis” and “Dormancy and Sporulation”, in which Site B clearly dominated the other two. In particular, regarding the “Dormancy and Sporulation” category, this was mainly composed of ORFs annotated as Stages 0, I, II, III, IV, and V sporulation proteins, confirming the presence of different sporogenic bacteria in this site.
As for the “Mobility and Chemotaxis” category, the main differences were related to the presence of ORFs of bacterial origin involved in the structure and mobility of the flagellum, while in Site C, the ORFs annotated as “Archaeal Flagellum” and “Bacterial Chemotaxis” (dipeptide-binding ABC transporter) were most abundant.
In all three samples, the largest number of ORFs functionally annotated belonged to the categories “Carbohydrate Transport and Metabolism” and “Carbohydrate” of COG and SEED, respectively. A similar result was previously observed in Pool 1 and Pool 2 and related to the abundant vegetation around the Pisciarelli thermal spring, rich in starch, hemicellulose, and pectins, which could represent an available carbon source for the microbial communities populating these geothermal sites [
20].
To map the difference of the enzymatic activities involved in the synthesis, degradation, and modification of carbohydrates (CAZymes) in the Sites A, B, and C, the ORFs of the three samples were analyzed by dbCAN2 [
45]. The assessment of the taxonomic origin of the identified CAZymes revealed that in Sites A and C, the highest number of CAZymes belonged to the phylum of the Crenarchaeota (76 and 93% respectively), while 71% of the CAZymes identified in Site B belonged to the phylum of the Firmicutes (
Table 6).
The genus analysis (
Figure 8) indicated a higher number of
Acidianus-related CAZymes for Sites A and C. In addition, while Site A had numerous activities related to
Pyrobaculum and
Desulfurococcus and to several (hyper)thermophilic bacteria of the phylum Aquificae (
Hydrogenobacter,
Thermocrinis, and
Aquifex), the CAZymes of Site C were mainly assigned to the genera
Metallosphaera, Saccharolobus, and
Sulfolobus. Differently, the greater number of CAZymes identified in Site B belonged to the thermophilic bacteria of the genus
Thermoanaerobacterium,
Thermoanaerobacter,
Caldanaerobius,
Caldanaerobacter, and
Desulfotomaculum (
Figure 8).
However, although it was possible to annotate the identified CAZymes at the genus level, less than 50% of these had an identity ≥95% compared to sequences already present in the Refseq Protein Database, indicating the presence of new sequences related to carbohydrate active enzymes (
Figure 9,
Tables S4–S6).
Glycosidases (GHs) represented 28%, 43%, and 27% of the CAZymes identified in Site A, Site B, and Site C, respectively. Among the families of GHs common in all three sites (
Figure 10,
Table 7), GH13, GH15, GH31, GH57, GH122, and GH133 were identified, which groups enzymes mainly active on α-glycosidic bonds, indicating the presence of pathways of degradation of starch and amylopectins used as energy reserves in plants around the area (mainly ferns, dicots, and grass) [
20].
Among the identified GHs, 48 of these were present only in Site B (
Table 7). The analysis of aminoacidic sequences showed that these were associated exclusively with (hyper)thermophilic bacteria with identities between 33% and 100% compared to the sequences present in the NCBI NR database (
Table S5).
A large fraction of the identified CAZymes was annotated as glycosyltransferase (GTs). In particular, in Sites A and C they represented the most abundant activity class (62% and 58% of the CAZymes, respectively), while in Site B GTs were only 38%.
Eight hypothetical families of GTs were identified exclusively in Site B (
Figure 11A), namely GT1, GT13, GT27, GT41, GT76, GT81, and GT104, which are involved in glycosylation mechanisms of proteins and peptides, and GT47, which groups heparan β-glucuronyltransferase, xyloglucan β-galactosyltransferase, heparan synthase, and arabinan α-L-arabinosyltransferase. The aminoacidic sequence of the ORFs classified in these families revealed identity percentages between 48% and 99% compared to the sequences present in NR database, and a bacterial origin related to the genera
Acidithiobacillus,
Anaeromusa,
Caldanaerobacter,
Desulfofarcimen,
Desulfotomaculum,
Desulfurella,
Planifilum,
Syntrophorhabdus,
Thermoanaerobacter,
Thermoanaerobacterium, and
Thiomonas.
In addition, with regard to carbohydrate esterases (CEs) (
Figure 11B), Site B showed unique families (CE7, CE12, CE15, and CE16) which group putative acetyl xylan esterases, pectin acetyl esterases, rhamnogalacturonan acetyl esterases, 4-O-methyl-glucuronoyl methylesterases, and acetyl-mannan esterases with identity percentages between 58 and 100% with CEs identified in members of the genera
Thermoanaerobacter and
Desulforella (
Table S5), and which might be involved in the metabolism of hemicellulose polysaccharides.
Site B has also been shown to be particularly rich in hypothetical carbohydrate-binding modules (CBMs), in auxiliary activities (AAs) and in polysaccharide lyases (PLs) (
Figure 11C,D). The hypothetical CBMs identified exclusively in Site B are mainly involved in the degradation of starch and amylopectin (CBM20, CBM25, CBM41) and of cellulosic and hemicellulosic polysaccharides (CBM6, CBM22, CBM23, CBM32, CBM54, CBM59) with an identity percentage between 43% and 99% with bacterial sequences, mostly associated with the
Thermoanaerobacter and
Caldanaerobacter genera (
Table S5). In contrast, the only exclusive CMB family identified in Site A was CBM4, which groups specific modules for xylan, β-1,3-glucan, β-1,3-1,4-glucan, β-1,6-glucan, and amorphous cellulose but not crystalline cellulose. The single sequence in Site A, annotated as CBM4, shows 97% identity with the cellulase C (GH16) from
Cellvibrio mixus containing a CBM4 [
46].
The hypothetical AAs present in all three sites belong to the families AA6 (1,4-benzoquinone reductases) and AA7 glucooligosaccharide oxidases and chitooligosaccharide oxidases, showing >80% identity in Sites A and C with sequences mainly associated with the phylum of the Crenarchaeota. In Site B, the sequences of which more than 50% of identity have mainly been attributed to the bacterial phyla of Firmicutes and Proteobacteria as well as for the families AA1, AA2, and AA3, were identified exclusively in this sample.
Finally, only in Site B ORFs annotated as PLs were found. Among these, in particular, there was a sequence assigned to the PL10 family, which includes pectate lyase, and three sequences assigned to the PL15 family, which includes alginate lyase, oligoalginate lyase/exo-alginate lyase, heparin lyase I, and heparin lyase III. While the only identified PL10 showed a low identity (47%) with the pectate lyase from
Pelosinus sp.
UFO1, the three ORFs classified as PL15 showed identities between 90 and 99% with hypothetical proteins associated with the thermophilic genera
Thermoanaerobacter and
Caldanaerobacter (
Table S5).
The remarkable number of sequences encoding putative CAZymes makes the Pisciarelli microbial population an attractive source of novel thermophilic biocatalysts for industrial applications. The functional annotation here described represents a preliminary survey, but already promising a relevant biochemical potential of the microbial consortia of the different geothermal sites in the Pisciarelli area. Indeed, future research will carry out more detailed studies on these extremophilic communities and their CAZomes in order to deepen their knowledge and exploit their biodiversity in biotechnological processes.