2.1. Enzyme Characterization of S. enterica CpdB Protein
This CpdB protein, devoid of its signal sequence, was overexpressed in BL21 cells from plasmid pGEX-6P-3-S.enter_CpdB which encodes a fusion protein GST-CpdB. The recombinant enzyme present in cell lysate supernatants was purified by affinity to GSH-Sepharose and recovered free of the GST tag by specific proteolysis with PreScission protease. An extensive enzymatic characterization was performed using 3′-nucleotides, 2′,3′-cNMP, linear dinucleotides, cyclic di-, tetra- and hexanucleotides, among other substrates. The cyclic oligoadenylates c-tetra-AMP and c-hexa-AMP are second messengers produced by type III CRISPR-Cas systems [
43], and there is little data about phosphodiesterases hydrolyzing them other than the so-called ring nucleases [
44,
45,
46]. So far, c-tetra-AMP and c-hexa-AMP had not been tested as substrates of CpdB-like enzymes.
With all the substrates of CpdB, except 2′,3′-cGAMP, saturation kinetics were studied by assaying initial rates of phosphohydrolysis at different substrate concentrations. Estimations of
kcat,
KM and catalytic efficiency (
kcat/
KM) were obtained by non-linear regression of the Michaelis–Menten equation to the experimental data. In the case of 2′,3′-cGAMP, the catalytic efficiency was estimated from initial-rate assays directly proportional to substrate concentration. The results are shown in
Table 1 in order of decreasing catalytic efficiencies.
The catalytic efficiencies for the substrates tested ranged from very high (>10
7 M
−1s
−1; near the diffusion rate limit) for 3′-nucleotides, 2′,3′-cNMP and the linear dinucleotide pApA, to low values (<10
3 M
−1s
−1) for c-tetra-AMP, c-hexa-AMP, 3′,5′-cAMP, NDP-hexoses, 2′,3′-cGAMP, 5′-AMP and 2′-AMP. Actually, with the two latter compounds no activity was detected, highlighting the strict specificity of the enzyme for 3′-nucleotides. Between the two extremes of catalytic efficiency, there are twelve intermediate substrates with catalytic efficiencies ranging 10
6–10
4 M
−1s
−1. Among them, pGpG, 3′,3′-cGAMP, c-di-AMP and c-di-GMP are relevant, together with pApA, to the role of cyclic dinucleotides as possible intermediates in the interferon response of the infected host. Although they are clearly worse substrates than 3′-nucleotides and 2′,3′-cNMP, they cannot be disregarded because catalytic efficiencies of 10
6–10
4 M
−1s
−1 are around the average value of catalytic efficiencies in the enzyme universe (≈10
5 M
−1s
−1) [
47].
Taking into account the periplasmic location of CpdB, one would expect that it targets extracytoplasmic c-di-GMP. In this context, the hydrolysis of c-di-GMP by the periplasmic CpdB of
S. enterica, followed by the degradation of the pGpG product by the same enzyme, could explain the pro-virulence of the
cpdB gene [
12]. Removal of the secreted dinucleotide would hinder host immune response, a defense strategy similar to that followed by streptococci through the hydrolysis of bacterial secreted c-di-AMP to diminish the innate response of the infected host cells [
9,
10].
Another interesting aspect of CpdB is related to its high activity towards 2′,3′-cNMP. These compounds are formed by RNase I, an enzyme present in bacterial cytosol and periplasmic space [
48,
49,
50,
51]. Therefore, at least in the latter case, 2′,3′-cNMP formed by RNase I would be hydrolyzed by periplasmic CpdB. Recently, 2′,3′-cNMP have been proposed as a novel class of bacterial signals [
50,
51,
52,
53]. In
E. coli, they have clear physiological effects on gene expression, flagellar motility, biofilm formation and acid tolerance. In
S. enterica, despite the evolutionary closeness with
E. coli, the response to 2′,3′-cNMP is quite different. To begin with, out of the many genes that are dysregulated upon 2′,3′-cNMP depletion, only two of them show consistent changes in both species. In general, it can be said that there is little overlap in the respective cellular responses [
51]. Anyhow, the possible physiological impacts of extracytoplasmic 2′,3′-cNMP, and of their hydrolysis by periplasmic CpdB are unknown.
2.3. Structural Comparison of CpdB-like Proteins of Different Bacteria
The specificity differences among the three CpdB-like enzymes studied should be the consequence of sequential/structural differences among the proteins. The protein alignment of
Figure 3 displays separately the differences between
S. suis Snta and
S. enterica CpdB (above the alignment), and those between
E.coli CpdB and
S. enterica CpdB (below the alignment). There are many differences between the sequences of
S. suis SntA and
S. enterica CpdB. The former is 813 amino acids long, and in the alignment only 283 of them are identical. Within the parts that align with
S. enterica CpdB, there are several gaps either in SntA or CpdB. The amino acid sequences of the CpdB proteins from
S. enterica and
E. coli are 90.3% identical. Both proteins are 647 amino acids long, and 584 of them are identical in the alignment. They align without any gap. The differences (
Figure 3) should be responsible for the different specificity of SntA versus CpdB (
Figure 1c), and for the higher efficiency of
S. enterica CpdB compared to
E. coli CpdB (
Figure 2c).
Currently, there are no crystal structures available for any of the three proteins considered, and within the AlphaFold Protein Structure Database [
54,
55] there is a model only for
S. enterica CpdB (UniProt ID P26265; AF-P26265-F1-model_v4.pdb). So, to evaluate possible structural differences among the three proteins, we used homology models of
E. coli CpdB and
S. suis SntA prepared using the AlphaFold structure of
S. enterica CpdB as the template. The homology models were obtained in the Phyre2 server [
56].
To analyze how the differences among the three proteins can have some bearing on their specificity and catalytic efficiency, it is necessary to consider the dynamic events occurring during the catalytic cycle of the metallophosphatases that contain a 5_nucleotid_C domain (
Scheme 2). This is inferred from detailed studies of the 5′-nucleotidase UshA [
57,
58,
59,
60,
61,
62] and recently it has been extrapolated to CpdB [
8]. The 5_Nucleotid_C domain contains the substrate-binding pocket, which in the “open” conformation faces the medium. After substrate binding, this domain undergoes a 96° rotation towards the “closed” conformation, bringing the scissible linkage of the substrate to the catalytic dimetallic site of the metallophos domain where phosphohydrolysis takes place. This is the conformation shown in the models of
Figure 4.
The differences of sequence between
S. suis SntA and
S. enterica CpdB are too many to warrant a systematic analysis of all of them (there are 364 different amino acids within the aligned regions in
Figure 3). Therefore, attention was centered on the gaps arising in the alignment: 17 gaps in the SntA sequence and 6 gaps in the CpdB one. They are marked by upper red lines in the SntA sequence (
Figure 3) and colored in red in the 3D model (
Figure 4a; s1–s10). Related to these gaps, the SntA model presents structural variations with respect to CpdB, as can be confirmed by careful comparison of
Figure 4a with
Figure 4b. This is underscored in
Figure 4a by representing colored in orange the parts of CpdB that do not overlap with SntA.
Most of the structural differences between SntA and the CpdB proteins are located in the 5_Nucleotid_C domain (s4–s10;
Figure 4a), which is responsible for substrate binding in the open conformation (not shown), and undergoes the large rotation needed to bring the substrate to the catalytic site (
Scheme 2). Several of the structural differences occur in regions near the active site in the closed conformation (s2, s5, s6 and s7), or near the region where twisting occurs during the rotation (s3, s5, s10). The most conspicuous difference is the one marked as s3, which affects amino acids 419–424 of
S. suis SntA, that in
S. enterica and
E. coli CpdB proteins are substituted by amino acids 322–332 which include two lysine residues (Lys
327 and Lys
328) absent in SntA. In CpdB proteins, this structural variation is associated with the presence of two aspartates (Asp
634 and Asp
636) which are also different in SntA (Ala
714 and Ser
715). As can be seen in
Figure 4b, Lys
327 (in the metallophos domain) and the two aspartates (in the 5_Nucleotid_C domain) may establish an electrostatic interaction during rotation of the latter domain. This could retard the full closing of the active site of the CpdB proteins and at least partly explain some kinetic differences of efficiency (
Figure 1). This analysis is complicated by the variety of substrates hydrolyzed by the enzymes, and by the possibility that the “closed” conformation is not the same with substrates of different sizes, e.g., 3′-nucleotides and cyclic dinucleotides.
Despite the 63 non-identical amino acids in the sequences of
S. enterica and
E. coli proteins, their 3D structures were practically undistinguishable in the overlapped models (not shown but compare
Figure 4b with
Figure 4c). Therefore, we centered our analysis on the differential sequences, which are highlighted in red both in the
E. coli CpdB sequence (
Figure 3) and the 3D model (
Figure 4c). None of these variations appears close enough to the active site in the “closed” conformation to explain the higher efficiency shown by
S. enterica CpdB (
Figure 2c). However, one of the sequence differences (marked as Q
350 in
Figure 4c) is located in the region of the interdomain linker that twists during the large rotation suffered by the 5_Nucleotid_C domain to bring the substrate towards the catalytic site in the metallophos domain (see
Scheme 2). The difference is a substitution of Gln
350 in
E. coli CpdB by Glu
350 in
S. enterica CpdB. It is conceivable that the negative charge favors the rotation and makes it occur more quickly. This would justify the larger
kcat values observed with many substrates, but it explains neither why this does not occur with all, nor the differences of
KM (
Figure 2a,b). Similar reasoning can be applied to other differences near the Q
350 mark in
Figure 4c: I
324, N
326, E
339, T
340, Y
421, R
428 and S
569, since they are located near the region twisted during the rotation of the 5_Nucleotide_C domain, and also interesting is G
186, not far from the space occupied by substrates in the closed conformation. In
E. coli CpdB, they represent significant substitutions with respect to
S. enterica CpdB: Gly186Ile, Ile324Ala, Asn326Ala, Glu339Gly, Thr340Ile, Tyr421Phe, Arg428Gln and Ser569Ala. All of these substitutions imply differences of charge, polarity, hydrophobicity and/or size in the side chain at those positions.
Altogether, the structural dataset provided by this study paves the way for future studies of mutagenesis to elucidate the molecular basis of the differential specificity and catalytic efficiency of the three CpdB-like proteins compared.
2.4. Genomic Distribution of cpdB-like Genes in Eubacteria
To perform a systematic study of this distribution, the strategy explained in Materials and Methods
Section 3.4 was applied to the
Bacteria taxa of the NCBI Taxonomy browser [
63] at different levels (
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6). TblastN analyses [
64,
65] were run using
S. enterica CpdB (accession number P26265) as the query, with the score and query coverage limits indicated. A score limit of 150 was chosen taking into account the occurrence of CpdB-like homologs named 5′-nucleotidase/UDP-sugar hydrolase (UshA) [
66,
67], with a two-domain structure similar to CpdB. In BlastP comparisons, most UshA proteins align with P26265 with scores < 130 (as compared to scores > 1000 for alignments between CpdB proteins from different bacteria). Nevertheless, the limit of score 150 is somewhat arbitrary, as one cannot totally rule out that some true
cpdB relatives align with P26265 with lower scores, while choosing a lower limit to avoid this would count some
ushA genes as
cpdB-like. The borderline hits in every
Bacteria phylum (
Table 2); when tested by BlastP, it showed a (much) better alignment score with CpdB than with UshA. In a few cases that this was not so, the affected hits were removed (see footnotes 4 and 3 of
Table 2 and
Table 3, respectively). The limit of 70% query coverage was chosen to ensure that the two domains typical of CpdB are covered by the alignment. In principle, the search was performed among genome “sequences from type material” [
68], but in some cases this restriction was removed (see below).
Another point one should be aware of is that some organisms rather than, or in addition to having separate proteins CpdB and UshA, may express a natural fusion of both, as the result of two-gene fusion [
69]. Such a protein was experimentally observed and characterized in
Bacillus subtilis [
70], and it is detected mainly in sequenced genomes of phylum
Firmicutes (classes
Bacilli and
Clostridia). Of course, the fused genes were counted as
cpdB-like, since P26265 aligns well with their
cpdB moiety, and no attempt to correct this was performed. Among other things, the CpdB-UshA natural fusions may be enzymatically active [
70].
Following the described search strategy and limits, out of 83,531 sequences of complete genomes of
Bacteria (NCBI:txid2), 1772 gave significant TblastN alignments with
S. enterica CpdB, and 984 aligned with score > 150 and query coverage > 70%. In contrast, the superkingdom
Archaea (NCBI:txid2157) gave no significant alignments with the same limits. In
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6, the near one thousand
cpdB-like genes found in
Bacteria are shown distributed among taxonomical groups according to different levels of classification. Results obtained at the level of phylum or groups of phyla are shown in
Table 2, where all the well-established phyla are included except those for which, at the time of running the final search (15 December 2022), sequenced genomes of type material were not available. Further exploration was run at the level of class, only within the phyla
Proteobacteria and
Firmicutes (
Table 3). Thereafter, results at the level of order were obtained only for those belonging to classes
Gammaproteobacteria and
Bacilli (
Table 4), and results at the level of family only for those belonging to orders
Enterobacterales and
Lactobacillales (
Table 5). Finally, an extensive selection of specific examples of pathogens of clinical interest is included in
Table 6. Interestingly, the genomic distribution of
cpdB-like genes among the genomes of
Bacteria was not homogeneous, as indicated in
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 by qualification keys “N” (Negative), “Nm” (Negative, mainly), “P” (Partial), “Wm” (Widespread, mainly) and “W” (Widespread).
Let us consider first the results obtained at the level of phyla (
Table 2). The presence of
cpdB-like genes was clear in
Proteobacteria,
Firmicutes,
Deinococcus-Thermus,
Spirochaetes,
Thermotogae, Actinobacteria and the
FCB group of phyla. In none of them the presence was widespread, only partial, meaning that, out of the tens to hundreds of sequenced genomes of type material for each of those phyla, between 11% and 51% gave hits indicative of
cpdB-like genes. A wide range of scores was obtained, from near the limit of 150 to high values, which were higher in
Proteobacteria than in the other cases (an expected result as the query is a protein from a
Proteobacteria species; see below). In addition, the phyla
Coprothermobacterota and
Calditrichaeota, with a single sequenced genome each, contained a low-score hit. The rest of the phyla were either mainly negative, giving 1–2 hits with low scores in 5–221 sequenced genomes, or fully negative in 1–51 sequenced genomes. For all the phyla that gave only 0–2 hits in the available sequenced genomes of type material, the search was extended to additional genomes sequenced by removing the limit to type material (data also shown in
Table 2). This revealed a small number of additional hits that did not modify the qualification key of the genomic distribution for any phylum.
In summary, out of the 27 phyla or groups of phyla with complete genomes available, cpdB-like genes are absent or near absent in 18, and present in the other 9 phyla. In the latter, the distribution is partial not homogeneous, with some genomes containing a cpdB-like gene and others not, except for two phyla with only one genome sequenced.
Further exploration of
cpdB-like genes at levels lower than phylum was centered on
Proteobacteria and
Firmicutes, where there are many type-material genomes sequenced that gave hits in 30% and 39% of cases, respectively, with many high scores. There were 139 hits with score > 1000 in
Proteobacteria, and 72 hits with scores > 500 in
Firmicutes. The difference depends on the sequential differences between genes coding for enzymes either periplasmic (such as
S. enterica CpdB) or cell wall-bound (such as
S. suis SntA). When the TblastN was repeated using SntA sequence (accession AYV64543) as the query, the scores were higher for
Firmicutes than for
Proteobacteria. It may be remarked that the CpdB-like enzymes compared in
Section 2.1 and 2.2 come either from
Proteobacteria Enterobacteriaceae fam. (
S. enterica and
E. coli), or from
Firmicutes Streptococcaceae fam. (
S. suis).
In
Table 3, phyla
Proteobacteria and
Firmicutes are subdivided into classes that also showed a non-ubiquitous and non-homogeneous distribution of
cpdB-like genes. In
Proteobacteria, 5–42% of the type-material genomes of
Gammaproteobacteria,
Betaproteobacteria,
Alphaproteobacteria and
Delta/epsilon subdivisions gave hits with high scores. In
Firmicutes,
Bacilli, and
Clostridia gave hits in 22% and 51% of the genomes, respectively. The other
Proteobacteria and
Firmicutes classes were mainly negative or just negative, except
Erysipelotrichia, with a partial distribution of
cpdB-like genes with very low scores, and
Limnochordia, with a moderately high score in a single genome sequenced.
In
Table 4, the orders pertaining to classes
Gammaproteobacteria and
Bacilli were analyzed. Here, for the first time in the course of the TblastN analysis, appeared a taxonomical level with 100% of the type-material genomes with hits (except some taxonomical levels with a single genome sequenced), namely the order
Pasteurellales. In this case, repetition of the TblastN without the limit “sequences from type material” gave 86% of the total genomes with hits (460/532). In addition, the order
Enterobacterales gave hits in 92% of the type-material genomes, while
Moraxellales,
Vibrionales,
Alteromonadales,
Oceanospirillales,
Cellvibrionales,
Aeromonadales,
Xanthomonadales,
Orbales,
Bacillales and
Lactobacillales gave hits in 9% to 67% of the type-material genomes. The rest of orders were mainly negative or just negative, including
Pseudomonadales and
Legionellales.
In
Table 5, the families pertaining to orders
Enterobacterales and
Lactobacillales were analyzed. Among
Enterobacterales, the families
Morganellaceae,
Enterobacteriaceae,
Yersiniaceae and
Pectobacteriaceae showed very near to widespread distribution of
cpdB-like genes, whereas
Erwiniaceae,
Hafniaceae and
Budviciaceae displayed a partial distribution.
Bruguierivoracaceae fam. was the only one with clearly negative results. Among
Lactobacillales, all the families exhibited a partial distribution.
In
Table 6, selected examples are shown, at the level of species, groups of species, or genus, of clinically relevant bacteria that either contain or do not contain a
cpdB-like gene. In this case, TblastN analyses were always run without the limit “sequences from type material”; therefore, the results include all the available complete genomes for each species. At this level, 34 species or groups showed a completely or mainly widespread distribution of
cpdB-like genes, i.e., they were present in 100% or near 100% of the genomes; 10 species showed a partial distribution, with some genomes containing and others not containing
cpdB-like genes in the same species; and 28 species were negative or mainly negative as they were devoid of
cpdB-like genes in 100% or near 100% of the genomes.
Let us discuss now what would be the repercussions of the three kinds of gene distribution found, taking into account that those from
E. coli,
S. enterica,
S. agalactiae and
S. suis are provirulent in different organisms [
9,
10,
11,
14]. Both the presence and the absence of
cpdB-like genes in the genome can be relevant (although not exclusively, of course) for the virulence degree of the pathogen.
First, for species that did not contain
cpdB-like genes (i.e., those that in
Table 6 are indicated with the N key), it can be safely concluded that these organisms cannot explode the CpdB-like protein-dependent strategy of degrading extracellular cyclic dinucleotides recognized as PAMPs by the infected host [
9,
10], or of interfering with the complement system [
14]. Of course, it is possible that other proteins replace CpdB-like ones. For instance, this occurs in the
Mycobacterium tuberculosis that is negative for
cpdB-like genes (
Table 6), but expresses a pro-virulent cyclic nucleotide phosphodiesterase, encoded by the
Rv2837c or
cnpB gene, which inhibits innate immune cytosolic surveillance [
19,
71]. Incidentally, this
M. tuberculosisis protein has been named also CdnP [
71], such as the CpdB-like protein of
S. agalactiae [
9], but its encoding gene was not a hit in the TblastN search run with
S. enterica CpdB (
Table 6), as they are very different proteins encoded by different genes.
Second, concerning species in which
cpdB-like genes were widespread (i.e., those that in
Table 6 are indicated with the W key), they constitute a field where the possible role of these genes in virulence can be explored by constructing gene mutants, and testing them in suitable infection systems in comparison with wildtype bacteria, or by expressing the encoded proteins and studying their enzyme specificity. By extension of what is known about the provirulent role of
cpdB-like genes, and of CpdB-like enzyme activities, in
E. coli,
S. enterica,
S. agalactiae and
S. suis [
7,
8,
9,
10,
11,
14], this strategy could be fruitful if applied to other species. For instance, it will be worth exploring genera such as
Bacillus,
Enterobacter,
Haemophilus,
Klebsiella,
Morganella,
Pasteurella,
Proteus,
Providencia,
Serratia,
Shigella and
Yersinia, among others, which contain
cpdB-like genes aligning with high TblastN scores with
S. enterica CpdB (
Table 6).
Third, particularly interesting are species with a partial distribution of
cpdB-like genes, indicative that different strains or isolates differ in this concern. This occured very markedly in pathogens like
S. enterica subsp. enterica ser. Typhimurium,
Streptococcus dysgalactiae and
Vibrio cholerae, to mention those that gave higher TblastN scores for alignment with
S. enterica CpdB (
Table 6). In this case, one should consider whether the presence or absence of a
cpdB-like gene could modulate the virulence of pathogen strains or isolates.
Another interesting observation from
Table 6 is that species of the same genus may differ drastically in the content of
cpdB-like genes. This was the case for genus
Streptococcus, since all the genomes of
S. agalactiae,
Streptococcus sanguinis,
S. suis (with one exception) and
S. thermophilus contained a
cpdB-like gene, but those of
Streptococcus mitis,
Streptococcus mutans,
Streptococcus pneumoniae and
Streptococcus pyogenes did not, and those of
S. dysgalactiae and
Streptococcus parasuis showed a partial distribution. This was confirmed by repeating the TblastN searches using
S. suis SntA as the query: scores higher than those shown in
Table 6 (
S. enterica CpdB as the query) were obtained, but the distribution of
sntA-like genes was the same as in
Table 6 for every
Streptococcus species. Another example worthy of comment are the TblastN results with genus
Salmonella, much more homogeneous in their content of
cpdB-like genes, which were widespread in
S. bongori and in
S. enterica subspecies
arizonae,
diarizonae,
houtenau, salamae VII, and
enterica serovar Typhi, while it was markedly partial in serovar Typhimurium. Concerning genus
Escherichia, the presence of
cpdB-like genes was almost constant, and only a very minor proportion of
E. coli genomes (0.3%) lack them.
2.5. Anecdotal Findings of cpdB-like Genes Outside Eubacteria Chromosomal Genomes
Incidentally, besides the findings summarized in
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 for chromosomal genomes, we also observed the presence of
cpdB-like genes in some unexpected genomic locations, including sequences from: plasmids,
Viruses (NCBI:txid10239) and
Eukaryota (NCBI:txid2759). To analyze this as deeply as possible, different ad hoc TblastN searches were ran in the NCBI Nucleotide (nr/nt) database with
S. enterica CpdB as the query, as described below.
Within the superkingdom Archaea, the TblastN run without any limits, other than the taxonomical one, gave no hits with score > 150.
Within the superkingdom
Bacteria, applying the Entrez query “plasmid[Title]”, 41 plasmid sequences containing
cpdB-like genes with scores ranging 1303–169 were recovered (
Table S1). Seven of these hits showed TblastN scores > 1000, with 100% query coverage and >88% identity, and pertained to bacterial species
Salmonella sp.,
Klebsiella pneumoniae, and
E. coli. Hits with lower scores corresponded to many different bacteria. The finding of
cpdB-like genes in plasmids is theoretically in agreement with the protective character of CpdB-like enzymes against innate immune responses of the host [
72].
Within the superkingdom
Viruses, the TblastN run without any limits, other than the taxonomical one, gave two hits (
Table S2), one of them with a score of 1062 corresponding to a
cpdB-like gene of an unclassified bacteriophage of family
Myoviridae [
73,
74].
Within the superkingdom
Eukaryota, the TblastN run without any limits, other than the taxonomical one, gave 4 hits (
Table S3). Three of them, with score 1185, corresponded to the genome of
Digitaria exilis, a nutritious cereal known as white fonio that constitutes a vital crop of West Africa [
75]. The fourth one, with score 457, corresponded to the genome of
Leishmania major, a protozoan parasite with the ability to invade macrophages and that causes cutaneous leishmaniasis [
76,
77].
The presence of cpdB-like genes in plasmids, viruses and, particularly, in a higher plant or in a parasitic protozoan is intriguing. One wonders, for instance, whether CpdB-like proteins could have in Leishmania the same protective effect versus the immune system of the infected host as they display in Bacteria.