3.1. The Hypoxylon sp. CO27-5 Genome Contains More Than 100 [D1,2] Stwintrons
In a previous communication, we identified a set of repetitive or multiple occurring (MO) intervening sequences, mostly [D1,2] stwintrons, in eight genomes of species of Hypoxylaceae and suggested how the most symmetrical canonical introns within this set in
Hypoxylon sp. CO27-5 could be generated from an ancestor stwintron during microhomology-mediated end-joining (MMEJ) repair of a double-stranded DNA breaks [
25]. These sequence-similar intervening sequences could be easily identified by BLASTN screening of the Whole Genome Shotgun contigs database (NCBI) using one of these intronic sequences of this group (HCOc017A) as the query sequence. During comparative studies, it became evident that genomes of the Hypoxylaceae species also included [D1,2] stwintrons seemingly unrelated to this set of 36 MO [D1,2] stwintrons, named sister stwintrons to reflect their sequence similarity and probable common ancestry. Two of the genes that contained a sequence-similar intervening sequence (i.e., HCOc091A and HCOc121A), incidentally, also carry a uniquely occurring (UO) [D1,2] stwintron that was not detected by the BLASTN screen (for cDNA sequences, see GenBank MW530491 and MW490721, respectively). Moreover, the gene encoding a DEXH-box DNA/RNA helicase in
Hypoxylon sp. CO27-5, a fungal orthologue of the helicase domain of human polymerase theta (POLQ), also harbours a UO [D1,2] stwintron (for cDNA sequences, see GenBank MW530496).
To identify additional [D1,2] stwintrons in
Hypoxylon sp. CO27-5, we analysed its genome sequences [
26] with a sequence motif search algorithm that allowed for screening of the 580 contigs for putative stwintrons that are not sequence-similar to sister stwintrons (see
Figure 2 and
Section 2 (Materials and Methods) for more detail). The primary [D1,2] search model A identified 282 candidate sequences, which upon further analysis (see below) included 90 authentic [D1,2] stwintrons (>30% success rate), 59 (65%) of which were not identified by BLASTN screening with HCOc017A as the query sister stwintron. The 59 newly identified UO [D1,2]’s were numbered as the corresponding match numbers in the output of the [D1,2] search model A screening. Search model A picked up 21 of the 23 full-length sister stwintrons. HCOc224-179 could never be identified by a motif search approach, as it is split over two sequence contigs (see GenBank MW477887 for the whole sequence). The second sister stwintron missed by the primary [D1,2] search model A, HCOc021A, is a phase zero [D1,2] stwintron in a recently pseudogenised gene for a short-chain dehydrogenase/reductase that features a non-canonical 5′-donor at position 5 (5′-GUACC
5UAUGU) for its internal intron in
Hypoxylon sp. CO27-5 due to the presence of four extra nts near the 5′ end of the internal intron. These four nts are absent in strain EC38, where this 5′-donor is fully canonical (5′-GUAUGU), and the “orthologue” stwintron was detected by our primary [D1,2] search protocol when the EC38 genome was screened for [D1,2] stwintrons (n.b., the orthologous EC38 gene is also a pseudo gene).
In addition, three of the 13 sheared sister stwintrons (cf., [
25]) were not recognised by the primary [D1,2] search model A screen. One of these is HCOc103A for which the internal intron distance between the donor–BP was 131 nt, much longer than the maximum setting in search model A (80 nt). In HCOc091A, the internal intron distance between the donor–BP was 82 nt, two nucleotides above the set limit. We adapted the search tool to screen for [D1,2] stwintrons in which the donor–BP distance in the internal intron would be between 81 and 110 nt (
Figure 2, [D1,2] search model B) and found seven additional UO stwintrons, numbered no-315–321. Further increasing the donor–BP distance within the internal intron did not detect stwintrons other than HCOc103A.
By contrast, in HCOc263A, the distance between the donor-BP of the external intron was 126 nt, longer than the upper limit set in the primary [D1,2] search model A. The tool did identify the 5′ end of HCOc016B, but the canonical BP (motif 4) of the external intron of this stwintron associated with the functional 3′-acceptor was 8 nt above the set length range for the distance between the donor–BP; upon experimental verification (described below), the full-length stwintron was found to be 58 nt longer at 3′ than that proposed by the search tool using model A. These observations led us to change the distance range between the donor and the BP elements of the external intron from 25–110 to 111–200 nt in the [D1,2] stwintron search model (
Figure 2, [D1,2] search model C). This resulted in the identification of another 13 UO [D1,2] stwintrons, numbered no-302–314. The largest stwintron found was 313 nt long (number no-313) while the shortest was 114 nt long (number no-061). Further increasing the distance range between the BP and its associated 3′-acceptor (for either U2 intron) from 4–20 to 21–40 nt did not lead to the identification of additional [D1,2] stwintrons.
Moreover, preliminary comparative analysis of two of the newly found [D1,2] stwintrons, numbers no-007 and no-061, showed that they are extant at the same position in two paralogue genes encoding short-chain dehydrogenases/reductases of 243 and 250 amino acids (AAs), respectively, ~54% identical at the AA level. Moreover, we found a third paralogue gene encoding a protein of 246 AAs, ~56% and ~59% identical to the two aforementioned paralogues, with a [D1,2] stwintron at the very same position. This latter stwintron (named number no-301) was not detected with the [D1,2] search model A, as the BP sequence of its external intron was the less conventional 5′-C
1CTGAC, i.e., incompatible with the set motif for BP elements, 5′-D
1YTRAY. We modified the stwintron search model in order to detect eventual additional [D1,2] stwintrons with a less canonical BP sequence (5′-C
1CTRAY) (
Figure 2, [D1,2] search model D) and found one additional [D1,2], number no-300, in the gene encoding an integral membrane protein of 318 AAs, with its external intron’s BP sequence 5′-C
1CTAAC, canonical but for the nt at the first position.
3.2. Evidence for the Existence of 81 Predicted Uniquely Occurring [D1,2] Stwintrons
Next, we assessed all the sequence matches gathered with the subsequent [D1,2] stwintron model searches after subtracting the 31 matches corresponding to sister stwintrons, already experimentally validated in our previous paper [
25]. We obtained evidence for the existence of an additional 81 UO [D1,2] stwintrons in
Hypoxylon sp. CO27-5 using three approaches. The first approach involved a BLASTN search for short RNA sequence reads deposited in the Sequence Read Archives (SRA) database at NCBI (accessions: SRX875229–34; [
26]), which confirmed the proposed exon fusion after the consecutive excision of the internal and external introns for each candidate [D1,2] as well as for reads that confirmed the existence of the typical stwintron splicing intermediate (abbreviated “splinter” below), the RNA species from which only the predicted internal intron was removed and which covered the temporary junction of the upstream exon and the now functional external intron before the excision of the latter and, ultimately, exon fusion. In
Supplementary Materials Table S1 (sheet CO27-5), the SRA evidence for each of the 117 [D1,2] stwintrons identified in
Hypoxylon sp. CO27-5 is represented by one of the covering SRAs.
Hypoxylon sp. CO27-5 and EC38 are closely related [
26]. When SRA reads were not found in CO27-5, we searched the accessible EC38 RNA SRA databases (SRX872662–67; [
26]) for reads confirming complete removal of the “orthologue” [D1,2] stwintron as well as reads validating the predicted splinter.
Secondly, we assessed the relevant sequences of the fully spliced RNA species (mRNA) as well as the predicted splinter of selected [D1,2] stwintrons by targeted RT-PCR and cDNA sequencing, providing more sensitive means when the expression levels of the stwintron-containing gene are limiting. For the experimental details, see
Section 2. The generated sequences were submitted to GenBank, and the accession numbers assigned are listed in
Supplementary Materials Table S1 (sheet CO27-5).
Finally, we identified and collected the orthologues of the stwintron-containing CO27-5 gene in the genomes of 19 other species/strains of the Hypoxylaceae family accessible at NCBI (see
Table 1) by targeted tblastn screening of the Whole Genome Shotgun (WGS) contig database (n.b., gDNA data). The comparative genomics approach enabled the confirmation of potential [D1,2] stwintrons, where expression data are insufficient or when gene expression cannot be detected by RT-PCR with pairs of gene-specific oligonucleotide primers functionally verified on genomic DNA as amplification templates. The intron–exon structures of these orthologue genes were manually predicted from the alignment of the coding sequences (exons) alternating with intronic sequences including the [D1,2] stwintron(s). Conservation of intron position(s) is considered a hallmark of orthologue genes (“intron positional conservation”; [
44]) and, in some cases, of structurally related paralogue genes as well. By these means, we detected the occasional use of rare intronic splicing elements in [D1,2] stwintrons such as the occurrence of a C at the second position of 5′-donor elements (5′-GC
2 instead of 5′-GU
2) for either the internal or external intron, for instance, in the orthologue of stwintron number no-071 in
Entonaema liquescens for the split external donor sequence (5′-G|C
2AAGU).
In
Supplementary Materials Table S1 (sheet CO27-5), we indicated the presence or absence of the “orthologue” intervening sequences for all 117 [D1,2] stwintrons now identified in
Hypoxylon sp. CO27-5 or the absence of the orthologue gene in 16 columns near the right side of the Excel table. In the column at the right of
Table S1, we indicated for most stwintron-containing genes whether there were structural paralogous genes of them in CO27-5; this was the case for more than half of the stwintron-harbouring genes (61) identified. In three cases—numbers no-007/no-061/no-301, numbers no-302/no-303 and sheared sister stwintrons HCOc020A/HCOc263A—it concerned paralogous genes with a position-conserved “paralogue” [D1,2] stwintron. In some other paralogous genes, [D1,2] stwintrons were identified in each, albeit not extant at the same intron position, e.g., for numbers no-158/no-189.
3.3. Occurrence of Occupied [D1,2] Stwintron Positions across Species of Hypoxylaceae
In our current work, we showed the existence of more than 1500 [D1,2] stwintrons in species of the Hypoxylaceae, a family of endophytic fungi in the Xylariales order of the Sordariomycetes class (Ascomycota phylum). The identification of [D1,2] stwintrons in Hypoxylon sp. CO27-5, experimentally or by comparative genomics, allowed, for the first time, for the statistical analysis of the sequences of more than one hundred different stwintrons of the same type present in one organism. Below, we compare a sizable group of sequence-related and terminally symmetrical stwintrons from Hypoxylon sp. CO27-5 and EC38, the 38 so-called sister stwintrons, with the 81 seemingly unrelated stwintrons identified with our [D1,2] stwintron search tool (see above).
The comparative approach immediately showed a substantial distinction between these two groups (
Figure 3). While 24 of the 25 sister stwintrons only occurred in the close relatives of
Hypoxylon sp. CO27-5 and/or EC38 (n.b., four of them are unique to either CO27-5 or EC38), the large majority of the UO [D1,2] stwintrons (68 out of 81: ~84%) were present in orthologue genes in all or most of the 17 assessed species/taxa of the Hypoxylaceae family. Amongst the 13 “atypical” UO stwintrons, five were located in genes that did not have orthologues in most of the other assessed species/taxa. These large differences in the occurrence of orthologue stwintrons (i.e., at conserved gene positions) strongly suggest that the large majority of the group of 81 UO stwintrons are much older than the highly symmetrical sister stwintrons unique to the CO27-5/EC38 taxon, as the former were likely generated in a common ancestor before divergence of all 20 taxa. The group of the 13 sheared sister stwintrons appears to be composed of mixed constituents: six of them were restricted to a narrow taxonomical clade that consists of or includes both
Hypoxylon sp. CO27-5/EC38 and
H. pulicicidum/
Hypoxylon sp. E7406B.
Three other sheared sister stwintrons occurred only in CO27-5/EC38, because there are no orthologue genes in the other taxa (not including
H. pulicicidum/E7406B). On the other hand, three sheared sister stwintrons identified by BLASTN screening with HCOc017A—HCOc016B, HCOc024B and HCOc091A (cf., [
25])—were present at a conserved position across the whole family and would appear to be misclassified as sheared sister stwintrons. In our previous work, we noted that there are also ten genuine sister stwintrons of the same pedigree/origin in the
H. pulicicidum/E7406B taxon that are absent from the CO27-5/EC38 taxon [
25], which supports the idea that the MO and terminal symmetrical sister stwintrons were recently generated.
A peculiar situation was observed for the phase-two stwintron number no-274 in a gene encoding a well-conserved plasma membrane protein of the monovalent cation:proton antiporter-1 family, suggesting that extant [D] stwintrons may change stwintron type by local mutation. In three species—
H. rickii,
H. fragiforme and
H. lienhwacheense—the original [D1,2] stwintron found in the other Hypoxylaceae species and also present in other Xylariales species presumably morphed into a [D5,6] stwintron. Insertion of four nt 5′-GTAA or 5′-GTGA directly upstream of the G
1 of the external intron of the original [D1,2] stwintron created (split) external intron donors 5′-GTAAG
5|T
6 or 5′-GTGAG
5|T
6 (
Supplementary Materials Figure S1). The internal intron and the bordering exons were unaffected, as the stwintron phase did not change. In
H. fragiforme, the loss of the [D1,2] splice option was the result of a 20 nt deletion in the 5′-sequence of the external intron directly downstream of the junction with the internal intron (5′-TAG|T
2).
3.4. No Insertion Site Bias Observed for Exonic Sequences Directly Neighbouring [D1,2] Stwintrons
We analysed the insertion sites of the 81 MO [D1,2] stwintrons after stacking the five concatenated conserved sequence motives at the 5′- and 3′-splice sites (donor; BP; acceptor; including the two hybrid motifs typical for the [D1,2] type) terminally extended with the first 15 nt of their neighbouring exons and compared with the local exon environments with the same Logo analysis for the “control” group of 38 sister stwintrons (
Figure 4).
The insertion sites of stwintrons of both groups are seemingly not sequence biased; there were no conserved sequence patterns in the exons directly next to the stwintron–exon junctions. In line with the above, there was no prevalence for alternatively spliced [D1,2]/[A2,3] stwintrons (i.e., a G directly downstream of the [D1,2] stwintron) (
Supplementary Materials Figure S2). The combination of [AG] directly upstream of the stwintron and [GT] directly downstream of it—comparable with double AG|GT fusion sites for certain canonical introns (e.g., [
45])—was not found in our set of 117 stwintrons; on probability grounds, one would expect this to occur once in every 256 cases. Furthermore, there was no evidence for tandem site duplication involving one of the splice sites (cf., [
46]) in the
Hypoxylon sp. CO27-5 stwintrons. There were two instances of [TAG] directly upstream of a stwintron terminating with the 3′-acceptor TAG, but [CAG] did not occur directly upstream of any of the
Hypoxylon sp. CO27-5 stwintrons.
Meanwhile, [GT] directly downstream of most of the 3′-acceptors of the stwintron occurred in eight of the 117 stwintrons, while [GTA] was present in two of them (i.e., numbers no-279 and no-308).
All 81 UO stwintrons were seamlessly integrated in continuous exonic sequences, without loss or gain of exonic sequences, very much as it was observed for all but one of the sister stwintrons (exception described in [
25]). Their exact position and phase were conserved amongst orthologous stwintrons in other Hypoxylaceae species. There were considerably more phase one stwintrons (~45%) than there were phase two or phase zero stwintrons, but also in this respect, the profile was very similar for both groups of [D1,2] stwintrons (
Supplementary Materials Figure S2).
3.5. The Distance between the Lariat Branch Point Sequence and the Acceptor of the External Intron Was Twice as Long in the Sister Stwintrons as in the Uniquely Occurring, Evolutionary Older Stwintrons
Collemare and co-workers [
47] formulated a hypothesis by which most conventional U2 introns (“regular spliceosomal introns” or RSIs) currently present in filamentous fungal genomes derive from ancestor introner-like elements (i.e., highly sequence-similar introns with stable secondary structures) by propagation events that have taken place so long ago that the duplicated intronic sequences have diverged beyond recognition. Degeneration of introner-like elements would not only lead to rapid sequence divergence but also to a consistent shortening of U2 introns towards the mean length of regular spliceosomal introns (cf., [
47]).
When the length distribution of the CO27-5 stwintrons was assessed, together with the lengths of the constituent canonical introns (
Supplementary Materials Figure S3), we found that the range of intron sizes for the group of the older UO stwintrons was much broader than those for the sister stwintrons. Such a result was not unexpected, as sister stwintrons share considerable sequence similarity and appear to have been generated recently after the divergence of
Hypoxylon sp. CO27-5/EC38 and
Hypoxylon sp. E7406B/
H. pulicicidum (cf., [
25]). Their relatedness is also reflected by the peaks in lengths for the sister stwintrons and their respective internal and external introns in the histograms’ moving averages (red line in
Supplementary Materials Figure S3a). Amongst the miscellaneous group of 81 UO stwintrons, there are 13 stwintrons that were, indeed, shorter than any sister stwintron, but ten others were longer than 37 of the 38 sister stwintrons. The shorter UO stwintrons seemed to coincide with “shrunken” internal introns, while in the longer UO stwintrons, it was the external intron that increased in size in most cases.
In general, the differences in stwintron size were confined to the core of the constituent intron between the 5′-donor and the lariat branch point sequence (BP). By contrast, we observed a proclivity to conserve a relatively short distance between the canonical branch point sequence (BP) and (associated) 3′-acceptor for both the internal and the external intron in the older UO stwintrons. The distance between the canonical lariat branch point sequence (BP) and 3′-acceptor of the internal intron was seven nucleotides in 34 of the 38 sister stwintrons (89%) as well as in 49 of the 81 UO stwintrons (60%) along with 11 other stwintrons with this distance being six nucleotides and five other stwintrons with this distance being eight nucleotides (
Figure 4). For the external intron, the distance between the canonical BP element and the 3′- acceptor was six nucleotides in 50 of the 81 UO stwintrons (~62%), while in a further six of these, the distance was seven nucleotides, and in three others, the distance was five nucleotides. However, in the sister stwintrons, the distance between the canonical BP sequence and the 3′-acceptor of the external intron is typically 19 nt, found in 23 of the 25 full sister stwintrons (92%) and 5 of the 13 sheared sister stwintrons, while 3 other sheared sister stwintrons had that distance at 18 nt. On the other hand, there were two sheared sister stwintrons—HCOc016B and HCOc024B—in which the distance between the canonical BP element and acceptor was six nucleotides, perhaps, not coincidently, by far the most frequently occurring distance in the miscellaneous group of UO stwintrons.
In animals, the U2 snRNP auxiliary factor (U2AF protein) recognises the pyrimidine (CU) tract and the downstream 3′-splice site, and then interacts with the U2 snRNP upon which the latter is recruited to face the functional lariat branch point sequence in the assembling U2 spliceosome (reviews: [
8,
9]). Genes for both subunits U2AF65 [contig MDCL01000013, coordinates c90609–88714] and U2AF35 [contig MDCL01000186, coordinates 104888–105517] are present in
Hypoxylon sp. CO27-5 (not shown). In line with what has been observed in model ascomycete fungi [
27], where pyrimidine tracts occur, these are usually situated 5′ to the canonical BP sequence, often closer to the 5′-donor than to the BP element. Such location is logical because the distances between the canonical BP elements and the functional 3′-acceptors are shorter than 8 nt in the large majority of the UO (older) stwintrons. Remarkably, we observe that in the external intron of the 25 full-length sister stwintrons in our set, uninterrupted pyrimidine (CT/CU) tracts longer than 6 nt occur sporadically (only in HCOc021A & HCOc047A). By contrast, half of the sheared sister stwintrons do have sizable pyrimidine tracts between the 5′-donor and the canonical BP element of the external intron, while two of them—HCOc046A and HCOc263A—have a pyrimidine tract (>7 nt) downstream of the canonical BP element of the external intron. In the set of miscellaneous UO stwintrons, most external introns feature clear pyrimidine tracts between donor and canonical BP sequence. In stwintrons numbers no-071 and no-274, a pyrimidine tract (>7 nt) is identifiable between the canonical BP sequence and the downstrean acceptor of the external intron; It concerns two of the older UO stwintrons where the distance between these consensus intron sequence elements is considerably longer (17 nt and 15 nt, respectively) than the mean size (six nt, see above).
3.7. All 117 [D1,2] Stwintrons Show Underlying Symmetry
In our previous communication, we noticed the extraordinary symmetry of some canonical U2 introns in
Hypoxylon sp. CO27-5 with high sequence similarity to sister stwintrons, hinging in the centre of a 10-nt palindrome 5′-TTTCTAGAAA ([
25]: cf.,
Figure 6a therein) and predicted to fold into one hairpin secondary structure leaving the unpaired 5′-G
1 of the donor and the 3′-G
3 of the acceptor in very close proximity. These so-called type-2 cropped sister introns most likely evolved recently from an ancestor sister stwintron with two internal palindromic sequences 5′-WTTCTAGAAA separated by approximately 100 nt, when a double-stranded DNA break between the two palindromes would have been repaired by microhomology-mediated end-joining. Some 40% of the 23 sister stwintrons of high sequence similarity indeed exhibited terminal inverted repeats 45–55 nt in length with 65–78% identity among them, when the stwintron RNA sequence was aligned with its own reverse complement sequence under stringent conditions considering the introduction of gaps as means to increase the identity score.
We studied internal symmetry with the same simple means (see above) in the 81 UO stwintrons newly identified and found that without exception, some level of sequence symmetry could be detected (
Figure 6: panel a, the “control” group of 38 sister stwintrons; panel b, the 81 UO stwintrons). Using different alignment programs, often two different centres of symmetry were apparent in the same stwintron. This could be a reflection of the dynamics of RNA secondary structure, locating different or alternative internal stem-loop structures, either mutually exclusive or concurrent, in the core of the intervening sequence.
Almost all of the symmetries shown in
Figure 6 result from MAFFT alignments employing the E-INS-i module while using either the 200 PAM, 20 PAM or 1 PAM scoring matrix to minimise the number of gaps introduced and to reduce the extent of those introduced gaps in the alignment of the stwintron RNA with its own reverse complement sequence. For most of the full-length sister stwintrons (19 from 25), internal symmetry is particularly dense near the termini of the stwintron while this is also the case in two sheared sister stwintrons, HCOc004B and HCOc046A. In the miscellaneous group of older UO stwintrons, MAFFT often produced alignments with a long overhang at either the 5′-terminus or the 3′-terminus, and also alignments with huge gaps, suggestive of big deletion or insertion events at the DNA level, unmatched at the opposite half of the stwintron. Interestingly, in two stwintrons—numbers no-215 and no-319—the terminal overhang is so long that the MAFFT alignment in effect suggests the existence of a symmetrical element that essentially corresponds exclusively to the external intron of the stwintron. Nevertheless, there are at least 30 UO stwintrons that produce continuous or near-continuous sequence alignments of the stwintron RNA with its own reverse complement sequence, with no or few small gaps near the termini and modest terminal overhangs (ten examples in
Supplementary Materials Figure S4). This underlying symmetry may no longer be involved in RNA structure dynamics but could be a vestige of more extended or stonger ancestral terminal inverted repeats. Our observations of symmetry in all 117
Hypoxylon sp. CO27-5 [D1,2] stwintrons may be taken as an indication in support of the thesis of Collemare and co-workers [
47] that many “regular spliceosomal introns” (RSI) present today in filamentous fungal genomes actually descend from ancient introner-like elements, repetitive intronic sequences that were predicted to fold in a characteristic hairpin secondary structure, and capable of intron proliferation. But the question remains to what extent the underlying internal symmetry of the complex intervening sequences in
Hypoxylon sp. CO27-5 is inconsequential—all single-stranded RNA molecules showed secondary structure dynamics—and at what level those alternating secondary structures could influence the splicing process, to shift between the alternative splicing options and therewith participate in (stw)intron propagation.
3.8. Secondary Structure Predictions of Hypoxylon sp. CO27-5 [D1,2] Stwintrons
A new class of spliceosomal introns was reported in zebrafish (
Danio rerio) several years ago, where splicing depends on the intron’s RNA secondary structure [
48,
49,
50]. Splice-site pairing concurs with the presence of repeats of complementary dimers AC near the 5′-donor and matching GU couples near the 3′-acceptor, allowing the single-stranded intron RNA to fold across the intron bringing its splice sites in close proximity. Using RNAfold software [
39,
40], the optimal predicted secondary structures of the single-stranded RNA of these so-called (AC)m-(GU)n introns reportedly have minimum free energies (ΔG) up to two-fold lower than those calculated for conventional U2 introns of the same length in zebrafish, over a broad range of intron lengths. Remarkably, (AC)m-(GU)n introns are spliced out without the involvement of the U2 snRNP auxiliary factor (U2AF protein: 2 subunits) necessary to recruit the U2 snRNP to the functional branch point sequence of conventional U2 introns. The RNAfold-predicted secondary structure of an exemplary (AC)m-(GU)n intron (i.e., intron 5 of the
D. rerio cep97 gene for a centrosomal protein—accession number: NC_007112, coordinates 13333–13503) (cf., [
48]) can be described as one extended albeit imperfect hairpin leaving the terminal G’s unpaired but in close proximity to each other. Conserved features, such as the branch point element, are masked in double-stranded sections in the imperfect hairpin structure.
This zebrafish’s optimal RNA structure bears a cunning resemblance to the RNAfold-predicted structures of half of our type-2 cropped sister introns in
Hypoxylon sp. CO27-5 as well as with those of some of the 25 full-length sister stwintrons (
Supplementary Materials Figure S5, bottom row), in particular with respect to what we previously defined as terminal inverted repeats in the sister stwintrons and the type-2 cropped sister introns (cf., [
25]). The canonical BP elements of the external introns of many sister stwintrons were partially or completely masked from the spliceosome in one of the double-stranded sections of the terminal stem structure However, stretches of complementary dinucleotide repeats (AC–GU) near the opposite termini did not appear in any of the 38
Hypoxylon sp. CO27-5 sister stwintrons or in any of the ten type-2 cropped sister introns (and neither in the group of the 81 UO stwintrons).
We assessed the minimum free energies (ΔG) for the optimal secondary structures in our set of 117 stwintrons according to the RNAfold predictor to estimate whether the ΔG provides another distinction between the sister stwintrons (the “younger” stwintrons) and the miscellaneous group of 81 UO stwintrons (the “older” stwintrons) (
Figure S6).When the predicted ΔG values (listed in
Supplementary Materials Table S1) were plotted against the fraction of stwintrons ordered to increasing size, the data points were scattered around the plot. We resorted to linear regression of each of the two data sets in the size range defined by 37 sister stwintrons, from 160 to 226 nt in length, to identify trends for each group of stwintrons. The two fitted lines suggested that the minimum free energies were on average, some 18–20% lower for sister stwintrons than for the UO stwintrons in the same size range, with many individual exceptions in both sets of stwintrons. These differences in ΔG are thus less pronounced for (AC)m-(GU)n introns versus conventional U2 introns in zebrafish (50–100% lower) [
48].
To improve the reliability of the secondary structure prediction, the 14 most similar sister stwintrons were aligned before folding was predicted for a “consensus” sister stwintron deduced from the alignment.
Figure 7 shows the secondary structure prediction of a parsimonious consensus sister stwintron (length 205 nt) using the RNAalifold program from the online ViennaRNA suite [
43]. A stem-loop structure was predicted, where the stem is formed by the terminal inverted repeat (cf., [
25]); this “consensus stem” is 47 nt long from which 8 nt are not base paired, without bulges but including two noncanonical GU base pairs. The canonical BP sequence of the external intron was masked by a double-stranded section of the consensus stem. In the centre of the consensus structure two small hairpins were predicted, neighbouring each other in this most parsimonious structure. Interestingly, in the most 5′ of these predicted small hairpins, the nine-nucleotide-long stem comprised the six-nucleotide canonical BP element of the internal intron (5′-GCUAAC) base paired with the six-nucleotide donor element of the external intron (5′-GUAAGU) (one mismatch), with the branch point A
5 and the donor G
1 (functionally for normal stwintron excision, the 3′-G
3 of the internal acceptor) locked in neighbouring base pairs. It would thus appear that in the predicted consensus sister stwintron structure (RNAalifold), the internal splice elements mask each other from the spliceosome RNP. The consensus sister stwintron did not contain sequences of more than five consecutive pyrimidines, but an interrupted tract (5′-UUCUUAUUUC) was (exactly) locked in the stem of the 3′ of the small hairpins near the centre of the consensus structure (
Figure 7).
The zebrafish (AC)m-(GT)n introns are spliceosomal introns with secondary structures that appear to be under positive selective pressure to guarantee their proper excision [
48]. However, the existence of strong hairpin-like or stem-loop structures in
Hypoxylon [D1,2] sister stwintrons potentially driving the pairing of distal splice sites and subsequent excision of (almost) the complex intervening sequence (minus G
1) in one reaction presents a contradiction with the typical two-step mode by which stwintrons are excised and which we showed to overwhelmingly occur in the set of 117 [D1,2] stwintrons (
Supplementary Materials Table S1). By definition for any stwintron, exact removal of the whole complex intervening sequence is only possible with two consecutive standard U2 splicing reactions, where the excision of the internal canonical U2 intron necessarily precedes the excision of the external canonical U2 intron (cf., [
18]). Immediate utilisation of the most distal splice sites of a [D1,2] stwintron by the spliceosome, in effect an alternative excision by one standard splicing reaction, leads to a +1 frameshift at the exon-exon junction and in most cases, in C-terminally truncated protein products. The RNA species that forms after such a mis-splicing event is usually turned over rapidly by means of nonsense-mediated mRNA decay (for reviews on nonsense-mediated mRNA decay, see, for example [
51,
52,
53,
54]). Nevertheless, we were able to find SRA reads that proved that this alternative splicing of [D1,2] stwintrons in one standard splicing reaction does occasionally occur in
Hypoxylon sp. CO27-5, notably for both sister stwintrons and UO stwintrons (
Supplementary Materials Table S3).
It would therefore appear that splice-site pairing of the internal intron of the
Hypoxylon sister stwintrons by intron definition is by far prevalent over splice-site pairing mediated by the secondary structure of the complete intervening sequence. The exact removal of the stwintron sequence by two consecutive conventional splicing reactions would occur despite the remarkable observation that the internal splice sites of sister stwintrons could mask each other from recognition by the assembling spliceosome. This prevalence would imply that in evolutionary older UO stwintrons that occur in most sequenced species of the Hypoxylaceae family (i.e., the group of 81), the secondary structure of the whole intervening sequence is not conserved between species, because the selective pressure bears solely on the correct excision by two consecutive standard U2 splicing reactions. We tested this in nine [D1,2] stwintrons that were position-conserved in most species of the family (HCOc024B and numbers no-053, no-054, no-082, no-143, no-208, no-223, no-239 and no-301; their predicted optimal secondary structures in
Hypoxylon sp. CO27-5 are shown in
Supplementary Materials Figure S5) and found no evidence for the conservation of the optimal secondary structure as predicted by RNAfold, not even in
Hypoxylon sp. CO27-5 and the closely related species
Hypoxylon pulicicidum.