Next Article in Journal
Neurologic Biomarkers, Neuroimaging, and Cognitive Function in Persistent Atrial Fibrillation: A Cross-Sectional Study
Next Article in Special Issue
Insight into the Organization of the B10v3 Cucumber Genome by Integration of Biological and Bioinformatic Data
Previous Article in Journal
Efficacy and Mechanisms of Action of Essential Oils’ Vapours against Blue Mould on Apples Caused by Penicillium expansum
Previous Article in Special Issue
Genome-Wide Association Studies of Seven Root Traits in Soybean (Glycine max L.) Landraces
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

De Novo Assembly and Annotation of 11 Diverse Shrub Willow (Salix) Genomes Reveals Novel Gene Organization in Sex-Linked Regions

1
Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA
2
Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2023, 24(3), 2904; https://doi.org/10.3390/ijms24032904
Submission received: 8 December 2022 / Revised: 13 January 2023 / Accepted: 31 January 2023 / Published: 2 February 2023
(This article belongs to the Special Issue Comparative Genomics and Functional Genomics Analysis in Plants)

Abstract

:
Poplar and willow species in the Salicaceae are dioecious, yet have been shown to use different sex determination systems located on different chromosomes. Willows in the subgenus Vetrix are interesting for comparative studies of sex determination systems, yet genomic resources for these species are still quite limited. Only a few annotated reference genome assemblies are available, despite many species in use in breeding programs. Here we present de novo assemblies and annotations of 11 shrub willow genomes from six species. Copy number variation of candidate sex determination genes within each genome was characterized and revealed remarkable differences in putative master regulator gene duplication and deletion. We also analyzed copy number and expression of candidate genes involved in floral secondary metabolism, and identified substantial variation across genotypes, which can be used for parental selection in breeding programs. Lastly, we report on a genotype that produces only female descendants and identified gene presence/absence variation in the mitochondrial genome that may be responsible for this unusual inheritance.

1. Introduction

The genus Salix and the Salicaceae family are of growing scientific interest for their use as model systems to understand sex determination and sex chromosome dynamics. The Salicaceae family is almost entirely dioecious, and contains approximately 30 species of Populus and over 300 species of Salix [1], yet both the location of the sex determination system (SDR) as well as the sex inheritance mechanism (ZW vs. XY) differ across clades within this family. In the subgenus Vetrix, the sex determination region has been localized to Chr15, with a ZW system of inheritance [2,3], while Chr15XY has been identified in S. triandra from the subgenus Salix [4,5] and Chr07 XY in subgenus Protitea (S. nigra [6], S. chaenomeloides [7], and S. dunnii [8]) and subgenus Pleuradenia (S. arbutifolia) [4,9]. In Populus, Chr19 XY, Chr19 ZW, and Chr14 XY sex determination systems have all been reported [10]. The precise genes responsible for sex determination in Salicaceae are still being studied but are thought to involve a presence/absence of expression of the type-C cytokinin response regulator ARR17 in Populus [11] and Salix subgenus Protitea [7], in which ARR17 acts as a single gene controlling sex, simultaneously promoting female floral development and suppressing male floral development. Willows in the subgenus Vetrix, on the other hand, may possess a two-gene system of sex determination where two genes on the W chromosome are necessary to determine sex: one to promote female floral development and another to suppress male development. Dosage levels of ARR17 in combination with GATA15 have been suggested as a possible two-gene mechanism of sex determination in the shrub willow S. purpurea, based on expression and resequencing evidence from a set of monoecious families in this species [12]. AGO4, DRB1, and three hypothetical proteins have also been proposed as potential master regulators of sex in S. purpurea [13].
Shrub willows in the subgenus Vetrix are a dioecious crop grown widely across the northern hemisphere for a variety of horticultural uses, including for bioenergy, as ornamentals, and for ecological restoration purposes [1]. Commonly cultivated shrub willow species include European natives S. purpurea and S. viminalis, the Chinese species S. suchowensis, Japanese natives S. integra and S. udensis, and S. koriyanagi from Korea [1,14,15]. Together, these six aforementioned species represent a broad range of genetic diversity across Vetrix [2,14]. Due to both the dynamic nature of the SDR within this genus and the unique mechanism of sex determination in S. purpurea [12,13], there is an interest in comparing the gene content of the sex determination regions across species in Vetrix, in order confirm the two-gene model from S. purpurea and to identify any additional shifts in sex determination genes during the evolution of this clade.
Salix are both wind and insect pollinated and as such, catkins produce a suite of secondary metabolites to attract pollinators [16,17,18]. Previous studies that characterized terpenoid and flavonoid profiles in Salix catkins have shown substantial differential expression of these compounds based on sex, which influences pollinator attraction [16,19]. Secondary metabolites also play a known role in defense against herbivory across plant species [20,21]. QTL mapping of floral terpenoid, flavonoid, and phenolic glucoside production and identification of candidate genes have been conducted in S. purpurea, and candidate genes for many specific compounds have been identified [19]. However, as of yet, there has been little effort to compare these candidate genes between related species. Characterizing the presence, copy number, and expression of secondary metabolite genes across Salix species is therefore useful for understanding biological differences in floral secondary metabolite production, and their effects on pollinator attraction and herbivory.
Genomic resources for the genus Salix are still under development, with the shrub willows being the most well-studied group with several assembled genomes and recent advances in QTL mapping of various traits, including yield, insect resistance, and rust susceptibility [22,23]. Within the subgenus Vetrix, reference genomes are currently available for a female S. viminalis [24], a female S. suchowensis [25], a male S. purpurea (‘Fish Creek’), a female S. purpurea (94006) [26], and a monoecious S. purpurea [12], the latter two of which have fully assembled Chr15Z and Chr15W sex chromosomes. Here we present de novo assembly and annotation of 11 Salix genomes across six shrub willow species, including three newly sequenced and assembled species. Among these 11 genomes is a reassembly of 94006, a S. purpurea female that was used for the Phytozome v5.1 reference genome (https://phytozome-next.jgi.doe.gov/info/Spurpurea_v5_1, last accessed 2 December 2022) and is the mother of the male ‘Fish Creek’ used for the Phytozome v3.1 reference (https://phytozome-next.jgi.doe.gov/info/SpurpureaFishCreek_v3_1, last accessed 2 December 2022), both produced by the US Department of Energy Joint Genome Institute (JGI) [26,27]. A male S. purpurea, (94001, the father of ‘Fish Creek’), two female (P294, P295) and one male (P63) S. suchowensis, one female S. integra (P336), one male (04-FF-016) and one female (SH3) S. koriyanagi, one female (07-MBG-5027) and one male (‘Jorr’) S. viminalis, and a male S. udensis (04-BN-051) were also sequenced. These particular genotypes were utilized since previous research has reported Chr15 ZW sex inheritance for each of these genotypes and their SDR boundaries have been delineated [2]. Furthermore, F1 crosses to S. purpurea 94001 and 94006 have been previously generated for each genotype, with their genetic linkage maps, phenotypic analysis, and QTL mapping results published [2].
For each assembly and annotation, gene content across the Chr15W SDR regions was characterized. Notably, nearly all previously identified candidate sex determination genes are missing from S. koriyanagi, S. viminalis, and S. udensis, which suggests a unique sex determination mechanism in these species that may not involve ARR17, as shown in Populus and S. purpurea [12,28]. We report the expression and copy number variation of multiple secondary metabolite genes, including previously identified candidates for sex-dimorphic floral volatile and phenolic glycoside compound production [19]. Finally, we present data that support an all-female inheritance in the Salix integra P336 descendants and identify a missing mitochondrial RPL10 gene as a candidate mechanism for this inheritance.

2. Results

2.1. Assembly and Annotation

Oxford Nanopore read length and quality distributions for each assembly are shown in Figure S1A,B, respectively. Mean genome coverage ranged from 45× to 103×. Contig N50 values ranged from 300.36 Kb in 04-FF-016, to 804.25 kb in P336. Assembly lengths were relatively consistent within each species. S. suchowensis had the largest genome size, with a mean of 375 Mb, while the mean size of the S. viminalis genome was only 288 Mb. All assemblies had a Eudicot core gene BUSCO score above 95%. Assembly statistics are shown in Table 1.
Annotation BUSCO scores ranged from 77.9% in P336 (S. integra) to 92.9% in ‘Jorr’ (S. viminalis). The mean number of annotated genes across all genomes was 32,166, while the mean number of annotated transcripts was 40,679. The estimated number of missing genes, relative to 94006 v5.1, ranged from 3706 in the 94006 reassembly (S. purpurea) to 4973 in SH3 (S. koriyanagi). The total gene number in genome-specific orthogroups ranged from 331 in ‘Jorr’ (S. viminalis) to 1026 in 94006 (S. purpurea). Annotation statistics are shown in Table 2.
Comparative genomics analysis with Orthofinder assigned 391,057 out of 407,955 transcripts across all 11 annotations to 49,209 orthogroups (95.2% of transcripts). A total of 2769 orthogroups were genome-specific and accounted for 1.7% of all transcripts. Orthogroup assignment had a G50 of 11 (i.e., 50% of genes were assigned to an orthogroup of 11 or larger) and an O50 of 11,958 among assigned genes (i.e., 11,958 orthogroups accounted for 50% of genes). A total of 10,799 orthogroups had genes from all 11 genomes represented, 3401 of which were single-copy orthogroups. Phylogenetic analysis of the annotated gene sets using Orthofinder consistently grouped genomes of the same species together (Figure S2).

2.2. Sex Determination Gene Analysis

Copy numbers of each candidate sex determination gene in the Chr15W locus varied among genotypes, as did the Chr15Z exon 1 and Chr19 full-length copies of ARR17 (Table 3). Candidate genes were present in expected numbers in S. purpurea 94006, consistent with the JGI S. purpurea 94006 v5.1 reference genome, with the exception of one fewer ARR17 copy as well as GATA15 initially assembled on Chr17 instead of Chr15W, which are likely due to errors during assembly or scaffolding. S. suchowensis P294, S. suchowensis P295, and S. integra P336 have two Chr15 ARR17 copies and one Chr15 AGO4 copy. Most candidate sex determination genes, including ARR17, AGO4, GATA15, were missing from S. koriyanagi SH3 and S. viminalis 07-MBG-5027 (Table 3). All Chr15 gene copies were located within the boundaries of the sex determination regions of each genotype, as previously described by Wilkerson et al. [2].

2.3. Secondary Metabolism Gene Analysis

BLASTN analysis of secondary metabolism genes revealed variation in copy number between genomes for most genes (Supplementary Tables S1 and S2). The total combined expression of secondary metabolism genes across all eight tissue types also showed substantial variation between genotypes (Supplementary Table S3).

2.4. P336 Crosses and Progeny

All progeny in the eight families generated with P336 (S. integra) as the female parent, including F1 progeny and second-generation progeny, were female, with over 75% of plants flowering in each family at the time of data collection (Table 4).

3. Discussion

3.1. Assemblies and Annotations

The high quality of the assemblies (BUSCO > 95%) as well as the large number of genes in the annotations, represents an advancement in Salix genomic resources, including comprehensive comparative genome analysis across the shrub willows. Across all 11 annotations, there are several thousand gene models from S. purpurea 94006 v5.1 that are missing, which is reflected by relatively low BUSCO scores of less than 90% in most annotations (Table 2). The RNA-Seq data used to perform the annotations did not contain any floral tissue, nor any tissue from drought, disease, or insect stressed plants, which can explain the missing gene models, as genes from these biological conditions were not expressed in our dataset and therefore were not annotated.

3.2. Sex Determination Genes and SDR Assembly

The reported assemblies each include one haplotype of Chr15 per genome: Chr15Z in the male assemblies and Chr15W in the females. Together, these include separate fully assembled 15Z and 15W chromosomes for S. purpurea, S. suchowensis, S. koriyanagi, and S. viminalis, Chr15W for S. integra, and Chr15Z for S. udensis. This is the first report of a fully assembled Chr15Z for both S. suchowensis and S. viminalis [24,25]. The Chr15 assemblies across the 11 genomes show substantial differences in structural arrangement (Figure S3A–K). These structural differences may be due in part to errors in assembly rather than true structural variations between genotypes, particularly since the order of sequences in the reassembly of S. purpurea 94006 Chr15W differs from the JGI S. purpurea 94006 v5.1 assembly. Sex determination regions are notoriously difficult to assemble due to highly repetitive regions resulting from a lack of recombination, and such differences in arrangement of contigs into the final scaffolded sex chromosomes are not unexpected [29]. Nevertheless, despite structural variation, the Chr15 appears to be fully intact across every assembly.
BLASTN results for candidate sex determination genes reveal substantial variation in gene content within the Chr15W SDR between genomes (Table 3). In S. purpurea 94006, the sex determination gene content closely matches the JGI S. purpurea 94006 v5.1 reference genome. Only three copies of ARR17 were identified on Chr15 instead of four, five copies of DRB1 instead of two, and GATA15 was initially located on Chr17 instead of Chr15; however, these differences in gene copy number between assemblies could be the result of errors in assembly within the Chr15 in either reference. In the case of the missing fourth ARR17, this gene is located within a series of four palindromic repeats, and, due to their repetitive nature, the fourth arm could have been lost during haplotig purging. When searching the purged contigs, an additional ARR17 was identified, which is likely this fourth Chr15 copy. In the case of GATA15, the Chr15W copy was in a 482 kb region that was originally assembled on Chr17. This was determined to be an error in the assembly, as no Chr17 GATA15 was present in any other S. purpurea genome assembly, including the JGI 94006 v1.0 and v5.1 assemblies, the JGI ‘Fish Creek’ v3.1 assembly [26], the 94003 assembly [12], or our 94001 assembly. A dotplot alignment of HiC_scaffold_7 (Chr17) from our reassembly of 94006 against the JGI 94006 v5.1 reference shows this 482 kb region on HiC_scaffold_7 aligns to Chr15W (Figure S4). Linkage map markers for the S. purpurea 94006 genotype were obtained from Wilkerson et al. (2022) and include one marker, S15_7998352, which is located in the misassembled region [2]. In a BLASTN analysis, the flanking regions of this marker align to HiC_scaffold_7 (Chr17), while the nearest markers on the 94006 linkage map, which are tightly linked, align to HiC_scaffold_3 (Chr15), confirming that this region, including GATA15, is indeed a Chr15W region misassembled onto Chr17. In light of this finding, the 482 kb misassembled region on HiC_scaffold_7 was manually moved to HiC_scaffold_3.
The ARR17 and GATA15 genes were absent from Chr15 in the males and present in the females of S. purpurea and S. suchowensis, consistent with the two-gene sex determination mechanism proposed by Hyden et al. [12,13] and suggesting a common sex determination mechanism between these two species. ARR17 and AGO4 were located in a series of four inverted palindromic repeats on Chr15W in S. purpurea [26]. In the S. suchowensis and S. integra female genomes there were only two ARR17 copies on Chr15 instead of four, and only one AGO4 copy instead of three. This indicates that there are only two arms of these palindromic repeats in S. suchowensis and S. integra instead of the four observed in S. purpurea [26]. These palindromic repeats appeared to be absent altogether in the S. koriyanagi and S. viminalis female genomes, which suggests that the palindromic repeats may have been deleted independently in S. koriyanagi and S. viminalis. Partial copies of the ARR17 exon 1 are thought to have a key role in sex determination in both Populus and Salix subgenus Protitea [7,11] by silencing ARR17 expression in males. BLAST results revealed ARR17 exon 1 copies present on Chr15 in all S. purpurea, S. suchowensis, S. integra, and S. koriyanagi regardless of sex, while they were absent entirely from S. viminalis and S. udensis, suggesting that these partial repeats were likely lost in a common ancestor of these two species (Table 3, Figure S6). The copy number variation of DRB1 and the three hypothetical proteins across the genomes is inconsistent with the current model of sex determination and does not support a role of these genes in sex determination, as previously proposed for S. purpurea [13]. Of particular interest is the lack of ARR17 or GATA15 homologs on Chr15 in the S. koriyanagi and S. viminalis female genomes. The missing ARR17 in S. viminalis is inconsistent with earlier studies on S. viminalis by Hallingback et al. [30] and Almeida et al. [24], which both identified one copy of ARR17 on the S. viminalis Chr15W. Taken together, the differing number of candidate sex determination genes between species, particularly ARR17 and GATA15, indicates that the mechanism of sex determination may be quite labile within the Vetrix lineage of willows, despite its apparent conservation between other willow subgenera and the poplars.

3.3. Secondary Metabolism Genes

Across most genomes, the copy number of annotated secondary metabolism genes shows little variation, with a few notable exceptions. S. suchowensis P294 exhibited an exceptionally high copy number of several gene families, including flavonol synthase 1, terpene synthase 21 (involved in sesquiterpene synthesis), coniferyl aldehyde 5-hydroxylase (associated with kaempferol-3-O-glucoside and prunin variation [19]), and UDP-glucose flavonoid 3-O-glucosyltransferase (Table S2). This abundance of gene annotations in P294 warrants further investigation into this particular genotype and its progeny for secondary metabolite abundance and its relationship to pollinator and pest attraction. Some other notable copy number variations between genomes included nine chalcone synthase genes in S. viminalis 07-MBG-5027, two copies of phytoene desaturase 1 in all three S. suchowensis, 20 copies of squalene monooxygenase in S. suchowensis P63, and 31 copies of UDP-glucose flavonoid 3-O-glucosyltransferase in S. purpurea 94001 [19] (Table S2).
FPKM normalized expression results from all eight tissue types mapped to the S. purpurea 94006 v5.1 reference showed substantial variation in expression for secondary metabolite gene families (Table S3). Sapur.019G055800, a 4-coumarate:CoA ligase, has been associated with phenolic glucoside production in S. purpurea [19]. However, both S. purpurea genomes had the lowest relative expression of this gene, while expression was nearly five-fold greater in both S. koriyanagi genotypes. S. koriyanagi 04-FF-016 also showed exceptionally high expression of the arogenate/prephenate dehydratase gene family, which has been associated with prunin and isosalicin production in S. purpurea [19]. Salix suchowensis P63 exhibited the greatest expression of terpene synthase 03 family genes, which are associated with numerous terpenoids including beta-ocimene, beta-pinene, farnesene, and isoprene, while S. suchowensis P294 exhibited the greatest expression of coniferyl aldehyde 5-hydroxylase genes associated with prunin and kaempferol-3-O-glucoside [19]. These findings suggest that further research is warranted into these genotypes to understand differences in secondary metabolite concentrations and the effects they may have on pollinator and pest attraction.

3.4. P336 Crosses and Progeny

Across all eight crosses generated with S. integra P336 as a parent or grandparent, 100% of the progeny were female. Notably, when a (S. integra P336 × S. suchowensis P63) F1 female was both backcrossed to S. suchowensis P63 and out-crossed with S. purpurea 94001, all of the progeny were again female. This is interesting as it suggests that all-female inheritance persists across multiple generations, despite independent assortment and recombination of autosomes. The most likely cause of such a sex bias persisting after more than one generation is the cytoplasmic inheritance of a “male killer” allele on either the chloroplast or mitochondrial genome from S. integra P336, such that only female gametes survive. Alternatively, ZZ progeny may survive, but a cytoplasmic factor may result in a female phenotype regardless of the state of the sex chromosomes. One likely candidate for such a factor for either of these two mechanisms is the RPL10 gene, which was identified in every mitochondrial genome except S. integra P336 and S. viminalis ‘Jorr’ [31]. The absence of this gene is particularly striking, as its presence in the mitochondrial genome is broadly conserved across plant taxa, including gymnosperms and non-flowering plants [32]. RPL10 encodes a protein that is a component of the 80S ribosome and plays a role in plant development and protein translation under UV-B stress, as well as antiviral signaling [33,34]. In Arabidopsis, RPL10C has also been found to be expressed exclusively in pollen grains, and RPL10A has impaired transmission in male gametophytes when either RPL10B or RPL10C are mutated [35]. The absence of RPL10 from the S. integra P336 mitochondria and, therefore, all of its descendants, as well as this gene’s known role in plant and male gametophyte development, presents a compelling case for the absence of RPL10 as the most likely explanation for the all-female bias observed in the progeny of S. integra P336.

4. Materials and Methods

4.1. DNA Sequencing

Fresh young leaf tissue (approximately 100 mg) for all 11 Salix genotypes was collected and ground in liquid nitrogen using the Qiagen TissueLyser II with one 5 mm stainless steel bead. DNA extraction was performed using a modified CTAB-based protocol [36]. Briefly, the organic and aqueous phases were extracted using chloroform:isoamyl alcohol 24:1. After separation, a SPRI bead solution was used to select for reads greater than 1 kb [37]. For long read sequencing, 1 μg of DNA was used as input to Oxford Nanopore’s genomic DNA by ligation sequencing kit (SQK-LSK109) and the subsequent library was sequenced on an R.9.4.1 flow cell. Short-read sequencing of the same samples was performed on the Illumina HiSeq X Ten platform.

4.2. RNA Sequencing

RNA was extracted from eight tissues (root, xylem, internode, node, young leaf, mature leaf, petiole, and young stem) for all 11 genotypes, as well as fasciated shoot tissue from 04-BN-051, following the protocol described in Zhang et al. [38]. Strand-specific RNA-Seq libraries were prepared by BGI and sequenced on the DNB-Seq platform, which generated paired-end 150 bp reads. The same RNA preps from mature leaves and roots were also sequenced on the Oxford Nanopore MinION platform, with the exception of S. viminalis ‘Jorr’, which failed quality control. The SQK-PCB109 PCR-based cDNA library kit was used to generate sequencing libraries for leaf and root tissue for all 11 genotypes and were sequenced on R.9.4.1 flow cells.

4.3. Hi-C Library Preparation

Hi-C libraries were prepared with the Phase Genomics Proximo Plant Hi-C kit (Phase Genomics, Seattle, WA, USA). Hi-C libraries were sequenced on the Illumina NovaSeq 6000 instrument, which generated paired-end 150 bp reads. The sequencing data of each Hi-C library underwent quality control with the phase genomics hic_qc.py script (https://github.com/phasegenomics/hic_qc; last accessed 15 November 2021) to ensure a sufficient number of informative Hi-C reads were present in each library. Hi-C heatmaps are shown in Figure S5.

4.4. Genome Assembly

Assembly was performed with Oxford Nanopore reads using Flye 2.8.3 [39]. Illumina short reads were mapped to the assembled contigs with BWA-MEM [40]. Pilon and a custom python script were used to generate the corrected draft assembly with the Illumina data (Figure S6) [41]. Assembled contigs were scaffolded using Hi-C reads with Falcon [42] and Juicer Hi-C [43] to generate phased genome assemblies. A BUSCO search of the Eudicot core genes was performed against each assembly to assess the quality and completeness of each genome [44]. One assembly, 04-FF-016, produced two chimeric contigs, HiC scaffold_5 and HiC_scaffold_6, each spanning the entire length of several chromosomes. BLASTN analysis at the default parameter settings [45] was used to determine alignment to specific chromosomes, and each chimeric contig was manually cut at the approximate site where mapping behavior became abnormal. Resulting scaffolds were appended with a letter (e.g., a, b, c, etc.) to denote their origin from the original chimeric scaffold.

4.5. Annotation

Genome annotation was performed with the LoReAn v2.5 pipeline [46], which utilized both Oxford Nanopore and Illumina RNA-Seq, along with protein models from the JGI Populus trichocarpa v4.1, Populus deltoides v2.1, and Populus nigra × P. maximowiczii v1.1 reference genome annotations obtained from Phytozome (https://phytozome-next.jgi.doe.gov; last accessed 21 March 2022) [27,47], followed by Augustus ab initio gene prediction [48]. BLASTN analysis, with the maximum target sequences set to one, was performed for each annotated transcript from every genome against the S. purpurea 94006 v5.1 annotation on Phytozome (https://phytozome-next.jgi.doe.gov, last accessed 6 July 2022) to identify homologous gene models [26,45]. Functional prediction of mRNAs in each annotation was performed using interproscan 5.52–86.0 [49]. The estimated number of missing genes from each annotation was determined by performing a BLASTN analysis (default parameter settings) of all S. purpurea 94006 v5.1 CDS sequences against all annotated genes for each genome and identifying those S. purpurea 94006 v5.1 genes without a match in each genome. Orthofinder was used to identify unique and shared genes for each assembly, and to generate a phylogeny tree from the annotated genes [50]. A BUSCO search of the Eudicot core genes was performed against the annotated mRNA sequences to estimate the completeness of each annotation [44].

4.6. Sex Determination Candidate Gene Analysis

BLASTN analysis of candidate sex determination genes was performed using the S. purpurea 94006 v5.1 [26] and P. trichocarpa v4.1 [47] CDS sequences of the candidate sex determination genes identified in Hyden et al. (2021) as the query, with each assembly as the target, using the default parameter settings [12]. Analyzed candidate sex determination genes included homologs of a type C cytokinin response regulator ARR17, a GATA15 transcription factor, a truncated Argonaute 4 AGO4, a double stranded RNA-binding protein DRB1, and three hypothetical proteins [13].

4.7. Secondary Metabolism and Rust Gene Analysis

Analysis of candidate secondary metabolism genes was performed by creating a customized list of S. purpurea 94006 v5.1 gene models, which included candidate genes identified by Keefover-Ring et al. (2022) located in flavonoid, phenolic glucoside, and terpenoid QTL [19]. Genes with annotations in flavonoid and chalcone synthesis, terpene, sesquiterpene, squalene, and phytoene synthesis, and UDP-glucose flavonoid glucosyltransferase were also included, all of which have likely roles in terpenoid, flavonoid, and phenolic glucoside production. Results from the BLASTN analysis of annotated transcripts against the S. purpurea 94006 v5.1 reference were used to find the total matches in each respective genome for genes on the customized list of S. purpurea secondary metabolism genes.
To analyze and compare expression of candidate genes, Illumina RNA-Seq data for each genome were mapped to the S. purpurea 94006 v5.1 reference using STAR 2.7.0 [51], read counts were determined using featureCounts [52], and FPKM calculated using EdgeR [53]. The sum of normalized FPKM values was calculated across all tissue types sequenced within each genotype and across all genes within each gene family.

4.8. P336 Crosses and Progeny

To quantify female bias in progeny from the S. integra P336 genotype, F1 crosses and a select set of backcrosses were attempted with clones from each male genome in this study using the crossing method described by Kopp et al. [54]. In 2013, the 13X-426 cross was generated between P336 and 94001. In 2014, 05X-278-071, a female from a P336 × P63 cross, was crossed with 94001 and P63 to generate the 14X-454 and 14X-456 families, respectively. In 2020, P336 was crossed with S. purpurea ‘Fish Creek’ (94006 × 94001), a monoecious S. purpurea 94003 [12], P63, 04-FF-016, and 04-BN-051 to generate the 20X-565, 20X-564, 20X-278, 20X-567, and 20X-566 families, respectively. A cross with S. viminalis ‘Jorr’ was also attempted, but failed to produce viable seed, possibly as a result of the species being too divergent. Scoring for sex among the progeny was performed in April 2021.

5. Conclusions

We present 11 new Salix genome assemblies and annotations as a novel resource for shrub willow breeding, genetics, and genomics that will enable more accurate genetics studies of these species in the future. This is the most comprehensive genome assembly and annotation effort to date in the genus Salix and represents closely related diploid species that can be compared to understand the evolution of sex determination mechanisms. We used these genomes to characterize copy number variation of interesting genes relating to sex determination and secondary metabolism, which could be drivers of dioecy through emergence of a new sex determination system or sexual antagonistic effects, respectively. We found that key sex determination genes are missing in S. viminalis and S. koriyanagi and hypothesize that a unique sex determination system exists in these species that differs from Populus and other Salix species, which further supports the dynamic nature of sex chromosome evolution in Salicaceae. We also characterized copy number variation and expression of sexually dimorphic secondary metabolite genes. Lastly, we demonstrated that S. integra P336 produces only female descendants and propose a missing RPL10 gene from the mitochondrial genome as a candidate for this unusual inheritance.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms24032904/s1.

Author Contributions

Conceptualization, L.B.S., W.M., K.F., T.B.Y. and B.H.; methodology: B.H., K.F., T.B.Y., S.J. and C.C.; formal analysis, B.H., K.F. and T.B.Y.; investigation, B.H., K.F. and T.B.Y.; resources, L.B.S. and W.M.; data curation, B.H. and T.B.Y.; writing—original draft preparation, B.H.; writing—review and editing, B.H., K.F., T.B.Y., L.B.S. and W.M.; visualization, B.H., K.F. and T.B.Y.; supervision, L.B.S. and W.M.; project administration, L.B.S. and W.M.; funding acquisition, B.H., L.B.S. and W.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the U.S. Department of Energy (DOE) Office of Science Early Career Research Program under the Biological and Environmental Research office. BH was supported by a fellowship from the DOE Office of Science Graduate Student Research (SCGSR) Program and by a Pre-doctoral Research Fellowship from the United States Department of Agriculture National Institute for Food and Agriculture (award #2021-67034-35116). Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract Number DE-AC05-00OR22725. Part of this work was performed at the Oak Ridge Leadership Computing Facility (OLCF) including resources of the Compute and Data Environment for Science (CADES) at Oak Ridge National Laboratory. Partial funding for this research was also provided by grants from the National Science Foundation (DEB-1542486) and from the USDA National Institute for Food and Agriculture (2015-67009-23957).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All raw sequencing data have been deposited at the NCBI SRA (https://www.ncbi.nlm.nih.gov/sra, accessed on 12 October 2022). Raw Illumina and nanopore DNA sequencing data can be accessed with the BioProject ID PRJNA827350. The raw Illumina RNA-Seq data can be accessed with the BioProject ID PRJNA827350. Nanopore RNA-Seq data can be accessed with the BioProject ID PRJNA888070. Genome assemblies and annotations have been deposited at the NCBI Genome Portal (https://www.ncbi.nlm.nih.gov/genome, accessed on 12 October 2022) with the BioProject IDs PRJNA890276, PRJNA892589, PRJNA892593, PRJNA892594, PRJNA892596, PRJNA892597, PRJNA892598, PRJNA892599, PRJNA892600, PRJNA892601, and PRJNA892602. Protein FASTA and information files with interproscan and S. purpurea 94006 v5.1 BLAST results for each annotation are available on the Willowpedia github (https://github.com/Willowpedia, accessed on 10 October 2022). Genome assemblies, annotations, and annotation information files are available on Dryad at https://doi.org/10.5061/dryad.5hqbzkh9f (Released 6 December 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kuzovkina, Y.A.; Weih, M.; Romero, M.A.; Charles, J.; Hust, S.; McIvor, I.; Karp, A.; Trybush, S.; Labrecque, M.; Teodorescu, T.I. Salix: Botany and Global Horticulture. Hortic. Rev. 2007, 34, 447–489. [Google Scholar] [CrossRef]
  2. Wilkerson, D.G.; Taskiran, B.; Carlson, C.H.; Smart, L.B. Mapping the sex determination region in the Salix F1 hybrid common parent population confirms a ZW system in six diverse species. G3 2022, 12, jkac071. [Google Scholar] [CrossRef]
  3. Zhou, R.; Macaya-Sanz, D.; Rodgers-Melnick, E.; Carlson, C.H.; Gouker, F.E.; Evans, L.M.; Schmutz, J.; Jenkins, J.W.; Yan, J.; Tuskan, G.A.; et al. Characterization of a large sex determination region in Salix purpurea L. (Salicaceae). Mol. Genet. Genom. 2018, 293, 1437–1452. [Google Scholar] [CrossRef]
  4. Wang, Y.; Cai, X.; Zhang, Y.; Hörandl, E.; Zhang, Z.; He, L. The male-heterogametic sex determination system on chromosome 15 of Salix triandra and Salix arbutifolia reveals ancestral male heterogamety and subsequent turnover events in the genus Salix. Heredity 2023, 1–13. [Google Scholar] [CrossRef]
  5. Gulyaev, S.; Cai, X.; Guo, F.; Kikuchi, S.; Applequist, W.L.; Zhang, Z.; Hörandl, E.; He, L. The phylogeny of Salix revealed by whole genome re-sequencing suggests different sex-determination systems in major groups of the genus. Ann. Bot. 2022, 129, 485–498. [Google Scholar] [CrossRef]
  6. Sanderson, B.J.; Feng, G.; Hu, N.; Carlson, C.H.; Smart, L.B.; Keefover-Ring, K.; Yin, T.; Ma, T.; Liu, J.; DiFazio, S.P. Sex determination through X–Y heterogamety in Salix nigra. Heredity 2021, 126, 630–639. [Google Scholar] [CrossRef]
  7. Wang, D.; Li, Y.; Li, M.; Yang, W.; Ma, X.; Zhang, L.; Wang, Y.; Feng, Y.; Zhang, Y.; Zhou, R.; et al. Repeated turnovers keep sex chromosomes young in willows. Genome Biol. 2022, 23, 200. [Google Scholar] [CrossRef]
  8. He, L.; Jia, K.H.; Zhang, R.G.; Wang, Y.; Shi, T.L.; Li, Z.C.; Zeng, S.W.; Cai, X.J.; Wagner, N.D.; Hörandl, E.; et al. Chromosome-scale assembly of the genome of Salix dunnii reveals a male-heterogametic sex determination system on chromosome 7. Mol. Ecol. Resour. 2021, 21, 1966–1982. [Google Scholar] [CrossRef] [PubMed]
  9. Wu, J.; Nyman, T.; Wang, D.-C.; Argus, G.W.; Yang, Y.-P.; Chen, J.-H. Phylogeny of Salix subgenus Salix s.l. (Salicaceae): Delimitation, biogeography, and reticulate evolution. BMC Evol. Biol. 2015, 15, 31. [Google Scholar] [CrossRef]
  10. Yang, W.; Wang, D.; Li, Y.; Zhang, Z.; Tong, S.; Li, M.; Zhang, X.; Zhang, L.; Ren, L.; Ma, X.; et al. A general model to explain repeated turnovers of sex determination in the Salicaceae. Mol. Biol. Evol. 2020, 38, 968–980. [Google Scholar] [CrossRef] [PubMed]
  11. Cronk, Q.; Müller, N.A. Default sex and single gene sex determination in dioecious plants. Front. Plant Sci. 2020, 11, 1162. [Google Scholar] [CrossRef]
  12. Hyden, B.; Zou, J.; Wilkerson, D.G.; Carlson, C.H.; Rivera Robles, A.; DiFazio, S.; Smart, L.B. Structural variation of a sex-linked region confers monoecy and implicates GATA15 as a master regulator of sex in Salix purpurea. submitted, in review.
  13. Hyden, B.; Carlson, C.H.; Gouker, F.E.; Schmutz, J.; Barry, K.; Lipzen, A.; Sharma, A.; Sandor, L.; Tuskan, G.A.; Feng, G.; et al. Integrative genomics reveals paths to sex dimorphism in Salix purpurea L. Hortic. Res. 2021, 8, 170. [Google Scholar] [CrossRef] [PubMed]
  14. Argus, G.W. Infrageneric classification of Salix (Salicaceae) in the New World. Syst. Bot. Monogr. 1997, 52, 1–121. [Google Scholar] [CrossRef]
  15. Newsholme, C. Willows: The Genus Salix; Timber Press, Inc.: Portland, OR, USA, 1992. [Google Scholar]
  16. Fussel, U.; Dotterl, S.; Jurgens, A.; Aas, G. Inter- and intraspecific variation in floral scent in the genus Salix and its implication for pollination. J. Chem. Ecol. 2007, 33, 749–765. [Google Scholar] [CrossRef] [PubMed]
  17. Mosseler, A.; Major, J.; Ostaff, D.; Ascher, J. Bee foraging preferences on three willow (Salix) species: Effects of species, plant sex, sampling day and time of day. Ann. Appl. Biol. 2020, 177, 333–345. [Google Scholar] [CrossRef]
  18. Zeng, S.; Liang, T.; Li, L.; Xing, X.; Chen, H.; He, L. Pollination system and reproductive allocation strategies of dioecious tree Salix dunnii. J. Trop. Subtrop. Bot. 2022, 30, 357–366. [Google Scholar]
  19. Keefover-Ring, K.; Carlson, C.H.; Hyden, B.; Azeem, M.; Smart, L.B. Genetic mapping of sexually dimorphic volatile and non-volatile floral secondary chemistry of a dioecious willow. J. Exp. Bot. 2022, 73, 6352–6366. [Google Scholar] [CrossRef]
  20. Foley, W.J.; Moore, B.D. Plant secondary metabolites and vertebrate herbivores–from physiological regulation to ecosystem function. Curr. Opin. Plant Biol. 2005, 8, 430–435. [Google Scholar] [CrossRef] [PubMed]
  21. Wink, M. Plant breeding: Importance of plant secondary metabolites for protection against pathogens and herbivores. Theor. Appl. Genet. 1988, 75, 225–233. [Google Scholar] [CrossRef]
  22. Carlson, C.H.; Gouker, F.E.; Crowell, C.R.; Evans, L.; DiFazio, S.P.; Smart, C.D.; Smart, L.B. Joint linkage and association mapping of complex traits in shrub willow (Salix purpurea L.). Ann. Bot. 2019, 124, 701–716. [Google Scholar] [CrossRef]
  23. Wilkerson, D.G.; Crowell, C.R.; Carlson, C.H.; McMullen, P.W.; Smart, C.D.; Smart, L.B. Comparative transcriptomics and eQTL mapping of response to Melampsora americana in selected Salix purpurea F2 progeny. BMC Genom. 2022, 23, 71. [Google Scholar] [CrossRef] [PubMed]
  24. Almeida, P.; Proux-Wera, E.; Churcher, A.; Soler, L.; Dainat, J.; Pucholt, P.; Nordlund, J.; Martin, T.; Rönnberg-Wästljung, A.-C.; Nystedt, B. Genome assembly of the basket willow, Salix viminalis, reveals earliest stages of sex chromosome expansion. BMC Biol. 2020, 18, 78. [Google Scholar] [CrossRef]
  25. Wei, S.; Yang, Y.; Yin, T. The chromosome-scale assembly of the willow genome provides insight into Salicaceae genome evolution. Hortic. Res. 2020, 7, 45. [Google Scholar] [CrossRef]
  26. Zhou, R.; Macaya-Sanz, D.; Carlson, C.H.; Schmutz, J.; Jenkins, J.W.; Kudrna, D.; Sharma, A.; Sandor, L.; Shu, S.; Barry, K.; et al. A willow sex chromosome reveals convergent evolution of complex palindromic repeats. Genome Biol. 2020, 21, 38. [Google Scholar] [CrossRef]
  27. Goodstein, D.M.; Shu, S.; Howson, R.; Neupane, R.; Hayes, R.D.; Fazo, J.; Mitros, T.; Dirks, W.; Hellsten, U.; Putnam, N.; et al. Phytozome: A comparative platform for green plant genomics. Nucleic Acids Res. 2011, 40, D1178–D1186. [Google Scholar] [CrossRef]
  28. Müller, N.A.; Kersten, B.; Leite Montalvao, A.P.; Mahler, N.; Bernhardsson, C.; Brautigam, K.; Carracedo Lorenzo, Z.; Hoenicka, H.; Kumar, V.; Mader, M.; et al. A single gene underlies the dynamic evolution of poplar sex determination. Nat. Plants 2020, 6, 630–637. [Google Scholar] [CrossRef]
  29. Webster, T.H.; Couse, M.; Grande, B.M.; Karlins, E.; Phung, T.N.; Richmond, P.A.; Whitford, W.; Wilson, M.A. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. Gigascience 2019, 8, giz074. [Google Scholar] [CrossRef] [PubMed]
  30. Hallingbäck, H.R.; Pucholt, P.; Ingvarsson, P.K.; Rönnberg-Wästljung, A.C.; Berlin, S. Genome-wide association mapping uncovers sex-associated copy number variation markers and female hemizygous regions on the W chromosome in Salix viminalis. BMC Genom. 2021, 22, 710. [Google Scholar] [CrossRef]
  31. Yates, T. Genome Evolution in the Salicaceae: Genetic Novelty, Horizontal Gene Transfer, and Comparative Genomics. Ph.D. Thesis, University of Tennessee, Knoxville, TN, USA, 2022. [Google Scholar]
  32. Mower, J.P.; Bonen, L. Ribosomal protein L10 is encoded in the mitochondrial genome of many land plants and green algae. BMC Evol. Biol. 2009, 9, 265. [Google Scholar] [CrossRef] [PubMed]
  33. Ferreyra, M.L.F.; Pezza, A.; Biarc, J.; Burlingame, A.L.; Casati, P. Plant L10 ribosomal proteins have different roles during development and translation under ultraviolet-B stress. Plant Physiol. 2010, 153, 1878–1894. [Google Scholar] [CrossRef]
  34. Rocha, C.S.; Santos, A.A.; Machado, J.P.B.; Fontes, E.P. The ribosomal protein L10/QM-like protein is a component of the NIK-mediated antiviral signaling. Virology 2008, 380, 165–169. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Falcone Ferreyra, M.L.; Casadevall, R.; Luciani, M.D.; Pezza, A.; Casati, P. New evidence for differential roles of L10 ribosomal proteins from Arabidopsis. Plant Physiol. 2013, 163, 378–391. [Google Scholar] [CrossRef]
  36. Doyle, J.J.; Doyle, J.L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 1987, 19, 11–15. [Google Scholar]
  37. Mayjonade, B.; Gouzy, J.; Donnadieu, C.; Pouilly, N.; Marande, W.; Callot, C.; Langlade, N.; Muños, S. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. Biotechniques 2016, 61, 203–205. [Google Scholar] [CrossRef]
  38. Zhang, J.; Yang, Y.; Zheng, K.; Xie, M.; Feng, K.; Jawdy, S.S.; Gunter, L.E.; Ranjan, P.; Singan, V.R.; Engle, N.; et al. Genome-wide association studies and expression-based quantitative trait loci analyses reveal roles of HCT2 in caffeoylquinic acid biosynthesis and its regulation by defense-responsive transcription factors in Populus. New Phytol. 2018, 220, 502–516. [Google Scholar] [CrossRef]
  39. Kolmogorov, M.; Yuan, J.; Lin, Y.; Pevzner, P.A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019, 37, 540–546. [Google Scholar] [CrossRef]
  40. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013, arXiv:1303.3997. [Google Scholar]
  41. Walker, B.J.; Abeel, T.; Shea, T.; Priest, M.; Abouelliel, A.; Sakthikumar, S.; Cuomo, C.A.; Zeng, Q.; Wortman, J.; Young, S.K. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 2014, 9, e112963. [Google Scholar] [CrossRef]
  42. Kronenberg, Z.N.; Rhie, A.; Koren, S.; Concepcion, G.T.; Peluso, P.; Munson, K.M.; Porubsky, D.; Kuhn, K.; Mueller, K.A.; Low, W.Y. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat. Commun. 2021, 12, 1935. [Google Scholar] [CrossRef]
  43. Durand, N.C.; Shamim, M.S.; Machol, I.; Rao, S.S.; Huntley, M.H.; Lander, E.S.; Aiden, E.L. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016, 3, 95–98. [Google Scholar] [CrossRef] [PubMed]
  44. Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef] [PubMed]
  45. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
  46. Cook, D.E.; Valle-Inclan, J.E.; Pajoro, A.; Rovenich, H.; Thomma, B.P.H.J.; Faino, L. Long-read annotation: Automated eukaryotic genome annotation based on long-read cDNA sequencing. Plant Physiol. 2018, 179, 38–54. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  47. Tuskan, G.A.; DiFazio, S.; Jansson, S.; Bohlmann, J.; Grigoriev, I.; Hellsten, U.; Putnam, N.; Ralph, S.; Rombauts, S.; Salamov, A.; et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 313, 1596–1604. [Google Scholar] [CrossRef] [PubMed]
  48. Stanke, M.; Keller, O.; Gunduz, I.; Hayes, A.; Waack, S.; Morgenstern, B. AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006, 34, W435–W439. [Google Scholar] [CrossRef] [PubMed]
  49. Blum, M.; Chang, H.-Y.; Chuguransky, S.; Grego, T.; Kandasaamy, S.; Mitchell, A.; Nuka, G.; Paysan-Lafosse, T.; Qureshi, M.; Raj, S.; et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021, 49, D344–D354. [Google Scholar] [CrossRef]
  50. Emms, D.M.; Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 2019, 20, 238. [Google Scholar] [CrossRef]
  51. Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef]
  52. Liao, Y.; Smyth, G.K.; Shi, W. featureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 2014, 30, 923–930. [Google Scholar] [CrossRef]
  53. Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26, 139–140. [Google Scholar] [CrossRef]
  54. Kopp, R.F.; Maynard, C.A.; Rocha de Niella, P.; Smart, L.B.; Abrahamson, L.P. Collection and storage of pollen from Salix (Salicaceae). Am. J. Bot. 2002, 89, 248–252. [Google Scholar] [CrossRef] [PubMed]
Table 1. Assembly statistics of 11 genomes, with S. purpurea 94006 v5.1 and ‘Fish Creek’ v3.1 assemblies for comparison. * scaffold number of 04-FF-016 prior to manual cutting of chimeric scaffolds.
Table 1. Assembly statistics of 11 genomes, with S. purpurea 94006 v5.1 and ‘Fish Creek’ v3.1 assemblies for comparison. * scaffold number of 04-FF-016 prior to manual cutting of chimeric scaffolds.
GenomeSpeciesSexTotal Assembly LengthNumber of ScaffoldsNumber of ContigsContig N50 (KB)Largest Contig (MB)Mean CoverageAssembly BUSCO Score
JGI v5.1 94006S. pupureaF328,137,719348NANANANA97.0%
JGI v3.1
‘Fish Creek’
S. purpureaM312,123,941274NANANANA97.2%
94006S. purpureaF338,238,4211792675319.304.757295.8%
94001S. purpureaM332,407,3181362696232.303.675595.8%
P63S. suchowensisM369,253,8411352243383.133.785896.2%
P294S. suchowensisF375,803,6501732589325.522.465795.8%
P295S. suchowensisF382,054,2631351982435.712.166296.3%
P336S. integraF312,752,8201111246804.255.996096.7%
SH3S. koriyanagiF339,158,2211472922335.522.194595.5%
04-FF-016S. koriyanagiM349,107,755152 *2983300.362.277595.1%
07-MBG-5027S. viminalisF293,303,5391711716532.844.1610395.7%
‘Jorr’S. viminalisM282,587,1861972136442.893.815196.1%
04-BN-051S. udensisM315,877,0651402087396.094.455195.5%
Table 2. Summary statistics from 11 genome annotations, with S. purpurea 94006 v5.1 and ‘Fish Creek’ v3.1 assemblies for comparison.
Table 2. Summary statistics from 11 genome annotations, with S. purpurea 94006 v5.1 and ‘Fish Creek’ v3.1 assemblies for comparison.
GenomeSpeciesAnnotation BUSCO ScoreGenesTranscriptsGenes MissingGenome-Specific OrthogroupsGenes in Specific Orthogroups
JGI v5.1 94006S. pupurea97.0%35,12557,462NANANA
JGI v3.1 ‘Fish Creek’S. purpurea97.2%34,46446,943NANANA
94006S. purpurea82.2%31,93836,19937063791026
94001S. purpurea91.1%31,47039,1964164336770
P63S. suchowensis84.9%30,53037,3104663229534
P294S. suchowensis89.7%34,68138,7884002298730
P295S. suchowensis87.2%30,71936,5074532217574
P336S. integra77.9%29,90734,3274733225574
SH3S. koriyanagi86.1%30,53936,4364973181442
04-FF-016S. koriyanagi87.0%30,47836,2264856229543
07-MBG-5027S. viminalis89.0%31,70837,9913732267706
‘Jorr’S. viminalis92.9%30,52434,1124420138331
04-BN-051S. udensis86.5%30,38236,4834902270609
Table 3. Copy number of candidate sex determination genes across the 11 annotated genomes with S. purpurea 94006 v5.1 and ‘Fish Creek’ v3.1 assemblies for comparison. Homologs of Sapur.019G053300 were all located on Chr19 in each assembly, and homologs of all other genes were located on Chr15 in each assembly, unless otherwise noted. * GATA15 was originally assembled to Chr17, but this was identified as an assembly error and the region manually moved to Chr15. ** A fourth ARR17 was identified on a purged haplotig.
Table 3. Copy number of candidate sex determination genes across the 11 annotated genomes with S. purpurea 94006 v5.1 and ‘Fish Creek’ v3.1 assemblies for comparison. Homologs of Sapur.019G053300 were all located on Chr19 in each assembly, and homologs of all other genes were located on Chr15 in each assembly, unless otherwise noted. * GATA15 was originally assembled to Chr17, but this was identified as an assembly error and the region manually moved to Chr15. ** A fourth ARR17 was identified on a purged haplotig.
Gene IDFunction94006 JGI (F)‘Fish Creek’ JGI (M)94006 (F)94001 (M)P295 (F)P294 (F)P63 (M)P336 (F)SH3 (F)04-FF-016 (M)07-MBG-5027 (F)‘Jorr’ (M)04-BN-051 (M)
Sapur.15WG073500ARR17403 **0220200000
Sapur.019G053300ARR172222111111111
ARR17 15Z exon 1ARR171112111111000
Sapur.15WG062800GATA15101 *0110000000
Sapur.15WG074400AGO43030110100000
Sapur.15WG074300DRB12051121152212
Sapur.15WG074900hypothetical1010110000000
Sapur.15WG075300hypothetical1000220200000
Sapur.15WG075700hypothetical2031000109000
Table 4. Summary of families generated with S. integra P336 as the mother and maternal grandmother and resulting scores of sex on the progeny.
Table 4. Summary of families generated with S. integra P336 as the mother and maternal grandmother and resulting scores of sex on the progeny.
Family IDMotherMaternal SpeciesFatherPaternal
Species
ProgenyPercent FloweringPercent Female
13X-426P336S. integra94001S. purpurea28498%100%
20X-565P336S. integraFish CreekS. purpurea21075%100%
20X-564P336S. integra94003S. purpurea25277%100%
20X-278P336S. integraP63S. suchowensis21298%100%
20X-567P336S. integra04-FF-016S. koriyanagi20897%100%
20X-566P336S. integra04-BN-051S. udensis20476%100%
14X-45405X-278-071S. integra × S. suchowensis94001S. purpurea9488%100%
14X-45605X-278-071S. integra × S. suchowensisP63S. suchowensis16690%100%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hyden, B.; Feng, K.; Yates, T.B.; Jawdy, S.; Cereghino, C.; Smart, L.B.; Muchero, W. De Novo Assembly and Annotation of 11 Diverse Shrub Willow (Salix) Genomes Reveals Novel Gene Organization in Sex-Linked Regions. Int. J. Mol. Sci. 2023, 24, 2904. https://doi.org/10.3390/ijms24032904

AMA Style

Hyden B, Feng K, Yates TB, Jawdy S, Cereghino C, Smart LB, Muchero W. De Novo Assembly and Annotation of 11 Diverse Shrub Willow (Salix) Genomes Reveals Novel Gene Organization in Sex-Linked Regions. International Journal of Molecular Sciences. 2023; 24(3):2904. https://doi.org/10.3390/ijms24032904

Chicago/Turabian Style

Hyden, Brennan, Kai Feng, Timothy B. Yates, Sara Jawdy, Chelsea Cereghino, Lawrence B. Smart, and Wellington Muchero. 2023. "De Novo Assembly and Annotation of 11 Diverse Shrub Willow (Salix) Genomes Reveals Novel Gene Organization in Sex-Linked Regions" International Journal of Molecular Sciences 24, no. 3: 2904. https://doi.org/10.3390/ijms24032904

APA Style

Hyden, B., Feng, K., Yates, T. B., Jawdy, S., Cereghino, C., Smart, L. B., & Muchero, W. (2023). De Novo Assembly and Annotation of 11 Diverse Shrub Willow (Salix) Genomes Reveals Novel Gene Organization in Sex-Linked Regions. International Journal of Molecular Sciences, 24(3), 2904. https://doi.org/10.3390/ijms24032904

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop