Selection of Catechin Biosynthesis-Related Genes and Functional Analysis from Chromosome-Level Genome Assembly in C. sinensis L. Variety ‘Sangmok’
Abstract
:1. Introduction
2. Results
2.1. Chromosome-Level Genome Assembly
2.2. Repetitive Element Analysis
2.3. Gene Annotation
2.4. Comparison of Gene Families
2.5. Phylogenetic Diversity
2.6. Gene Ontology and ‘Sangmok’-Specific Gene Annotation
2.7. Expression Analysis
3. Discussion
4. Materials and Methods
4.1. Plant Material
4.2. Genome Sequencing
4.3. Transcriptome Sequencing
4.4. Genome Assembly and Hi-C Scaffolding
4.5. Repeat Analysis
4.6. Gene Prediction and Annotation
4.7. Gene Family and Phylogenetic Analysis
4.8. qRT-PCR Analysis
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Teixeira, A.M.; Sousa, C. A review on the biological activity of Camellia species. Molecules 2021, 26, 2178. [Google Scholar] [CrossRef]
- Xia, E.H.; Zhang, H.B.; Sheng, J.; Li, K.; Zhang, Q.J.; Kim, C.; Zhang, Y.; Liu, Y.; Zhu, T.; Li, W.; et al. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant 2017, 10, 866–877. [Google Scholar] [CrossRef]
- Lim, K.H.; Jeon, S.E.; Choi, M.N. Review of effectiveness of green tea epigallocatechin gallate (EGCG) on the inside and outside of human body. Kor. J. Aesthet. Cosmetol. 2015, 13, 701–711. [Google Scholar]
- Li, C.F.; Zhu, Y.; Yu, Y.; Zhao, Q.Y.; Wang, S.J.; Wang, X.C.; Yao, M.Z.; Luo, D.; Li, X.; Chen, L.; et al. Global transcriptome and gene regulation network for secondary metabolite biosynthesis of tea plant (Camellia sinensis). BMC Genom. 2015, 16, 560. [Google Scholar] [CrossRef]
- Wei, C.; Yang, H.; Wang, S.; Zhao, J.; Liu, C.; Gao, L.; Xia, E.; Lu, Y.; Tai, Y.; She, G.; et al. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc. Natl. Acad. Sci. USA 2018, 115, E4151–E4158. [Google Scholar] [CrossRef]
- Nakachi, K.; Matsuyama, S.; Miyake, S.; Suganuma, M.; Imai, K. Preventive effects of drinking green tea on cancer and cardiovascular disease: Epidemiological evidence for multiple targeting prevention. Biofactors 2000, 13, 49–54. [Google Scholar] [CrossRef]
- Namiki, M. Antioxidants/antimutagens in food. Crit. Rev. Food Sci. Nutr. 1990, 29, 273–300. [Google Scholar] [CrossRef]
- Sakanaka, S. The inhibitory effect of green tea polyphenols on the Synthesis of glucan and the adherence of Streptococcus mutans. Agric. Biol. Chem. 1990, 54, 23–27. [Google Scholar]
- Zhang, W.; Zhang, Y.; Qiu, H.; Guo, Y.; Wan, H.; Zhang, X.; Scossa, F.; Alseekh, S.; Zhang, Q.; Wang, P.; et al. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nat. Commun. 2020, 11, 3719. [Google Scholar] [CrossRef]
- Zhang, X.; Chen, S.; Shi, L.; Gong, D.; Zhang, S.; Zhao, Q.; Zhan, D.; Vasseur, L.; Wang, Y.; Yu, J.; et al. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 2021, 53, 1250–1259. [Google Scholar] [CrossRef] [PubMed]
- Simao, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef] [PubMed]
- Ou, S.; Jiang, N. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 2018, 176, 1410–1422. [Google Scholar] [CrossRef]
- Consortium, U. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019, 47, D506–D515. [Google Scholar] [CrossRef]
- Jones, P.; Binns, D.; Chang, H.Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef]
- Zhang, L.Q.; Wei, K.; Cheng, H.; Wang, L.Y.; Zhang, C.C. Accumulation of catechins and expression of catechin synthetic genes in Camellia sinensis at different developmental stages. Botan. Stud. 2016, 57, 31. [Google Scholar] [CrossRef] [PubMed]
- López-Meyer, M.; Paiva, N.L. Immunolocalization of vestitone reductase and isoflavone reductase, two enzymes involved in the biosynthesis of the phytoalexin medicarpin. Physiol. Mol. Plant Pathol. 2002, 61, 15–30. [Google Scholar] [CrossRef]
- Nelson, D.R.; Ming, R.; Alam, M.; Schuler, M.A. Comparison of cytochrome P450 genes from six plant genomes. Trop. Plant Biol. 2008, 1, 216–235. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, B.; Wen, D.; Liu, R.; Yao, X.; Chen, Z.; Mu, R.; Pei, H.; Liu, M.; Song, B.; et al. Chromosome-Scale Genome Assembly of Camellia sinensis Combined with Multi-Omics Provides Insights into Its Responses to Infestation with Green Leafhoppers. Front. Plant Sci. 2022, 13, 1004387. [Google Scholar] [CrossRef]
- Chin, C.S.; Peluso, P.; Sedlazeck, F.J.; Nattestad, M.; Concepcion, G.T.; Clum, A.; Dunn, C.; O’Malley, R.; Figueroa-Balderas, R.; Morales-Cruz, A.; et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 2016, 13, 1050–1054. [Google Scholar] [CrossRef]
- Guan, D.; McCarthy, S.A.; Wood, J.; Howe, K.; Wang, Y.; Durbin, R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 2020, 36, 2896–2898. [Google Scholar] [CrossRef]
- Servant, N.; Varoquaux, N.; Lajoie, B.R.; Viara, E.; Chen, C.J.; Vert, J.P.; Heard, E.; Dekker, J.; Barillot, E. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015, 16, 259. [Google Scholar] [CrossRef]
- Durand, N.C.; Shamim, M.S.; Machol, I.; Rao, S.S.; Huntley, M.H.; Lander, E.S.; Aiden, E.L. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016, 3, 95–98. [Google Scholar] [CrossRef]
- Dudchenko, O.; Batra, S.S.; Omer, A.D.; Nyquist, S.K.; Hoeger, M.; Durand, N.C.; Shamim, M.S.; Machol, I.; Lander, E.S.; Aiden, A.P.; et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 2017, 356, 92–95. [Google Scholar] [CrossRef]
- Kurtz, S.; Phillippy, A.; Delcher, A.L.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004, 5, R12. [Google Scholar] [CrossRef]
- Krzywinski, M.; Schein, J.; Birol, I.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S.J.; Marra, M.A. Circos: An information aesthetic for comparative genomics. Genome Res. 2009, 19, 1639–1645. [Google Scholar] [CrossRef] [PubMed]
- Bao, Z.; Eddy, S.R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12, 1269–1276. [Google Scholar] [CrossRef]
- Price, A.L.; Jones, N.C.; Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 2005, 21, i351–i358. [Google Scholar] [CrossRef] [PubMed]
- Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999, 27, 573–580. [Google Scholar] [CrossRef] [PubMed]
- Ellinghaus, D.; Kurtz, S.; Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 2008, 9, 18. [Google Scholar] [CrossRef]
- Xu, Z.; Wang, H. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007, 35, W265–W268. [Google Scholar] [CrossRef]
- Haas, B.J.; Salzberg, S.L.; Zhu, W.; Pertea, M.; Allen, J.E.; Orvis, J.; White, O.; Buell, C.R.; Wortman, J.R. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008, 9, R7. [Google Scholar] [CrossRef] [PubMed]
- Stanke, M.; Schoffmann, O.; Morgenstern, B.; Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 2006, 7, 62. [Google Scholar] [CrossRef] [PubMed]
- Brůna, T.; Lomsadze, A.; Borodovsky, M. GeneMark-EP+: Eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom. Bioinform. 2020, 2, lqaa026. [Google Scholar] [CrossRef] [PubMed]
- Haas, B.J.; Delcher, A.L.; Mount, S.M.; Wortman, J.R.; Smith, R.K., Jr.; Hannick, L.I.; Maiti, R.; Ronning, C.M.; Rusch, D.B.; Town, C.D.; et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003, 31, 5654–5666. [Google Scholar] [CrossRef] [PubMed]
- Kuo, R.I.; Cheng, Y.; Zhang, R.; Brown, J.W.; Smith, J.; Archibald, A.L.; Burt, D.W. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genom. 2020, 21, 751. [Google Scholar] [CrossRef] [PubMed]
- Beiki, H.; Liu, H.; Huang, J.; Manchanda, N.; Nonneman, D.; Smith, T.P.L.; Reecy, J.M.; Tuggle, C.K. Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data. BMC Genom. 2019, 20, 344. [Google Scholar] [CrossRef] [PubMed]
- Marchler-Bauer, A.; Lu, S.; Anderson, J.B.; Chitsaz, F.; Derbyshire, M.K.; DeWeese-Scott, C.; Fong, J.H.; Geer, L.Y.; Geer, R.C.; Gonzales, N.R.; et al. CDD: A Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2010, 39, D225–D229. [Google Scholar] [CrossRef]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic Local Alignment Search Tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Dimmer, E.C.; Huntley, R.P.; Alam-Faruque, Y.; Sawford, T.; O’Donovan, C.; Martin, M.J.; Bely, B.; Browne, P.; Mun Chan, W.; Eberhardt, R.; et al. The UniProt-GO annotation database in 2011. Nucleic Acids Res. 2012, 40, D565–D570. [Google Scholar] [CrossRef]
- Conesa, A.; Götz, S.; García-Gómez, J.M.; Terol, J.; Talón, M.; Robles, M. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005, 21, 3674–3676. [Google Scholar] [CrossRef]
- Emms, D.M.; Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 2019, 20, 238. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Stoeckert, C.J.; Roos, D.S. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13, 2178–2189. [Google Scholar] [CrossRef] [PubMed]
- Katoh, K.; Kuma, K.-I.; Toh, H.; Miyata, T. MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33, 511–518. [Google Scholar] [CrossRef] [PubMed]
- Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef] [PubMed]
- Britton, T.; Anderson, C.L.; Jacquet, D.; Lundqvist, S.; Bremer, K. Estimating divergence times in large phylogenetic trees. Syst. Biol. 2007, 56, 741–752. [Google Scholar] [CrossRef]
- Hedges, S.B.; Dudley, J.; Kumar, S. TimeTree: A public knowledge-base of divergence times among organisms. Bioinformatics 2006, 22, 2971–2972. [Google Scholar] [CrossRef]
- De Bie, T.; Cristianini, N.; Demuth, J.P.; Hahn, M.W. CAFE: A computational tool for the study of gene family evolution. Bioinformatics 2006, 22, 1269–1271. [Google Scholar] [CrossRef]
- Zhang, G.; Yang, J.; Cui, D.; Zhao, D.; Li, Y.; Wan, X.; Zhao, J. Transcriptome and metabolic profiling unveiled roles of peroxidases in theaflavin production in black tea processing and determination of tea processing suitability. J. Agric. Food Chem. 2020, 68, 3528–3538. [Google Scholar] [CrossRef]
Final Assembly | |
---|---|
Number of scaffolds | 7171 |
Total bases of scaffolds (bp) | 2,679,620,961 |
Number of scaffolds > 1 M nt | 38 |
Number of scaffolds > 10 M nt | 15 |
N50 scaffold length (bp) | 146,057,547 |
L50 scaffold count | 8 |
GC content (%) | 38 |
BUSCOs | BUSCOs (%) | |
---|---|---|
Complete BUSCOs | 1521 | 94.3 |
Complete and single-copy BUSCOs | 1310 | 81.2 |
Complete and duplicate BUSCOs | 211 | 13.1 |
Fragmented BUSCOs | 44 | 2.7 |
Missing BUSCOs | 49 | 3 |
Number of Features | Total Length (bp) | Avg. Length (bp) | Density (Mbp) | |
---|---|---|---|---|
Gene | 68,151 | 236,339,063 | 3467.87 | 25.4331 |
CDS | 83,264 | 85,302,684 | 1024.48 | 31.0731 |
Exon | 416,542 | 133,000,454 | 319.297 | 155.448 |
Intron | 333,278 | 302,686,548 | 908.21 | 124.375 |
Species | Genes | Orthogroup Genes (%) | Unassigned Genes (%) | Species-Specific Orthogroup Genes (%) |
---|---|---|---|---|
C. sinensis (Sangmok) | 68,151 | 60,983 (89.5) | 7156 (10.5) | 10,561 (15.5) |
C. sinensis (DASZ) | 33,021 | 32,945 (99.8) | 76 (0.2) | 38 (0.1) |
C. sinensis (G240) | 32,356 | 32,328 (99.9) | 28 (0.1) | 11 (0.0) |
C. sinensis (Shuchazao) | 50,822 | 49,150 (96.7) | 1672 (3.3) | 2382 (4.7) |
A. chinensis | 33,044 | 31,933 (96.6) | 1111 (3.4) | 404 (1.2) |
A. thaliana | 27,543 | 25,319 (91.9) | 2224 (8.1) | 3236 (11.7) |
C. arabica | 44,747 | 43,739 (97.7) | 1008 (2.3) | 2583 (5.8) |
C. canephora | 25,574 | 23,753 (92.9) | 1821 (7.1) | 635 (2.5) |
Ci. sinensis (Orange) | 24,525 | 24,045 (98.0) | 480 (2.0) | 1241 (5.1) |
G. max | 47,013 | 45,609 (97.0) | 1404 (3.0) | 2832 (6.0) |
M. truncatula | 31,917 | 31,077 (97.4) | 840 (2.6) | 2555 (8.0) |
O. sativa | 28,708 | 27,272 (95.0) | 1436 (5.0) | 1564 (5.4) |
P. trichocarpa | 31,618 | 30,966 (97.9) | 652 (2.1) | 1401 (4.4) |
S. lycopersium | 25,557 | 24,836 (97.2) | 721 (2.8) | 1087 (4.3) |
T. aestivum | 103,785 | 98,250 (94.7) | 5535 (5.3) | 24,544 (23.6) |
T. cacao | 21,507 | 21,174 (98.5) | 333 (1.5) | 627 (2.9) |
V. vinifera | 25,787 | 25,166 (97.6) | 621 (2.4) | 1362 (5.3) |
Z. mays | 34,221 | 32,196 (94.1) | 2025 (5.9) | 3056 (8.9) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, D.-J.; Kim, J.-H.; Lee, T.-H.; Park, M.-E.; Ahn, B.-O.; Lee, S.-J.; Cho, J.-Y.; Kim, C.-K. Selection of Catechin Biosynthesis-Related Genes and Functional Analysis from Chromosome-Level Genome Assembly in C. sinensis L. Variety ‘Sangmok’. Int. J. Mol. Sci. 2024, 25, 3634. https://doi.org/10.3390/ijms25073634
Lee D-J, Kim J-H, Lee T-H, Park M-E, Ahn B-O, Lee S-J, Cho J-Y, Kim C-K. Selection of Catechin Biosynthesis-Related Genes and Functional Analysis from Chromosome-Level Genome Assembly in C. sinensis L. Variety ‘Sangmok’. International Journal of Molecular Sciences. 2024; 25(7):3634. https://doi.org/10.3390/ijms25073634
Chicago/Turabian StyleLee, Dong-Jun, Jin-Hyun Kim, Tae-Ho Lee, Myung-Eun Park, Byung-Ohg Ahn, So-Jin Lee, Jeong-Yong Cho, and Chang-Kug Kim. 2024. "Selection of Catechin Biosynthesis-Related Genes and Functional Analysis from Chromosome-Level Genome Assembly in C. sinensis L. Variety ‘Sangmok’" International Journal of Molecular Sciences 25, no. 7: 3634. https://doi.org/10.3390/ijms25073634
APA StyleLee, D. -J., Kim, J. -H., Lee, T. -H., Park, M. -E., Ahn, B. -O., Lee, S. -J., Cho, J. -Y., & Kim, C. -K. (2024). Selection of Catechin Biosynthesis-Related Genes and Functional Analysis from Chromosome-Level Genome Assembly in C. sinensis L. Variety ‘Sangmok’. International Journal of Molecular Sciences, 25(7), 3634. https://doi.org/10.3390/ijms25073634