The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Population and Phenotypic Growth Measurements
2.2. Genomic Data Filtering and Mapping
Variant Calling
2.3. Model Construction, Testing, and Prediction
2.3.1. Feature Selection Methods
2.3.2. Classification Algorithms
2.3.3. Model Building and Testing
2.4. Result Annotation
3. Results
3.1. Genetic Variant Catalogue
3.2. Model Building and Testing
3.3. Annotation of Key Results
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- May, R.M. Biological diversity: Differences between land and sea. Philos. Trans. R. Soc. London. Ser. B Biol. Sci. 1994, 343, 105–111. [Google Scholar]
- Mérot, C.; Oomen, R.; Tigano, A.; Wellenreuther, M. A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation. Trends Ecol. Evol. 2020, 35, 561–572. [Google Scholar] [CrossRef] [PubMed]
- Wellenreuther, M.; Mérot, C.; Berdan, E.; Bernatchez, L. Going beyond SNPs: The role of structural genomic variants in adaptive evolution and species diversification. Mol. Ecol. 2019, 28, 1203–1209. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chain, F.J.J.; Feulner, P.G.D. Ecological and evolutionary implications of genomic structural variations. Front. Genet. 2014, 5, 326. [Google Scholar] [CrossRef] [Green Version]
- Chain, F.J.J.; Feulner, P.G.D.; Panchal, M.; Eizaguirre, C.; Samonte, I.E.; Kalbe, M.; Lenz, T.L.; Stoll, M.; Bornberg-Bauer, E.; Milinski, M.; et al. Extensive Copy-Number Variation of Young Genes across Stickleback Populations. PLoS Genet. 2014, 10, e1004830. [Google Scholar] [CrossRef] [Green Version]
- Fan, S.; Meyer, A. Evolution of genomic structural variation and genomic architecture in the adaptive radiations of African cichlid fishes. Front. Genet. 2014, 5, 163. [Google Scholar] [CrossRef] [Green Version]
- Sudmant, H.P.; Kitzman, J.O.; Antonacci, F.; Alkan, C.; Malig, M.; Tsalenko, A.; Sampas, N.; Bruhn, L.; Shendure, J.; Project, G.; et al. Diversity of Human Copy Number Variation and Multicopy Genes. Science 2010, 330, 641–646. [Google Scholar] [CrossRef] [Green Version]
- Sudmant, H.P.; Rausch, T.; Gardner, E.J.; Handsaker, R.E.; Abyzov, A.; Huddleston, J.; Zhang, Y.; Ye, K.; Jun, G.; Fritz, M.H.-Y. An integrated map of structural variation in 2504 human genomes. Nature 2015, 526, 75–81. [Google Scholar] [CrossRef] [Green Version]
- Catanach, A.; Deng, C.; Charles, D.; Bernatchez, L.; Wellenreuther, M. The genomic pool of standing structural variation outnumbers single nucleotide polymorphism by more than three-fold in the marine teleost Chrysophrys auratus. Mol. Ecol. 2019, 28, 1210–1223. [Google Scholar] [CrossRef]
- Wellenreuther, M.; Bernatchez, L. Eco-Evolutionary Genomics of Chromosomal Inversions. Trends Ecol. Evol. 2018, 33, 427–440. [Google Scholar] [CrossRef]
- Ayala, D.; Zhang, S.; Chateau, M.; Fouet, C.; Morlais, I.; Costantini, C.; Hahn, M.W.; Besansky, N.J. Association mapping desiccation resistance within chromosomal inversions in the African malaria vector Anopheles gambiae. Mol. Ecol. 2018, 28, 1333–1342. [Google Scholar] [CrossRef] [Green Version]
- Prunier, J.; Giguere, I.; Ryan, N.; Guy, R.; Soolanayakanahally, R.; Isabel, N.; MacKay, J.; Porth, I. Gene copy number variations involved in balsam poplar (Populus balsamifera L.) adaptive variations. Mol. Ecol. 2019, 28, 1476–1490. [Google Scholar] [CrossRef]
- Kapun, M.; Flatt, T. The adaptive significance of chromosomal inversion polymorphisms in Drosophila melanogaster. Mol. Ecol. 2018, 28, 1263–1282. [Google Scholar] [CrossRef] [Green Version]
- Falconer, D.S.; Mackay, T.F.C. Introduction to Quantitative Genetics; Longmans Green: Harlow, UK, 1996. [Google Scholar]
- Fisher, R. The Genetical Theory of Natural Selection; Clarendon: Oxford, UK, 1930. [Google Scholar]
- Murata, O.; Harada, T.; Miyashita, S.; Izumi, K.-I.; Maeda, S.; Kato, K.; Kumai, H. Selective Breeding for Growth in Red Sea Bream. Fish. Sci. 1996, 62, 845–849. [Google Scholar] [CrossRef] [Green Version]
- Ashton, D.T.; Ritchie, P.A.; Wellenreuther, M. High-Density Linkage Map and QTLs for Growth in Snapper (Chrysophrys auratus). G3 Genes Genomes Genet. 2019, 9, 1027–1035. [Google Scholar] [CrossRef] [Green Version]
- Ashton, D.T.; Hilario, E.; Jaksons, P.; Ritchie, P.A.; Wellenreuther, M. Genetic diversity and heritability of economically important traits in captive Australasian snapper (Chrysophrys auratus). Aquaculture 2019, 505, 190–198. [Google Scholar] [CrossRef]
- Andrews, S. FastQC: A quality control tool for high throughput sequence data. Babraham Bioinformatics. In FastQC A Quality Control Tool for High Throughput Sequence Data; Babraham Institute: Babraham, UK, 2010. [Google Scholar]
- Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [Green Version]
- Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013, arXiv:1303.3997. [Google Scholar]
- Broad Institute. Picard-Tools. 2015. Available online: https://broadinstitute.github.io/picard/ (accessed on 1 April 2022).
- Van der Auwera, A.G.; Carneiro, M.O.; Hartl, C.; Poplin, R.; del Angel, G.; Levy-Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J. From FastQ data to high-confidence variant calls: The genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinform. 2013, 43, 11.10.11–11.10.33. [Google Scholar]
- Garrison, E.; Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 2012, arXiv:1207.3907. [Google Scholar]
- Zarate, S.; Carroll, A.; Mahmoud, M.; Krasheninina, O.; Jun, G.; Salerno, W.J.; Schatz, M.C.; Boerwinkle, E.; Gibbs, A.R.; Sedlazeck, F.J. Parliament2: Accurate structural variant calling at scale. GigaScience 2020, 9, giaa145. [Google Scholar] [CrossRef] [PubMed]
- Chen, K.; Wallis, J.W.; McLellan, M.D.; Larson, D.E.; Kalicki, J.M.; Pohl, C.S.; McGrath, S.D.; Wendl, M.C.; Zhang, Q.; Locke, D.P.; et al. BreakDancer: An algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 2009, 6, 677–681. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Abyzov, A.; Li, S.; Kim, D.R.; Mohiyuddin, M.; Stütz, A.M.; Parrish, N.F.; Mu, X.J.; Clark, W.; Chen, K.; Hurles, M. Analysis of deletion breakpoints from 1092 humans reveals details of mutation mechanisms. Nat. Commun. 2015, 6, 1–12. [Google Scholar]
- Abyzov, A.; Urban, A.E.; Snyder, M.; Gerstein, M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011, 21, 974–984. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rausch, T.; Zichner, T.; Schlattl, A.; Stütz, A.M.; Benes, V.; Korbel, J.O. DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012, 28, i333–i339. [Google Scholar] [CrossRef] [PubMed]
- Layer, R.M.; Chiang, C.; Quinlan, A.R.; Hall, I.M. LUMPY: A probabilistic framework for structural variant discovery. Genome. Biol. 2014, 15, R84. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Schulz-Trieglaff, O.; Shaw, R.; Barnes, B.; Schlesinger, F.; Källberg, M.; Cox, A.J.; Kruglyak, S.; Saunders, C.T. Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 2016, 32, 1220–1222. [Google Scholar] [CrossRef]
- Jeffares, D.C.; Jolly, C.; Hoti, M.; Speed, D.; Shaw, L.; Rallis, C.; Balloux, F.; Dessimoz, C.; Bähler, J.; Sedlazeck, F.J. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 2017, 8, 14061. [Google Scholar] [CrossRef] [Green Version]
- Bergadano, F.; de Raedt, L. Machine Learning: ECML-94: European Conference on Machine Learning, Catania, Italy, April 6-8, 1994. Proceedings; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1994; Volume 784. [Google Scholar]
- Moore, J.H.; White, B.C. Tuning ReliefF for Genome-Wide Genetic Analysis. In European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Proceedings of the 5th European Conference, Valencia, Spain, 11–13 April 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 166–175. [Google Scholar]
- Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-based feature selection: Introduction and review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef]
- Gajawada, S. Chi-Square Test for Feature Selection in Machine learning; Towards Data Science: Toronto, ON, Canada, 2019. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, G.; et al. Scikit-learn: Machine Learning in Python. arXiv 2012, arXiv:1201.0490. [Google Scholar]
- Latham, P.E.; Roudi, Y. Mutual information. Scholarpedia 2009, 4, 1658. [Google Scholar] [CrossRef]
- Nagpal, A.; Singh, V. A Feature Selection Algorithm Based on Qualitative Mutual Information for Cancer Microarray Data. Procedia Comput. Sci. 2018, 132, 244–252. [Google Scholar] [CrossRef]
- Song, L.; Langfelder, P.; Horvath, S. Comparison of co-expression measures: Mutual information, correlation, and model based indices. BMC Bioinform. 2012, 13, 1–21. [Google Scholar] [CrossRef] [Green Version]
- Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
- LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef] [Green Version]
- Cherkassky, V.; Mulier, F.M. Learning from Data: Concepts, Theory, and Methods; John Wiley & Sons: Hoboken, NJ, USA, 2017; Volume 2212. [Google Scholar]
- Zhang, J.; Kang, D.-K.; Silvescu, A.; Honavar, V. Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data. Knowl. Inf. Syst. 2005, 9, 157–179. [Google Scholar] [CrossRef] [Green Version]
- Fisher, R. The Genetical Theory of Natural Selection; Dover: New York, NY, USA, 1958. [Google Scholar]
- Sandoval, J.; Beheregaray, L.; Wellenreuther, M. Genomic prediction of growth in a commercially, recreationally, and culturally important marine resource, the Australasian snapper (Chrysophrys auratus). G3 Genes Genomes Genet. 2022, 12, jkac015. [Google Scholar]
- Gu, Z.; Gu, L.; Eils, R.; Schlesner, M.; Brors, B. Circlize implements and enhances circular visualization in R. Bioinformatics 2014, 30, 2811–2812. [Google Scholar] [CrossRef] [Green Version]
- Valenza-Troubat, N.; Montanari, S.; Ritchie, P.A.; Wellenreuther, M. Unravelling the complex genetic basis of growth in trevally (Pseudocaranx georgianus). G3 Genes Genomes Genet. 2022, 12, jkac016. [Google Scholar]
- Mérot, C.; Llaurens, V.; Normandeau, E.; Bernatchez, L.; Wellenreuther, M. Balancing selection via life-history trade-offs maintains an inversion polymorphism in a seaweed fly. Nat. Commun. 2020, 11, 670. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Berdan, E.; Enge, S.; Nylund, G.M.; Wellenreuther, M.; Martens, G.A.; Pavia, H. Genetic divergence and phenotypic plasticity contribute to variation in cuticular hydrocarbons in the seaweed fly Coelopa frigida. Ecol. Evol. 2019, 9, 12156–12170. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mérot, C.; Berdan, E.L.; Babin, C.; Normandeau, E.; Wellenreuther, M.; Bernatchez, L. Intercontinental karyotype–environment parallelism supports a role for a chromosomal inversion in local adaptation in a seaweed fly. Proc. R. Soc. B Boil. Sci. 2018, 285, 20180519. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wellenreuther, M.; Hansson, B. Detecting polygenic evolution: Problems, pitfalls, and promises. Trends Genet. 2016, 32, 155–164. [Google Scholar] [CrossRef]
- Okser, S.; Pahikkala, T.; Airola, A.; Salakoski, T.; Ripatti, S.; Aittokallio, T. Regularized Machine Learning in the Genetic Prediction of Complex Traits. PLoS Genet. 2014, 10, e1004754. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Ding, C.; Li, T. Gene selection algorithm by combining reliefF and mRMR. BMC Genom. 2008, 9, S27. [Google Scholar] [CrossRef] [Green Version]
- Chicco, D.; Faultless, T. Brief Survey on Machine Learning in Epistasis. Epistasis 2021, 2212, 169–179. [Google Scholar]
- Chen, L.; Pryce, J.; Hayes, B.; Daetwyler, H. Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle. Animals 2021, 11, 541. [Google Scholar] [CrossRef]
- Dorant, Y.; Cayuela, H.; Wellband, K.; Laporte, M.; Rougemont, Q.; Mérot, C.; Normandeau, E.; Rochette, R.; Bernatchez, L. Copy number variants outperform SNPs to reveal genotype–temperature association in a marine species. Mol. Ecol. 2020, 29, 4765–4782. [Google Scholar] [CrossRef]
- Alonge, M.; Wang, X.; Benoit, M.; Soyk, S.; Pereira, L.; Zhang, L.; Suresh, H.; Ramakrishnan, S.; Maumus, F.; Ciren, D.; et al. Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato. Cell 2020, 182, 145–161.e23. [Google Scholar] [CrossRef]
- Christmas, M.J.; Wallberg, A.; Bunikis, I.; Olsson, A.; Wallerman, O.; Webster, M.T. Chromosomal inversions associated with environmental adaptation in honeybees. Mol. Ecol. 2019, 28, 1358–1374. [Google Scholar] [CrossRef]
- Todesco, M.; Owens, G.L.; Bercovich, N.; Légaré, J.-S.; Soudi, S.; Burge, D.O.; Huang, K.; Ostevik, K.L.; Drummond, E.; Imerovski, I. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 2020, 584, 602–607. [Google Scholar] [CrossRef]
- Subramanian, S. The effects of sample size on population genomic analyses—Implications for the tests of neutrality. BMC Genom. 2016, 17, 123. [Google Scholar] [CrossRef] [Green Version]
- Lüftinger, L.; Májek, P.; Beisken, S.; Rattei, T.; Posch, A.E. Learning from limited data: Towards best practice techniques for antimicrobial resistance prediction from whole genome sequencing data. Front. Cell. Infect. Microbiol. 2021, 11, 610348. [Google Scholar] [CrossRef]
- Huang, W.-H.; Wei, Y.-C. A split-and-merge deep learning approach for phenotype prediction. Front. Biosci. 2022, 27, 78. [Google Scholar] [CrossRef]
- Bi, Y.; Xue, B.; Zhang, M. Using a small number of training instances in genetic programming for face image classification. Inf. Sci. 2022, 593, 488–504. [Google Scholar] [CrossRef]
Year 1 Fish | Year 3 Fish | |||
---|---|---|---|---|
Training Dataset | Test Dataset | Training Dataset | Test Dataset | |
Small | 10 | 1 | 9 | 0 |
Medium | 4 | 6 | 4 | 2 |
Large | 10 | 1 | 10 | 6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ruigrok, M.; Xue, B.; Catanach, A.; Zhang, M.; Jesson, L.; Davy, M.; Wellenreuther, M. The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus. Genes 2022, 13, 1129. https://doi.org/10.3390/genes13071129
Ruigrok M, Xue B, Catanach A, Zhang M, Jesson L, Davy M, Wellenreuther M. The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus. Genes. 2022; 13(7):1129. https://doi.org/10.3390/genes13071129
Chicago/Turabian StyleRuigrok, Mike, Bing Xue, Andrew Catanach, Mengjie Zhang, Linley Jesson, Marcus Davy, and Maren Wellenreuther. 2022. "The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus" Genes 13, no. 7: 1129. https://doi.org/10.3390/genes13071129
APA StyleRuigrok, M., Xue, B., Catanach, A., Zhang, M., Jesson, L., Davy, M., & Wellenreuther, M. (2022). The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus. Genes, 13(7), 1129. https://doi.org/10.3390/genes13071129