PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. PlantMine Framework
2.3. Feature Selection Method
2.4. Genomic Prediction Method
2.4.1. XGBoost Algorithm
2.4.2. SVM Algorithm
2.4.3. KNN Algorithm
2.4.4. RF Algorithm
2.5. Evaluation Metrics
2.6. Analysis of the Differences in SNPs Selected by Various Methods
3. Results
3.1. Suitable for Screening for Quantitative Traits
3.2. Suitable for Screening for Quality Traits
3.3. The Differences in SNPs Selected by Various Methods
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wing, R.A.; Purugganan, M.D.; Zhang, Q. The rice genome revolution: From an ancient grain to Green Super Rice. Nat. Rev. Genet. 2018, 19, 505–517. [Google Scholar] [CrossRef]
- Muller, A.; Schader, C.; El-Hage Scialabba, N.; Bruggemann, J.; Isensee, A.; Erb, K.H.; Smith, P.; Klocke, P.; Leiber, F.; Stolze, M.; et al. Strategies for feeding the world more sustainably with organic agriculture. Nat. Commun. 2017, 8, 1290. [Google Scholar] [CrossRef] [PubMed]
- Rosegrant, M.W.; Cline, S.A. Global food security: Challenges and policies. Science 2003, 302, 1917–1919. [Google Scholar] [CrossRef] [PubMed]
- Wu, B.; Hu, W.; Xing, Y.Z. The history and prospect of rice genetic breeding in China. Yi Chuan 2018, 40, 841–857. [Google Scholar] [PubMed]
- Ganal, M.W.; Altmann, T.; Roder, M.S. SNP identification in crop plants. Curr. Opin. Plant Biol. 2009, 12, 211–217. [Google Scholar] [CrossRef]
- Huang, J.; Li, Z.; Zhang, J. Research on Plant Genomics and Breeding. Int. J. Mol. Sci. 2023, 24, 15298. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Liang, P.; Zheng, L.; Long, C.; Li, H.; Zuo, Y. eHSCPr discriminating the cell identity involved in endothelial to hematopoietic transition. Bioinformatics 2021, 37, 2157–2164. [Google Scholar] [CrossRef]
- Thachuk, C.; Crossa, J.; Franco, J.; Dreisigacker, S.; Warburton, M.; Davenport, G.F.J.B.B. Core Hunter: An algorithm for sampling genetic resources based on multiple genetic measures. BMC Bioinform. 2009, 10, 243. [Google Scholar] [CrossRef] [PubMed]
- Jeong, S.; Kim, J.Y.; Jeong, S.C.; Kang, S.T.; Moon, J.K.; Kim, N. GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets. PLoS ONE 2017, 12, e0181420. [Google Scholar] [CrossRef]
- Yan, J.; Wang, X. Machine learning bridges omics sciences and plant breeding. Trends Plant Sci. 2023, 28, 199–210. [Google Scholar] [CrossRef]
- Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Bhardwaj, A.; Bag, S.K. PLANET-SNP pipeline: PLants based ANnotation and Establishment of True SNP pipeline. Genomics 2019, 111, 1066–1077. [Google Scholar] [CrossRef] [PubMed]
- Jing, X.Y.; Li, F.M. Predicting Cell Wall Lytic Enzymes Using Combined Features. Front. Bioeng. Biotechnol. 2020, 8, 627335. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Lin, Y.; Yan, S.; Hong, J.; Tan, J.; Chen, Y.; Cao, Y.; Fang, W. NRTPredictor: Identifying rice root cell state in single-cell RNA-seq via ensemble learning. Plant Methods 2023, 19, 119. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Mauleon, R.; Hu, Z.; Chebotarov, D.; Tai, S.; Wu, Z.; Li, M.; Zheng, T.; Fuentes, R.R.; Zhang, F.; et al. Genomic variation in 3010 diverse accessions of Asian cultivated rice. Nature 2018, 557, 43–49. [Google Scholar] [CrossRef]
- Yan, J.; Xu, Y.; Cheng, Q.; Jiang, S.; Wang, Q.; Xiao, Y.; Ma, C.; Yan, J.; Wang, X. LightGBM: Accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 2021, 22, 271. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Yang, Y.; Ding, H.; Wang, D.; Chen, W.; Lin, H. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief. Bioinform. 2021, 22, 526–535. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Huang, G.; Chen, Z.; Xiong, Y.; Huang, Q.; Xu, X.; Huo, Z. Effects of irrigation and fertilization on grain yield, water and nitrogen dynamics and their use efficiency of spring wheat farmland in an arid agricultural watershed of Northwest China. Agric. Water Manag. 2022, 260, 107277. [Google Scholar] [CrossRef]
- Dhal, P.; Azad, C.J.A.I. A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell. 2022, 52, 4543–4581. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2000, 29, 1180–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Lewis, D.P.; Jebara, T.; Noble, W.S. Support vector machine learning from heterogeneous data: An empirical analysis using protein sequence and structure. Bioinformatics 2006, 22, 2753–2760. [Google Scholar] [CrossRef] [PubMed]
- Lourenço, V.M.; Ogutu, J.O.; Rodrigues, R.A.P.; Posekany, A.; Piepho, H.-P. Genomic prediction using machine learning: A comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data. BMC Genom. 2024, 25, 152. [Google Scholar] [CrossRef] [PubMed]
- Mahood, E.H.; Kruse, L.H.; Moghe, G.D. Machine learning: A powerful tool for gene function prediction in plants. Appl. Plant Sci. 2020, 8, e11376. [Google Scholar] [CrossRef]
- Sandhu, K.S.; Lozada, D.N.; Zhang, Z.; Pumphrey, M.O.; Carter, A.H. Deep learning for predicting complex traits in spring wheat breeding program. Front. Plant Sci. 2021, 11, 613325. [Google Scholar] [CrossRef]
Feature Selection Methods | Machine Learning Algorithms | Optimal Features | Accuracy (%) |
---|---|---|---|
ANOVA | KNN | 1170 | 0.737001 |
SVM | 790 | 0.636433 | |
XGBoost | 510 | 0.607868 | |
RF | 880 | 0.730588 | |
F-SCORE | KNN | 170 | 0.691149 |
SVM | 800 | 0.6801 | |
XGBoost | 790 | 0.698819 | |
RF | 950 | 0.701213 | |
MIC | KNN | 1540 | 0.70826 |
SVM | 740 | 0.655904 | |
XGBoost | 2070 | 0.698285 | |
RF | 2290 | 0.694212 |
Feature Selection Methods | Machine Learning Algorithms | Optimal Features | Accuracy (%) |
---|---|---|---|
ANOVA | KNN | 10 | 0.466071 |
SVM | 940 | 0.647621 | |
XGBoost | 510 | 0.607868 | |
RF | 5210 | 0.642267 | |
F-SCORE | KNN | 50 | 0.546494 |
SVM | 1660 | 0.654765 | |
XGBoost | 500 | 0.616071 | |
RF | 1170 | 0.65253 | |
MIC | KNN | 110 | 0.342857 |
SVM | 430 | 0.652978 | |
XGBoost | 500 | 0.65 | |
RF | 1790 | 0.650299 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tong, K.; Chen, X.; Yan, S.; Dai, L.; Liao, Y.; Li, Z.; Wang, T. PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics. Genes 2024, 15, 603. https://doi.org/10.3390/genes15050603
Tong K, Chen X, Yan S, Dai L, Liao Y, Li Z, Wang T. PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics. Genes. 2024; 15(5):603. https://doi.org/10.3390/genes15050603
Chicago/Turabian StyleTong, Kai, Xiaojing Chen, Shen Yan, Liangli Dai, Yuxue Liao, Zhaoling Li, and Ting Wang. 2024. "PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics" Genes 15, no. 5: 603. https://doi.org/10.3390/genes15050603
APA StyleTong, K., Chen, X., Yan, S., Dai, L., Liao, Y., Li, Z., & Wang, T. (2024). PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics. Genes, 15(5), 603. https://doi.org/10.3390/genes15050603