Identification of Regulatory SNPs Associated with Vicine and Convicine Content of Vicia faba Based on Genotyping by Sequencing Data Using Deep Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Plant Material and Sequencing
2.2. Assembly of a Partial Vicia faba Genome
2.3. Variant Calling and Association Testing
2.4. Data Sets for Training the Neural Network
2.5. Sequence Features Used to Predict Promoters
2.6. Convolutional Neural Networks
2.7. Identification of Putative Regulatory SNPs
3. Results and Discussion
3.1. Processing the GBS Data
3.2. Prediction of Promoter Sequences
3.2.1. Intra- and Inter-Species Promoter Prediction
3.2.2. Prediction of Vicia faba Promoters
3.2.3. Systematic Identification of Regulatory SNPs Associated with the V+C Content of Vicia faba
3.3. Functional Analysis of the Candidate Gene and Transcription Factors
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
SNP | Single Nucleotide Polymorphism |
rSNP | Regulatory Single Nucleotide Polymorphism |
CNN | Convolutional Neural Network |
TF | Transcription Factor |
TFBS | Transcription Factor Binding Site |
V+C | Vicin and Convicine |
FDR | False Discovery Rate |
HMI | Horizontal Mutual Information |
GTE | Generalized Topological Entropy |
CPE | Core Promoter Element |
PWM | Position Weight Matrix |
MSS | Matrix Similarity Score |
References
- Deschamps, S.; Llaca, V.; May, G.D. Genotyping-by-Sequencing in Plants. Biology 2012, 1, 460–483. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Muktar, M.S.; Teshome, A.; Hanson, J.; Negawo, A.T.; Habte, E.; Entfellner, J.D.; Lee, K.; Jones, C.S. Genotyping by sequencing provides new insights into the diversity of Napier grass (Cenchrus purpureus) and reveals variation in genome-wide LD patterns between collections. Sci. Rep. 2019, 9, 6936. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Raman, H.; Raman, R.; Nelson, M.N.; Aslam, M.N.; Rajasekaran, R.; Wratten, N.; Cowling, W.A.; Kilian, A.; Sharpe, A.G.; Schondelmaier, J. Diversity array technology markers: Genetic diversity analyses and linkage map construction in rapeseed (Brassica napus L.). DNA Res. 2011, 19, 51–65. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wenzl, P.; Raman, H.; Wang, J.; Zhou, M.; Huttner, E.; Kilian, A. A DArT platform for quantitative bulked segregant analysis. BMC Genom. 2007, 8, 196. [Google Scholar] [CrossRef] [Green Version]
- He, J.; Zhao, X.; Laroche, A.; Lu, Z.; Liu, H.; Li, Z. Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding. Front. Plant Sci. 2014, 5, 484. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, N.H.; Premachandra, H.K.A.; Kilian, A.; Knibb, W. Genomic prediction using DArT-Seq technology for yellowtail kingfish Seriola lalandi. BMC Genom. 2018, 19, 107. [Google Scholar] [CrossRef]
- Von Mark, V.C.; Kilian, A.; Dierig, D.A. Development of DArT marker platforms and genetic diversity assessment of the US collection of the new oilseed crop lesquerella and related species. PLoS ONE 2013, 8, e64062. [Google Scholar]
- Morris, G.P.; Ramu, P.; Deshpande, S.P.; Hash, C.T.; Shah, T.; Upadhyaya, H.D.; Riera-Lizarazu, O.; Brown, P.J.; Acharya, C.B.; Mitchell, S.E.; et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc. Natl. Acad. Sci. USA 2013, 110, 453–458. [Google Scholar] [CrossRef] [Green Version]
- International Cassava Genetic Map Consortium. High-resolution linkage map and chromosome-scale genome assembly for cassava (Manihot esculenta Crantz) from 10 populations. G3 Genes Genomes Genet. 2015, 5, 133–144. [Google Scholar]
- Soto, J.C.; Ortiz, J.F.; Perlaza-Jiménez, L.; Vásquez, A.X.; Lopez-Lavalle, L.A.B.; Mathew, B.; Léon, J.; Bernal, A.J.; Ballvora, A.; López, C.E. A genetic map of cassava (Manihot esculenta Crantz) with integrated physical mapping of immunity-related genes. BMC Genom. 2015, 16, 190. [Google Scholar] [CrossRef] [Green Version]
- Elshire, R.J.; Glaubitz, J.C.; Sun, Q.; Poland, J.A.; Kawamoto, K.; Buckler, E.S.; Mitchell, S.E. A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS ONE 2011, 6, e19379. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cooper, J.W.; Wilson, M.H.; Derks, M.F.L.; Smit, S.; Kunert, K.J.; Cullis, C.; Foyer, C.H. Enhancing faba bean (Vicia faba L.) genome resources. J. Exp. Bot. 2017, 68, 1941–1953. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Köpke, U.; Nemecek, T. Ecological services of faba bean. Field Crop. Res. 2010, 115, 217–233. [Google Scholar] [CrossRef]
- Khazaei, H.; Purves, R.W.; Hughes, J.; Link, W.; O’Sullivan, D.M.; Schulman, A.H.; Björnsdotter, E.; Geu-Flores, F.; Nadzieja, M.; Andersen, S.U.; et al. Eliminating vicine and convicine, the main anti-nutritional factors restricting faba bean usage. Trends Food Sci. Technol. 2019, 91, 549–556. [Google Scholar] [CrossRef]
- Arese, P.; Gallo, V.; Pantaleo, A.; Turrini, F. Life and Death of Glucose-6-Phosphate Dehydrogenase (G6PD) Deficient Erythrocytes - Role of Redox Stress and Band 3 Modifications. Transfus. Med. Hemotherapy 2012, 39, 328–334. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Duc, G.; Sixdenier, G.; Lila, M.; Furstoss, V. Search of Genetic Variability for Vicine and Convicine Content in Vicia faba L.: A First Report of a Gene Which Codes for Nearly Zero-Vicine and Zero-Convicine Contents. In Recent Advances of Research in Antinutritional Factors in Legume Seeds; Huisman, J., van der Poel, A.F.B., Liener, I.E., Eds.; Wageningen Academic Publishers: Wageningen, The Netherlands, 1989; pp. 305–313. [Google Scholar]
- Fang, L.; Ahn, J.K.; Wodziak, D.; Sibley, E. The human lactase persistence-associated SNP -13910*T enables in vivo functional persistence of lactase promoter-reporter transgene expression. Hum. Genet. 2012, 131, 1153–1159. [Google Scholar] [CrossRef] [Green Version]
- De Gobbi, M.; Viprakasit, V.; Hughes, J.R.; Fisher, C.; Buckle, V.J.; Ayyub, H.; Gibbons, R.J.; Vernimmen, D.; Yoshinaga, Y.; de Jong, P.; et al. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science 2006, 312, 1215–1217. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ordovás, L.; Roy, R.; Pampín, S.; Zaragoza, P.; Osta, R.; Rodríguez-Rey, J.C.; Rodellar, C. The g.763G>C SNP of the bovine FASN gene affects its promoter activity via Sp-mediated regulation: Implications for the bovine lactating mammary gland. Physiol. Genom. 2008, 34, 144–148. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ryan, M.T.; Hamill, R.M.; O’Halloran, A.M.; Davey, G.C.; McBryan, J.; Mullen, A.M.; McGee, C.; Gispert, M.; Southwood, O.I.; Sweeney, T. SNP variation in the promoter of the PRKAG3 gene and association with meat quality traits in pig. BMC Genet. 2012, 13, 66. [Google Scholar] [CrossRef] [Green Version]
- Barkova, O.Y.; Sazanova, K.A.; Fomichev, K.A.; Malewski, T.; Parada, R.; Kawka, M.; Jaszczak, K.; Sazanov, A.A. Associations of new rSNPs with eggshell thickness in Rhode Island layers. Anim. Sci. Pap. Rep. 2013, 31, 165–172. [Google Scholar]
- Konishi, S.; Izawa, T.; Lin, S.Y.; Ebana, K.; Fukuta, Y.; Sasaki, T.; Yano, M. An SNP caused loss of seed shattering during rice domestication. Science 2006, 312, 1392–1396. [Google Scholar] [CrossRef] [Green Version]
- Fickett, J.W.; Hatzigeorgiou, A.G. Eukaryotic Promoter Recognition. Genome Res. 1997, 7, 861–878. [Google Scholar] [CrossRef] [Green Version]
- Shahmuradov, I.A.; Solovyev, V.V.; Gammerman, A.J. Plant promoter prediction with confidence estimation. Nucleic Acids Res. 2005, 33, 1069–1076. [Google Scholar] [CrossRef] [Green Version]
- Ohler, U. Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res. 2006, 34, 5943–5950. [Google Scholar] [CrossRef] [Green Version]
- Morey, C.; Mookherjee, S.; Rajasekaran, G.; Bansal, M. DNA Free Energy-Based Promoter Prediction and Comparative Analysis of Arabidopsis and Rice Genomes. Plant Physiol. 2011, 156, 1300–1315. [Google Scholar] [CrossRef] [Green Version]
- Azad, A.K.M.; Shahid, S.; Noman, N.; Lee, H. Prediction of plant promoters based on hexamers and random triplet pair analysis. Algorithms Mol. Biol. 2011, 6, 19. [Google Scholar] [CrossRef] [Green Version]
- Lai, H.; Zhang, Z.; Su, Z.; Su, W.; Ding, H.; Chen, W.; Lin, H. iProEP: A Computational Predictor for Predicting Promoter. Mol. Ther. Nucleic Acids 2019, 17, 337–346. [Google Scholar] [CrossRef] [Green Version]
- Abeel, T.; Saeys, Y.; Rouzé, P.; Van de Peer, Y. ProSOM: Core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 2008, 24, i24–i31. [Google Scholar] [CrossRef]
- Anwar, F.; Baker, S.M.; Jabid, T.; Mehedi Hasan, M.; Shoyaib, M.; Khan, H.; Walshe, R. Pol II promoter prediction using characteristic 4-mer motifs: A machine learning approach. BMC Bioinform. 2008, 9, 414. [Google Scholar] [CrossRef] [Green Version]
- Umarov, R.K.; Solovyev, V.V. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS ONE 2017, 12, e171410. [Google Scholar] [CrossRef]
- Umarov, R.; Kuwahara, H.; Li, Y.; Gao, X.; Solovyev, V. Promoter analysis and prediction in the human genome using sequence-based deep learning models. Bioinformatics 2019, 35, 2730–2737. [Google Scholar] [CrossRef] [Green Version]
- Triska, M.; Solovyev, V.; Baranova, A.; Kel, A.; Tatarinova, T.V. Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS ONE 2017, 12, 1–28. [Google Scholar] [CrossRef] [Green Version]
- Qian, Y.; Zhang, Y.; Guo, B.; Ye, S.; Wu, Y.; Zhang, J. An Improved Promoter Recognition Model Using Convolutional Neural Network. In Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, Japan, 23–27 July 2018; Volume 1, pp. 471–476. [Google Scholar] [CrossRef]
- Oubounyt, M.; Louadi, Z.; Tayara, H.; Chong, K.T. DeePromoter: Robust Promoter Predictor Using Deep Learning. Front. Genet. 2019, 10, 286. [Google Scholar] [CrossRef] [Green Version]
- Pachganov, S.; Murtazalieva, K.; Zarubin, A.; Sokolov, D.; Chartier, D.R.; Tatarinova, T.V. TransPrise: A novel machine learning approach for eukaryotic promoter prediction. PeerJ 2019, 7, e7990. [Google Scholar] [CrossRef]
- Kumari, S.; Ware, D. Genome-Wide Computational Prediction and Analysis of Core Promoter Elements across Plant Monocots and Dicots. PLoS ONE 2013, 8, e79011. [Google Scholar] [CrossRef] [Green Version]
- Shahmuradov, I.A.; Umarov, R.K.; Solovyev, V.V. TSSPlant: A new tool for prediction of plant Pol II promoters. Nucleic Acids Res. 2017, 45, e65. [Google Scholar] [CrossRef]
- Goubert, C.; Modolo, L.; Vieira, C.; ValienteMoro, C.; Mavingui, P.; Boulesteix, M. De Novo Assembly and Annotation of the Asian Tiger Mosquito (Aedes albopictus) Repeatome with dnaPipeTE from Raw Genomic Reads and Comparative Analysis with the Yellow Fever Mosquito (Aedes aegypti). Genome Biol. Evol. 2015, 7, 1192–1205. [Google Scholar] [CrossRef] [Green Version]
- Yuan, S.; Xia, Y.; Zheng, Y.; Zeng, X. Next-generation sequencing of mixed genomic DNA allows efficient assembly of rearranged mitochondrial genomes in Amolops chunganensis andQuasipaa boulengeri. PeerJ 2016, 4, e2786. [Google Scholar] [CrossRef] [Green Version]
- Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644–652. [Google Scholar] [CrossRef] [Green Version]
- Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28, 3150–3152. [Google Scholar] [CrossRef]
- Hwang, S.; Kim, E.; Lee, I.; Marcotte, E.M. Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci. Rep. 2015, 5, 17875. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [Green Version]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chang, C.C.; Chow, C.C.; Tellier, L.C.A.M.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience 2015, 4. [Google Scholar] [CrossRef]
- Howe, K.L.; Contreras-Moreira, B.; De Silva, N.; Maslen, G.; Akanni, W.; Allen, J.; Alvarez-Jarreta, J.; Barba, M.; Bolser, D.M.; Cambell, L.; et al. Ensembl Genomes 2020—Enabling non-vertebrate genomic research. Nucleic Acids Res. 2019, gkz890. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kinsella, R.J.; Kähäri, A.; Haider, S.; Zamora, J.; Proctor, G.; Spudich, G.; Almeida-King, J.; Staines, D.; Derwent, P.; Kerhornou, A.; et al. Ensembl BioMarts: A hub for data retrieval across taxonomic space. Database 2011, 2011, bar030. [Google Scholar] [CrossRef]
- Humann, J.L.; Jung, S.; Cheng, C.-H.; Lee, T.; Zheng, P.; Frank, M.; McGaughey, D.; Scott, K.; Buble, K.; Yu, J.; et al. Cool Season Food Legume Genome Database: A resource for pea, lentil, faba bean and chickpea genetics, genomics and breeding. In Proceedings of the International Plant and Animal Genome Conference, San Diego, CA, USA, 12–16 January 2019. [Google Scholar]
- Lichtenstein, F.; Antoneli, F.; Briones, M.R.S. MIA: Mutual Information Analyzer, a graphic user interface program that calculates entropy, vertical and horizontal mutual information of molecular sequence sets. BMC Bioinform. 2015, 16, 409. [Google Scholar] [CrossRef] [Green Version]
- Schmitt, A.O.; Herzel, H. Estimating the entropy of DNA sequences. J. Theor. Biol. 1997, 188, 369–377. [Google Scholar] [CrossRef]
- Jin, S.; Tan, R.; Jiang, Q.; Xu, L.; Peng, J.; Wang, Y.; Wang, Y. A Generalized Topological Entropy for Analyzing the Complexity of DNA Sequences. PLoS ONE 2014, 9, e88519. [Google Scholar] [CrossRef]
- Li, J.; Zhang, L.; Li, H.; Ping, Y.; Xu, Q.; Wang, R.; Tan, R.; Wang, Z.; Liu, B.; Wang, Y. Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinform. 2019, 20, 283. [Google Scholar] [CrossRef] [Green Version]
- Al-Ajlan, A.; El Allali, A. CNN-MGP: Convolutional neural networks for metagenomics gene prediction. Interdiscip. Sci. Comput. Life Sci. 2019, 11, 628–635. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chollet, F.; Allaire, J.J. Deep Learning with R; Manning Publications: Shelter Island, NY, USA, 2018. [Google Scholar]
- Li, Q.; Cai, W.; Wang, X.; Zhou, Y.; Feng, D.D.; Chen, M. Medical image classification with convolutional neural network. In Proceedings of the 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore, 10–12 December 2014; pp. 844–848. [Google Scholar]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:cs.LG/1412.6980. [Google Scholar]
- Yu, K.; Xu, W.; Gong, Y. Deep learning with kernel regularization for visual recognition. In Advances in Neural Information Processing Systems; Koller, D., Schuurmans, D., Bengio, Y., Bottou, L., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2009; pp. 1889–1896. [Google Scholar]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 28 May 2020).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: tensorflow.org (accessed on 28 May 2020).
- Boughorbel, S.; Jarray, F.; El-Anbari, M. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 2017, 12, e0177678. [Google Scholar] [CrossRef]
- Kel, A.E.; Gößling, E.; Cheremushkin, E.; Kel-Margoulis, O.V.; Wingender, E. MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 2003, 31, 3576–3579. [Google Scholar] [CrossRef] [Green Version]
- Wingender, E. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief. Bioinform. 2008, 9, 326–332. [Google Scholar] [CrossRef] [Green Version]
- Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and applications. BMC Bioinform. 2009, 10, 421. [Google Scholar] [CrossRef] [Green Version]
- Xu, Z.; Taylor, J.A. SNPinfo: Integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 2009, 37, W600–W605. [Google Scholar] [CrossRef] [Green Version]
- Fu, Y.; Liu, Z.; Lou, S.; Bedford, J.; Mu, X.J.; Yip, K.Y.; Khurana, E.; Gerstein, M. FunSeq2: A framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 2014, 15, 480. [Google Scholar] [CrossRef]
- Gearing, L.J.; Cumming, H.E.; Chapman, R.; Finkel, A.M.; Woodhouse, I.B.; Luu, K.; Gould, J.A.; Forster, S.C.; Hertzog, P.J. CiiiDER: A tool for predicting and analysing transcription factor binding sites. PLoS ONE 2019, 14, e0215495. [Google Scholar] [CrossRef] [Green Version]
- Heath, R.J.; Rock, C.O. Roles of the FabA and FabZ β-Hydroxyacyl-Acyl Carrier Protein Dehydratases in Escherichia coli Fatty Acid Biosynthesis. J. Biol. Chem. 1996, 271, 27795–27801. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, S.; Hanson, R.E.; Cronan, J.E. Biotin synthesis begins by hijacking the fatty acid synthetic pathway. Nat. Chem. Biol. 2010, 6, 682–688. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Brown, E.G.; Roberts, F.M. Formation of vicine and convicine by Vicia faba. Phytochemistry 1972, 11, 3203–3206. [Google Scholar] [CrossRef]
- Smaczniak, C.; Immink, R.G.H.; Angenent, G.C.; Kaufmann, K. Developmental and evolutionary diversity of plant MADS-domain factors: Insights from recent studies. Development 2012, 139, 3081–3098. [Google Scholar] [CrossRef] [Green Version]
- Riechmann, J.L.; Ratcliffe, O.J. A genomic perspective on plant transcription factors. Curr. Opin. Plant Biol. 2000, 3, 423–434. [Google Scholar] [CrossRef]
- Ping, J.; Liu, Y.; Sun, L.; Zhao, M.; Li, Y.; She, M.; Sui, Y.; Lin, F.; Liu, X.; Tang, Z.; et al. Dt2 Is a Gain-of-Function MADS-Domain Factor Gene That Specifies Semideterminacy in Soybean. Plant Cell 2014, 26, 2831–2842. [Google Scholar] [CrossRef] [Green Version]
- Danyluk, J.; Kane, N.A.; Breton, G.; Limin, A.E.; Fowler, D.B.; Sarhan, F. TaVRT-1, a Putative Transcription Factor Associated with Vegetative to Reproductive Transition in Cereals. Plant Physiol. 2003, 132, 1849–1860. [Google Scholar] [CrossRef] [Green Version]
- West, A.G.; Sharrocks, A.D.; Causier, B.E.; Davies, B. DNA binding and dimerisation determinants of Antirrhinum majus MADS-box transcription factors. Nucleic Acids Res. 1998, 26, 5277–5287. [Google Scholar] [CrossRef]
- Theißen, G.; Melzer, R.; Rümpler, F. MADS-domain transcription factors and the floral quartet model of flower development: Linking plant development and evolution. Development 2016, 143, 3259–3271. [Google Scholar] [CrossRef] [Green Version]
- Dubos, C.; Stracke, R.; Grotewold, E.; Weisshaar, B.; Martin, C.; Lepiniec, L. MYB transcription factors in Arabidopsis. Trends Plant Sci. 2010, 15, 573–581. [Google Scholar] [CrossRef]
- Roy, S. Function of MYB domain transcription factors in abiotic stress and epigenetic control of stress response in plant genome. Plant Signal. Behav. 2016, 11, e1117723. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fu, F.; Zhang, W.; Li, Y.; Wang, H.L. Establishment of the model system between phytochemicals and gene expression profiles in Macrosclereid cells of Medicago truncatula. Sci. Rep. 2017, 7, 2580. [Google Scholar] [CrossRef] [PubMed]
- Jin, H.; Cominelli, E.; Bailey, P.; Parr, A.; Mehrtens, F.; Jones, J.; Tonelli, C.; Weisshaar, B.; Martin, C. Transcriptional repression by AtMYB4 controls production of UV-protecting sunscreens in Arabidopsis. EMBO J. 2000, 19, 6150–6161. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Agarwal, P.; Banerjee, S.; Mitra, M.; Roy, S. MYB4 transcription factor, A member of R2R3-type MYB family protein regulates Cd tolerance via activation of antioxidant defense and glutathione (GSH) dependent pathway in Arabidopsis thaliana. In Proceedings of the XIV International Geographical Union (IGU)-India Conference, Burdwan, India, 6–8 March 2020. [Google Scholar]
- Vannini, C.; Locatelli, F.; Bracale, M.; Magnani, E.; Marsoni, M.; Osnato, M.; Mattana, M.; Baldoni, E.; Coraggio, I. Overexpression of the rice Osmyb4 gene increases chilling and freezing tolerance of Arabidopsis thaliana plants. Plant J. 2004, 37, 115–127. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Wu, J.; Guan, M.; Zhao, C.; Geng, P.; Zhao, Q. Arabidopsis MYB4 plays dual roles in flavonoid biosynthesis. Plant J. 2020, 101, 637–652. [Google Scholar] [CrossRef]
- Zhang, Z.; Hu, X.; Zhang, Y.; Miao, Z.; Xie, C.; Meng, X.; Deng, J.; Wen, J.; Mysore, K.S.; Frugier, F.; et al. Opposing Control by Transcription Factors MYB61 and MYB3 Increases Freezing Tolerance by Relieving C-Repeat Binding Factor Suppression. Plant Physiol. 2016, 172, 1306–1323. [Google Scholar] [CrossRef] [Green Version]
- Romano, J.M.; Dubos, C.; Prouse, M.B.; Wilkins, O.; Hong, H.; Poole, M.; Kang, K.; Li, E.; Douglas, C.J.; Western, T.L.; et al. AtMYB61, an R2R3-MYB transcription factor, functions as a pleiotropic regulator via a small gene network. New Phytol. 2012, 195, 774–786. [Google Scholar] [CrossRef]
- Matías-Hernández, L.; Jiang, W.; Yang, K.; Tang, K.; Brodelius, P.E.; Pelaz, S. AaMYB1 and its orthologue AtMYB61 affect terpene metabolism and trichome development in Artemisia annua and Arabidopsis thaliana. Plant J. 2017, 90, 520–534. [Google Scholar] [CrossRef] [Green Version]
- Liang, Y.; Dubos, C.; Dodd, I.C.; Holroyd, G.H.; Hetherington, A.M.; Campbell, M.M. AtMYB61, an R2R3-MYB Transcription Factor Controlling Stomatal Aperture in Arabidopsis thaliana. Curr. Biol. 2005, 15, 1201–1206. [Google Scholar] [CrossRef] [Green Version]
- Arsovski, A.A.; Villota, M.M.; Rowland, O.; Subramaniam, R.; Western, T.L. MUM ENHANCERS are important for seed coat mucilage production and mucilage secretory cell differentiation in Arabidopsis thaliana. J. Exp. Bot. 2009, 60, 2601–2612. [Google Scholar] [CrossRef] [Green Version]
- Penfield, S.; Meissner, R.C.; Shoue, D.A.; Carpita, N.C.; Bevan, M.W. MYB61 Is Required for Mucilage Deposition and Extrusion in the Arabidopsis Seed Coat. Plant Cell 2001, 13, 2777–2791. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ramsay, G.; Griffiths, D.W. Accumulation of vicine and convicine in Vicia faba and V. narbonensis. Phytochemistry 1996, 42, 63–67. [Google Scholar] [CrossRef]
Species | # Promoter Sequences | # Non-Promoter Sequences |
---|---|---|
Arabidopsis thaliana | 23,315 | 23,315 |
Glycine max | 46,199 | 46,199 |
Lupinus angustifolius | 23,463 | 23,463 |
Medicago truncatula | 32,158 | 32,158 |
Phaseolus vulgaris | 22,750 | 22,750 |
Trifolium pratense | 14,749 | 14,749 |
Vigna angularis | 19,584 | 19,584 |
Vigna radiata | 15,495 | 15,495 |
Medicago truncatula (Genome-wide) | - | ∼12,000 |
Vicia faba (Transcriptome) | - | ∼60,000 |
Evaluated | Arabidopsis thaliana | Glycine max | Lupinus angustifolius | Medicago truncatula | Phaseolus vulgaris | Trifolium pratense | Vigna angularis | Vigna radiata | |
---|---|---|---|---|---|---|---|---|---|
Trained | |||||||||
Arabidopsis thaliana | 0.901 | 0.767 | 0.690 | 0.746 | 0.797 | 0.765 | 0.633 | 0.733 | |
Glycine max | 0.837 | 0.864 | 0.915 | 0.847 | 0.863 | 0.724 | 0.914 | 0.856 | |
Lupinus angustifolius | 0.545 | 0.611 | 0.981 | 0.720 | 0.586 | 0.493 | 0.974 | 0.709 | |
Medicago truncatula | 0.755 | 0.797 | 0.959 | 0.876 | 0.789 | 0.715 | 0.951 | 0.841 | |
Phaseolus vulgaris | 0.845 | 0.842 | 0.888 | 0.834 | 0.898 | 0.748 | 0.880 | 0.853 | |
Trifolium pratense | 0.822 | 0.764 | 0.696 | 0.751 | 0.794 | 0.840 | 0.689 | 0.736 | |
Vigna angularis | 0.510 | 0.607 | 0.971 | 0.715 | 0.583 | 0.494 | 0.977 | 0.712 | |
Vigna radiata | 0.741 | 0.812 | 0.937 | 0.827 | 0.825 | 0.675 | 0.928 | 0.904 |
Features | Accuracy | Sensitivity | Specificity | MCC |
---|---|---|---|---|
DNA sequence | 0.876 | 0.897 | 0.855 | 0.750 |
DNA sequence + 2-mer | 0.874 | 0.880 | 0.867 | 0.747 |
DNA sequence + 2-mer + frequency of CA motif | 0.862 | 0.828 | 0.897 | 0.726 |
DNA sequence + 2-mer + frequency of CG motif | 0.875 | 0.875 | 0.876 | 0.751 |
DNA sequence + 2-mer + HMI | 0.874 | 0.882 | 0.865 | 0.747 |
DNA sequence + 2-mer + frequency of TATAA motif | 0.876 | 0.875 | 0.878 | 0.752 |
DNA sequence + 2-mer + CG skew | 0.876 | 0.889 | 0.863 | 0.753 |
DNA sequence + topological entropy | 0.874 | 0.886 | 0.861 | 0.747 |
DNA sequence + 2-mer + topological entropy | 0.871 | 0.852 | 0.890 | 0.743 |
DNA sequence + 2-mer + HMI + frequency of TATAA motif | 0.871 | 0.869 | 0.874 | 0.743 |
DNA sequence + 2-mer + HMI + frequency of CA motif + frequency of CG motif + frequency of TATAA motif + CC skew | 0.873 | 0.859 | 0.888 | 0.747 |
DNA sequence + HMI + frequency of CA motif + frequency of CG motif + frequency of TATAA motif + CG skew | 0.875 | 0.889 | 0.860 | 0.749 |
SNP_ID | Genotype | FDR | Position | Medicago Gene |
---|---|---|---|---|
SNP_131938_118 | C/T | 0.234 | 1,385,390 | MTR_2g008290 |
SNP_302904_183 | G/A | 0.179 | 1,385,444 | |
SNP_341016_236 | C/T | 1,554,857 | MTR_2g008620 | |
SNP_341016_239 | G/A | 1,554,860 | ||
SNP_356745_200 | A/G | 0.730 | 1,707,078 | MTR_2g008960 |
SNP_280549_41 | C/T | 0.234 | 1,707,183 | |
SNP_350273_103 | G/T | 0.234 | 1,707,199 | |
SNP_350273_90 | A/C | 0.234 | 1,707,212 | |
SNP_350273_61 | G/A | 0.234 | 1,912,704 | MTR_2g009430 |
SNP_29452_204 | G/A | 0.730 | 1,912,812 | |
SNP_29452_206 | G/A | 0.496 | 1,912,814 | |
SNP_118828_190 | C/T | 0.234 | 2,030,017 | MTR_2g009690 |
SNP_80231_27 | C/T | 0.234 | 2,163,048 | MTR_2g009940 |
SNP_364434_97 | A/T | 0.359 | 2,163,084 |
SNP_ID | Allele | TFBS | MSS | Consequence |
---|---|---|---|---|
SNP_341016_236 | Ref | P$MYB4_01 | 0.945 | Loss of TFBS |
SNP_341016_236 | Ref | P$MYB61_01 | 0.880 | Loss of TFBS |
SNP_341016_236 | Alt | P$SQUA_01 | 0.870 | Gain of TFBS |
SNP_341016_239 | Ref | P$MYB61_01 | 0.880 | Score change |
SNP_341016_239 | Alt | P$MYB61_01 | 0.881 | Score change |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Heinrich, F.; Wutke, M.; Das, P.P.; Kamp, M.; Gültas, M.; Link, W.; Schmitt, A.O. Identification of Regulatory SNPs Associated with Vicine and Convicine Content of Vicia faba Based on Genotyping by Sequencing Data Using Deep Learning. Genes 2020, 11, 614. https://doi.org/10.3390/genes11060614
Heinrich F, Wutke M, Das PP, Kamp M, Gültas M, Link W, Schmitt AO. Identification of Regulatory SNPs Associated with Vicine and Convicine Content of Vicia faba Based on Genotyping by Sequencing Data Using Deep Learning. Genes. 2020; 11(6):614. https://doi.org/10.3390/genes11060614
Chicago/Turabian StyleHeinrich, Felix, Martin Wutke, Pronaya Prosun Das, Miriam Kamp, Mehmet Gültas, Wolfgang Link, and Armin Otto Schmitt. 2020. "Identification of Regulatory SNPs Associated with Vicine and Convicine Content of Vicia faba Based on Genotyping by Sequencing Data Using Deep Learning" Genes 11, no. 6: 614. https://doi.org/10.3390/genes11060614
APA StyleHeinrich, F., Wutke, M., Das, P. P., Kamp, M., Gültas, M., Link, W., & Schmitt, A. O. (2020). Identification of Regulatory SNPs Associated with Vicine and Convicine Content of Vicia faba Based on Genotyping by Sequencing Data Using Deep Learning. Genes, 11(6), 614. https://doi.org/10.3390/genes11060614