Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset Collection
2.2. Feature-Selection
2.3. Training of Classification Algorithms
3. Results
3.1. Comparison of Feature-Selection Method and Classification Algorithm Combinations
3.2. Model Evaluation Using the Test Set
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Antczak, P.; Je-Jo, H.; Woo, S.; Scanlan, L.; Poynton, H.; Loguinov, A.; Chan, S.; Falciani, F.; Vulpe, C. Molecular toxicity identification evaluation (mTIE) approach predicts chemical exposure in Daphnia magna. Environ. Sci. Technol. 2013, 47, 11747–11756. [Google Scholar] [CrossRef] [PubMed]
- Stanley, J.K.; Perkins, E.J.; Habib, T.; Sims, J.G.; Chappell, P.; Escalon, B.L.; Wilbanks, M.; Garcia-Reyero, N. The good, the bad, and the toxic: Approaching hormesis in Daphnia magna exposed to an energetic compound. Environ. Sci. Technol. 2013, 47, 9424–9433. [Google Scholar] [CrossRef]
- Helfrich, L.A.; Weigmann, D.L.; Hipkins, P.A.; Stinson, E.R. Pesticides and Aquatic Animals: A Guide to Reducing Impacts on Aquatic Systems; Virginia Cooperative Extension: Blacksburg, VA, USA, 2009. [Google Scholar]
- Ivahnenko, T.; Barbash, J.E. Chloroform in the Hydrologic System—Sources, Transport, Fate, Occurrence, and Effects on Human Health and Aquatic Organisms; US Geological Survey Reston: Reston, VA, USA, 2004.
- Ankley, G.; Schubauer-Berigan, M. Background and overview of current sediment toxicity identification evaluation procedures. J. Aqua. Eco. Health. 1995, 4, 133–149. [Google Scholar] [CrossRef]
- Jeremias, G.; Jesus, F.; Ventura, S.P.M.; Gonçalves, F.J.M.; Asselman, J.; Pereira, J.L. New insights on the effects of ionic liquid structural changes at the gene expression level: Molecular mechanisms of toxicity in Daphnia magna. J. Hazard. Mater. 2021, 409, 124517. [Google Scholar] [CrossRef] [PubMed]
- Poynton, H.; Lazorchak, J.; Impellitteri, C.; Smith, M.; Rogers, K.; Patra, M.; Hammer, K.; Allen, H.; Vulpe, C. Differential gene expression in Daphnia magna suggests distinct modes of action and bioavailability for ZnO nanoparticles and Zn ions. Environ. Sci. Technol. 2010, 45, 762–768. [Google Scholar] [CrossRef]
- Biales, A.D.; Kostich, M.; Burgess, R.M.; Ho, K.T.; Bencic, D.C.; Flick, R.L.; Portis, L.M.; Pelletier, M.C.; Perron, M.M.; Reiss, M. Linkage of genomic biomarkers to whole organism end points in a toxicity identification evaluation (TIE). Environ. Sci. Technol. 2013, 47, 1306–1312. [Google Scholar] [CrossRef]
- Martinović-Weigelt, D.; Mehinto, A.C.; Ankley, G.T.; Denslow, N.D.; Barber, L.B.; Lee, K.E.; King, R.J.; Schoenfuss, H.L.; Schroeder, A.L.; Villeneuve, D.L. Transcriptomic effects-based monitoring for endocrine active chemicals: Assessing relative contribution of treated wastewater to downstream pollution. Environ. Sci. Technol. 2014, 18, 2385–2394. [Google Scholar] [CrossRef]
- Bhandari, N.; Walambe, R.; Kotech, K.; Khare, S. Comprehensive survey of computational learning methods for analysis of gene expression data in genomics. arXiv 2022, arXiv:2202.02958. [Google Scholar]
- Rubinstein, B.I.P.; McAuliffe, J.; Cawley, S.; Palaniswami, M.; Ramamohanarao, K.; Speed, T.P. Machine learning in low-level microarray analysis. ACM SIGKDD Explor. Newsletter. 2003, 5, 130–139. [Google Scholar] [CrossRef]
- Arowolo, M.O.; Adebiyi, M.O.; Aremu, C.; Adebiyi, A.A. A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector. J. Big Data. 2021, 8, 50. [Google Scholar] [CrossRef]
- Fan, J.; Huang, G.; Chi, M.; Shi, Y.; Jiang, J.; Feng, C.; Yan, Z.; Xu, Z. Prediction of chemical reproductive toxicity to aquatic species using a machine learning model: An application in an ecological risk assessment of the Yangtze River, China. Sci. Total Environ. 2021, 796, 148901. [Google Scholar] [CrossRef]
- Dinis, F.; Liu, H.; Liu, Q.; Wang, X.; Xu, M. Ecological risk assessment of cadmium in karst lake sediments based on Daphnia pulex ecotoxicology. Minerals. 2021, 11, 650. [Google Scholar] [CrossRef]
- Zhou, J.; Du, N.; Li, D.; Qin, J.; Li, H.; Chen, G. Combined effects of perchlorate and hexavalent chromium on the survival, growth and reproduction of Daphnia carinata. Sci. Total Environ. 2021, 769, 144676. [Google Scholar] [CrossRef] [PubMed]
- Cardoso, D.N.; Soares, A.M.V.M.; Wrona, F.J.; Loureiro, S. Assessing the acute and chronic toxicity of exposure to naturally occurring oil sands deposits to aquatic organisms using Daphnia magna. Sci. Total Environ. 2020, 729, 138805. [Google Scholar] [CrossRef] [PubMed]
- Zimmermann, L.; Gottlich, S.; Oehlmann, J.; Wagner, M.; Volker, C. What are the drivers of microplastic toxicity? Comparing the toxicity of plastic chemicals and particles to Daphnia magna. Environ. Pol. 2020, 267, 115392. [Google Scholar] [CrossRef]
- Barrett, T.; Wilhite, S.E.; Ledoux, P.; Evangelista, C.; Kim, I.F.; Tomashevsky, M.; Marshall, K.A.; Phillippy, K.H.; Sherman, P.M.; Holko, M.; et al. NCBI GEO: Archive for functional genomics data sets—Update. Nucleic Acids Res. 2013, 41, D991–D995. [Google Scholar] [CrossRef]
- Giraudo, M.; Douvile, M.; Houde, M. Evaluation of Chronic Sublethal Effects Effects of the Organophosphate Flame-Retardant Tris(2-butoxyethyl) Phosphate (TBEP) Using Daphnia Magna Transcriptomic Response. GEO Database. 2014. Available online: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55132 (accessed on 26 October 2021).
- Campos, B.; Garcia-Reyero, N.; Rivetti, C.; Escalon, L.; Habib, T.; Tauler, R.; Tsakovski, S.; Pina, B.; Barata, C. Identification of metabolic pathways in Daphnia magna explaining hormetic effects of selective serotonin reuptake inhibitors and 4-nonylphenol using transcriptomic and phenotypic responses. Environ. Sci. Technol. 2013, 47, 9434–9443. [Google Scholar] [CrossRef]
- Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
- Kuhn, M.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenkel, B.; Team, R.C.; Benesty, M.; et al. caret: Classification and regression Training. R Package Version 6.0-90. 2021. Available online: https://CRAN.R-project.org/package=caret (accessed on 13 September 2021).
- Gentleman, R.; Carey, V.J.; Huber, W.; Irizarry, R.A.; Dodoit, S. Bioinformatics and Computational Biology Solutions Using R and Bioconductor; Springer: New York, NY, USA, 2005; pp. 397–420. [Google Scholar]
- Hira, Z.M.; Gillies, D.F. A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 198363, 13. [Google Scholar] [CrossRef]
- Poynton, H.; Varshavsky, J.; Chang, B.; Cavigiolio, G.; Chan, S.; Holman, P.; Loguinov, A.; Bauer, D.; Komachi, K.; Theil, E. Daphnia magna ecotoxicogenomics provides mechanistic insights into metal toxicity. Environ. Sci. Technol. 2007, 41, 1044–1050. [Google Scholar] [CrossRef]
- Poynton, H.; Loguinov, A.; Varshavsky, J.; Chan, S.; Perkins, E.; Vulpe, C. Gene expression profiling in Daphnia magna part I: Concentration-dependent profiles provide support for the no observed transcriptional effect level. Environ. Sci. Technol. 2008, 42, 6250–6256. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Garcia-Reyero, N.; Poynton, H.; Kennedy, A.; Guan, X.; Escalon, B.; Chang, B.; Varshavsky, J.; Loguinov, A.; Vulpe, C.; Perkins, E. Biomarker discovery and transcriptomic responses in Daphnia magna exposed to munitions constituents. Environ. Sci. Technol. 2009, 43, 4188–4193. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Zhao, Y.; Bollas, A.; Wang, Y.; Au, K.F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 2021, 39, 1348–1365. [Google Scholar] [CrossRef]
- Nagi, S.; Bhattacharyya, D.K. Classification of microarray cancer data using ensemble approach. Netw. Model. Anal. Health Inform. Bioinform. 2013, 2, 159–173. [Google Scholar] [CrossRef]
- Tan, A.C.; Gilbert, D. Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinform. 2003, 2, S75–S83. [Google Scholar]
- Peng, Y. A novel ensemble machine learning for robust microarray data classification. Comput. Biol. Med. 2006, 36, 553–573. [Google Scholar] [CrossRef] [PubMed]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
- Irwin, B.W.; Levell, J.R.; Whitehead, T.M.; Segall, M.D.; Conduit, G.J. Practical applications of deep learning to impute heterogeneous drug discovery data. J. Chem. Inf. Model 2020, 60, 2848–2857. [Google Scholar] [CrossRef]
- Wu, W.; Wang, G. Machine learning based toxicity prediction: From chemical structural description to transcriptome analysis. Int. J. Mol. Sci. 2018, 19, 2358. [Google Scholar]
GEO Accession | Toxicity Organic Compounds |
---|---|
GSE43564 | Atrazine, Acrylonitrile, Beta-benzene-hexachloride, Bifenthrin, Bis2-ethylhexyl-phthalate, Chlorpyrifos, Chloroform, Diazinon, Dichlorobenzene, Lamda-Cyhalothrin, Parathion, Phenol, Permethrin, Toluene, Trichloroethylene, 2-chloroethyl-vinyl-ether |
GSE55132 | Tris(2-butoxyethyl) phosphate (TBEP) |
GSE43960 | 2,4,6-trinitrotoluene (TNT) |
GSE45053 | Acetone, Fluvoxamine, Fluoxetine, Nonylphenol (from Adult) |
Algorithm | Feature-Ranking Algorithm | Abbreviation |
---|---|---|
Artificial neural network | Learning Vector Quantization | LVQ |
Ensemble | Random Forest | RF |
Nonlinear | Support Vector Machines with a Linear kernel | SVML |
Algorithm | Classification Algorithm | Abbreviation |
---|---|---|
Linear | Linear Discriminant Analysis | LDA |
Nonlinear | Classification And Regression Trees K-nearest neighbors Support Vector Machines with a Linear kernel | CART Knn SVML |
Ensemble | Random Forest Boosted C5.0 Gradient Boosting Machine eXtreme Gradient Boosting with tree eXtreme Gradient Boosting with DART booster | RF C5.0 GBM xgbTree xgbDART |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, T.-J.; An, H.-E.; Kim, C.-B. Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles. Life 2022, 12, 1443. https://doi.org/10.3390/life12091443
Choi T-J, An H-E, Kim C-B. Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles. Life. 2022; 12(9):1443. https://doi.org/10.3390/life12091443
Chicago/Turabian StyleChoi, Tae-June, Hyung-Eun An, and Chang-Bae Kim. 2022. "Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles" Life 12, no. 9: 1443. https://doi.org/10.3390/life12091443
APA StyleChoi, T. -J., An, H. -E., & Kim, C. -B. (2022). Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles. Life, 12(9), 1443. https://doi.org/10.3390/life12091443