A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks †
Abstract
:1. Introduction
1.1. Related Work
1.2. Contributions
1.3. Outline of the Present Work
2. Background
2.1. Gene Regulatory Network
2.2. Inferring a Gene Regulatory Network
- Observation: The first step is to observe how the gene expression of a group of genes responds to external perturbations in a real organism. This can be performed using various strategies, such as microarray technology [29]. The level of gene expression for each gene is recorded over time to create a time-series dataset containing gene expression for the genes under observation. Typically, such a dataset is represented as a matrix , where N is the number of genes and M is the number of observations for each gene over time.
- Inference: The model created in the previous phase is used to make predictions about the relationships between genes in order to discover regulatory genes. This information can, therefore, be used to draw a complex network, i.e., a gene regulatory network, showing these relationships.
- Validation: Finally, to validate the accuracy of a predicted gene regulatory network, it is essential to compare it with the target network. However, this comparison can only be performed on an artificial dataset where the gene regulatory network is known beforehand. In a real organism, we do not have access to a gene regulatory network, and therefore, the validation of the predicted gene regulatory network must be performed empirically and in the field.
2.3. Datasets
3. Modeling
3.1. Agent
3.2. Artificial Environmental Setting
3.3. Simulation
4. Methodology
4.1. Model Estimation
4.1.1. Data Preprocessing
4.1.2. Learning
4.1.3. Evaluation
4.1.4. Optimization Algorithm
Algorithm 1: Pseudo-code to compute the fitness of a possible configuration of an environment. |
4.2. Inferring a Gene Regulatory Network
4.2.1. Perturb the State Matrix
4.2.2. Regulatory Value and Regulatory Matrix
- The simulation starts with initial conditions and runs until all the variables reach a steady state within a certain range. The time point at the end of this step is ;
- The simulation continues for m more time steps to ensure the stability of the environment. The time point at the end of this step is ;
- A perturbation function is applied to gene for time steps. The time point at the end of this step is . The perturbation duration is a hyperparameter, and we set it to m;
- The simulation runs for m more time steps until the state matrix stabilizes after the perturbation. The time point at the end of this step is , where is and represents the instability interval caused by the perturbation;
- Changes in the gene expression level of all genes between and are recorded in a matrix ;
- For each gene, the regulatory value is computed as the slope in radians of a linear regression model fit on the values between and .
4.2.3. Interaction Probability Matrix
4.2.4. Evaluation of a Gene Regulatory Network
5. Results
- We start by proving that our method of creating models, capable of predicting gene expression levels over time, works well on all datasets, including both real and artificial datasets (Section 5.1).
- Then, we show that our strategy of using metaheuristics to select the appropriate neural network architecture for each agent is the key to accurate simulation (Section 5.2).
- Subsequently, we aim to prove that our AES responds and regulates the gene expression in accordance with the behavior present in the real world under specific perturbations of specific genes (Section 5.3).
- Finally, we present and discuss the results obtained in terms of inferring a gene regulatory network, comparing our results with the state-of-the-art methodology (Section 5.4).
5.1. Environment Reliability Analysis
5.2. Optimization Analysis
5.3. Inferring Method Validation
5.4. Gene Regulatory Networks
6. Conclusions and Future Work
6.1. Gene Expression Forecasting
6.2. Gene Regulation Validation
6.3. Artificial vs. Real Datasets
6.4. Computational Cost
6.5. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AES | Artificial Environmental Settings |
GRN | Gene Regulatory Network |
ODE | Ordinary Differential Equation |
MLP | Multi-Layer Perceptron |
RNN | Recurrent Neural Network |
CNN | Convolutional Neural Network |
MSE | Mean-Squared Error |
MAE | Mean Absolute Error |
n | Number of genes |
b | Number of observations |
m | Number of steps considered in the state matrix |
k | Number of experiments |
A generic agent related to the i-th gene | |
Agent function | |
State of the environment | |
State matrix of the environment | |
Environment reliability based on Pearson correlation | |
Environment reliability based on cosine similarity | |
Environment error based on MSE | |
Environment error based on MAE | |
Perturbation function | |
Duration of a perturbation | |
Perturbation instability interval | |
Maximum gene expression level observed for the i-th gene in the dataset |
Appendix A. Neural Network Hyperparameter Optimization
(a) Hyperparameters of Neural Network Layers | ||
---|---|---|
Layer Type | Hyperparameter | Possible Values |
Fully Connected Layer | Output Size | From 50 to 1000 |
Dropout Layer | Dropout Rate | From 0 to 1 |
Activation Function | Non-Linear Function Name | relu, Linear, elu, tanh, gelu, selu |
LSTM Layer | Number of LSTM Hidden Units | From 2 to 1024 |
(b) Training Settings | ||
Parameter Name | Description | Possible Values |
InitialLearnRate | It determines the initial learning rate of the neural network during training. | From to |
MaxEpochs | It determines the maximum number of epochs (iterations) that the neural network will be trained for. | From 50 to 500 |
Shuffle | It determines whether the training data are shuffled before each epoch. | Enable, disable |
Optimizer | It determines the optimization algorithm used to update the weights of the neural network during training | Stochastic Gradient Descent (SGD), Adam |
References
- Gout, J.F.; Kahn, D.; Duret, L.; Paramecium Post-Genomics Consortium. The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution. PLoS Genet. 2010, 6, e1000944. [Google Scholar] [CrossRef]
- Karlebach, G.; Shamir, R. Modeling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 2008, 9, 770–780. [Google Scholar] [CrossRef]
- Shu, H.; Zhou, J.; Lian, Q.; Li, H.; Zhao, D.; Zeng, J.; Ma, J. Modeling gene regulatory networks using neural network architectures. Nat. Comput. Sci. 2021, 1, 491–501. [Google Scholar] [CrossRef]
- Aubin-Frankowski, P.C.; Vert, J.P. Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference. Bioinformatics 2020, 36, 4774–4780. [Google Scholar] [CrossRef]
- Pratapa, A.; Jalihal, A.P.; Law, J.N.; Bharadwaj, A.; Murali, T.M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 2020, 17, 147–154. [Google Scholar] [CrossRef]
- Zito, F.; Cutello, V.; Pavone, M. A Novel Reverse Engineering Approach for Gene Regulatory Networks. In The Complex Networks and Their Applications XI; Cherifi, H., Mantegna, R.N., Rocha, L.M., Cherifi, C., Miccichè, S., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 310–321. [Google Scholar] [CrossRef]
- Schaffter, T.; Marbach, D.; Floreano, D. GeneNetWeaver: In silico benchmark generation and performance profiling of network inference methods. Bioinformatics 2011, 27, 2263–2270. [Google Scholar] [CrossRef] [Green Version]
- Raza, K.; Alam, M. Recurrent neural network based hybrid model for reconstructing gene regulatory network. Comput. Biol. Chem. 2016, 64, 322–334. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schwab, J.D.; Kühlwein, S.D.; Ikonomi, N.; Kühl, M.; Kestler, H.A. Concepts in Boolean network modeling: What do they all mean? Comput. Struct. Biotechnol. J. 2020, 18, 571–582. [Google Scholar] [CrossRef] [PubMed]
- Delgado, F.M.; Gómez-Vela, F. Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif. Intell. Med. 2019, 95, 133–145. [Google Scholar] [CrossRef] [PubMed]
- Zhao, M.; He, W.; Tang, J.; Zou, Q.; Guo, F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Briefings Bioinform. 2021, 22, bbab009. [Google Scholar] [CrossRef]
- Pirooznia, M.; Yang, J.Y.; Yang, M.Q.; Deng, Y. A comparative study of different machine learning methods on microarray gene expression data. BMC Genom. 2008, 9 (Suppl. S1), S13. [Google Scholar] [CrossRef] [Green Version]
- Cao, J.; Qi, X.; Zhao, H. Modeling Gene Regulation Networks Using Ordinary Differential Equations. In Next Generation Microarray Bioinformatics: Methods and Protocols; Wang, J., Tan, A.C., Tian, T., Eds.; Humana Press: Totowa, NJ, USA, 2012; pp. 185–197. [Google Scholar] [CrossRef]
- Agostini, D.; Costanza, J.; Cutello, V.; Zammataro, L.; Krasnogor, N.; Pavone, M.; Nicosia, G. Effective calibration of artificial gene regulatory networks. In Proceedings of the 2011 11th European Conference on Artificial Life (ECAL), Paris, France, 8–12 August 2011; p. 11. [Google Scholar]
- Yang, B.; Bao, W.; Zhang, W.; Wang, H.; Song, C.; Chen, Y.; Jiang, X. Reverse engineering gene regulatory network based on complex-valued ordinary differential equation model. BMC Bioinform. 2021, 22, 448. [Google Scholar] [CrossRef]
- Huynh-Thu, V.A.; Irrthum, A.; Wehenkel, L.; Geurts, P. Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS ONE 2010, 5, e12776. [Google Scholar] [CrossRef] [PubMed]
- Huynh-Thu, V.A.; Geurts, P. dynGENIE3: Dynamical GENIE3 for the inference of gene networks from time-series expression data. Sci. Rep. 2018, 8, 3384. [Google Scholar] [CrossRef] [Green Version]
- Åkesson, J.; Lubovac-Pilav, Z.; Magnusson, R.; Gustafsson, M. ComHub: Community predictions of hubs in gene regulatory networks. BMC Bioinform. 2021, 22, 58. [Google Scholar] [CrossRef] [PubMed]
- Hartemink, A.J. Reverse engineering gene regulatory networks. Nat. Biotechnol. 2005, 23, 554–555. [Google Scholar] [CrossRef] [PubMed]
- Emmert-Streib, F.; Dehmer, M.; Haibe-Kains, B. Gene regulatory networks and their applications: Understanding biological and medical problems in terms of networks. Front. Cell Dev. Biol. 2014, 2, 38. [Google Scholar] [CrossRef] [Green Version]
- Emerson, J.J.; Li, W.H. The genetic basis of evolutionary change in gene expression levels. Philos. Trans. R. Soc. B Biol. Sci. 2010, 365, 2581–2590. [Google Scholar] [CrossRef] [Green Version]
- Davidson, E.; Levin, M. Gene regulatory networks. Proc. Natl. Acad. Sci. USA 2005, 102, 4935. [Google Scholar] [CrossRef]
- Glubb, D.M.; Innocenti, F. Mechanisms of genetic regulation in gene expression: Examples from drug metabolizing enzymes and transporters. WIREs Syst. Biol. Med. 2011, 3, 299–313. [Google Scholar] [CrossRef]
- Huynh-Thu, V.A.; Sanguinetti, G. Gene Regulatory Network Inference: An Introductory Survey. In Gene Regulatory Networks: Methods and Protocols; Sanguinetti, G., Huynh-Thu, V.A., Eds.; Springer: New York, NY, USA, 2019; pp. 1–23. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Z.; Lei, A.; Xu, L.; Chen, L.; Chen, Y.; Zhang, X.; Gao, Y.; Yang, X.; Zhang, M.; Cao, Y. Similarity in gene-regulatory networks suggests that cancer cells share characteristics of embryonic neural cells. J. Biol. Chem. 2017, 292, 12842–12859. [Google Scholar] [CrossRef] [Green Version]
- Vijesh, N.; Chakrabarti, S.K.; Sreekumar, J. Modeling of gene regulatory networks: A review. J. Biomed. Sci. Eng. 2013, 6, 9. [Google Scholar] [CrossRef] [Green Version]
- Hecker, M.; Lambeck, S.; Toepfer, S.; van Someren, E.; Guthke, R. Gene regulatory network inference: Data integration in dynamic models—A review. Biosystems 2009, 96, 86–103. [Google Scholar] [CrossRef]
- Wang, Y.R.; Huang, H. Review on statistical methods for gene network reconstruction using expression data. J. Theor. Biol. 2014, 362, 53–61. [Google Scholar] [CrossRef] [PubMed]
- Müller, U.R.; Nicolau, D.V. Microarray Technology and Its Applications; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
- Gebert, J.; Radde, N.; Weber, G.W. Modeling gene regulatory networks with piecewise linear differential equations. Eur. J. Oper. Res. 2007, 181, 1148–1165. [Google Scholar] [CrossRef]
- Al-Ghamdi, A.B.; Kamel, S.; Khayyat, M. Evaluation of Artificial Neural Networks Performance Using Various Normalization Methods for Water Demand Forecasting. In Proceedings of the 2021 National Computing Colleges Conference (NCCC), Taif, Saudi Arabia, 27–28 March 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Zito, F.; Cutello, V.; Pavone, M. Optimizing Multi-Variable Time Series Forecasting using Metaheuristics. In Proceedings of the 2022 14th Metaheuristics International Conference (MIC), Syracuse, Italy, 11–14 July 2022; Di Gaspero, L., Festa, P., Nakib, A., Pavone, M., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2023; Volume 13838. [Google Scholar] [CrossRef]
- Zito, F.; Cutello, V.; Pavone, M. Deep Learning and Metaheuristic for Multivariate Time-Series Forecasting. In Proceedings of the 2023 18th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO), Salamanca, Spain, 5–7 September 2023; Bringas, P.G., Pérez García, H., Martínez de Pisón, F.J., Martínez Álvarez, F., Troncoso Lora, A., Herrero, Á., Calvo Rolle, J.L., Quintián, H., Corchado, E., Eds.; Lecture Notes in Networks and Systems. Springer: Cham, Switzerland, 2023; Volume 749. [Google Scholar]
- Lee, S.; Kim, J.; Kang, H.; Kang, D.Y.; Park, J. Genetic Algorithm Based Deep Learning Neural Network Structure and Hyperparameter Optimization. Appl. Sci. 2021, 11, 744. [Google Scholar] [CrossRef]
- Thompson, P.A. An MSE statistic for comparing forecast accuracy across series. Int. J. Forecast. 1990, 6, 219–227. [Google Scholar] [CrossRef]
- Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef] [PubMed]
- Deep, K.; Singh, K.P.; Kansal, M.; Mohan, C. A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl. Math. Comput. 2009, 212, 505–518. [Google Scholar] [CrossRef]
- Cutello, V.; Pavone, M.; Zito, F. Inferring a Gene Regulatory Network from Gene Expression Data. An Overview of Best Methods and a Reverse Engineering Approach. In Computational Logic to Computational Biology; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 14070. [Google Scholar]
- Ronen, M.; Rosenberg, R.; Shraiman, B.I.; Alon, U. Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics. Proc. Natl. Acad. Sci. USA 2002, 99, 10555–10560. [Google Scholar] [CrossRef]
- Kamenšek, S.; Podlesek, Z.; Gillor, O.; Žgur-Bertok, D. Genes regulated by the Escherichia coli SOS repressor LexA exhibit heterogenous expression. BMC Microbiol. 2010, 10, 283. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Podlesek, Z.; Bertok, D.Ž. The DNA Damage Inducible SOS Response Is a Key Player in the Generation of Bacterial Persister Cells and Population Wide Tolerance. Front. Microbiol. 2020, 11, 1785. [Google Scholar] [CrossRef] [PubMed]
Name | Description | Source |
---|---|---|
DREAM3 | Artificial dataset generated by GeneNetWeaver. It contains experiments with 10 and 50 genes. Each experiment has 21 observations. | [7] |
DREAM4 | Artificial dataset generated by GeneNetWeaver. It contains experiments with 10 and 100 genes. Each experiment has 21 observations. | [11] |
SOS DNA Repair | Real dataset obtained by observing how the gene expression of 8 genes evolves over time. | [8] |
Neural Networks | Layers |
---|---|
Fully Connected Neural Network (FCNN) | • Input Layer (Input Size ) • Fully Connected Layer * • Activation Function * • Dropout Layer * • Fully Connected Layer * • Activation Function * • Fully Connected Layer * • Activation Function * • Fully Connected Layer (Output Size 1) • Output Layer |
Convolutional Neural Network (CNN) |
• Input Layer (Input Size ) • Convolutional Layer 1D (Filter 64, Kernel 2) • Activation Function * • Max Pooling 1D (Pool Size 2) • Fully Connected Layer * • Activation Function * • Dropout Layer * • Fully Connected Layer * • Activation Function * • Fully Connected Layer (Output Size 1) • Output Layer |
Recurrent Neural Network (RNN) | • Input Layer (Input Size ) • Long-Short-Term-Memory Units * • Activation Function * • Fully Connected Layer * • Activation Function * • Dropout Layer * • Fully Connected Layer * • Activation Function * • Fully Connected Layer (Output Size 1) • Output Layer |
ID | Name | Genes | (%) | (%) | ||
---|---|---|---|---|---|---|
1 | DREAM3_Ecoli_size10_1 | 10 | ||||
2 | DREAM3_Ecoli_size10_2 | 10 | ||||
3 | DREAM3_Ecoli_size10_3 | 10 | ||||
4 | DREAM3_Ecoli_size50_1 | 50 | ||||
5 | DREAM3_Ecoli_size50_2 | 50 | ||||
6 | DREAM3_Ecoli_size50_3 | 50 | ||||
7 | DREAM4_insilico_size10_1 | 10 | ||||
8 | DREAM4_insilico_size10_2 | 10 | ||||
9 | DREAM4_insilico_size10_3 | 10 | ||||
10 | DREAM4_insilico_size10_4 | 10 | ||||
11 | DREAM4_insilico_size10_5 | 10 | ||||
12 | DREAM4_insilico_size100_1 | 100 | ||||
13 | DREAM4_insilico_size100_2 | 100 | ||||
14 | DREAM4_insilico_size100_3 | 100 | ||||
15 | DREAM4_insilico_size100_4 | 100 | ||||
16 | DREAM4_insilico_size100_5 | 100 | ||||
17 | SOS DNA Repair | 8 |
Agent Type | Hyperparameter Optimization | (%) | (%) | ||
---|---|---|---|---|---|
FCNN | Yes | ||||
FCNN | No | ||||
CNN | Yes | ||||
CNN | No | ||||
RNN | Yes | ||||
RNN | No | ||||
Mixed | Yes |
Gene Names | uvrD | lexA | umuDC | recA | uvrA | uvrY | ruvA | polB |
---|---|---|---|---|---|---|---|---|
Agent Types | RNN | RNN | CNN | RNN | CNN | CNN | RNN | RNN |
uvrD | lexA | umuDC | recA | uvrA | polB | |
---|---|---|---|---|---|---|
uvrD | ||||||
lexA | ||||||
umuDC | ||||||
recA | ||||||
uvrA | ||||||
polB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zito, F.; Cutello, V.; Pavone, M. A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks. Entropy 2023, 25, 1214. https://doi.org/10.3390/e25081214
Zito F, Cutello V, Pavone M. A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks. Entropy. 2023; 25(8):1214. https://doi.org/10.3390/e25081214
Chicago/Turabian StyleZito, Francesco, Vincenzo Cutello, and Mario Pavone. 2023. "A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks" Entropy 25, no. 8: 1214. https://doi.org/10.3390/e25081214
APA StyleZito, F., Cutello, V., & Pavone, M. (2023). A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks. Entropy, 25(8), 1214. https://doi.org/10.3390/e25081214