Role of Optimization in RNA–Protein-Binding Prediction
Abstract
:1. Introduction
2. Materials and Methods
2.1. Model Architecture
2.2. Preprocessing
3. Model Optimization
3.1. Grid Search
3.2. Random Search
- Define a hyperparameter search space;
- Specify the number of samples;
- Randomly select a combination of hyperparameters from the predefined search space;
- Train and evaluate the model;
- Select best configuration.
3.3. Bayesian Optimization
- Build a surrogate probability model of the objective function (often through a Gaussian process (GP));
- Find the hyperparameters that perform best on the surrogate;
- Apply these hyperparameters to the true objective function. An acquisition function is used to determine the next point to evaluate the objective function;
- Update the surrogate model incorporating the new results after the evaluation of the objective function;
- Repeat steps 2–4 until max iterations or time is reached.
4. Optimized RNA–Protein-Binding CNN Prediction Model
5. Results
5.1. Experimental Setup
5.2. Empirical Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ma, H.; Wen, H.; Xue, Z.; Li, G.; Zhang, Z. RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites. PLoS Comput. Biol. 2022, 18, e1010293. [Google Scholar] [CrossRef]
- Oliveira, C.; Faoro, H.; Alves, L.R.; Goldenberg, S. RNA-binding proteins and their role in the regulation of gene expression in Trypanosoma cruzi and Saccharomyces cerevisiae. Genet. Mol. Biol. 2017, 40, 22–30. [Google Scholar] [CrossRef]
- Qin, H.; Ni, H.; Liu, Y.; Yuan, Y.; Xi, T.; Li, X.; Zheng, L. RNA-binding proteins in tumor progression. J. Hematol. Oncol. 2020, 13, 90. [Google Scholar] [CrossRef]
- Gebauer, F.; Schwarzl, T.; Valcárcel, J.; Hentze, M.W. RNA-binding proteins in human genetic disease. Nat. Rev. Genet. 2021, 22, 185–198. [Google Scholar] [CrossRef]
- Li, D.; Zhang, J.; Li, X.; Chen, Y.; Yu, F.; Liu, Q. Insights into lncRNAs in Alzheimer’s disease mechanisms. RNA Biol. 2021, 18, 1037–1047. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, B.; Wang, Z.; Lehnert, K.; Gahegan, M. DeepPN: A deep parallel neural network based on convolutional neural network and graph convolutional network for predicting RNA-protein binding sites. BMC Bioinform. 2022, 23, 257. [Google Scholar] [CrossRef]
- Hellman, L.M.; Fried, M.G. Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions. Nat. Protoc. 2007, 2, 1849–1861. [Google Scholar] [CrossRef]
- Buenrostro, J.D.; Araya, C.L.; Chircus, L.M.; Layton, C.J.; Chang, H.Y.; Snyder, M.P.; Greenleaf, W.J. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 2014, 32, 562–568. [Google Scholar] [CrossRef]
- Lambert, N.; Robertson, A.; Jangi, M.; McGeary, S.; Sharp, P.A.; Burge, C.B. RNA Bind-n-Seq: Quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol. Cell 2014, 54, 887–900. [Google Scholar] [CrossRef]
- Garzia, A.; Meyer, C.; Morozov, P.; Sajek, M.; Tuschl, T. Optimization of PAR-CLIP for transcriptome-wide identification of binding sites of RNA-binding proteins. Methods 2017, 118, 24–40. [Google Scholar] [CrossRef]
- Tang, B.; Pan, Z.; Yin, K.; Khateeb, A. Recent advances of deep learning in bioinformatics and computational biology. Front. Genet. 2019, 10, 214. [Google Scholar] [CrossRef] [PubMed]
- Sapoval, N.; Aghazadeh, A.; Nute, M.G.; Antunes, D.A.; Balaji, A.; Baraniuk, R.; Barberan, C.J.; Dannenfelser, R.; Dun, C.; Edrisi, M.; et al. Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun. 2022, 13, 1728. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Huang, C.; Ding, L.; Li, Z.; Pan, Y.; Gao, X. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019, 166, 4–21. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Li, X. Pep-CNN: An improved convolutional neural network for predicting therapeutic peptides. Chemom. Intell. Lab. Syst. 2022, 221, 104490. [Google Scholar] [CrossRef]
- Fernandez-Castillo, E.; Barbosa-Santillán, L.I.; Falcon-Morales, L.; Sánchez-Escobar, J.J. Deep Splicer: A CNN Model for Splice Site Prediction in Genetic Sequences. Genes 2022, 13, 907. [Google Scholar] [CrossRef] [PubMed]
- Pan, X.; Shen, H.b. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform. 2017, 18, 136. [Google Scholar] [CrossRef]
- Pan, X.; Shen, H.B. Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 2018, 34, 3427–3436. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, Y.; Du, X. DFpin: Deep learning–based protein-binding site prediction with feature-based non-redundancy from RNA level. Comput. Biol. Med. 2022, 142, 105216. [Google Scholar] [CrossRef]
- Du, X.; Zhao, X.; Zhang, Y. DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning. J. Bioinform. Comput. Biol. 2022, 20, 2250006. [Google Scholar] [CrossRef]
- Jones, D.R.; Schonlau, M.; Welch, W.J. Efficient Global Optimization of Expensive Black-Box Functions. J. Glob. Optim. 1998, 13, 455–492. [Google Scholar] [CrossRef]
- Yu, T.; Zhu, H. Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv 2020, arXiv:2003.05689. [Google Scholar]
- Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
- Calvet, L.; Benito, S.; Juan, A.A.; Prados, F. On the role of metaheuristic optimization in bioinformatics. Int. Trans. Oper. Res. 2023, 30, 2909–2944. [Google Scholar] [CrossRef]
- Czarnecki, W.M.; Podlewska, S.; Bojarski, A.J. Robust optimization of SVM hyperparameters in the classification of bioactive compounds. J. Cheminform. 2015, 7, 38. [Google Scholar] [CrossRef]
- Mao, S.; Jiang, Y.; Mathew, E.B.; Kannan, S. BOAssembler: A Bayesian Optimization Framework to Improve RNA-Seq Assembly Performance. In Proceedings of the Algorithms for Computational Biology: 7th International Conference, AlCoB 2020, Missoula, MT, USA, 13–15 April 2020; pp. 188–197. [Google Scholar] [CrossRef]
- Rosa, S.S.; Nunes, D.; Antunes, L.; Prazeres, D.M.; Marques, M.P.; Azevedo, A.M. Maximizing mRNA vaccine production with Bayesian optimization. Biotechnol. Bioeng. 2022, 119, 3127–3139. [Google Scholar] [CrossRef] [PubMed]
- Quitadamo, A.; Johnson, J.; Shi, X. Bayesian hyperparameter optimization for machine learning based eQTL analysis. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Boston, MA, USA, 20–23 August 2017; pp. 98–106. [Google Scholar] [CrossRef]
- Iwano, N.; Adachi, T.; Aoki, K.; Nakamura, Y.; Hamada, M. Generative aptamer discovery using RaptGen. Nat. Comput. Sci. 2022, 2, 378–386. [Google Scholar] [CrossRef]
- Sato, K.; Hamada, M.; Mituyama, T.; Asai, K.; Sakakibara, Y. A non-parametric bayesian approach for predicting rna secondary structures. J. Bioinform. Comput. Biol. 2010, 8, 727–742. [Google Scholar] [CrossRef]
- Agarwal, A.; Singh, K.; Kant, S.; Bahadur, R.P. A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences. Comput. Struct. Biotechnol. J. 2022, 20, 3195–3207. [Google Scholar] [CrossRef]
- Chen, Y.C.; Sargsyan, K.; Wright, J.D.; Huang, Y.S.; Lim, C. Identifying RNA-binding residues based on evolutionary conserved structural and energetic features. Nucleic Acids Res. 2014, 42, e15. [Google Scholar] [CrossRef] [PubMed]
- Kim, O.T.P.; Yura, K.; Go, N. Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction. Nucleic Acids Res. 2006, 34, 6450–6460. [Google Scholar] [CrossRef] [PubMed]
- Pérez-Cano, L.; Fernández-Recio, J. Optimal protein-RNA area, OPRA: A propensity-based method to identify RNA-binding sites on proteins. Proteins Struct. Funct. Bioinform. 2010, 78, 25–35. [Google Scholar] [CrossRef]
- Wang, L.; Huang, C.; Yang, M.Q.; Yang, J.Y. BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol. 2010, 4, S3. [Google Scholar] [CrossRef]
- Kumar, M.; Gromiha, M.M.; Raghava, G.P.S. Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins Struct. Funct. Bioinform. 2008, 71, 189–194. [Google Scholar] [CrossRef]
- Hayashida, M.; Kamada, M.; Song, J.; Akutsu, T. Prediction of protein-RNA residue-base contacts using two-dimensional conditional random field with the lasso. BMC Syst. Biol. 2013, 7, S15. [Google Scholar] [CrossRef]
- Kashiwagi, S.; Sato, K.; Sakakibara, Y. A Max-Margin Model for Predicting Residue—Base Contacts in Protein–RNA Interactions. Life 2021, 11, 1135. [Google Scholar] [CrossRef]
- Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
- Snoek, J.; Rippel, O.; Swersky, K.; Kiros, R.; Satish, N.; Sundaram, N.; Patwary, M.; Prabhat, M.; Adams, R. Scalable bayesian optimization using deep neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, Lille, France, 6–11 July 2015; pp. 2171–2180. [Google Scholar]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Jiang, X.; Xu, C. Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data. J. Clin. Med. 2022, 11, 5772. [Google Scholar] [CrossRef]
- Liashchynskyi, P.; Liashchynskyi, P. Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS. arXiv 2019, arXiv:1912.06059. [Google Scholar]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 1 September 2023).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://tensorflow.org (accessed on 1 September 2023).
- Zhang, S.; Zhou, J.; Hu, H.; Gong, H.; Chen, L.; Cheng, C.; Zeng, J. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 2016, 44, e32. [Google Scholar] [CrossRef]
# | RBP | Optimized CNN Model |
---|---|---|
1 | ALKBH5 | 66.8 |
2 | C17ORF85 | 75.2 |
3 | C22ORF28 | 79.8 |
4 | CAPRIN1 | 76.8 |
5 | Ago2 | 77.6 |
6 | ELAVL1H | 91.26 |
7 | SFRS1 | 88.42 |
8 | HNRNPC | 92.68 |
9 | TDP43 | 90.25 |
10 | TIA1 | 84.89 |
11 | TIAL1 | 84.83 |
12 | Ago1-4 | 85.56 |
13 | ELAVL1B | 93.78 |
14 | ELAVL1A | 93.23 |
15 | EWSR1 | 88.4 |
16 | FUS | 93.2 |
17 | ELAVL1C | 94.42 |
18 | IGF2BP1-3 | 78.24 |
19 | MOV10 | 82.83 |
20 | PUM2 | 88.32 |
21 | QKI | 83.82 |
22 | TAF15 | 88.40 |
23 | PTB | 89.76 |
24 | ZC3H7B | 78.92 |
Mean | 85.30 |
Dataset | CNN Model (No Optimizer) | CNN+ Grid Search | CNN+ Bayesian Optimizer | CNN+ Random Optimization |
---|---|---|---|---|
HNRNPC | 84.9 | 68.4 | 90.28 | 92.68 |
C22ORF28 | 76.16 | 69.5 | 77.57 | 79.8 |
ELAVL1A | 88.931 | 71.1 | 88.97 | 93.23 |
ALGO2 | 54.92 | 71.2 | 70.14 | 77.62 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alsenan, S.; Al-Turaiki, I.; Aldayel, M.; Tounsi, M. Role of Optimization in RNA–Protein-Binding Prediction. Curr. Issues Mol. Biol. 2024, 46, 1360-1373. https://doi.org/10.3390/cimb46020087
Alsenan S, Al-Turaiki I, Aldayel M, Tounsi M. Role of Optimization in RNA–Protein-Binding Prediction. Current Issues in Molecular Biology. 2024; 46(2):1360-1373. https://doi.org/10.3390/cimb46020087
Chicago/Turabian StyleAlsenan, Shrooq, Isra Al-Turaiki, Mashael Aldayel, and Mohamed Tounsi. 2024. "Role of Optimization in RNA–Protein-Binding Prediction" Current Issues in Molecular Biology 46, no. 2: 1360-1373. https://doi.org/10.3390/cimb46020087
APA StyleAlsenan, S., Al-Turaiki, I., Aldayel, M., & Tounsi, M. (2024). Role of Optimization in RNA–Protein-Binding Prediction. Current Issues in Molecular Biology, 46(2), 1360-1373. https://doi.org/10.3390/cimb46020087