MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM
Abstract
:1. Introduction
1.1. Machine Learning Approach towards Mitochondria Proteins Identification
1.2. Deep Learning Approach towards Mitochondria Proteins Identification
- Considering the lack of effective vaccination, a rise in drug-resistant Plasmodium parasites, and the lethal nature of malaria, we propose a novel sequence-based framework MPPIF-Net to efficiently discriminate Plasmodium mitochondria and non-mitochondria proteins. The proposed model is useful in developing vaccines against malaria parasites.
- With the rising sequencing technology, the number of various proteins increases day by day with rapid acceleration in the protein databanks. In the aforementioned literature, researchers follow machine learning and computational techniques, which revealed inadequate performance while capturing contextual features from biological sequence patterns, yielding non-representative classifiers. In this study, we pursue a deep learning approach, which is capable of extracting contextual features and apply a sequence learning mechanism to efficiently classify the nature of proteins with the assistance of CNN and MBD-LSTM.
- Due to the unavailability of a large benchmark dataset of Plasmodium mitochondrial proteins, in this paper, we prepared a new dataset from the Uniprot site which contains both mitochondria and non-mitochondria proteins. The types of proteins mentioned in our dataset are passed from CD-Hit software to detect and remove similarity and short length proteins to optimally acquire a preprocessed and adoptable dataset.
- To validate the adoptability of our proposed model, we also made an extensive experimentation on the benchmark datasets, that is designed using mitochondrial proteins of another organism. The proposed model responded with convincing accuracy on this dataset, thereby validating the fact that our model is adoptable not only to the mitochondria proteins of the Plasmodium organism, but is trust-worthy to classify mitochondria proteins of other species as well.
2. Proposed Methodology
2.1. Raw Data Acquisition and Preprocessing
2.2. Encoding Protein Sequences
2.3. Embedding Layer
2.4. Convolution Layer
2.5. MBD-LSTM Layer
3. Results and Discussion
3.1. Datasets
3.2. Experimental Setup
3.3. Evaluation Metrics
3.4. Ablation Study on PF2095
3.5. Experimental Evaluation on PF175
3.6. Experimental Evaluation on MPD
3.7. Comparative Analysis of the MPPIF-NET with Other Models on PF175
3.8. Comparative Analysis of the MPPIF-NET with Other Models on MPD
4. Conclusions and Future Directions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Gazanion, E.; Vergnes, B. Protozoan parasite auxotrophies and metabolic dependencies. In Metabolic Interaction in Infection; Springer: Berlin/Heidelberg, Germany, 2018; pp. 351–375. [Google Scholar]
- Dundas, K.; Shears, M.J.; Sinnis, P.; Wright, G.J. Important extracellular interactions between Plasmodium sporozoites and host cells required for infection. Trends Parasitol. 2019, 35, 129–139. [Google Scholar] [CrossRef] [PubMed]
- Hou, X.S.; Wang, H.S.; Mugaka, B.P.; Yang, G.J.; Ding, Y. Mitochondria: Promising organelle targets for cancer diagnosis and treatment. Biomater. Sci. 2018, 6, 2786–2797. [Google Scholar] [CrossRef]
- Devine, M.J.; Kittler, J.T. Mitochondria at the neuronal presynapse in health and disease. Nat. Rev. Neurosci. 2018, 19, 63. [Google Scholar] [CrossRef] [PubMed]
- UniProtKB/Swiss-Prot UniProt 2019. Available online: https://www.uniprot.org/statistics/Swiss-Prot%202019_06 (accessed on 20 May 2020).
- Bender, A.; van Dooren, G.G.; Ralph, S.A.; McFadden, G.I.; Schneider, G. Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol. Biochem. Parasitol. 2003, 132, 59–66. [Google Scholar] [CrossRef] [PubMed]
- Verma, R.; Varshney, G.C.; Raghava, G.P.S. Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids 2010, 39, 101–110. [Google Scholar]
- Jia, C.; Liu, T.; Chang, A.K.; Zhai, Y. Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. Biochimie 2011, 93, 778–782. [Google Scholar] [CrossRef] [PubMed]
- Afridi, T.H.; Khan, A.; Lee, Y.S. Mito-GSAAC: Mitochondria prediction using genetic ensemble classifier and split amino acid composition. Amino Acids 2012, 42, 1443–1454. [Google Scholar] [CrossRef] [PubMed]
- Ding, H.; Li, D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids 2015, 47, 329–333. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.-L.; Li, Q.-Z.; Zhang, L.-Q. Using increment of diversity to predict mitochondrial proteins of malaria parasite: Integrating pseudo-amino acid composition and structural alphabet. Amino Acids 2012, 42, 1309–1316. [Google Scholar] [CrossRef] [PubMed]
- Cai, C.Z.; Han, L.Y.; Ji, Z.L.; Chen, X.; Chen, Y.Z. SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res. 2003, 31, 3692–3697. [Google Scholar] [CrossRef] [Green Version]
- Kumar, R.; Kumari, B.; Kumar, M. Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. Mitochondrion 2018, 42, 11–22. [Google Scholar] [CrossRef] [PubMed]
- Savojardo, C.; Bruciaferri, N.; Tartari, G.; Martelli, P.L.; Casadio, R. DeepMito: Accurate prediction of protein sub-mitochondrial localization using convolutional neural networks. Bioinformatics 2020, 36, 56–64. [Google Scholar] [CrossRef] [PubMed]
- Waris, M.; Ahmad, K.; Kabir, M.; Hayat, M. Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. Neurocomputing 2016, 199, 154–162. [Google Scholar] [CrossRef]
- Hayat, M.; Khan, A. MemHyb: Predicting membrane protein types by hybridizing SAAC and PSSM. J. Theor. Biol. 2012, 292, 93–102. [Google Scholar]
- Qu, Y.H.; Yu, H.; Gong, X.J.; Xu, J.H.; Lee, H.S. On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach. PLoS ONE 2017, 12, e0188129. [Google Scholar] [CrossRef] [Green Version]
- Zeng, H.; Edwards, M.D.; Liu, G.; Gifford, D.K. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 2016, 32, i121–i127. [Google Scholar] [CrossRef] [PubMed]
- Qiu, W.; Li, S.; Cui, X.; Yu, Z.; Wang, M.; Du, J.; Peng, Y.; Yu, B. Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J. Theor. Boil. 2018, 450, 86–103. [Google Scholar] [CrossRef] [PubMed]
- Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Zhu, L.; Bao, W.; Huang, D.S. Weakly-supervised convolutional neural network architecture for predicting protein-DNA binding. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018. [Google Scholar] [CrossRef]
- Melamud, O.; Goldberger, J.; Dagan, I. context2vec: Learning generic context embedding with bidirectional lstm. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 7–12 August 2016. [Google Scholar]
- Monteiro, N.R.; Ribeiro, B.; Arrais, J.P. Deep Neural Network Architecture for Drug-Target Interaction Prediction. In Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Khan, S.U.; Haq, I.U.; Rho, S.; Baik, S.W.; Lee, M.Y. Cover the Violence: A Novel Deep-Learning-Based Approach towards Violence-Detection in Movies. Appl. Sci. 2019, 9, 4963. [Google Scholar] [CrossRef] [Green Version]
- Haq, I.U.; Muhammad, K.; Ullah, A.; Baik, S.W. DeepStar: Detecting starring characters in movies. IEEE Access 2019, 7, 9265–9272. [Google Scholar] [CrossRef]
- Ullah, A.; Muhammad, K.; Del Ser, J.; Baik, S.W.; de Albuquerque, V.H.C. Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Trans. Ind. Electron. 2018, 66, 9692–9702. [Google Scholar] [CrossRef]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In A Field Guide to Dynamical Recurrent Neural Networks; IEEE Press: Linz, Austria, 2001. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar]
- Kwon, S. A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition. Sensors 2020, 20, 183. [Google Scholar]
- Hussain, T.; Muhammad, K.; Ullah, A.; Cao, Z.; Baik, S.W.; de Albuquerque, V.H.C. Cloud-Assisted Multiview Video Summarization Using CNN and Bidirectional LSTM. IEEE Trans. Ind. Inform. 2019, 16, 77–86. [Google Scholar] [CrossRef]
- Ullah, F.U.M.; Ullah, A.; Haq, I.U.; Rho, S.; Baik, S.W. Short-Term Prediction of Residential Power Energy Consumption via CNN and Multilayer Bi-directional LSTM Networks. IEEE Access 2019. [Google Scholar] [CrossRef]
- Tan, F.; Feng, X.; Fang, Z.; Li, M.; Guo, Y.; Jiang, L. Prediction of mitochondrial proteins based on genetic algorithm–partial least squares and support vector machine. Amino Acids 2007, 33, 669–675. [Google Scholar] [CrossRef]
- Bhasin, M.; Raghava, G. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res. 2004, 32 (suppl. 2), W414–W419. [Google Scholar] [CrossRef] [Green Version]
- Guda, C.; Fahy, E.; Subramaniam, S. MITOPRED: A genome-scale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics 2004, 20, 1785–1794. [Google Scholar] [CrossRef] [PubMed]
- Jiang, L.; Li, M.; Wen, Z.; Wang, K.; Diao, Y. Prediction of mitochondrial proteins using discrete wavelet transform. Protein J. 2006, 25, 241–249. [Google Scholar] [CrossRef] [PubMed]
Amino Acids | Letters | Code |
---|---|---|
Alanine | A | 1 |
Cysteine | C | 2 |
Aspartic | D | 3 |
Glutamic | E | 4 |
Phenylalanine | F | 5 |
Glycine | E | 6 |
Histidine | H | 7 |
Isoleucine | I | 8 |
Lysine | K | 9 |
Leucine | L | 10 |
Methionine | M | 11 |
Asparagine | N | 12 |
Proline | P | 13 |
Glutamine | Q | 14 |
Arginine | R | 15 |
Serine | S | 16 |
Threonine | T | 17 |
Valine | V | 18 |
Tryptophan | W | 19 |
Tyrosine | Y | 20 |
Dataset | Positive Sample | Negative Sample |
---|---|---|
PF175 | 40 | 135 |
PF2095 | 890 | 1205 |
MPD | 499 | 250 |
Dataset | Model | Training Accuracy | Testing Accuracy | Sensitivity | Specificity |
---|---|---|---|---|---|
PF2095 | CNN-GRU | 89.7 | 88.0 | 90.4 | 88.9 |
CNN-LSTM | 93.5 | 91.2 | 90.6 | 91.7 | |
CNN-MBD-LSTM (Proposed) | 98.2 | 97.6 | 98.1 | 97.2 | |
PF175 | CNN-MBD-LSTM (Proposed) | 100 | 97.1 | 100 | 96.2 |
MPD | CNN-MBD-LSTM (Proposed) | 99.7 | 99.5 | 99.3 | 100 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, S.U.; Baik, R. MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM. Processes 2020, 8, 725. https://doi.org/10.3390/pr8060725
Khan SU, Baik R. MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM. Processes. 2020; 8(6):725. https://doi.org/10.3390/pr8060725
Chicago/Turabian StyleKhan, Samee Ullah, and Ran Baik. 2020. "MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM" Processes 8, no. 6: 725. https://doi.org/10.3390/pr8060725
APA StyleKhan, S. U., & Baik, R. (2020). MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM. Processes, 8(6), 725. https://doi.org/10.3390/pr8060725