A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data
Abstract
:1. Introduction
2. Recent Reviews of Artificial Intelligence Application in Healthcare
3. Application of Artificial Intelligence in Modern Healthcare
4. Cancer Classification with Machine Learning Method
4.1. Supervised Learning (SL)
4.2. Hybrid of Supervised and Unsupervised Learning (UL)
5. Recent Deep Learning Methods in Cancer Research
6. Healthcare Dataset for Cancer Classification
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Deepashri, K.S.; Kamath, A. Survey on Techniques of Data Mining and its Applications. Int. J. Emerg. Res. Manag. Technol. 2017, 6, 198–201. [Google Scholar]
- Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [Green Version]
- Singh, A.; Thakur, N.; Sharma, A. A review of supervised machine learning algorithms. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 March 2016; pp. 1310–1315. [Google Scholar]
- Jiang, F.; Jiang, Y.; Zhi, H.; Dong, Y.; Li, H.; Ma, S.; Wang, Y.; Dong, Q.; Shen, H.; Wang, Y. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol. 2017, 2, 230–243. [Google Scholar] [CrossRef] [PubMed]
- Alloghani, M.; Al-Jumeily, D.; Aljaaf, A.J.; Khalaf, M.; Mustafina, J.; Tan, S.Y. The Application of Artificial Intelligence Technology in Healthcare: A Systematic Review. In International Conference on Applied Computing to Support Industry: Innovation and Technology; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Murali, N.; Sivakumaran, N. Review Article Artificial Intelligence in Healthcare—A Review. Int. J. Modern Comput. Inf. Commun. Technol. 2018, 1, 103–110. [Google Scholar]
- Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015, 16, 321–332. [Google Scholar] [CrossRef] [Green Version]
- Chicco, D. Ten quick tips for machine learning in computational biology. BioData Min. 2017, 10, 1–17. [Google Scholar] [CrossRef] [PubMed]
- Goldenberg, S.L.; Nir, G.; Salcudean, S.E. A new era: Artificial intelligence and machine learning in prostate cancer. Nat. Rev. Urol. 2019, 16, 391–403. [Google Scholar] [CrossRef]
- Petegrosso, R.; Li, Z.; Kuang, R. Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Brief. Bioinform. 2019, 21, 1209–1223. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Qi, R.; Ma, A.; Ma, Q.; Zou, Q. Clustering and classification methods for single-cell RNA-sequencing data. Brief. Bioinform. 2019, 21, 1196–1208. [Google Scholar] [CrossRef]
- Arora, I.; Tollefsbol, T.O. Computational methods and next-generation sequencing approaches to analyze epigenetics data: Profiling of methods and applications. Methods 2021, 187, 92–103. [Google Scholar] [CrossRef] [PubMed]
- Zielinski, J.M.; Luke, J.J.; Guglietta, S.; Krieg, C. High Throughput Multi-Omics Approaches for Clinical Trial Evaluation and Drug Discovery. Front. Immunol. 2021, 12, 1–10. [Google Scholar] [CrossRef]
- Koteluk, O.; Wartecki, A.; Mazurek, S.; Kołodziejczak, I.; Mackiewicz, A. How do machines learn? Artificial intelligence as a new era in medicine. J. Pers. Med. 2021, 11, 32. [Google Scholar] [CrossRef] [PubMed]
- Avanzo, M.; Trianni, A.; Botta, F.; Talamonti, C.; Stasi, M.; Iori, M. Artificial intelligence and the medical physicist: Welcome to the machine. Appl. Sci. 2021, 11, 1691. [Google Scholar] [CrossRef]
- Yousef, M.; Kumar, A.; Bakir-Gungor, B. Application of biological domain knowledge based feature selection on gene expression data. Entropy 2021, 23, 2. [Google Scholar] [CrossRef]
- Hamzeh, O.; Alkhateeb, A.; Zheng, J.; Kandalam, S.; Rueda, L. Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data. BMC Bioinform. 2020, 21, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Pabby, G.; Kumar, N. A Review on Artificial Intelligence, Challenges Involved & Its Applications. Int. J. Adv. Res. Comput. Eng. Technol. 2017, 6, 1569–1573. [Google Scholar]
- Furey, T.S.; Cristianini, N.; Duffy, N.; Bednarski, D.W.; Schummer, M.; Haussler, D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000, 16, 906–914. [Google Scholar] [CrossRef]
- Inza, I.; Larrañaga, P.; Blanco, R.; Cerrolaza, A.J. Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 2004, 31, 91–103. [Google Scholar] [CrossRef]
- Saeys, Y.; Inza, I.; Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [Green Version]
- Dey, A. Machine Learning Algorithms: A Review. Int. J. Comput. Sci. Inf. Technol. 2016, 7, 1174–1179. [Google Scholar]
- Bhola, A.; Tiwari, A.K. Machine Learning Based Approaches for Cancer Classification Using Gene Expression Data. Mach. Learn. Appl. An Int. J. 2015, 2, 1–12. [Google Scholar] [CrossRef]
- Ray, R.; Abdullah, A.A.; Mallick, D.K. Classification of Benign and Malignant Breast Cancer using Supervised Machine Learning Algorithms Based on Image and Numeric Datasets Classification of Benign and Malignant Breast Cancer using Supervised Machine Learning Algorithms Based on Image and Nume. Int. Conf. Biomed. Eng. 2019. [Google Scholar] [CrossRef] [Green Version]
- Huo, Y.; Xin, L.; Kang, C.; Wang, M.; Ma, Q.; Yu, B. SGL-SVM: A novel method for tumor classification via support vector machine with sparse group Lasso. J. Theor. Biol. 2020, 486. [Google Scholar] [CrossRef] [PubMed]
- Remli, M.A.; Daud, K.M.; Nies, H.W.; Mohamad, M.S.; Deris, S.; Omatu, S.; Kasim, S.; Sulong, G. K-means clustering with infinite feature selection for classification tasks in gene expression data. In International Conference on Practical Applications of Computational Biology & Bioinformatics; Springer: Cham, Switzerland, 2017; Volume 616, pp. 50–57. [Google Scholar]
- Sinaga, K.P.; Yang, M.-S. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
- Kang, C.; Huo, Y.; Xin, L.; Tian, B.; Yu, B. Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J. Theor. Biol. 2019, 463, 77–91. [Google Scholar] [CrossRef]
- Statnikov, A.; Aliferis, C.F.; Tsamardinos, I.; Hardin, D.; Levy, S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2005, 21, 631–643. [Google Scholar] [CrossRef] [Green Version]
- Ayyad, S.M.; Saleh, A.I.; Labib, L.M. Gene expression cancer classification using modified K-Nearest Neighbors technique. BioSystems 2019, 176, 41–51. [Google Scholar] [CrossRef] [PubMed]
- Thamilselvan, P.; Sathiaseelan, J.G.R. An enhanced k nearest neighbor method to detecting and classifying MRI lung cancer images for large amount data. Int. J. Appl. Eng. Res. 2016, 11, 4223–4229. [Google Scholar]
- Kamel, H.; Abdulah, D.; Al-Tuwaijari, J.M. Cancer Classification Using Gaussian Naive Bayes Algorithm. In Proceedings of the 2019 International Engineering Conference (IEC), Erbil, Iraq, 23–25 June 2019; pp. 165–170. [Google Scholar] [CrossRef]
- Salmi, N.; Rustam, Z. Naïve Bayes Classifier Models for Predicting the Colon Cancer. IOP Conf. Ser. Mater. Sci. Eng. 2019, 546. [Google Scholar] [CrossRef]
- Nandhini, S.; Sofiyan, M.A.; Kumar, S.; Afridi, A. Skin Cancer Classification using Random Forest. Int. J. Manag. Humanit. 2019, 4, 39–42. [Google Scholar] [CrossRef]
- Aydadenta, H.; Adiwijaya, A. A clustering approach for feature selection in microarray data classification using random forest. J. Inf. Process. Syst. 2018, 14, 1167–1175. [Google Scholar] [CrossRef]
- Mohd, A.; Ram, G.K.; Shafeeq, A. Skin cancer classification using K-means clustering. Int. J. Tech. Res. Appl. 2017, 5, 62–65. [Google Scholar]
- Nurfalah, A.; Adiwijaya; Suryani, A.A. Cancer detection based on microarray data classification using PCA and modified back propagation. Far East J. Electron. Commun. 2016, 16, 269–281. [Google Scholar] [CrossRef]
- Kavitha, K.R.; Ram, A.V.; Anandu, S.; Karthik, S.; Kailas, S.; Arjun, N.M. PCA-based gene selection for cancer classification. In Proceedings of the 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Madurai, India, 13–15 December 2018. [Google Scholar] [CrossRef]
- Mert, A.; Kiliç, N.; Bilgili, E.; Akan, A. Breast cancer detection with reduced feature set. Comput. Math. Methods Med. 2015, 2015. [Google Scholar] [CrossRef]
- Sandhya, G.; Giri, K.; Savitri, S. A novel approach for the detection of tumor in MR images of the brain and its classification via independent component analysis and kernel support vector machine. Imaging Med. 2017, 9, 33–44. [Google Scholar]
- Sharma, S.; Rattan, M. An Improved Segmentation and Classifier Approach Based on HMM for Brain Cancer Detection. Open Biomed. Eng. J. 2019. [Google Scholar] [CrossRef]
- Mirzaei, F.; Parishan, M.R.; Faridafshin, M.; Faghihi, R.; Sina, S. Automated Brain Tumor Segmentation in Mr Images Using a Hidden Markov Classifier Framework Trained by Svd-Derived Features. ICTACT J. Image Video Process. 2018, 9, 1844–1848. [Google Scholar] [CrossRef]
- Nasteski, V. An overview of the supervised machine learning methods. Horizons B 2017, 4, 51–62. [Google Scholar] [CrossRef]
- Octaviani, T.L.; Rustam, Z. Random forest for breast cancer prediction. AIP Conf. Proc. 2019, 2168. [Google Scholar] [CrossRef]
- Liu, Y.; Bai, F.; Tang, Z.; Liu, N.; Liu, Q. Integrative transcriptomic, proteomic, and machine learning approach to identifying feature genes of atrial fibrillation using atrial samples from patients with valvular heart disease. BMC Cardiovasc. Disord. 2021, 21, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Hases, L.; Ibrahim, A.; Chen, X.; Liu, Y.; Hartman, J.; Williams, C. The importance of sex in the discovery of colorectal cancer prognostic biomarkers. Int. J. Mol. Sci. 2021, 22, 1354. [Google Scholar] [CrossRef] [PubMed]
- Mitrofanov, A.; Alkhnbashi, O.S.; Shmakov, S.A.; Makarova, K.S.; Koonin, E.V.; Backofen, R. CRISPRidentify: Identification of CRISPR arrays using machine learning approach. Nucleic Acids Res. 2021, 49. [Google Scholar] [CrossRef]
- Zhao, S.; Bao, Z.; Zhao, X.; Xu, M.; Li, M.D.; Yang, Z. Identification of Diagnostic Markers for Major Depressive Disorder Using Machine Learning Methods. Front. Neurosci. 2021, 15, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Shuwen, H.; Xi, Y.; Qing, Z.; Jing, Z.; Wei, W. Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models. Cancer Med. 2020, 9, 6667–6678. [Google Scholar] [CrossRef]
- Kim, B.H.; Yu, K.; Lee, P.C.W. Cancer classification of single-cell gene expression data by neural network. Bioinformatics 2020, 36, 1360–1366. [Google Scholar] [CrossRef]
- Jin, T.; Nguyen, N.D.; Talos, F.; Wang, D. ECMarker: Interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages. Bioinformatics 2021, 37, 1115–1124. [Google Scholar] [CrossRef] [PubMed]
- Auwul, R. A Robust Procedure for Machine Learning Algorithms Using Gene Expression Data. Biointerface Res. Appl. Chem. 2021, 12, 2422–2439. [Google Scholar]
- Mu, Q.; Wang, J. CNAPE: A Machine Learning Method for Copy Number Alteration Prediction from Gene Expression. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 306–311. [Google Scholar] [CrossRef]
- Huang, S.; Yang, J.; Fong, S.; Zhao, Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett. 2020, 471, 61–71. [Google Scholar] [CrossRef]
- Koumakis, L. Deep learning models in genomics; are we there yet? Comput. Struct. Biotechnol. J. 2020, 18, 1466–1473. [Google Scholar] [CrossRef]
- Avanzo, M.; Porzio, M.; Lorenzon, L.; Milan, L.; Sghedoni, R.; Russo, G.; Massafra, R.; Fanizzi, A.; Barucci, A.; Ardu, V.; et al. Artificial intelligence applications in medical imaging: A review of the medical physics research in Italy. Phys. Med. 2021, 83, 221–241. [Google Scholar] [CrossRef] [PubMed]
- Tabares-Soto, R.; Orozco-Arias, S.; Romero-Cano, V.; Bucheli, V.S.; Rodríguez-Sotelo, J.L.; Jiménez-Varón, C.F. A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Comput. Sci. 2020, 2020, 1–22. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, W.; Xie, L.; Han, J.; Guo, X. The application of deep learning in cancer prognosis prediction. Cancers 2020, 12, 603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Karim, M.R.; Beyan, O.; Zappa, A.; Costa, I.G.; Rebholz-Schuhmann, D.; Cochez, M.; Decker, S. Deep learning-based clustering approaches for bioinformatics. Brief. Bioinform. 2021, 22, 393–415. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kumar, A.; Singh, S.K.; Saxena, S.; Lakshmanan, K.; Sangaiah, A.K.; Chauhan, H.; Shrivastava, S.; Singh, R.K. Deep feature learning for histopathological image classification of canine mammary tumors and human breast cancer. Inf. Sci. 2020, 508, 405–421. [Google Scholar] [CrossRef]
- El Kaitouni, S.E.; Abbad, A.; Tairi, H. A breast tumors segmentation and elimination of pectoral muscle based on hidden markov and region growing. Multimed. Tools Appl. 2018, 77, 31347–31362. [Google Scholar] [CrossRef]
Classification Method | Dataset | Description of Ref. | Ref. | Advantages | Limitations |
---|---|---|---|---|---|
Support vector machine | ALLAML, DLBCL, Prostate Lung, Lymphoma, MLL, SRBCT, Stjude | Tumor classification using support vector machine and sparse group lasso | [25] |
|
|
MLL, Lymphoma, Brain, TOX_171, CNS, DLBCL, Lung | Feature selection and tumor classification for microarray data using the relaxed lasso and generalized multi class support vector machine | [28] | |||
cDNAs | Classification and validation of cancer tissue samples using microarray expression data | [19] | |||
Tumors, Brain Tumors, Leukemia, Lung, SRBCT, DLBCL | Using multicategory support vector machines (MC-SVMs) to the diagnose cancer from gene expression data | [29] | |||
K-Nearest Neighbors | Colon, Leukemia, Lung, Lymphoma-DLBCL, Ovarian, Prostate | Cancer classification using K-Nearest Neighbors for gene expression data | [30] |
|
|
Lung cancer MRI | Cancer detection and classification using an enhanced K-Nearest Neighbors for MRI lung cancer images | [31] | |||
Naïve Bayes | Wisconsin Breast Cancer dataset (WBCD), lung cancer | Cancer classification using gaussian naive bayes | [32] |
|
|
Colon cancer | Cancer prediction using naïve bayes model | [33] | |||
Random Forest | Wisconsin diagnostic breast cancer (WDBC) | Cancer prediction using Random Forest | [33] |
|
|
Dermatoscopic images | Skin cancer classification using Random Forest | [34] |
Classification Method | Dimension Reduction Method | Dataset | Description of Ref. | Ref. | Advantages | Limitations |
---|---|---|---|---|---|---|
Random Forest | K-means algorithm | Colon cancer, Lung cancer, Prostate tumor | A method of combining feature selection algorithm and classification algorithm using K-means and Random Forest | [35] |
|
|
Nearest Neighborhood, Support Vector Classifier, Nearest Mean Classifier | Skin cancer images | Skin cancer classification using K-means algorithm | [36] | |||
Modified Back Propagation (MBP) | Principal Component Analysis (PCA) | Ovarian, Colon, Leukemia | Cancer detection based on microarray data using PCA and MBP | [37] |
|
|
Support vector machine- Recursive Feature Elimination (SVM-RFE) | Leukemia, Colon, Breast Cancer | Gene selection using PCA for cancer classification | [38] | |||
k-nearest neighbor (k-NN), artificial neural network (ANN), radial basis function neural network (RBFNN), SVM | Independent Component Analysis (ICA) | Wisconsin diagnostic breast cancer (WDBC) | Feature reduction using ICA for breast cancer detection | [39] |
|
|
Kernel SVM (KSVM) | Brain tumor MRI | Brain tumor detection and classification in MR images using ICA and KSVM | [40] | |||
Hidden Markov model (HMM) | Scale Invariant Feature Transform | Brain tumor MRI | Brain tumor segmentation and classification based on HMM | [41] |
|
|
Singular Value Decomposition (SVD) | Brain tumor MRI | Brain tumor segmentation using HMRF | [42] |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mazlan, A.U.; Sahabudin, N.A.; Remli, M.A.; Ismail, N.S.N.; Mohamad, M.S.; Nies, H.W.; Abd Warif, N.B. A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data. Processes 2021, 9, 1466. https://doi.org/10.3390/pr9081466
Mazlan AU, Sahabudin NA, Remli MA, Ismail NSN, Mohamad MS, Nies HW, Abd Warif NB. A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data. Processes. 2021; 9(8):1466. https://doi.org/10.3390/pr9081466
Chicago/Turabian StyleMazlan, Aina Umairah, Noor Azida Sahabudin, Muhammad Akmal Remli, Nor Syahidatul Nadiah Ismail, Mohd Saberi Mohamad, Hui Wen Nies, and Nor Bakiah Abd Warif. 2021. "A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data" Processes 9, no. 8: 1466. https://doi.org/10.3390/pr9081466
APA StyleMazlan, A. U., Sahabudin, N. A., Remli, M. A., Ismail, N. S. N., Mohamad, M. S., Nies, H. W., & Abd Warif, N. B. (2021). A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data. Processes, 9(8), 1466. https://doi.org/10.3390/pr9081466