RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning
Abstract
:1. Introduction
- To verify the performance of feature selection when classifying diseases using machine learning with RNA-seq data.
- To investigate how various feature selection methods affect the performance of machine learning.
- To compare the performance of machine learning according to various feature selection methods (genetic algorithms, information gain, wolf search algorithms).
2. Background
2.1. RNA Sequencing (RNA-seq)
2.2. Dataset Used for This Study
2.3. Feature Selection
2.4. Genetic Algorithm (GA)
2.5. Information Gain (IG)
2.6. Wolf Search Algorithm (WSA)
- Each wolf has a fixed radius, which represents the visual area in 2D. Coverage is the area of the circle by the fixed radius. Each wolf can detect a companion appearing within its coverage, and the distance that the wolf can move is smaller than the radius.
- The quality of the wolf is represented using the objective function. Wolves try to move to better terrain. If there are one or more better positions, the wolf will choose the best terrain to inhabit among the given options.
- Wolves can sense enemies and escape to random positions away from threats and beyond their fixed radius.
2.7. Machine Learning Algorithms for Classification
3. Method
3.1. Feature Selection Using Genetic Algorithms
3.2. Feature Selection Using Information Gain
3.3. Feature Selection Using Wolf Search Algorithm
3.4. Machine Learning Model Implementation
3.5. Model Evaluation
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
GA1 | Genetic algorithm using Equation (5) |
GA2 | Genetic algorithm using Equation (6) |
IG | Information gain |
WSA | Wolf search algorithm |
XGBoost | Extreme gradient boosting |
DNN | Deep neural network |
SVM | Support vector machine |
DT | Decision tree |
Appendix A. Comparison of the Performances of GA1 and GA2 with Various Number of Features
Number of Features | Model | Accuracy (%) | Std. | Precision | Recall | Time Cost (s) | ||
---|---|---|---|---|---|---|---|---|
GA1 | 25 | XGBoost | 62.73 | 5.30 | 64.87 | 61.31 | 60.13 | 1.13 |
DNN | 67.27 | 5.30 | 68.68 | 66.00 | 65.64 | 4.72 | ||
SVM | 70.91 | 4.64 | 71.56 | 69.47 | 69.26 | 0.66 | ||
DT | 56.36 | 8.43 | 52.84 | 54.02 | 51.16 | 0.06 | ||
50 | XGBoost | 68.18 | 5.75 | 67.95 | 67.66 | 67.42 | 0.94 | |
DNN | 78.18 | 5.30 | 81.81 | 77.50 | 77.20 | 5.04 | ||
SVM | 66.36 | 8.43 | 66.36 | 66.49 | 65.96 | 0.06 | ||
DT | 60.91 | 6.17 | 62.82 | 60.84 | 59.85 | 0.07 | ||
75 | XGBoost | 79.09 | 9.36 | 82.27 | 77.37 | 77.74 | 0.90 | |
DNN | 75.45 | 6.17 | 78.06 | 74.17 | 73.95 | 4.65 | ||
SVM | 79.09 | 10.60 | 79.69 | 78.92 | 78.64 | 0.07 | ||
DT | 69.09 | 7.82 | 68.68 | 68.12 | 67.77 | 0.09 | ||
100 | XGBoost | 80.91 | 8.81 | 80.94 | 80.53 | 81.36 | 1.02 | |
DNN | 85.45 | 5.30 | 86.28 | 85.00 | 85.24 | 10.84 | ||
SVM | 71.82 | 6.68 | 72.39 | 71.37 | 71.05 | 0.08 | ||
DT | 62.73 | 7.82 | 62.74 | 62.14 | 61.83 | 0.12 | ||
GA2 | 25 | XGBoost | 70.91 | 3.64 | 72.23 | 70.06 | 69.66 | 1.03 |
DNN | 61.82 | 3.64 | 79.46 | 58.00 | 50.46 | 4.45 | ||
SVM | 67.27 | 5.30 | 68.68 | 67.29 | 65.90 | 1.00 | ||
DT | 60.00 | 6.03 | 60.15 | 59.75 | 58.84 | 0.05 | ||
50 | XGBoost | 74.55 | 6.17 | 76.18 | 74.56 | 73.73 | 1.12 | |
DNN | 71.82 | 7.82 | 72.61 | 72.17 | 71.60 | 4.38 | ||
SVM | 73.64 | 12.33 | 74.67 | 73.68 | 72.63 | 0.35 | ||
DT | 68.18 | 10.76 | 69.33 | 68.70 | 67.66 | 0.08 | ||
75 | XGBoost | 69.09 | 9.27 | 73.79 | 68.91 | 66.56 | 0.99 | |
DNN | 84.55 | 7.39 | 86.69 | 83.67 | 83.72 | 4.86 | ||
SVM | 70.91 | 5.45 | 72.00 | 70.68 | 70.18 | 0.07 | ||
DT | 59.09 | 6.43 | 60.22 | 60.08 | 56.84 | 0.09 | ||
100 | XGBoost | 72.73 | 4.07 | 73.94 | 72.93 | 72.26 | 0.97 | |
DNN | 82.73 | 5.30 | 84.33 | 82.17 | 82.30 | 5.68 | ||
SVM | 69.09 | 4.45 | 69.17 | 69.20 | 68.60 | 0.09 | ||
DT | 63.64 | 4.98 | 61.93 | 62.05 | 60.89 | 0.10 |
References
- Borrageiro, G.; Haylett, W.; Seedat, S.; Kuivaniemi, H.; Bardien, S. A review of genome-wide transcriptomics studies in Parkinson’s disease. Eur. J. Neurosci. 2018, 47, 1–16. [Google Scholar] [CrossRef] [PubMed]
- Chatterjee, P.; Roy, D. Comparative analysis of RNA-Seq data from brain and blood samples of Parkinson’s disease. Biochem. Biophys. Res. Commun. 2017, 484, 557–564. [Google Scholar] [CrossRef] [PubMed]
- Hook, P.W.; McClymont, S.A.; Cannon, G.H.; Law, W.D.; Morton, A.J.; Goff, L.A.; McCallion, A.S. Single-cell RNA-Seq of mouse dopaminergic neurons informs candidate gene selection for sporadic Parkinson disease. Am. J. Hum. Genet. 2018, 102, 427–446. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Prashanth, R.; Roy, S.D.; Mandal, P.K.; Ghosh, S. High-accuracy detection of early Parkinson’s disease through multimodal features and machine learning. Int. J. Med. Inform. 2016, 90, 13–21. [Google Scholar] [CrossRef]
- Kakati, T.; Bhattacharyya, D.K.; Kalita, J.K.; Norden-Krichmar, T.M. DEGnext: Classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning. BMC Bioinform. 2022, 23, 1–18. [Google Scholar] [CrossRef]
- Urda, D.; Montes-Torres, J.; Moreno, F.; Franco, L.; Jerez, J.M. Deep learning to analyze RNA-seq gene expression data. In Proceedings of the International Work-Conference on Artificial Neural Networks, Cádiz, Spain, 25 January 2017; pp. 50–59. [Google Scholar]
- Xiao, Y.; Wu, J.; Lin, Z.; Zhao, X. A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data. Comput. Methods Programs Biomed. 2018, 166, 99–105. [Google Scholar] [CrossRef]
- Eshun, R.B.; Rabby, M.K.M.; Islam, A.K.; Bikdash, M.U. Histological classification of non-small cell lung cancer with RNA-seq data using machine learning models. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, Gainesville, FL, USA, 30 April 2021; pp. 1–7. [Google Scholar]
- Jiang, X.; Zhao, J.; Qian, W.; Song, W.; Lin, G.N. A generative adversarial network model for disease gene prediction with RNA-seq data. IEEE Access 2020, 8, 37352–37360. [Google Scholar] [CrossRef]
- Jiang, X.; Zhang, H.; Duan, F.; Quan, X. Identify Huntington’s disease associated genes based on restricted Boltzmann machine with RNA-seq data. BMC Bioinform. 2017, 18, 447. [Google Scholar] [CrossRef] [Green Version]
- Shokhirev, M.N.; Johnson, A.A. An integrative machine-learning meta-analysis of high-throughput omics data identifies age-specific hallmarks of Alzheimer’s disease. Ageing Res. Rev. 2022, 81, 101721. [Google Scholar] [CrossRef]
- Oh, S.L.; Hagiwara, Y.; Raghavendra, U.; Yuvaraj, R.; Arunkumar, N.; Murugappan, M.; Acharya, U.R. A deep learning approach for Parkinson’s disease diagnosis from EEG signals. Neural Comput. Appl. 2020, 32, 10927–10933. [Google Scholar] [CrossRef]
- Sivaranjini, S.; Sujatha, C. Deep learning based diagnosis of Parkinson’s disease using convolutional neural network. Multimed. Tools Appl. 2020, 79, 15467–15479. [Google Scholar] [CrossRef]
- Pahuja, G.; Nagabhushan, T.; Prasad, B. Early detection of Parkinson’s disease by using SPECT imaging and biomarkers. J. Intell. Syst. 2020, 29, 1329–1344. [Google Scholar] [CrossRef]
- Vásquez-Correa, J.C.; Arias-Vergara, T.; Orozco-Arroyave, J.R.; Eskofier, B.; Klucken, J.; Nöth, E. Multimodal assessment of Parkinson’s disease: A deep learning approach. IEEE J. Biomed. Health Inform. 2018, 23, 1618–1630. [Google Scholar] [CrossRef]
- Ghaheri, P.; Nasiri, H.; Shateri, A.; Homafar, A. Diagnosis of Parkinson’s Disease Based on Voice Signals Using SHAP and Hard Voting Ensemble Method. arXiv 2022, arXiv:2210.01205. [Google Scholar]
- Maskeliūnas, R.; Damaševičius, R.; Kulikajevas, A.; Padervinskis, E.; Pribuišis, K.; Uloza, V. A hybrid U-lossian deep learning network for screening and evaluating Parkinson’s disease. Appl. Sci. 2022, 12, 11601. [Google Scholar] [CrossRef]
- Yang, Y.; Yuan, Y.; Zhang, G.; Wang, H.; Chen, Y.-C.; Liu, Y.; Tarolli, C.G.; Crepeau, D.; Bukartyk, J.; Junna, M.R. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals. Nat. Med. 2022, 28, 2207–2215. [Google Scholar] [CrossRef]
- Kalari, K.R.; Nair, A.A.; Bhavsar, J.D.; O’Brien, D.R.; Davila, J.I.; Bockol, M.A.; Nie, J.; Tang, X.; Baheti, S.; Doughty, J.B. MAP-RSeq: Mayo analysis pipeline for RNA sequencing. BMC Bioinform. 2014, 15, 224. [Google Scholar] [CrossRef] [Green Version]
- Eswaran, J.; Cyanam, D.; Mudvari, P.; Reddy, S.D.N.; Pakala, S.B.; Nair, S.S.; Florea, L.; Fuqua, S.A.; Godbole, S.; Kumar, R. Transcriptomic landscape of breast cancers through mRNA sequencing. Sci. Rep. 2012, 2, 264. [Google Scholar] [CrossRef] [Green Version]
- Oh, I.-S.; Lee, J.-S.; Moon, B.-R. Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1424–1437. [Google Scholar]
- Siedlecki, W.; Sklansky, J. A note on genetic algorithms for large-scale feature selection. Pattern Recognit. Lett. 1989, 10, 335–347. [Google Scholar] [CrossRef]
- Sakri, S.B.; Rashid, N.B.A.; Zain, Z.M. Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 2018, 6, 29637–29647. [Google Scholar] [CrossRef]
- Suthar, V.; Vakharia, V.; Patel, V.K.; Shah, M. Detection of Compound Faults in Ball Bearings Using Multiscale-SinGAN, Heat Transfer Search Optimization, and Extreme Learning Machine. Machines 2022, 11, 29. [Google Scholar] [CrossRef]
- Nadimi-Shahraki, M.H.; Asghari Varzaneh, Z.; Zamani, H.; Mirjalili, S. Binary Starling Murmuration Optimizer Algorithm to Select Effective Features from Medical Data. Appl. Sci. 2022, 13, 564. [Google Scholar] [CrossRef]
- Zamani, H.; Nadimi-Shahraki, M.H.; Gandomi, A.H. Starling murmuration optimizer: A novel bio-inspired algorithm for global and engineering optimization. Comput. Methods Appl. Mech. Eng. 2022, 392, 114616. [Google Scholar] [CrossRef]
- Yamany, W.; Emary, E.; Hassanien, A.E. Wolf search algorithm for attribute reduction in classification. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Orlando, FL, USA, 9–12 December 2014; pp. 351–358. [Google Scholar]
- Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
- Li, Q.; Chen, H.; Huang, H.; Zhao, X.; Cai, Z.; Tong, C.; Liu, W.; Tian, X. An enhanced grey wolf optimization based feature selection wrapped kernel extreme learning machine for medical diagnosis. Comput. Math. Methods Med. 2017, 2017, 1–15. [Google Scholar] [CrossRef] [Green Version]
- de-Prado-Gil, J.; Palencia, C.; Jagadesh, P.; Martínez-García, R. A Comparison of Machine Learning Tools That Model the Splitting Tensile Strength of Self-Compacting Recycled Aggregate Concrete. Materials 2022, 15, 4164. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Wang, D.; Zhang, Y.; Zhao, Y. LightGBM: An effective miRNA classification method in breast cancer patients. In Proceedings of the 2017 International Conference on Computational Biology and Bioinformatics, Newark, NJ, USA, 18–20 October 2017; pp. 7–11. [Google Scholar]
- Li, W.; Yin, Y.; Quan, X.; Zhang, H. Gene expression value prediction based on XGBoost algorithm. Front. Genet. 2019, 10, 1077. [Google Scholar] [CrossRef] [Green Version]
- Sarkhani Benemaran, R.; Esmaeili-Falak, M.; Javadi, A. Predicting resilient modulus of flexible pavement foundation using extreme gradient boosting based optimised models. Int. J. Pavement Eng. 2022, 1–20. [Google Scholar] [CrossRef]
- Zararsız, G.; Goksuluk, D.; Korkmaz, S.; Eldem, V.; Zararsiz, G.E.; Duru, I.P.; Ozturk, A. A comprehensive simulation study on classification of RNA-Seq data. PLoS ONE 2017, 12, e0182507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Khalifa, N.E.M.; Taha, M.H.N.; Ali, D.E.; Slowik, A.; Hassanien, A.E. Artificial intelligence technique for gene expression by tumor RNA-Seq data: A novel optimized deep learning approach. IEEE Access 2020, 8, 22874–22883. [Google Scholar] [CrossRef]
- Xiao, Y.; Wu, J.; Lin, Z.; Zhao, X. A deep learning-based multi-model ensemble method for cancer prediction. Comput. Methods Programs Biomed. 2018, 153, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Dumitriu, A.; Golji, J.; Labadorf, A.T.; Gao, B.; Beach, T.G.; Myers, R.H.; Longo, K.A.; Latourelle, J.C. Integrative analyses of proteomics and RNA transcriptomics implicate mitochondrial processes, protein folding pathways and GWAS loci in Parkinson disease. BMC Med. Genom. 2015, 9, 5. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kakati, T.; Bhattacharyya, D.K.; Kalita, J.K. DEGnet: Identifying differentially expressed genes using deep neural network from RNA-Seq datasets. In Proceedings of the Pattern Recognition and Machine Intelligence: 8th International Conference, PReMI 2019, Tezpur, India, 17–20 December 2019; Proceedings, Part II. pp. 130–138. [Google Scholar]
- Kukurba, K.R.; Montgomery, S.B. RNA sequencing and analysis. Cold Spring Harb. Protoc. 2015, 2015, pdb-top084970. [Google Scholar] [CrossRef] [Green Version]
- Negi, A.; Shukla, A.; Jaiswar, A.; Shrinet, J.; Jasrotia, R.S. Applications and challenges of microarray and RNA-sequencing. Bioinformatics 2022, 91–103. [Google Scholar] [CrossRef]
- Wang, Z.; Gerstein, M.; Snyder, M. RNA-Seq: A revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009, 10, 57–63. [Google Scholar] [CrossRef]
- Kogenaru, S.; Yan, Q.; Guo, Y.; Wang, N. RNA-seq and microarray complement each other in transcriptome profiling. BMC Genom. 2012, 13, 629. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Wang, S.; Summers, R.M. Machine learning and radiology. Med. Image Anal. 2012, 16, 933–951. [Google Scholar] [CrossRef] [Green Version]
- Saeys, Y.; Inza, I.; Larranaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [Green Version]
- Ghosh, M.; Guha, R.; Sarkar, R.; Abraham, A. A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput. Appl. 2020, 32, 7839–7857. [Google Scholar] [CrossRef]
- Wah, Y.B.; Ibrahim, N.; Hamid, H.A.; Abdul-Rahman, S.; Fong, S. Feature Selection Methods: Case of Filter and Wrapper Approaches for Maximising Classification Accuracy. Pertanika J. Sci. Technol. 2018, 26. [Google Scholar]
- Liu, X.-Y.; Liang, Y.; Wang, S.; Yang, Z.-Y.; Ye, H.-S. A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access 2018, 6, 22863–22874. [Google Scholar] [CrossRef]
- Maldonado, S.; Weber, R. A wrapper method for feature selection using support vector machines. Inf. Sci. 2009, 179, 2208–2217. [Google Scholar] [CrossRef]
- Holland, J.H. Genetic algorithms. Sci. Am. 1992, 267, 66–73. [Google Scholar] [CrossRef]
- Mirjalili, S.; Song Dong, J.; Sadiq, A.S.; Faris, H. Genetic algorithm: Theory, literature review, and application in image reconstruction. Nat.-Inspired Optim. 2020, 69–85. [Google Scholar]
- Kumar, M.; Husain, D.; Upreti, N.; Gupta, D. Genetic algorithm: Review and application. SSRN 3529843 2010. [Google Scholar] [CrossRef]
- Eiben, A.E.; Raue, P.-E.; Ruttkay, Z. Genetic algorithms with multi-parent recombination. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Jerusalem, Israel, 21 September 1994; pp. 78–87. [Google Scholar]
- Whitley, D. A genetic algorithm tutorial. Stat. Comput. 1994, 4, 65–85. [Google Scholar] [CrossRef]
- Lei, S. A feature selection method based on information gain and genetic algorithm. In Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering, Hangzhou, China, 23–25 May 2012; pp. 355–358. [Google Scholar]
- Baobao, W.; Jinsheng, M.; Minru, S. An enhancement of K-Nearest Neighbor algorithm using information gain and extension relativity. In Proceedings of the 2008 International Conference on Condition Monitoring and Diagnosis, Beijing, China, 21–24 April 2008; pp. 1314–1317. [Google Scholar]
- Agbehadji, I.E.; Fong, S.; Millham, R. Wolf search algorithm for numeric association rule mining. In Proceedings of the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 5–7 July 2016; pp. 146–151. [Google Scholar]
- Tang, R.; Fong, S.; Yang, X.-S.; Deb, S. Wolf search algorithm with ephemeral memory. In Proceedings of the Seventh International Conference on Digital Information Management (ICDIM 2012), Macau, Macao, 22–24 August 2012; pp. 165–172. [Google Scholar]
- Li, J.; Fong, S.; Wong, R.K.; Millham, R.; Wong, K.K. Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Sci. Rep. 2017, 7, 4354. [Google Scholar] [CrossRef] [Green Version]
- Wei, J.; Chu, X.; Sun, X.Y.; Xu, K.; Deng, H.X.; Chen, J.; Wei, Z.; Lei, M. Machine learning in materials science. InfoMat 2019, 1, 338–358. [Google Scholar] [CrossRef] [Green Version]
- Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef] [Green Version]
- Machalek, D.; Quah, T.; Powell, K.M. A novel implicit hybrid machine learning model and its application for reinforcement learning. Comput. Chem. Eng. 2021, 155, 107496. [Google Scholar] [CrossRef]
- Fatahi, R.; Nasiri, H.; Dadfar, E.; Chehreh Chelgani, S. Modeling of energy consumption factors for an industrial cement vertical roller mill by SHAP-XGBoost: A” conscious lab” approach. Sci. Rep. 2022, 12, 7543. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.; Qian, L.; Mao, B.; Huang, C.; Huang, B.; Si, Y. A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access 2018, 6, 21020–21031. [Google Scholar] [CrossRef]
- Jiang, H.; He, Z.; Ye, G.; Zhang, H. Network intrusion detection based on PSO-XGBoost model. IEEE Access 2020, 8, 58392–58401. [Google Scholar] [CrossRef]
- Nasiri, H.; Homafar, A.; Chelgani, S.C. Prediction of uniaxial compressive strength and modulus of elasticity for Travertine samples using an explainable artificial intelligence. Results Geophys. Sci. 2021, 8, 100034. [Google Scholar] [CrossRef]
- Ogunleye, A.; Wang, Q.-G. XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 17, 2131–2140. [Google Scholar] [CrossRef]
- Song, K.; Yan, F.; Ding, T.; Gao, L.; Lu, S. A steel property optimization model based on the XGBoost algorithm and improved PSO. Comput. Mater. Sci. 2020, 174, 109472. [Google Scholar] [CrossRef]
- Lin, B.; Huang, Y.; Zhang, J.; Hu, J.; Chen, X.; Li, J. Cost-driven off-loading for DNN-based applications over cloud, edge, and end devices. IEEE Trans. Ind. Inform. 2019, 16, 5456–5466. [Google Scholar] [CrossRef] [Green Version]
- Wu, Y.-C.; Lee, Y.-S.; Yang, J.-C. Robust and efficient multiclass SVM models for phrase pattern recognition. Pattern Recognit. 2008, 41, 2874–2889. [Google Scholar] [CrossRef]
- Chapelle, O.; Haffner, P.; Vapnik, V.N. Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw. 1999, 10, 1055–1064. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Cai, N.; Pacheco, P.P.; Narrandes, S.; Wang, Y.; Xu, W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genom. Proteom. 2018, 15, 41–51. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Hussain, M.; Wajid, S.K.; Elzaart, A.; Berbar, M. A comparison of SVM kernel functions for breast cancer detection. In Proceedings of the 2011 Eighth International Conference Computer Graphics, Imaging and Visualization, Singapore, 17–19 August 2011; pp. 145–150. [Google Scholar]
- Kouziokas, G.N. SVM kernel based on particle swarm optimized vector and Bayesian optimized SVM in atmospheric particulate matter forecasting. Appl. Soft Comput. 2020, 93, 106410. [Google Scholar] [CrossRef]
- Butler, K.T.; Davies, D.W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine learning for molecular and materials science. Nature 2018, 559, 547–555. [Google Scholar] [CrossRef] [Green Version]
- Patel, H.H.; Prajapati, P. Study and analysis of decision tree based classification algorithms. Int. J. Comput. Sci. Eng. 2018, 6, 74–78. [Google Scholar] [CrossRef]
- Haupt, R.L. Optimum population size and mutation rate for a simple real genetic algorithm that optimizes array factors. In Proceedings of the IEEE Antennas and Propagation Society International Symposium. Transmitting Waves of Progress to the Next Millennium. 2000 Digest. Held in Conjunction with: USNC/URSI National Radio Science Meeting, Salt Lake City, UT, USA, 16–21 July 2000; pp. 1034–1037. [Google Scholar]
- Kim, Y.-H.; Yoon, Y. A genetic filter for cancer classification on gene expression data. Bio-Med. Mater. Eng. 2015, 26, S1993–S2002. [Google Scholar] [CrossRef] [Green Version]
- Fong, S.; Biuk-Aghai, R.P.; Millham, R.C. Swarm search methods in weka for data mining. In Proceedings of the 2018 10th International Conference on Machine Learning and Computing, Macau, China, 26–28 February 2018; pp. 122–127. [Google Scholar]
- Gnanambal, S.; Thangaraj, M.; Meenatchi, V.; Gayathri, V. Classification algorithms with attribute selection: An evaluation study using WEKA. Int. J. Adv. Netw. Appl. 2018, 9, 3640–3644. [Google Scholar]
- Hall, M.A. Correlation-Based Feature Subset Selection for Machine Learning. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand, 1998. [Google Scholar]
- Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June2013; pp. 1139–1147. [Google Scholar]
- Suthaharan, S. (Ed.) Decision Tree Learning. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Springer: Boston, MA, USA, 2016; pp. 237–269. [Google Scholar]
Analysis | Description | Control | PD | t-Test p |
---|---|---|---|---|
RNA-seq | Number of samples | 44 | 29 | - |
Age at death, years (range) | 70.00 (46–97) | 77.55 (64–95) | 4.6 × 10−3 | |
PMI, hours (range) | 14.36 (2–32) | 11.14 (1–31) | 1.7 × 10−1 | |
RIN (range) | 7.85 (6.0–9.1) | 7.07 (5.8–8.5) | 5.9 × 10−5 |
Model | Parameter | Explanation | Value |
---|---|---|---|
XGBoost | n_estimators | Specify the number of gradient boosted trees | 100 |
learning_rate | Determine the percentage of improvement in learning rates | 0.1 | |
gamma | Minimum loss reduction required to make a further partition on a leaf node of the tree | 0 | |
max_depth | Maximum tree depth for base learners | 6 | |
objective | Specify corresponding learning objectives | binary | |
SVM | kernel | Sets the type of kernel that the algorithm uses to classify | linear |
Layer | Output Shape | Param # |
---|---|---|
Input | 51, 100 | - |
Linear-1 | 51, 128 | 12,928 |
Dropout-2 | 51, 128 | 0 |
Linear-3 | 51, 128 | 16,512 |
Linear-4 | 51, 64 | 8256 |
Linear-5 | 51, 64 | 4160 |
Linear-6 | 51, 32 | 2090 |
Linear-7 | 51, 32 | 1056 |
Linear-8 | 51, 16 | 528 |
Linear-9 | 51, 1 | 17 |
Feature Selection Method | Number of Features | ML | Fold | Accuracy | Precision | Recall | Value between the Feature Selection and Non-Feature Selection | |
---|---|---|---|---|---|---|---|---|
IG | 50 | XGBoost | 1 | 86.36 | 89.28 | 86.36 | 86.10 | 2.57 × 10−4 |
2 | 90.90 | 90.83 | 90.83 | 90.83 | ||||
3 | 95.45 | 95.45 | 95.83 | 95.44 | ||||
4 | 81.81 | 83.03 | 80.83 | 81.19 | ||||
5 | 81.81 | 80.35 | 80.35 | 80.35 | ||||
GA1 | 100 | DNN | 1 | 90.90 | 92.85 | 90.00 | 90.59 | 3.14 × 10−13 |
2 | 81.81 | 81.66 | 81.66 | 81.66 | ||||
3 | 86.36 | 86.75 | 85.83 | 86.10 | ||||
4 | 90.90 | 92.85 | 90.00 | 90.59 | ||||
5 | 77.27 | 77.27 | 77.50 | 77.22 | ||||
IG | 50 | SVM | 1 | 72.72 | 79.46 | 77.27 | 76.84 | 4.38 × 10−1 |
2 | 90.90 | 90.83 | 90.83 | 90.83 | ||||
3 | 72.72 | 73.33 | 73.33 | 72.72 | ||||
4 | 72.72 | 79.52 | 75.83 | 76.03 | ||||
5 | 90.90 | 90.17 | 90.17 | 90.17 | ||||
WSA | 73 | DT | 1 | 68.18 | 68.33 | 68.18 | 68.11 | 2.41 × 10−4 |
2 | 95.45 | 95.45 | 95.83 | 95.44 | ||||
3 | 77.27 | 77.27 | 77.50 | 77.22 | ||||
4 | 81.81 | 83.03 | 80.83 | 81.19 | ||||
5 | 63.63 | 62.50 | 63.39 | 62.39 |
Number of Features | Model | Accuracy (%) | Precision | Recall | Time Cost (s) | ||
---|---|---|---|---|---|---|---|
Non-feature selection | 17,850 | XGBoost | 77.27 | 76.60 | 76.44 | 77.03 | 26.45 |
DNN | 47.27 | 50.00 | 32.06 | 23.64 | 16.74 | ||
SVM | 80.91 | 80.20 | 80.43 | 82.52 | 38.28 | ||
DT | 70.00 | 68.62 | 68.59 | 70.51 | 11.44 | ||
GA1 | 75 | XGBoost | 80.91 | 80.94 | 80.53 | 81.36 | 1.02 |
75 | DNN | 85.45 | 86.28 | 85.00 | 85.24 | 10.84 | |
50 | SVM | 79.09 | 79.69 | 78.92 | 78.64 | 0.07 | |
50 | DT | 69.09 | 68.68 | 68.12 | 67.77 | 0.09 | |
GA2 | 50 | XGBoost | 74.55 | 76.18 | 74.56 | 73.73 | 1.12 |
75 | DNN | 84.55 | 86.69 | 83.67 | 83.72 | 4.86 | |
50 | SVM | 73.64 | 74.67 | 73.68 | 72.63 | 0.35 | |
50 | DT | 68.18 | 69.33 | 68.70 | 67.66 | 0.08 | |
IG | 50 | XGBoost | 87.27 | 87.79 | 86.84 | 86.79 | 0.98 |
100 | DNN | 81.82 | 83.96 | 80.83 | 81.16 | 6.56 | |
50 | SVM | 81.82 | 82.67 | 81.49 | 81.32 | 0.06 | |
50 | DT | 77.27 | 78.90 | 77.37 | 76.43 | 0.08 | |
WSA | 73 | XGBoost | 86.36 | 89.04 | 84.95 | 85.46 | 0.77 |
73 | DNN | 85.45 | 87.18 | 84.50 | 84.89 | 4.61 | |
8358 | SVM | 80.00 | 81.55 | 79.10 | 79.33 | 5.85 | |
73 | DT | 77.27 | 77.32 | 77.15 | 76.88 | 0.09 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J.; Park, H.-J.; Yoon, Y. RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning. Appl. Sci. 2023, 13, 2698. https://doi.org/10.3390/app13042698
Kim J, Park H-J, Yoon Y. RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning. Applied Sciences. 2023; 13(4):2698. https://doi.org/10.3390/app13042698
Chicago/Turabian StyleKim, Jingeun, Hye-Jin Park, and Yourim Yoon. 2023. "RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning" Applied Sciences 13, no. 4: 2698. https://doi.org/10.3390/app13042698
APA StyleKim, J., Park, H. -J., & Yoon, Y. (2023). RNA Sequences-Based Diagnosis of Parkinson’s Disease Using Various Feature Selection Methods and Machine Learning. Applied Sciences, 13(4), 2698. https://doi.org/10.3390/app13042698