Cervical Cancer Diagnosis Using Stacked Ensemble Model and Optimized Feature Selection: An Explainable Artificial Intelligence Approach
Abstract
:1. Introduction
- Examining the impact of combining feature selection techniques alongside machine learning models towards cervical cancer prediction.
- Proposing stacking ensemble learning that combines different ML with meta-learner with SMOTETomek method to predict cervical cancer.
- Assessing the effectiveness of the provided model and evaluating it against different ensemble models and conventional ML models applying different evaluation matrices.
- When compared to other models, the performance of the suggested model gave better performed with REF.
- Enhancing the proposed approach through using techniques to deliver a comprehensible explanation.
2. Related Work
3. Methodology
3.1. Data Description
3.2. Data Processing
- Filling missing values in the cervical cancer dataset is crucial for the accuracy and reliability of ML models [32]. Methods include mode, mean, and median imputation. Mean imputation substitutes the mean value of the corresponding feature for missing data, and median imputation substitutes missing values with the median value. We remove features with 70% missing values, reduce the number of features from 36 to 32, and remove rows with more than 50% null values; the number of rows reduces from 858 to 763. Then, we applied median for filling missing values for numerical features and the mode for filling in missing values for categorical features.
- Dataset encoding converts non-numerical or categorical data into a numerical format. Encoding labels involves assigning a number value to each category of categorical data.
- We used min-max normalization, which scales the features to a fixed range.
3.3. Optimization Models
3.4. Class-Imbalanced Resampling Techniques
3.5. Feature Selection Methods
3.5.1. Filter Approach
3.5.2. Wrapper Approach
- Initially, the ML model is trained using the complete set of features available in the dataset.
- Importance scores are calculated for each feature based on the feature importance attribute. These scores quantify the relative significance of each feature concerning the model’s predictive performance.
- The feature with the lowest importance score is eliminated from the feature set.
- A new model is then trained using the remaining features.
- Steps 2–4 are iteratively repeated until the optimal number of features is selected.
- RFE can be used with any ML model that provides feature importance scores, such as DT, SVM, and LR.
3.5.3. Embedded Approach
3.6. Class-Imbalanced Resampling Techniques
3.7. Machine Learning Models
- Logistic Regression (LR) refers to a statistical technique that estimates outcomes according to a collection of input features. It is a generalized linear model that employs a logistic function (sigmoid function) to model the relationship between input features and the probability of result [46].
- Support Vector Machines (SVMs) operate by locating the hyperplane that effectively separates the data into two classes. The hyperplane is selected to maximize the margin, which refers to the distance between the closest points from each class and the hyperplane. The decision boundary is defined as the set of points that lie on the hyperplane. The data is transformed into a higher dimensional space using kernel functions [47].
- Decision Tree (DT) works by dividing the data into groups depending on the characteristics values. The objective is to build a tree that effectively separates the data into several classes or predicts the target variable. Every node in the tree indicates a feature, and the branches represent the potential values of the feature. The leaves represent the class label or the predicted values [48].
- Random Forest is an ensemble technique which employs many decision trees for improving the model’s accuracy and robustness. Using a randomly selected subset of features and data, it functions by creating a forest of decision trees. The predictions of all trees in the forest are combined to make the final prediction [49].
3.8. Ensemble Learning Models
- Bagging is the process of simultaneously training multiple models on different subsets of the training data then combining their predictions to generate the final prediction [9]. Multiple models are independently trained on distinct subsets of the training data, and their predictions are merged to get the best possible outcome [9]. Bagging minimizes prediction variance. It averages predictions from numerous models trained on different data subsets.
- Boosting, a sequential training technique, involves sequentially training multiple models. Each model is trained to rectify the errors made by the preceding models [50]. The fundamental foundation of boosting lies in the amalgamation of weak models, which serves as the cornerstone of this technique. Through this combination, a robust and accurate ensemble model is generated. The primary advantage of boosting resides in its ability to mitigate prediction bias by iteratively adjusting the weights assigned to misclassified instances; boosting emphasizes challenging or misclassified samples during subsequent rounds of model training. This adaptive approach enables the ensemble model to enhance performance and progressively achieve more precise predictions [50].
3.9. Proposed Stacking Ensemble Model
3.10. Explainable Artificial Intelligence XAI
3.11. Evaluating Models
- Accuracy, precision, recall, and F1-score among are the more frequently employed metrics to measure classification performance. These metrics calculations are explained by Equations (1) to (4).True negative (TN) indicate that the individual is healthy and the test is negative, whereas true positive (TP) indicate if the individual is sick and the test is positive. A false positive (FP) happens when a test results in a positive result while the subject is healthy. A false negative (FN) occurs when a test yields a negative result despite the fact that the person was sick.
- A receiver operating characteristic curve (ROC curve) provides a graph which depicts the efficacy of a classification model over all classification levels. The true-positive rate and false-positive rate [54] are plotted on this curve. Because the true-positive rate (TPR) is a synonym for recall, it is defined as follows:The following is how the false-positive rate (FPR) is determined:An ROC curve is generated while plotting both TPR and FPR at various categorization levels. More items are classified as positive when the classification threshold is lowered, which raises the number of both false positives and true positives. A typical ROC curve is depicted in the following figure [54]. The acronym AUC stands for “area under the ROC curve”. It measures the complete two-dimensional region below the entire ROC curve, from (0,0) to (1,1), distinguishing between classes measured by this parameter. Models with a higher AUC are better at predicting values [54].
4. Experiment Results
4.1. Experimental Setup
4.2. Feature Selection Results
4.2.1. Feature Scores Based on Chi2
4.2.2. Importance of Selected Features by Based Tree
4.2.3. The Ranking of Selected Features According to RFE
4.3. ML Results
Results of Chi2 for Models with Selected Features
4.4. Results of REF for Models with Selected Features
Results of Tree-Based Models with Selected Features
5. Discussion
5.1. The Best Models
5.2. Explainable Artificial Intelligence (XAI)
5.3. Model Results Comparison with the Literature
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- World Health Organization. Cervical-Cancer. 2023. Available online: https://www.who.int/news-room/fact-sheets/detail/cervical-cancer (accessed on 5 August 2023).
- Tanimu, J.J.; Hamada, M.; Hassan, M.; Kakudi, H.; Abiodun, J.O. A machine learning method for classification of cervical cancer. Electronics 2022, 11, 463. [Google Scholar] [CrossRef]
- Venkatesh, B.; Anuradha, J. A review of feature selection and its methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef]
- Gu, Q.; Li, Z.; Han, J. Generalized fisher score for feature selection. arXiv 2012, arXiv:1202.3725. [Google Scholar]
- Lin, X.; Li, C.; Zhang, Y.; Su, B.; Fan, M.; Wei, H. Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics. Molecules 2017, 23, 52. [Google Scholar] [CrossRef] [PubMed]
- He, Y.; Yu, H.; Yu, R.; Song, J.; Lian, H.; He, J.; Yuan, J. A correlation-based feature selection algorithm for operating data of nuclear power plants. Sci. Technol. Nucl. Install. 2021, 2021, 9994340. [Google Scholar] [CrossRef]
- Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
- Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Schapire, R.E. A brief introduction to boosting. Ijcai 1999, 99, 1401–1406. [Google Scholar]
- Saleh, H.; Mostafa, S.; Alharbi, A.; El-Sappagh, S.; Alkhalifah, T. Heterogeneous ensemble deep learning model for enhanced Arabic sentiment analysis. Sensors 2022, 22, 3707. [Google Scholar] [CrossRef]
- Rajagopal, S.; Kundapur, P.P.; Hareesha, K.S. A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. 2020, 2020, 4586875. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
- Lee, H.; Yune, S.; Mansouri, M.; Kim, M.; Tajmir, S.H.; Guerrier, C.E.; Ebert, S.A.; Pomerantz, S.R.; Romero, J.M.; Kamalian, S.; et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat. Biomed. Eng. 2019, 3, 173–182. [Google Scholar] [CrossRef] [PubMed]
- Al Mudawi, N.; Alazeb, A. A model for predicting cervical cancer using machine learning algorithms. Sensors 2022, 22, 4132. [Google Scholar] [CrossRef] [PubMed]
- Fatlawi, H.K. Enhanced classification model for cervical cancer dataset based on cost sensitive classifier. Int. J. Comput. Tech. 2017, 4, 115–120. [Google Scholar]
- Choudhury, A.; Wesabi, Y.; Won, D. Classification of cervical cancer dataset. arXiv 2018, arXiv:1812.10383. [Google Scholar]
- Razali, N.; Mostafa, S.A.; Mustapha, A.; Abd Wahab, M.H.; Ibrahim, N.A. Risk factors of cervical cancer using classification in data mining. J. Physics Conf. Ser. 2020, 1529, 022102. [Google Scholar] [CrossRef]
- Ali, M.M.; Ahmed, K.; Bui, F.M.; Paul, B.K.; Ibrahim, S.M.; Quinn, J.M.; Moni, M.A. Machine learning-based statistical analysis for early stage detection of cervical cancer. Comput. Biol. Med. 2021, 139, 104985. [Google Scholar] [CrossRef]
- Adem, K.; Kiliçarslan, S.; Cömert, O. Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification. Expert Syst. Appl. 2019, 115, 557–564. [Google Scholar] [CrossRef]
- Alsmariy, R.; Healy, G.; Abdelhafez, H. Predicting cervical cancer using machine learning methods. Int. J. Adv. Comput. Sci. Appl. 2020, 11. [Google Scholar] [CrossRef]
- Abdoh, S.F.; Rizka, M.A.; Maghraby, F.A. Cervical cancer diagnosis using random forest classifier with SMOTE and feature reduction techniques. IEEE Access 2018, 6, 59475–59485. [Google Scholar] [CrossRef]
- Asadi, F.; Salehnasab, C.; Ajori, L. Supervised algorithms of machine learning for the prediction of cervical cancer. J. Biomed. Phys. Eng. 2020, 10, 513. [Google Scholar]
- Wang, S.; Dai, Y.; Shen, J.; Xuan, J. Research on expansion and classification of imbalanced data based on SMOTE algorithm. Sci. Rep. 2021, 11, 24039. [Google Scholar] [CrossRef] [PubMed]
- Le, T.T.H.; Oktian, Y.E.; Kim, H. XGBoost for imbalanced multiclass classification-based industrial internet of things intrusion detection systems. Sustainability 2022, 14, 8707. [Google Scholar] [CrossRef]
- Yu, S.; Guo, J.; Zhang, R.; Fan, Y.; Wang, Z.; Cheng, X. A re-balancing strategy for class-imbalanced classification based on instance difficulty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 70–79. [Google Scholar]
- Jiang, P.; Suzuki, H.; Obi, T. XAI-based cross-ensemble feature ranking methodology for machine learning models. Int. J. Inf. Technol. 2023, 15, 1759–1768. [Google Scholar] [CrossRef]
- Le, T.T.H.; Kim, H.; Kang, H.; Kim, H. Classification and explanation for intrusion detection system based on ensemble trees and SHAP method. Sensors 2022, 22, 1154. [Google Scholar] [CrossRef]
- Chakir, O.; Rehaimi, A.; Sadqi, Y.; Alaoui, E.A.A.; Krichen, M.; Gaba, G.S.; Gurtov, A. An empirical assessment of ensemble methods and traditional machine learning techniques for web-based attack detection in industry 5.0. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 103–119. [Google Scholar] [CrossRef]
- Fernandes, K.C.J.; Fernandes, J. Cervical Cancer (Risk Factors). UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/dataset/383/cervical+cancer+risk+factors (accessed on 5 August 2023).
- Huang, J.; Li, Y.F.; Xie, M. An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 2015, 67, 108–127. [Google Scholar] [CrossRef]
- Hartini, E. Classification of missing values handling method during data mining. Sigma Epsil.-Bul. Ilm. Teknol. Keselam. Reakt. Nukl. 2018, 21. [Google Scholar]
- Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
- Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar]
- Brochu, E.; Cora, V.M.; De Freitas, N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv 2010, arXiv:1012.2599. [Google Scholar]
- Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 2017, 18, 559–563. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Zeng, M.; Zou, B.; Wei, F.; Liu, X.; Wang, L. Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. In Proceedings of the 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 28–29 May 2016; pp. 225–228. [Google Scholar]
- Khleel, N.A.A.; Nehéz, K. A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method. J. Intell. Inf. Syst. 2023, 60, 673–707. [Google Scholar] [CrossRef]
- SMOTETomek. 2023. Available online: https://imbalanced-learn.org/stable/references/generated/imblearn.combine.SMOTETomek.html (accessed on 5 August 2023).
- McHugh, M.L. The chi-square test of independence. Biochem. Medica 2013, 23, 143–149. [Google Scholar] [CrossRef]
- Germano, M. Turbulence: The filtering approach. J. Fluid Mech. 1992, 238, 325–336. [Google Scholar] [CrossRef]
- Jović, A.; Brkić, K.; Bogunović, N. A review of feature selection methods with applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar]
- Stańczyk, U. Feature evaluation by filter, wrapper, and embedded approaches. Feature Sel. Data Pattern Recognit. 2015, 29–44. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef]
- Suthaharan, S.; Suthaharan, S. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Springer: Berlin, Germany, 2016; pp. 207–235. [Google Scholar]
- Quinlan, J.R. Learning decision tree classifiers. ACM Comput. Surv. (CSUR) 1996, 28, 71–72. [Google Scholar] [CrossRef]
- Rigatti, S.J. Random forest. J. Insur. Med. 2017, 47, 31–39. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do we need to build explainable AI systems for the medical domain? arXiv 2017, arXiv:1712.09923. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Albini, E.; Long, J.; Dervovic, D.; Magazzeni, D. Counterfactual shapley additive explanations. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022; pp. 1054–1070. [Google Scholar]
- Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- SHAP Explainers. 2023. Available online: https://shap.readthedocs.io/en/latest/ (accessed on 5 August 2023).
- Matplotlib.pyplot. 2023. Available online: https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html (accessed on 5 August 2023).
Features Name | Abbreviation |
---|---|
Smokes | Smokes |
Hormonal Contraceptives | HC |
IUD | IUD |
STDs | STDs |
STDs:condylomatosis | CON |
STDs:vaginal condylomatosis | VC |
STDs:vulvo-perineal condylomatosis | VPC |
STDs:syphilis | SYP |
STDs:pelvic inflammatory disease | PID |
STDs:genital herpes | GH |
STDs:molluscum contagiosum | MC |
STDs:HIV | HIV |
STDs:Hepatitis B | HB |
STDs:HPV | STDs:HPV |
Dx:Cancer | Dx:Cancer |
Dx:CIN | Dx:CIN |
Dx:HPV | Dx:HPV |
Dx | Dx |
Hinselmann | Hinselmann |
Schiller | Schiller |
Citology | Citology |
Biopsy | Biopsy |
Age | Age |
Number of sexual partners | NSP |
First sexual intercourse | FSI |
Num of pregnancies | NPRE |
Smokes (years) | SY |
Smokes (packs/year) | SmokesPY |
Hormonal Contraceptives (years) | HC |
IUD (years) | IUD (years) |
STDs (number) | STDs (number) |
STDs: Number of diagnosis | ND |
Smokes | Smokes |
Hormonal Contraceptives | HC |
IUD | IUD |
STDs | STDs |
Approaches | Models | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
ML models | RF | 92.66 | 92.85 | 92.66 | 92.65 |
LR | 90.11 | 90.40 | 90.11 | 90.10 | |
DT | 91.81 | 91.92 | 91.81 | 91.80 | |
SVM | 90.96 | 90.98 | 90.96 | 90.96 | |
NB | 90.11 | 90.40 | 90.11 | 90.10 | |
Ensemble models | AdaBoost | 90.68 | 90.84 | 90.68 | 90.67 |
Bagging | 94.07 | 94.14 | 94.07 | 94.07 | |
Voting | 92.09 | 92.23 | 92.09 | 92.08 | |
Stacking | 96.05 | 96.07 | 96.05 | 96.04 |
Approaches | Models | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
ML models | RF | 98.87 | 98.87 | 98.87 | 98.87 |
LR | 89.83 | 89.08 | 89.83 | 89.81 | |
DT | 94.92 | 94.92 | 94.92 | 94.92 | |
SVM | 92.09 | 92.18 | 92.09 | 92.09 | |
NB | 90.11 | 90.40 | 90.11 | 90.10 | |
Ensemble models | AdaBoost | 94.63 | 94.65 | 94.63 | 94.63 |
Bagging | 97.46 | 97.53 | 97.46 | 97.46 | |
voting | 94.92 | 94.92 | 94.92 | 94.92 | |
Stacking | 99.44 | 99.44 | 99.44 | 99.44 |
Approaches | Models | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
ML models | RF | 98.87 | 98.87 | 98.87 | 98.87 |
LR | 89.83 | 89.08 | 89.83 | 89.81 | |
DT | 94.92 | 94.92 | 94.92 | 94.92 | |
SVM | 92.09 | 92.18 | 92.09 | 92.09 | |
NB | 90.11 | 90.40 | 90.11 | 90.10 | |
Ensemble models | AdaBoost | 94.63 | 94.65 | 94.63 | 94.63 |
Bagging | 97.46 | 97.53 | 97.46 | 97.46 | |
voting | 94.92 | 94.92 | 94.92 | 94.92 | |
Stacking | 99.44 | 99.44 | 99.44 | 99.44 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
AlMohimeed, A.; Saleh, H.; Mostafa, S.; Saad, R.M.A.; Talaat, A.S. Cervical Cancer Diagnosis Using Stacked Ensemble Model and Optimized Feature Selection: An Explainable Artificial Intelligence Approach. Computers 2023, 12, 200. https://doi.org/10.3390/computers12100200
AlMohimeed A, Saleh H, Mostafa S, Saad RMA, Talaat AS. Cervical Cancer Diagnosis Using Stacked Ensemble Model and Optimized Feature Selection: An Explainable Artificial Intelligence Approach. Computers. 2023; 12(10):200. https://doi.org/10.3390/computers12100200
Chicago/Turabian StyleAlMohimeed, Abdulaziz, Hager Saleh, Sherif Mostafa, Redhwan M. A. Saad, and Amira Samy Talaat. 2023. "Cervical Cancer Diagnosis Using Stacked Ensemble Model and Optimized Feature Selection: An Explainable Artificial Intelligence Approach" Computers 12, no. 10: 200. https://doi.org/10.3390/computers12100200
APA StyleAlMohimeed, A., Saleh, H., Mostafa, S., Saad, R. M. A., & Talaat, A. S. (2023). Cervical Cancer Diagnosis Using Stacked Ensemble Model and Optimized Feature Selection: An Explainable Artificial Intelligence Approach. Computers, 12(10), 200. https://doi.org/10.3390/computers12100200