An Efficient Strategy for Blood Diseases Detection Based on Grey Wolf Optimization as Feature Selection and Machine Learning Techniques
Abstract
:1. Introduction
1.1. Related Studies
1.2. Motivations and Contributions
- 1
- Accurate and novel datasets with combination features for ALL classification.
- 2
- Several classification algorithms have been proposed with comparative results for determining the quality of the classification systems.
- 3
- Several preprocessing methods such as resizing, stretchlim, and adaptive thresholding algorithms have been applied to the dataset images in order to clearly show the abnormalities.
- 4
- Simpler classifier architecture compared with other techniques that have been used in ALL classification.
- 5
- The proposed model has achieved 99.69% accuracy.
- 6
- Low time-consuming diagnostic tests due to feature reduction.
- 7
- Features have been a reduction by using the grey wolf optimization algorithm in the feature selection part.
2. Materials
3. Methods
3.1. Image Preprocessing Module (IPM)
3.2. Feature Extraction Module (FEM)
- 1
- Energy: a measure of image homogeneity.
- 2
- Contrast: a different moment of the regional co-occurrence matrix that measures the contrast or the number of local variations in an image.
- 3
- Correlation: a measure of the image’s regional pattern liner dependencies.
- 4
- Homogeneity: returns a value indicating how close the distribution of elements in the GLCM is to the diagonal GLCM.
- 5
- Entropy: the measure of randomness or disorder in the images.
- 6
- Mean: measure of the image’s brightness by calculating the average value of pixels inside the region of interest.
- 7
- Standard deviation: used for deciding what is normal, extra-large, or extra-small.
- 8
- RMS: the root mean square.
- 9
- Smoothness: a measurement of grey level disparity that can be used to create relative smoothness recipes.
- 10
- Kurtosis: measure of the peak of the distribution of the intensity values around the mean.
- 11
- Skewness: assesses the absence of symmetry. The zero value shows that the intensity value distribution is moderately fair to both sides of the mean.
- 12
- Variance: average of squared differences from mean.
- 13
- IDM: measure homogeneity, used with grey image, measure grey level linear dependency of an image
3.3. Feature Selection Module (FSM)
3.3.1. Grey Wolf Optimization Algorithm
- 1
- Alpha wolves (α) are the hunters’ leaders that make the hunting decisions. They are the pack’s most dominant wolves because their actions are determined and must be followed by the rest of the pack. The alphas do not have to be the strongest members of the pack, but they must be the best at managing the entire pack.
- 2
- Beta wolves (β) occupy the second position in the hierarchy. A beta wolf advises the alpha, assisting those in deciding. If the alpha wolf dies or becomes old, the beta wolf takes his place. Their responsibility is to reinforce the pack’s alpha’s commands and to maintain discipline as one of the levels that are lower in the structure.
- 3
- Omega wolves (ω) occupy the lowest level of the hierarchy and serve as a scapegoat. They should surrender to the dominant wolves in the structure, and they should eat last.
- 4
- Delta wolf (δ), a wolf who is neither an alpha, beta, nor omega, is known as a subordinate wolf in the pack. Delta wolves report to alphas or betas, but they have authority over omega wolves [32].
3.3.2. Grey Wolf Optimization Mathematical Modeling
3.4. Classification Module (CM)
3.4.1. Support Vector Machine (SVM)
3.4.2. Random Forest (RF)
3.4.3. K-Nearest Neighbor (KNN)
3.4.4. Naïve Bayes (NB)
4. Results
4.1. The Demographic Characteristics:
- A
- The image was enhanced by using stretchlim and adaptive thresholing.
- B
- Extracting features was carried out by using feature extraction methods.
- C
- The number of features was reduced by using the grey wolf optimization technique.
- D
- Acute lymphoblastic leukemia was classified into benign and malignant using RF, SVM, KNN, and NB classifiers.
4.2. Performance Measures
- “TP” (true positives): the number of infected cases expected to be infected.
- “TN” (true negatives): the number of uninfected cases expected to be uninfected.
- “FP” (false positives): the number of uninfected cases expected to be infected.
- “FN” (false negatives): the number of infected cases expected to be uninfected.
- Note: In the proposed methodology the patient represents a positive class (0 is benign, 1 is malignant).
5. Comparative Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rezayi, S.; Mohammadzadeh, N.; Bouraghi, H.; Saeedi, S.; Mohammadpour, A. Timely Diagnosis of Acute Lymphoblastic Leukemia Using Artificial Intelligence-Oriented Deep Learning Methods. Comput. Intell. Neurosci. 2021, 2021, 5478157. [Google Scholar] [CrossRef] [PubMed]
- Kashef, A.; Khatibi, T.; Mehrvar, A. Treatment outcome classification of pediatric Acute Lymphoblastic Leukemia patients with clinical and medical data using machine learning: A case study at MAHAK hospital. Inform. Med. Unlocked 2020, 20, 100399. [Google Scholar] [CrossRef]
- Mondal, C.; Hasan, M.; Jawad, M.; Dutta, A.; Islam, M.; Awal, M.; Ahmad, M. Acute Lymphoblastic Leukemia Detection from Microscopic Images Using Weighted Ensemble of Convolutional Neural Networks. arXiv 2021, arXiv:2105.03995. [Google Scholar]
- Shafique, S.; Tehsin, S.; Anas, S.; Masud, F. Computer-assisted Acute Lymphoblastic Leukemia detection and diagnosis. In Proceedings of the 2019 2nd International Conference on Communication, Computing and Digital systems (C-CODE), Islamabad, Pakistan, 6–7 March 2019; pp. 184–189. [Google Scholar] [CrossRef]
- Ghaderzadeh, M.; Aria, M.; Hosseini, A.; Asadi, F.; Bashash, D.; Abolghasemi, H. A fast and efficient CNN model for B-ALL diagnosis and its subtypes classification using peripheral blood smear images. Int. J. Intell. Syst. 2021, 37, 5113–5133. [Google Scholar] [CrossRef]
- Aftab, M.O.; Awan, M.J.; Khalid, S.; Javed, R.; Shabir, H. Executing Spark BigDL for Leukemia Detection from Microscopic Images using Transfer Learning. In Proceedings of the 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 6–7 April 2021; pp. 216–220. [Google Scholar]
- Kasani, P.H.; Park, S.-W.; Jang, J.-W. An Aggregated-Based Deep Learning Method for Leukemic B-lymphoblast Classification. Diagnostics 2020, 10, 1064. [Google Scholar] [CrossRef]
- Sahlol, A.T.; Kollmannsberger, P.; Ewees, A.A. Efficient Classification of White Blood Cell Leukemia with Improved Swarm Optimization of Deep Features. Sci. Rep. 2020, 10, 2536. [Google Scholar] [CrossRef] [Green Version]
- Kassani, S.H.; Kassani, P.H.; Wesolowski, M.J.; Schneider, K.A.; Deters, R. A hybrid deep learning architecture for leukemic B-lymphoblast classification. In Proceedings of the 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea, 16–18 October 2019; pp. 271–276. [Google Scholar]
- Ahmed, N.; Yigit, A.; Isik, Z.; Alpkocak, A. Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network. Diagnostics 2019, 9, 104. [Google Scholar] [CrossRef] [Green Version]
- Alagu, S. Automatic Detection of Acute Lymphoblastic Leukemia Using UNET Based Segmentation and Statistical Analysis of Fused Deep Features. Appl. Artif. Intell. 2021, 35, 1952–1969. [Google Scholar] [CrossRef]
- Kumar, D.; Jain, N.; Khurana, A.; Mittal, S.; Satapathy, S.C.; Senkerik, R.; Hemanth, J.D. Automatic Detection of White Blood Cancer from Bone Marrow Microscopic Images Using Convolutional Neural Networks. IEEE Access 2020, 8, 142521–142531. [Google Scholar] [CrossRef]
- Rawat, J.; Singh, A.; Bhadauria, H.S.; Virmani, J.; Devgun, J.S. Classification of acute lymphoblastic leukaemia using hybrid hierarchical classifiers. Multimedia Tools Appl. 2017, 76, 19057–19085. [Google Scholar] [CrossRef]
- Sarki, R.; Ahmed, K.; Wang, H.; Zhang, Y.; Ma, J.; Wang, K. Image Preprocessing in Classification and Identification of Diabetic Eye Diseases. Data Sci. Eng. 2021, 6, 455–471. [Google Scholar] [CrossRef]
- Shereena, V.B.; David, J.M. Content Based Image Retrieval: A Review. Comput. Sci. Inf. Technol. 2014, 65–77. [Google Scholar] [CrossRef]
- Ojala, T.; Rautiainen, M.; Matinmikko, E.; Aittola, M. Semantic image retrieval with HSV correlograms. In Proceedings of the Scandinavian conference on Image Analysis, Bergen, Norway, 11–14 June 2001; pp. 621–627. [Google Scholar]
- Tigistu, T.; Abebe, G. Classification of rose flowers based on Fourier descriptors and color moments. Multimedia Tools Appl. 2021, 80, 36143–36157. [Google Scholar] [CrossRef]
- Damayanti, F.; Muntasa, A.; Herawati, S.; Yusuf, M.; Rachmad, A. Identification of Madura Tobacco Leaf Disease Using Gray-Level Co-Occurrence Matrix, Color Moments and Naïve Bayes. J. Phys. Conf. Ser. 2020, 1477, 052054. [Google Scholar] [CrossRef]
- Singh, R.; Goel, A.; Raghuvanshi, D.K. Computer-aided diagnostic network for brain tumor classification employing modulated Gabor filter banks. Vis. Comput. 2021, 37, 2157–2171. [Google Scholar] [CrossRef]
- Sultan, S.; Ghanim, M.F. Human Retina Based Identification System Using Gabor Filters and GDA Technique. J. Commun. Softw. Syst. 2020, 16, 243–253. [Google Scholar] [CrossRef]
- Starosolski, R. Hybrid Adaptive Lossless Image Compression Based on Discrete Wavelet Transform. Entropy 2020, 22, 751. [Google Scholar] [CrossRef] [PubMed]
- Osadchiy, A.; Kamenev, A.; Saharov, V.; Chernyi, S. Signal Processing Algorithm Based on Discrete Wavelet Transform. Designs 2021, 5, 41. [Google Scholar] [CrossRef]
- Iqbal, N.; Mumtaz, R.; Shafi, U.; Zaidi, S.M.H. Gray level co-occurrence matrix (GLCM) texture based crop classi-fication using low altitude remote sensing platforms. PeerJ Comput. Sci. 2021, 7, e536. [Google Scholar] [CrossRef]
- Albregtsen, F. Statistical Texture Measures Computed from Gray Level Coocurrence Matrices; Image Processing Laboratory, Department of Informatics, University of Oslo: Oslo, Norway, 2008; Volume 5. [Google Scholar]
- Hariprasath, S.; Dharani, T.; Santhi, M. Detection of acute lymphocytic leukemia using statistical features. In Proceedings of the 4th International Conference on Current Research in Engineering Science and Technology, Tamil Nadu, India, 8 March 2019. [Google Scholar]
- Mutlag, W.K.; Ali, S.K.; Aydam, Z.M.; Taher, B.H. Feature Extraction Methods: A Review. J. Phys. Conf. Ser. 2020, 1591, 012028. [Google Scholar] [CrossRef]
- Al-Tashi, Q.; Rais, H.; Abdulkadir, S.J.; Mirjalili, S.; Alhussian, H. A Review of Grey Wolf Optimizer-Based Feature Selection Methods for Classification. In Evolutionary Machine Learning Techniques. Algorithms for Intelligent Systems; Mirjalili, S., Faris, H., Aljarah, I., Eds.; Springer: Singapore, 2020; pp. 273–286. [Google Scholar] [CrossRef]
- Abdollahzadeh, B.; Gharehchopogh, F.S. A multi-objective optimization algorithm for feature selection problems. Eng. Comput. 2021, 38, 1845–1863. [Google Scholar] [CrossRef]
- Hu, P.; Pan, J.-S.; Chu, S.-C. Improved Binary Grey Wolf Optimizer and Its application for feature selection. Knowl.-Based Syst. 2020, 195, 105746. [Google Scholar] [CrossRef]
- Hu, P.; Pan, J.-S.; Chu, S.-C.; Chai, Q.-W.; Liu, T.; Li, Z.-C. New Hybrid Algorithms for Prediction of Daily Load of Power Network. Appl. Sci. 2019, 9, 4514. [Google Scholar] [CrossRef] [Green Version]
- Raj, S.; Bhattacharyya, B. Optimal placement of TCSC and SVC for reactive power planning using Whale optimization algorithm. Swarm Evol. Comput. 2018, 40, 131–143. [Google Scholar] [CrossRef]
- Kumar, S.; Singh, M. Breast Cancer Detection Based on Feature Selection Using Enhanced Grey Wolf Optimizer and Support Vector Machine Algorithms. Viet. J. Comput. Sci. 2021, 8, 177–197. [Google Scholar] [CrossRef]
- Almazini, H.; Ku-Mahamud, K.R. Grey Wolf Optimization Parameter Control for Feature Selection in Anomaly Detection. Int. J. Intell. Eng. Syst. 2021, 14, 474–483. [Google Scholar] [CrossRef]
- Chawla, V.; Chanda, A.K.; Angra, S. The scheduling of automatic guided vehicles for the workload balancing and travel time minimi-zation in the flexible manufacturing system by the nature-inspired algorithm. J. Proj. Manag. 2019, 4, 19–30. [Google Scholar] [CrossRef]
- Kitonyi, P.M.; Segera, D.R. Hybrid Gradient Descent Grey Wolf Optimizer for Optimal Feature Selection. BioMed Res. Int. 2021, 2021, 2555622. [Google Scholar] [CrossRef]
- Shiva, C.K.; Gudadappanavar, S.S.; Vedik, B.; Babu, R.; Raj, S.; Bhattacharyya, B. Fuzzy-Based Shunt VAR Source Placement and Sizing by Oppositional Crow Search Algorithm. J. Control. Autom. Electr. Syst. 2022, 33, 1576–1591. [Google Scholar] [CrossRef]
- Shekarappa, G.S.; Mahapatra, S.; Raj, S. Voltage constrained reactive power planning problem for reactive loading variation using hybrid harris hawk particle swarm optimizer. Electr. Power Compon. Syst. 2021, 49, 421–435. [Google Scholar] [CrossRef]
- Balaraman, S. Comparison of Classification Models for Breast Cancer Identification using Google Colab. Preprints 2020, 2020050328. [Google Scholar] [CrossRef]
- Sanlı, T.; Sıcakyüz, Ç.; Yüregir, O.H. Comparison of the accuracy of classification algorithms on three data-sets in data mining: Example of 20 classes. Int. J. Eng. Sci. Technol. 2020, 12, 81–89. [Google Scholar] [CrossRef]
- Amancio, D.R.; Comin, C.; Casanova, D.; Travieso, G.; Bruno, O.; Rodrigues, F.; Costa, L.D.F. A Systematic Comparison of Supervised Classifiers. PLoS ONE 2014, 9, e94137. [Google Scholar] [CrossRef]
- Bafjaish, S.S. Comparative Analysis of Naive Bayesian Techniques in Health-Related for Classification Task. J. Soft Comput. Data Min. 2020, 1, 1–10. [Google Scholar]
- Bibi, N.; Sikandar, M.; Ud Din, I.; Almogren, A.; Ali, S. IoMT-based automated detection and classification of leukemia using deep learning. J. Healthc. Eng. 2020, 2020, 6648574. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Z.; Dong, Z.; Wang, L.; Jiang, W. Method for Diagnosis of Acute Lymphoblastic Leukemia Based on ViT-CNN Ensemble Model. Comput. Intell. Neurosci. 2021, 2021, 7529893. [Google Scholar] [CrossRef]
Sort | No. of Samples | No. of Patients | Image Size |
---|---|---|---|
Benign | 504 | 25 | 224 × 224 |
Malignant early Pre-B | 918 | 20 | 224 × 224 |
Malignant Pre-B | 963 | 21 | 224 × 224 |
Malignant Pro-B | 804 | 23 | 224 × 224 |
Total of samples | 3189 | 89 |
Type | No. of Samples | No. of Patients | No. of Training Samples | No. of Testing Samples |
---|---|---|---|---|
Benign | 504 | 25 | 403 | 101 |
Malignant | 2685 | 64 | 2148 | 537 |
Total number | 3189 | 89 | 2551 | 638 |
Monitor | Equation | Description |
Precision | TP/(TP + FP) | The metric represents the number of correct positive classes. |
Sensitivity | TP/(TP + FN) | The metric represents the number of correct positive class classifications made out of all positive class classifications. |
Accuracy | (TP + TN)/(TP + TN + FP + FN) | The classifier’s ability to correctly classify the class label |
Error (ER) | 1-Accuracy | The proportion of incorrect classifications |
F1-score | 2 × [(Precision * Recall)/(Precision + Recall)] | Metric for combining recall and precision into a single score that takes into account both properties |
Specificity | TN/(TN + FP) | The ability of correctly identify people without illness |
RF | SVM | KNN | NB | |||||
---|---|---|---|---|---|---|---|---|
Benign | Malignant | Benign | Malignant | Benign | Malignant | Benign | Malignant | |
Precision | 99% | 99% | 93% | 96% | 88% | 96% | 74% | 91% |
AVG of precision | 99% | 94.5% | 92% | 82.5% | ||||
Recall | 92% | 100% | 80% | 99% | 79% | 98% | 49% | 97% |
AVG of recall | 96% | 89.5% | 88.5% | 73% | ||||
F1-score | 95% | 99% | 86% | 98% | 83% | 97% | 59% | 94% |
AVG of F1-score | 97% | 92% | 90% | 76.5% | ||||
Accuracy | 99% | 96% | 95% | 89% |
RF | SVM | KNN | NB | |||||
---|---|---|---|---|---|---|---|---|
Benign | Malignant | Benign | Malignant | Benign | Malignant | Benign | Malignant | |
Precision | 99% | 100% | 96% | 99% | 96% | 99% | 80% | 96% |
AVG of precision | 99.5% | 97.5% | 97.5% | 88% | ||||
Recall | 99% | 100% | 93% | 99% | 95% | 99% | 76% | 96% |
AVG of recall | 99.5% | 96% | 97% | 86% | ||||
F1-score | 99% | 100% | 94% | 99% | 96% | 99% | 78% | 96% |
AVG of F1-score | 99.5% | 96.5% | 97.5% | 87% | ||||
Accuracy | 99.69% | 98.75% | 98.59% | 92.79% |
Author | Dataset | Algorithm | Accuracy | Precision | Sensitivity | F1-Score | Specificity |
---|---|---|---|---|---|---|---|
Sorayya et al. [1] | 12,528 images | VGG-16 | 84.6% | 85% | 83.5% | 84% | |
ResNet-50 | 81.6% | 84% | 82.5% | 83.5% | |||
KNN | 77.9% | 75% | 75% | 75% | |||
Proposed CNN | 82.1% | 83.5% | 81% | 82% | |||
Amirarash et al. [2] | ALL 241 patients | SVM | 94.9% | 90.2% | 86.2% | 88.2% | |
RF | 90.9% | 79.6% | 92.2% | 85.4% | |||
Sarmad et al. [4] | ALL-IDB 1 108 images | SVM | 93.7% | ||||
Payam et al. [7] | ISBT2019 | DCNN | 96.6% | 96.9% | 91.8% | 94.7% | |
Sara et al. [9] | ISBI | DCNN | 96.17% | 95.2% | 98.6% | ||
Nizar et al. [10] | ALL-DB | CNN | 88.3% | ||||
SVM | 50.1% | ||||||
NB | 69.7% | ||||||
S Alagu et al. [11] | ALL-IDB2 | AlexNet+GoogleNet+SqueezeNet+SVM | 98.2% | 99.2% | 98.1% | 96.3% | 99.1% |
The proposed methodology | 3189 images overall | RF | 99.7% | 99.5% | 99.5% | 99.5% | 99% |
SVM | 98.8% | 97.5% | 96% | 96.5% | 93.1% | ||
KNN | 98.6% | 97.5% | 97% | 97.5% | 95.1% | ||
NB | 92.8% | 88% | 86% | 87% | 76.2% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sallam, N.M.; Saleh, A.I.; Arafat Ali, H.; Abdelsalam, M.M. An Efficient Strategy for Blood Diseases Detection Based on Grey Wolf Optimization as Feature Selection and Machine Learning Techniques. Appl. Sci. 2022, 12, 10760. https://doi.org/10.3390/app122110760
Sallam NM, Saleh AI, Arafat Ali H, Abdelsalam MM. An Efficient Strategy for Blood Diseases Detection Based on Grey Wolf Optimization as Feature Selection and Machine Learning Techniques. Applied Sciences. 2022; 12(21):10760. https://doi.org/10.3390/app122110760
Chicago/Turabian StyleSallam, Nada M., Ahmed I. Saleh, H. Arafat Ali, and Mohamed M. Abdelsalam. 2022. "An Efficient Strategy for Blood Diseases Detection Based on Grey Wolf Optimization as Feature Selection and Machine Learning Techniques" Applied Sciences 12, no. 21: 10760. https://doi.org/10.3390/app122110760
APA StyleSallam, N. M., Saleh, A. I., Arafat Ali, H., & Abdelsalam, M. M. (2022). An Efficient Strategy for Blood Diseases Detection Based on Grey Wolf Optimization as Feature Selection and Machine Learning Techniques. Applied Sciences, 12(21), 10760. https://doi.org/10.3390/app122110760