Analysis of Colorectal and Gastric Cancer Classification: A Mathematical Insight Utilizing Traditional Machine Learning Classifiers
Abstract
:1. Introduction
Literature Review
- Mathematical Formulations to Augment Cognizance: Inaugurating the realm of mathematical formulations, meticulously addressing the most frequently utilized preprocessing techniques, features, machine learning classifiers, and the intricate domain of assessment metrics.
- Mathematical Deconstruction of ML Classifiers: Engaging in a profound exploration of the mathematical intricacies underpinning machine learning classifiers commonly harnessed in the arena of cancer detection.
- Colorectal and Gastric Cancer Detection: Dedicating an analytical focus to the nuanced landscape of colorectal and gastric cancer detection. Our scrutiny unfurled a detailed examination of the methodologies and techniques germane to the diagnosis and localization of these particular cancer types.
- Preprocessing Techniques and Their Formulation: Penetrating the intricate realm of preprocessing techniques and probing their pivotal role in elevating the quality and accuracy of models employed in cancer detection.
- Feature Extraction Strategies and Informative Features: Embarking on a comprehensive journey, scrutinizing the multifaceted domain of feature extraction techniques, meticulously counting and discerning the number of features wielded in research articles.
- A Multidimensional Metrics Analysis: Conducting an holistic examination encompassing a spectrum of performance evaluation metrics, encapsulating accuracy, sensitivity, specificity, precision, negative predictive value, F-measure (F1), area under the curve, and the Matthews correlation coefficient (MCC).
- Evaluation Parameters for Research Articles: Systematically analyzing diverse parameters, including publication year, preprocessing techniques, features, techniques, image count, modality nuances, dataset details, and integral metrics (%).
- Prominent Techniques and Their Effectiveness: Expertly identifying the techniques most prevalently harnessed by researchers in the realm of cancer detection and meticulously pinpointing the most effective among the gamut of options.
- Key Insights and Ongoing Challenges: Highlighting key insights from the scrutinized research papers, encompassing advances, groundbreaking revelations, and challenges in cancer detection using traditional machine learning techniques.
- Architectural Design of Proposed Methodology: Laying out in meticulous detail an architectural blueprint derived from the reviewed literature. These architectural formulations present invaluable guides for the enhancement of cancer detection models.
- Recognizing Opportunities for Improvement: Executing a methodical comparative analysis of an array of metrics, meticulously scrutinizing their zenith and nadir values, as well as the interstitial chasm. This granular evaluation aids in the strategic pinpointing of areas harboring untapped potential for enhancement in cancer detection practices.
2. Materials and Methods
2.1. Literature Selection Process
2.1.1. Inclusion Criteria
2.1.2. Exclusion Criteria
2.2. Medical Imaging Datasets
2.3. Preprocessing
2.4. Feature Engineering
2.4.1. Histogram-Based First-Order Features (FOFs)
2.4.2. Gray-Level Co-Occurrence Matrix (GLCM) Features
2.4.3. Gray-Level Run Length Matrix (GLRLM)
2.4.4. Neighborhood Gray-Tone Difference Matrix (NGTDM)
2.5. Traditional Machine Learning Classifiers
2.5.1. K-Nearest Neighbors (KNN)
2.5.2. Multilayered Perceptron (MLP)
2.5.3. Support Vector Machine (SVM)
2.5.4. Bayes and Naive Bayes (NB) Classifier
2.5.5. Logistic Regression (LR)
2.5.6. Decision Tree (DT)
2.5.7. Ensemble Classifier (EC)
Algorithm 1: Pseudo-code for the AnyBoost. |
Input: l, a, {(xi, yi)}, A H0 = 0 for t = 0: T − 1 do if then else return Ht (Negative gradient orthogonal to descent direction.) end end return HT |
Algorithm 2: Pseudo-code for GBRT |
Input: l, α, {(xi, yi)}, A H = 0 for t = 1: T do end return H |
Algorithm 3: Pseudo-code for AdaBoost |
Input: l, α, {(xi, yi)}, A H = 0 for t = 1: T do if then else return (Ht) end return H end |
2.6. Assessment Metrics
3. Review Analysis
3.1. Analysis of Colorectal Cancer Prediction
Year | References | Pre-Processing | Features | Techniques | Dataset | Data Samples | Train Data | Test Data | Modality | Metrics (%) |
---|---|---|---|---|---|---|---|---|---|---|
2017 | [53] | Endocytoscopy | Texture, nuclei | SVM | Private | 5843 | 5643 | 200 | ENI | Acc 94.1 Sen 89.4 Spe 98.9 Pre 98.8 NPV 90.1 |
2019 | [54] | IPP | CSQ, Color histogram | WSVMCS | Private | 180 | 108 | 72 | H&E | Acc 96.0 |
2019 | [55] | Cropping | Biophysical characteristic, WLD, | NB, MLP, | OMIS data | 316 | 237 | 79 | OMIS | Acc 92.6 Sen 96.3 Spe 88.9 |
2021 | [56] | Filtering | HOS, FOS, GLCM, Gabor, WPT, LBP | ANN, RSVM, | KCRC-16 | 5000 | 4550 | 450 | H&E | Acc 95.3 |
2021 | [57] | IPP, Augmentation | VGG-16 | MLP | KCRC-16 | 5000 | 4825 | 175 | H&E | Acc 99.0 Sen 96.0 Spe 99.0 Pre 96.0 NPV 99.0 F1 96.0 |
2021 | [50] | --- | AlexNet | EC, SVM, AlexNet, | LC25000 | 10,000 | 4-fold cross validation | H&E | Acc 99.4 | |
2021 | [58] | THN, DRR | BmzP | NN | MALDI MSI | 559 | Leave-One-Out cross-validation | H&E | Acc 98.0 Sen 98.2 Spe 98.6 | |
2021 | [52] | Filtering | Filters, Texture, GLHS, Shape | RF | Private | 287 | 169 | 77 | CT | Acc 84.7 * Sen 82.0 Spe 85.0 AUC 91.0 |
2021 | [49] | --- | GFD, NSCT, Shape | MLP LSSVM, | Private | 734 | five-fold cross-validation | NBI, WLI | Acc 95.7 Sen 95.3 Spe 95.0 Pre 93.2 F1 90.5 | |
2021 | [48] | Normalization, smoothing | Spatial Information | MLP, SVM, RF | Private | 54 | Leave-One-Out cross-validation | HSI | Acc 94.0 Sen 86.0 Spe 95.0 | |
2022 | [59] | VTI | Haralick, VTF | RF | Private | 63 | cross-validation method | CT | Acc 92.2 Sen 88.4 Spe 96.0 AUC 96.2 | |
2022 | [60] | RGBG | GLCM | ANN, RF, KNN | KCRC-16 | 5000 | 4500 | 500 | H&E | Acc 98.7 Sen 98.6 Spe 99.0 Pre 98.9 |
2022 | [45] | Resize, BGR2RGB, Normalization, | Deep Features | EC, Hybrid, LR, LGB, MLP, RF, SVM, XGB, Voting | LC25000 | 2800 | 10-fold cross-validation | H&E | Acc 100.0 | |
2022 | [46] | ROI | FOS, GLCM, GLDM, GLRLM, GLSZM, LoG, NGTDM, Shape, WT | MLR | Private | 276 | 194 | 82 | CECT | Acc 76.0 Sen 65.0 Spe 80.0 Pre 54.0 NPV 86.0 |
2022 | [61] | UM-SN | HIM, GLCM, Statistical | LDA, MLP, RF, SVM, XGB, LGB | LC25000 | 1000 | 900 | 100 | H&E | Acc 99.3 Sen 99.5 Pre 99.5 F1 99.5 |
2022 | [26] | --- | Color Spaces, Haralick | ANN, DT, KNN, QDA, SVM | KCRC-16 | 5000 | 3504 | 1496 | H&E | Acc 97.3 Sen 97.3 Spe 99.6 Pre 97.4 |
2023 | [62] | Filtering, linear Transformation, normalization | Color characteristic, DBCM, SMOTE | CatBoost, DT, GNB, KNN, RF | NCT-CRCHE-7K | 12,042 | 8429 | 3613 | H&E | Acc 90.7 Sen 97.6 Spe 97.4 Pre 90.6 Rec 90.5 F1 90.5 |
2023 | [51] | --- | Clinical, FEViT | SEKNN | Private | 1729 | tenfold cross-validation | ENI | Acc 94.0 Sen 74.0 Spe 98.0 AUC 93.0 | |
2023 | [47] | Lightness space, RGB to HSV | dResNet | DSVM | KCRC-16 | 5000 | 4000 | 1000 | H&E | Acc 98.8 |
NCT-CRC-HE-100 K | 100,000 | 80,003 | 19,997 | H&E | Acc 99.8 | |||||
2023 | [63] | HOG, RGBG, Resizing | Morphological | SVM | Private | 540 | 420 | 120 | ENI | Acc 97.5 |
3.2. Analysis of Gastric Cancer Prediction
Year | References | Preprocessing | Features | Techniques | Dataset | Data Samples | Train Data | Test Data | Modality | Metrics (%) |
---|---|---|---|---|---|---|---|---|---|---|
2018 | [71] | Fourier transform | BRISK, SURF, MSER | DT, DA | Private | 180 | 90 | 90 | H&E | Acc 86.7 |
2018 | [72] | Resizing | LBP, HOG | ANN, RF | Private | 180 | 90 | 90 | H&E | Acc 100.0 |
2018 | [68] | --- | SURF, DFT | NB | Private | 180 | 90 | 90 | H&E | Acc 87.8 |
Private | 720 | 360 | 360 | H&E | Acc 90.3 | |||||
2018 | [73] | CEI, filtering, resizing | GLCM | SVM | Private | 207 | 126 | 81 | NBI | Acc 96.3 Sen 96.7 Spe 95.0 Pre 98.3 |
2019 | [74] | Resizing, cropping | GLCM, Shape, FOF, GLSZM | SVM | Private | 490 | 326 | 164 | CT | Acc 71.3 Sen 72.6 Spe 68.1 Pre 82.0 NPV 50.0 |
2021 | [67] | DIFQ | SMI | FCM, KMC | Private | 30 | --- | --- | MRI | Acc 85.0 |
2021 | [75] | Resizing | Extract HOG | RF, MLP | Private | 180 | 90 | 90 | H&E | Acc 98.1 |
2021 | [76] | Resizing | TSS | BP, BPSVM, SVM | Private | 78 | --- | --- | MRI | Acc 94.6 |
2021 | [69] | --- | Deep Features | CSVM, Bagged Trees, KNNs, SVMs | Private | 4000 | 2800 | 1200 | WCE | Acc 99.8 Sen 99.0 Pre 99.3 F1 99.1 AUC 100 |
2021 | [65] | Filtering, ROI | LoG, WT, GLDM, GLRLM | GBM, DT, RF, LR, SVM. | Private | 159 | Leave-One-Out cross-validation | CT | Acc 71.2 Sen 43.1 Spe 87.1 Pre 65.8 | |
2022 | [77] | Augmentation, resizing, filtering | InceptionNet, VGGNet | SVM, RF, KNN. | HKD | 10,662 (47,398 Augmneted) | 37,788 | 9610 | Endoscopy | Acc 98.0 Sen 100 Pre 100 F1 100 MCC 97.8 |
2022 | [70] | --- | GLCM, LBP, HOG, histogram, luminance, Color histogram | NSVM, LSVM, LR, NB, RF, ANN, KNN | GasHisSDB | 245196 | 196,157 | 49,039 | H&E | Acc 85.2 Sen 84.9 # Pre 84.6 # Spe 84.9 # F1 84.8 # |
2022 | [64] | Binarization, CEI, filtering, resizing | VGG19 Alexnet | Bagged Tree, Coarse Tree, CSVM, CKNN, DT, Fine Tree, KNN, NB | Private | 2590 | 10-fold cross-validation | EUS | Acc 99.8 Sen 99.8 Pre 99.8 F1 99.8 AUC 100 | |
2022 | [66] | Cropping, disruption, filtering, ROI, Rotation | Color histogram, GLCM, LBP | LSVM, RF | GasHisSDB | 245,196 | 196,157 | 49,039 | H&E | Acc 85.9 Sen 86.2 # Spe 86.2 # Pre 85.7 # F1 85.9 # |
2023 | [78] | Augmentation, CEI | MobileNet-V2 | Bayesian, CSVM, LSVM, QSVM, Softmax | KV2D | 4854 | 10-fold cross-validation | Endoscopy | Acc 96.4 Pre 97.6 Sen 93.0 F1 95.2 | |
2023 | [79] | RSA | RSF | PLS-DA, LOO, SVM | Private | 450 | Leave-One-Out cross validation | H&E | Acc 94.8 Sen 91.0 Spe 100 AUC 95.8 |
4. Proposed Methodology
4.1. Detection of Colorectal Cancer
4.2. Detection of Gastric Cancer
4.3. Key Observations
- Dataset Diversity: Evaluation includes colorectal and gastric cancer datasets, ranging from 30 to 100,000 images. The varied dataset sizes showcase machine learning classifier effectiveness with appropriate tuning.
- Exceptional Model Performances: Models achieve 100% accuracy for both colorectal and gastric cancer, with perfect scores in key metrics like sensitivity, specificity, precision, and F1-score, showcasing the potential of traditional ML classifiers with optimal parameters.
- Preprocessing Techniques: Researchers employ various preprocessing techniques, including image filtering, denoising, wavelet transforms, RGB-to-gray conversion, normalization, cropping (ROI), sampling, and binarization, to optimize model performance and minimize biases during data manipulation.
- Literature Review Significance: This analysis spans 36 literature sources related to colorectal and gastric cancer, underscoring the significant interest in cancer detection through traditional ML classifiers. Researchers have explored an extensive range of cancer types, diverse evaluation metrics, and datasets, collectively advancing the field.
- Dominant Traditional ML Techniques: SVM is a commonly used traditional ML classifier in cancer detection tasks, emphasizing the need to understand each classifier’s strengths and limitations for optimal selection.
- Insightful Dataset and Feature Analysis: Reviewed studies predominantly utilized benchmark medical image datasets, with researchers employing feature extraction techniques like GLCM for informative feature extraction in cancer detection.
- Prudent Model Architecture Design: Optimal results in cancer detection require thoughtful and optimized model architectures, which can enhance accuracy, generalizability, and interpretability, addressing challenges in medical image analysis.
4.4. Key Challenges and Future Scope
- Variability in Accuracy: Traditional ML classifiers exhibit variable accuracy rates across cancer types, ranging from 76% to 100%. Overcoming these variations poses a challenge, underscoring the need for enhanced models. Future research should prioritize refining models for consistent and accurate performance across diverse cancer types.
- Metric Disparities: Metric variations, especially in sensitivity (43.1% to 100%) for gastric cancer, suggest potential data imbalance challenges. Addressing these issues is crucial for accurate model assessments. Future research should focus on developing strategies to handle imbalanced data and improve model robustness.
- Preprocessing Challenges: Balancing raw and preprocessed data is crucial to ensure input data quality and reliability, contributing to robust cancer detection model performance. Future research should explore advanced preprocessing techniques and optimization methods to further enhance model robustness.
- Limited use of evaluation metrics: Limited use of metrics like NPV, AUC, and MCC in the reviewed literature highlights the challenge of comprehensive model assessment. Addressing this limitation and exploring a broader range of metrics is crucial for future research to enhance understanding and effectiveness in cancer detection tasks.
- Generalizing to novel cancer types: The literature primarily focuses on colorectal and gastric cancers, posing a challenge for extending traditional ML classifiers to less-explored cancer types. Future research should aim to develop versatile ML models with robust feature extraction techniques to adapt to diverse cancer types and domains.
- Addressing overfitting and model selection: The diversity in ML classifiers poses challenges in model selection for specific cancers, emphasizing the need for careful evaluation to avoid overfitting. Future research should focus on refining model selection strategies to enhance the robustness of cancer detection techniques and improve diagnostic accuracy.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Faguet, G.B. A brief history of cancer: Age-old milestones underlying our current knowledge database. Int. J. Cancer 2014, 136, 2022–2036. [Google Scholar] [CrossRef] [PubMed]
- Afrash, M.R.; Shafiee, M.; Kazemi-Arpanahi, H. Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors. BMC Gastroenterol. 2023, 23, 6. [Google Scholar] [CrossRef] [PubMed]
- Kumar, Y.; Gupta, S.; Singla, R.; Hu, Y.-C. A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Arch. Comput. Methods Eng. 2021, 29, 2043–2070. [Google Scholar] [CrossRef] [PubMed]
- Nguon, L.S.; Seo, K.; Lim, J.-H.; Song, T.-J.; Cho, S.-H.; Park, J.-S.; Park, S. Deep learning-based differentiation between mucinous cystic neoplasm and serous cystic neoplasm in the pancreas using endoscopic ultrasonography. Diagnostics 2021, 11, 1052. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.H.; Hong, S.J. Current status of image-enhanced endoscopy for early identification of esophageal neoplasms. Clin. Endosc. 2021, 54, 464–476. [Google Scholar] [CrossRef] [PubMed]
- NCI. What Is Cancer?—NCI. National Cancer Institute. Available online: https://www.cancer.gov/about-cancer/understanding/what-is-cancer (accessed on 9 June 2023).
- Zhi, J.; Sun, J.; Wang, Z.; Ding, W. Support vector machine classifier for prediction of the metastasis of colorectal cancer. Int. J. Mol. Med. 2018, 41, 1419–1426. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.; Dong, D.; Chen, B.; Fang, M.; Cheng, Y.; Gan, Y.; Zhang, R.; Zhang, L.; Zang, Y.; Liu, Z.; et al. Diagnosis of Distant Metastasis of Lung Cancer: Based on Clinical and Radiomic Features. Transl. Oncol. 2017, 11, 31–36. [Google Scholar] [CrossRef] [PubMed]
- Levine, A.B.; Schlosser, C.; Grewal, J.; Coope, R.; Jones, S.J.; Yip, S. Rise of the Machines: Advances in Deep Learning for Cancer Diagnosis. Trends Cancer 2019, 5, 157–169. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Yang, J.; Fong, S.; Zhao, Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett. 2019, 471, 61–71. [Google Scholar] [CrossRef]
- Saba, T. Recent advancement in cancer detection using machine learning: Systematic survey of decades, comparisons and challenges. J. Infect. Public Health 2020, 13, 1274–1289. [Google Scholar] [CrossRef]
- Shah, B.; Alsadoon, A.; Prasad, P.; Al-Naymat, G.; Beg, A. DPV: A taxonomy for utilizing deep learning as a prediction technique for various types of cancers detection. Multimed. Tools Appl. 2021, 80, 21339–21361. [Google Scholar] [CrossRef]
- Majumder, A.; Sen, D. Artificial intelligence in cancer diagnostics and therapy: Current perspectives. Indian J. Cancer 2021, 58, 481–492. [Google Scholar] [CrossRef] [PubMed]
- Bin Tufail, A.; Ma, Y.-K.; Kaabar, M.K.A.; Martínez, F.; Junejo, A.R.; Ullah, I.; Khan, R. Deep Learning in Cancer Diagnosis and Prognosis Prediction: A Minireview on Challenges, Recent Trends, and Future Directions. Comput. Math. Methods Med. 2021, 2021, 9025470. [Google Scholar] [CrossRef] [PubMed]
- Kumar, G.; Alqahtani, H. Deep Learning-Based Cancer Detection-Recent Developments, Trend and Challenges. Comput. Model. Eng. Sci. 2022, 130, 1271–1307. [Google Scholar] [CrossRef]
- Painuli, D.; Bhardwaj, S.; Köse, U. Recent advancement in cancer diagnosis using machine learning and deep learning techniques: A comprehensive review. Comput. Biol. Med. 2022, 146, 105580. [Google Scholar] [CrossRef] [PubMed]
- Rai, H.M. Cancer detection and segmentation using machine learning and deep learning techniques: A review. Multimed. Tools Appl. 2023, 1–35. [Google Scholar] [CrossRef]
- Maurya, S.; Tiwari, S.; Mothukuri, M.C.; Tangeda, C.M.; Nandigam, R.N.S.; Addagiri, D.C. A review on recent developments in cancer detection using Machine Learning and Deep Learning models. Biomed. Signal Process. Control. 2023, 80, 104398. [Google Scholar] [CrossRef]
- Mokoatle, M.; Marivate, V.; Mapiye, D.; Bornman, R.; Hayes, V.M. A review and comparative study of cancer detection using machine learning: SBERT and SimCSE application. BMC Bioinform. 2023, 24, 112. [Google Scholar] [CrossRef]
- Rai, H.M.; Yoo, J. A comprehensive analysis of recent advancements in cancer detection using machine learning and deep learning models for improved diagnostics. J. Cancer Res. Clin. Oncol. 2023, 149, 14365–14408. [Google Scholar] [CrossRef]
- Ullah, A.; Chen, W.; Khan, M.A. A new variational approach for restoring images with multiplicative noise. Comput. Math. Appl. 2016, 71, 2034–2050. [Google Scholar] [CrossRef]
- Azmi, K.Z.M.; Ghani, A.S.A.; Yusof, Z.M.; Ibrahim, Z. Natural-based underwater image color enhancement through fusion of swarm-intelligence algorithm. Appl. Soft Comput. 2019, 85, 105810. [Google Scholar] [CrossRef]
- Alruwaili, M.; Gupta, L. A statistical adaptive algorithm for dust image enhancement and restoration. In Proceedings of the 2015 IEEE International Conference on Electro/Information Technology (EIT), Dekalb, IL, USA, 21–23 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 286–289. [Google Scholar]
- Cai, J.-H.; He, Y.; Zhong, X.-L.; Lei, H.; Wang, F.; Luo, G.-H.; Zhao, H.; Liu, J.-C. Magnetic Resonance Texture Analysis in Alzheimer’s disease. Acad. Radiol. 2020, 27, 1774–1783. [Google Scholar] [CrossRef]
- Chandrasekhara, S.P.R.; Kabadi, M.G.; Srivinay, S. Wearable IoT based diagnosis of prostate cancer using GLCM-multiclass SVM and SIFT-multiclass SVM feature extraction strategies. Int. J. Pervasive Comput. Commun. 2021. ahead-of-print. [Google Scholar] [CrossRef]
- Alqudah, A.M.; Alqudah, A. Improving machine learning recognition of colorectal cancer using 3D GLCM applied to different color spaces. Multimed. Tools Appl. 2022, 81, 10839–10860. [Google Scholar] [CrossRef]
- Vallabhaneni, R.B.; Rajesh, V. Brain tumour detection using mean shift clustering and GLCM features with edge adaptive total variation denoising technique. Alex. Eng. J. 2018, 57, 2387–2392. [Google Scholar] [CrossRef]
- Rego, C.H.Q.; França-Silva, F.; Gomes-Junior, F.G.; de Moraes, M.H.D.; de Medeiros, A.D.; da Silva, C.B. Using Multispectral Imaging for Detecting Seed-Borne Fungi in Cowpea. Agriculture 2020, 10, 361. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Callen, J.L.; Segal, D. An Analytical and Empirical Measure of the Degree of Conditional Conservatism. J. Account. Audit. Financ. 2013, 28, 215–242. [Google Scholar] [CrossRef]
- Weinberger, K. Lecture 2: K-Nearest Neighbors. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote02_kNN.html (accessed on 12 November 2023).
- Weinberger, K. Lecture 3: The Perceptron. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote03.html (accessed on 12 November 2023).
- Watt, J.; Borhani, R.; Katsaggelos, A.K. Machine Learning Refined; Cambridge University Press (CUP): Cambridge, UK, 2020; ISBN 9781107123526. [Google Scholar]
- Watt, R.B.J. 13.1 Multi-Layer Perceptrons (MLPs). Available online: https://kenndanielso.github.io/mlrefined/blog_posts/13_Multilayer_perceptrons/13_1_Multi_layer_perceptrons.html (accessed on 12 November 2023).
- Weinberger, K. Lecture 9: SVM. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote09.html (accessed on 13 November 2023).
- Balas, V.E.; Mastorakis, N.E.; Popescu, M.-C.; Balas, V.E. Multilayer Perceptron and Neural Networks. 2009. Available online: https://www.researchgate.net/publication/228340819 (accessed on 18 September 2023).
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Islam, U.; Al-Atawi, A.; Alwageed, H.S.; Ahsan, M.; Awwad, F.A.; Abonazel, M.R. Real-Time Detection Schemes for Memory DoS (M-DoS) Attacks on Cloud Computing Applications. IEEE Access 2023, 11, 74641–74656. [Google Scholar] [CrossRef]
- Houshmand, M.; Hosseini-Khayat, S.; Wilde, M.M. Minimal-Memory, Noncatastrophic, Polynomial-Depth Quantum Convolutional Encoders. IEEE Trans. Inf. Theory 2012, 59, 1198–1210. [Google Scholar] [CrossRef]
- Bagging. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote18.html (accessed on 13 November 2023).
- Boosting. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote19.html (accessed on 13 November 2023).
- Dewangan, S.; Rao, R.S.; Mishra, A.; Gupta, M. Code Smell Detection Using Ensemble Machine Learning Algorithms. Appl. Sci. 2022, 12, 10321. [Google Scholar] [CrossRef]
- Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2018, 17, 168–192. [Google Scholar] [CrossRef]
- Leem, S.; Oh, J.; So, D.; Moon, J. Towards Data-Driven Decision-Making in the Korean Film Industry: An XAI Model for Box Office Analysis Using Dimension Reduction, Clustering, and Classification. Entropy 2023, 25, 571. [Google Scholar] [CrossRef] [PubMed]
- Talukder, A.; Islam, M.; Uddin, A.; Akhter, A.; Hasan, K.F.; Moni, M.A. Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert Syst. Appl. 2022, 205, 117695. [Google Scholar] [CrossRef]
- Ying, M.; Pan, J.; Lu, G.; Zhou, S.; Fu, J.; Wang, Q.; Wang, L.; Hu, B.; Wei, Y.; Shen, J. Development and validation of a radiomics-based nomogram for the preoperative prediction of microsatellite instability in colorectal cancer. BMC Cancer 2022, 22, 524. [Google Scholar] [CrossRef]
- Fadafen, M.K.; Rezaee, K. Ensemble-based multi-tissue classification approach of colorectal cancer histology images using a novel hybrid deep learning framework. Sci. Rep. 2023, 13, 8823. [Google Scholar] [CrossRef]
- Jansen-Winkeln, B.; Barberio, M.; Chalopin, C.; Schierle, K.; Diana, M.; Köhler, H.; Gockel, I.; Maktabi, M. Feedforward artificial neural network-based colorectal cancer detection using hyperspectral imaging: A step towards automatic optical biopsy. Cancers 2021, 13, 967. [Google Scholar] [CrossRef]
- Bora, K.; Bhuyan, M.K.; Kasugai, K.; Mallik, S.; Zhao, Z. Computational learning of features for automated colonic polyp classification. Sci. Rep. 2021, 11, 4347. [Google Scholar] [CrossRef]
- Fan, J.; Lee, J.; Lee, Y. A Transfer learning architecture based on a support vector machine for histopathology image classification. Appl. Sci. 2021, 11, 6380. [Google Scholar] [CrossRef]
- Lo, C.-M.; Yang, Y.-W.; Lin, J.-K.; Lin, T.-C.; Chen, W.-S.; Yang, S.-H.; Chang, S.-C.; Wang, H.-S.; Lan, Y.-T.; Lin, H.-H.; et al. Modeling the survival of colorectal cancer patients based on colonoscopic features in a feature ensemble vision transformer. Comput. Med. Imaging Graph. 2023, 107, 102242. [Google Scholar] [CrossRef] [PubMed]
- Grosu, S.; Wesp, P.; Graser, A.; Maurus, S.; Schulz, C.; Knösel, T.; Cyran, C.C.; Ricke, J.; Ingrisch, M.; Kazmierczak, P.M. Machine learning–based differentiation of benign and premalignant colorectal polyps detected with CT colonography in an asymptomatic screening population: A proof-of-concept study. Radiology 2021, 299, 326–335. [Google Scholar] [CrossRef]
- Takeda, K.; Kudo, S.-E.; Mori, Y.; Misawa, M.; Kudo, T.; Wakamura, K.; Katagiri, A.; Baba, T.; Hidaka, E.; Ishida, F.; et al. Accuracy of diagnosing invasive colorectal cancer using computer-aided endocytoscopy. Endoscopy 2017, 49, 798–802. [Google Scholar] [CrossRef] [PubMed]
- Yang, K.; Zhou, B.; Yi, F.; Chen, Y.; Chen, Y. Colorectal Cancer Diagnostic Algorithm Based on Sub-Patch Weight Color Histogram in Combination of Improved Least Squares Support Vector Machine for Pathological Image. J. Med. Syst. 2019, 43, 306. [Google Scholar] [CrossRef] [PubMed]
- Dragicevic, A.; Matija, L.; Krivokapic, Z.; Dimitrijevic, I.; Baros, M.; Koruga, D. Classification of Healthy and Cancer States of Colon Epithelial Tissues Using Opto-magnetic Imaging Spectroscopy. J. Med. Biol. Eng. 2018, 39, 367–380. [Google Scholar] [CrossRef]
- Trivizakis, E.; Ioannidis, G.S.; Souglakos, I.; Karantanas, A.H.; Tzardi, M.; Marias, K. A neural pathomics framework for classifying colorectal cancer histopathology images based on wavelet multi-scale texture analysis. Sci. Rep. 2021, 11, 15546. [Google Scholar] [CrossRef]
- Damkliang, K.; Wongsirichot, T.; Thongsuksai, P. Tissue classification for colorectal cancer utilizing techniques of deep learning and machine learning. Biomed. Eng. Appl. Basis Commun. 2021, 33, 2150022. [Google Scholar] [CrossRef]
- Mittal, P.; Condina, M.R.; Klingler-Hoffmann, M.; Kaur, G.; Oehler, M.K.; Sieber, O.M.; Palmieri, M.; Kommoss, S.; Brucker, S.; McDonnell, M.D.; et al. Cancer tissue classification using supervised machine learning applied to MALDI mass spectrometry imaging. Cancers 2021, 13, 5388. [Google Scholar] [CrossRef]
- Cao, W.; Pomeroy, M.J.; Liang, Z.; Abbasi, A.F.; Pickhardt, P.J.; Lu, H. Vector textures derived from higher order derivative domains for classification of colorectal polyps. Vis. Comput. Ind. Biomed. Art 2022, 5, 16. [Google Scholar] [CrossRef]
- Deif, M.A.; Attar, H.; Amer, A.; Issa, H.; Khosravi, M.R.; Solyman, A.A.A. A New Feature Selection Method Based on Hybrid Approach for Colorectal Cancer Histology Classification. Wirel. Commun. Mob. Comput. 2022, 2022, 7614264. [Google Scholar] [CrossRef]
- Chehade, A.H.; Abdallah, N.; Marion, J.-M.; Oueidat, M.; Chauvet, P. Lung and colon cancer classification using medical imaging: A feature engineering approach. Phys. Eng. Sci. Med. 2022, 45, 729–746. [Google Scholar] [CrossRef]
- Tripathi, A.; Misra, A.; Kumar, K.; Chaurasia, B.K. Optimized Machine Learning for Classifying Colorectal Tissues. SN Comput. Sci. 2023, 4, 461. [Google Scholar] [CrossRef]
- Kara, O.C.; Venkatayogi, N.; Ikoma, N.; Alambeigi, F. A Reliable and Sensitive Framework for Simultaneous Type and Stage Detection of Colorectal Cancer Polyps. Ann. Biomed. Eng. 2023, 51, 1499–1512. [Google Scholar] [CrossRef] [PubMed]
- Ayyaz, M.S.; Lali, M.I.U.; Hussain, M.; Rauf, H.T.; Alouffi, B.; Alyami, H.; Wasti, S. Hybrid deep learning model for endoscopic lesion detection and classification using endoscopy videos. Diagnostics 2021, 12, 43. [Google Scholar] [CrossRef] [PubMed]
- Mirniaharikandehei, S.; Heidari, M.; Danala, G.; Lakshmivarahan, S.; Zheng, B. Applying a random projection algorithm to optimize machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images. Comput. Methods Programs Biomed. 2021, 200, 105937. [Google Scholar] [CrossRef]
- Hu, W.; Li, C.; Li, X.; Rahaman, M.; Ma, J.; Zhang, Y.; Chen, H.; Liu, W.; Sun, C.; Yao, Y.; et al. GasHisSDB: A new gastric histopathology image dataset for computer aided diagnosis of gastric cancer. Comput. Biol. Med. 2022, 142, 105207. [Google Scholar] [CrossRef] [PubMed]
- Naser, E.F.; Zeki, S.M. Using Fuzzy Clustering to Detect the Tumor Area in Stomach Medical Images. Baghdad Sci. J. 2021, 18, 1294. [Google Scholar] [CrossRef]
- Korkmaz, S.A.; Esmeray, F. A New Application Based on GPLVM, LMNN, and NCA for Early Detection of the Stomach Cancer. Appl. Artif. Intell. 2018, 32, 541–557. [Google Scholar] [CrossRef]
- Nayyar, Z.; Khan, M.A.; Alhussein, M.; Nazir, M.; Aurangzeb, K.; Nam, Y.; Kadry, S.; Haider, S.I. Gastric tract disease recognition using optimized deep learning features. Comput. Mater. Contin. 2021, 68, 2041–2056. [Google Scholar] [CrossRef]
- Hu, W.; Chen, H.; Liu, W.; Li, X.; Sun, H.; Huang, X.; Grzegorzek, M.; Li, C. A comparative study of gastric histopathology sub-size image classification: From linear regression to visual transformer. Front. Med. 2022, 9, 1072109. [Google Scholar] [CrossRef]
- Korkmaz, S.A. Recognition of the Gastric Molecular Image Based on Decision Tree and Discriminant Analysis Classifiers by using Discrete Fourier Transform and Features. Appl. Artif. Intell. 2018, 32, 629–643. [Google Scholar] [CrossRef]
- Korkmaz, S.A.; Binol, H. Classification of molecular structure images by using ANN, RF, LBP, HOG, and size reduction methods for early stomach cancer detection. J. Mol. Struct. 2018, 1156, 255–263. [Google Scholar] [CrossRef]
- Kanesaka, T.; Lee, T.-C.; Uedo, N.; Lin, K.-P.; Chen, H.-Z.; Lee, J.-Y.; Wang, H.-P.; Chang, H.-T. Computer-aided diagnosis for identifying and delineating early gastric cancers in magnifying narrow-band imaging. Gastrointest. Endosc. 2018, 87, 1339–1344. [Google Scholar] [CrossRef] [PubMed]
- Feng, Q.-X.; Liu, C.; Qi, L.; Sun, S.-W.; Song, Y.; Yang, G.; Zhang, Y.-D.; Liu, X.-S. An Intelligent Clinical Decision Support System for Preoperative Prediction of Lymph Node Metastasis in Gastric Cancer. J. Am. Coll. Radiol. 2019, 16, 952–960. [Google Scholar] [CrossRef]
- Korkmaz, S.A. Classification of histopathological gastric images using a new method. Neural Comput. Appl. 2021, 33, 12007–12022. [Google Scholar] [CrossRef]
- Dai, H.; Bian, Y.; Wang, L.; Yang, J. Support Vector Machine-Based Backprojection Algorithm for Detection of Gastric Cancer Lesions with Abdominal Endoscope Using Magnetic Resonance Imaging Images. Sci. Program. 2021, 2021, 9964203. [Google Scholar] [CrossRef]
- Haile, M.B.; Salau, A.; Enyew, B.; Belay, A.J. Detection and classification of gastrointestinal disease using convolutional neural network and SVM. Cogent Eng. 2022, 9, 2084878. [Google Scholar] [CrossRef]
- Noor, M.N.; Nazir, M.; Khan, S.A.; Song, O.-Y.; Ashraf, I. Efficient Gastrointestinal Disease Classification Using Pretrained Deep Convolutional Neural Network. Electronics 2023, 12, 1557. [Google Scholar] [CrossRef]
- Yin, F.; Zhang, X.; Fan, A.; Liu, X.; Xu, J.; Ma, X.; Yang, L.; Su, H.; Xie, H.; Wang, X.; et al. A novel detection technology for early gastric cancer based on Raman spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2023, 292, 122422. [Google Scholar] [CrossRef]
Dataset | Cancer Category | Modality | Downloadable Link | No. of Data Samples | Pixel Size |
---|---|---|---|---|---|
NCT-CRC-HE-100K | Colorectal | H&E | https://zenodo.org/record/1214456 (accessed on 15 September 2023) | 100,000 | 224 × 224 |
Lung and colon histopathological images (LC25000) | H&E | https://academictorrents.com/details/7a638ed187a6180fd6e464b3666a6ea0499af4af (accessed on 15 September 2023) | 10,000 | 768 × 768 | |
CRC-VAL-HE-7K | H&E | https://zenodo.org/record/1214456 (accessed on 15 September 2023) | 7180 | 224 × 224 | |
Kather-CRC-2016 (KCRC-16) | H&E | https://zenodo.org/record/53169#.W6HwwP4zbOQ (accessed on 15 September 2023) | 5000 10 | 150 × 150 5000 × 5000 | |
Kvasir V-2 dataset (KV2D) | Stomach (Gastric) | Endoscopy | https://dl.acm.org/do/10.1145/3193289/full/ (accessed on 15 September 2023) | 4000 | 720 × 576 to 1920 × 1072 |
HyperKvasir dataset (HKD) | Endoscopy | https://osf.io/mh9sj/ (accessed on 15 September 2023) | 110,079 images and 374 videos | ---- | |
Gastric histopathology sub-size image database (GasHisSDB) | H&E | https://gitee.com/neuhwm/GasHisSDB | 245,196 | 160 × 160, 120 × 120, 80 × 80 |
Preprocessing Technique | Formula | Description |
---|---|---|
Image Filtering | epitomizes the clean image pixel at location is the pixel significance at location in the original image. is the value of the convolution kernel at location The summation is performed over a window of size centered at . | |
Image Denoising | represents the denoised image. is the data fidelity term, which measures how well the denoised image matches the noisy input image. is the regularization term, which imposes a prior on the structure of the denoised image [21]. | |
Gaussian Filtering | represents the resulting value after applying Gaussian filtering. and are the spatial coordinates. is the standard deviation, controlling the amount of smoothing or blurring. | |
Contrast Enhancement of Images (CEI) | is the enhanced pixel value, derived from in the input image. and are the minimum and maximum pixel values in the input image. and represent the desired minimum and maximum pixel values in the output image [22]. | |
Linear Transformation | where is the transformation operator, is the input vector, and is a matrix defining the transformation. | |
Contrast Limited Adaptive Histogram Equalization (CLAHE) | is the enhanced output pixel at using contrast-enhancing transformation function based on pixel intensity using cumulative distribution function (CDF). | |
Discrete Cosine Transform (DCT) | represents the DCT coefficient at frequency index . is the input signal. is the number of samples in the signal. The summation is performed over all samples in the signal | |
Wavelet Transform (WT) | is the DWT coefficient, is the pixel value at , and is the 2D wavelet function. | |
RGB to Gray Conversion (RGBG) | is the converted gray value from RGB channels (). Coefficients 0.2989, 0.5870, and 0.1140 are weights assigned to the R, G, and B channels, respectively [23]. | |
Cropping (ROI) | The cropped image is obtained by cropping the input image at coordinates with width and height . |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rai, H.M.; Yoo, J. Analysis of Colorectal and Gastric Cancer Classification: A Mathematical Insight Utilizing Traditional Machine Learning Classifiers. Mathematics 2023, 11, 4937. https://doi.org/10.3390/math11244937
Rai HM, Yoo J. Analysis of Colorectal and Gastric Cancer Classification: A Mathematical Insight Utilizing Traditional Machine Learning Classifiers. Mathematics. 2023; 11(24):4937. https://doi.org/10.3390/math11244937
Chicago/Turabian StyleRai, Hari Mohan, and Joon Yoo. 2023. "Analysis of Colorectal and Gastric Cancer Classification: A Mathematical Insight Utilizing Traditional Machine Learning Classifiers" Mathematics 11, no. 24: 4937. https://doi.org/10.3390/math11244937
APA StyleRai, H. M., & Yoo, J. (2023). Analysis of Colorectal and Gastric Cancer Classification: A Mathematical Insight Utilizing Traditional Machine Learning Classifiers. Mathematics, 11(24), 4937. https://doi.org/10.3390/math11244937