OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia
Abstract
:1. Introduction
- The present study addresses the influence of oversampling and undersampling methods on Bayesian classifiers that predict preeclampsia.
- Contrary to what is widely mentioned in various works, our results demonstrate that an improvement in model accuracy is not always guaranteed.
- Despite the previous point, a balanced performance has been obtained in the classification of positive and negative cases of preeclampsia.
- This study can serve as an example of the following conclusion: while using oversampling and undersampling techniques to address imbalanced training datasets can help to balance the detection of positive and negative cases in binary classification problems like the one under review in this manuscript, an improvement in the models’ accuracy is not always guaranteed.
2. Related Work
2.1. Class Balancing in Clinical Datasets
- In [16] undersampling and oversampling methods such as SMOTE have been used for class balancing in two lung cancer datasets. According to their results, undersampling techniques have the highest standard deviation (SD), and oversampling techniques have the lowest SD. They conclude that oversampling is a stable method to balance the class because their lowest SD achieved when training classification models on balanced training sets.
- In [17], authors presented an empirical performance evaluation of classification models for five imbalanced clinical datasets, Breast Cancer Disease, Coronary Heart Disease, Indian Liver Patient, Pima Indians Diabetes Database, and Coronary Kidney Disease. They evaluated the performance of classification models built on training sets that considered the following techniques: Undersampling, Random oversampling, SMOTE, ADASYN, SVM-SMOTE, SMOTEEN, and SMOTETOMEK. According to the results, SMOTEEN performed better than the other six data balancing techniques for all five clinical datasets.
- In [18], classification algorithms, combined with resampling strategies and dimensionality reduction methods, were investigated to find a prediction model to correctly identify between growth-hormone treated and non-treated animals. According to their results, SMOTE helped to improve the performance of their classification models.
2.2. Class Imbalance Problem of Our Previous Work
2.3. Class Balancing Techniques That Handles Nominal and Continuous Variables
3. Methodology
3.1. Preprocessed Dataset
3.2. Dataset Splitting: Training and Testing Datasets
3.3. Oversampling and Undersampling Techniques
3.4. Training Models
3.5. Evaluating Models
4. Results and Analysis
4.1. Feature Selection
4.2. Oversampling and Undersampling on the Training Data
4.3. Honest Estimation of the Accuracy of Models
4.4. Performance of Models on the Testing Dataset
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bravo, F.P.; García, A.A.D.B.; Veiga, A.B.G.; De La Sacristana, M.M.G.; Piñero, M.R.; Peral, A.G.; Džeroski, S.; Ayala, J.L. SMURF: Systematic Methodology for Unveiling Relevant Factors in retrospective data on chronic disease treatments. IEEE Access 2019, 7, 92598–92614. [Google Scholar] [CrossRef]
- Bravo, F.P.; García, A.A.; Russo, L.; Ayala, J.L. SOFIA: Selection of Medical Features by Induced Alterations in Numeric Labels. Electronics 2020, 9, 1492. [Google Scholar] [CrossRef]
- Parrales-Bravo, F.; Caicedo-Quiroz, R.; Rodríguez-Larraburu, E.; Barzola-Monteses, J. ACME: A Classification Model for Explaining the Risk of Preeclampsia Based on Bayesian Network Classifiers and a Non-Redundant Feature Selection Approach. Informatics 2024, 11, 31. [Google Scholar] [CrossRef]
- Parrales-Bravo, F.; Torres-Urresto, J.; Avila-Maldonado, D.; Barzola-Monteses, J. Relevant and Non-Redundant Feature Subset Selection Applied to the Detection of Malware in a Network. In Proceedings of the 2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM), Cuenca, Ecuador, 12–15 October 2021; pp. 1–6. [Google Scholar]
- Ministerio de Salud Pública del Ecuador. Gaceta de Muerte Materna SE14. 2020. Available online: https://bit.ly/3Poz79o (accessed on 28 March 2022).
- Parrales-Bravo, F.; Saltos-Cedeño, J.; Tomalá-Esparza, J.; Barzola-Monteses, J. Clustering-based Approach for Characterization of Patients with Preeclampsia using a Non-Redundant Feature Selection. In Proceedings of the 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Tenerife, Spain, 19–21 July 2023; pp. 1–6. [Google Scholar]
- De Kat, A.C.; Hirst, J.; Woodward, M.; Kennedy, S.; Peters, S.A. Prediction models for preeclampsia: A systematic review. Pregnancy Hypertens 2019, 16, 48–66. [Google Scholar] [CrossRef] [PubMed]
- Parrales-Bravo, F.; Caicedo-Quiroz, R.; Barzola-Monteses, J.; Guillén-Mirabá, J.; Guzmán-Bedor, O. CSM: A Chatbot Solution to Manage Student Questions About payments and Enrollment in University. IEEE Access 2024, 12, 74669–74680. [Google Scholar] [CrossRef]
- Barzola-Monteses, J.; Guerrero, M.; Parrales-Bravo, F.; Espinoza-Andaluz, M. Forecasting energy consumption in residential department using convolutional neural networks. In Proceedings of the Conference on Information and Communication Technologies of Ecuador, Guayaquil, Ecuador, 24–26 November 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 18–30. [Google Scholar]
- Liu, Y.; Li, B.; Yang, S.; Li, Z. Handling missing values and imbalanced classes in machine learning to predict consumer preference: Demonstrations and comparisons to prominent methods. Expert Syst. Appl. 2024, 237, 121694. [Google Scholar] [CrossRef]
- Roy, D.; Roy, A.; Roy, U. Learning from Imbalanced Data in Healthcare: State-of-the-Art and Research Challenges. In Computational Intelligence in Healthcare Informatics; Springer: Berlin/Heidelberg, Germany, 2024; pp. 19–32. [Google Scholar]
- Demir, S.; Şahin, E.K. Evaluation of oversampling methods (OVER, SMOTE, and ROSE) in classifying soil liquefaction dataset based on SVM, RF, and Naïve Bayes. Avrupa Bilim Teknoloji Dergisi 2022, 34, 142–147. [Google Scholar] [CrossRef]
- Moreno-Barea, F.J.; Jerez, J.M.; Franco, L. Improving classification accuracy using data augmentation on small data sets. Expert Syst. Appl. 2020, 161, 113696. [Google Scholar] [CrossRef]
- El Gannour, O.; Hamida, S.; Lamalem, Y.; Mahjoubi, M.A.; Cherradi, B.; Raihani, A. Improving skin diseases prediction through data balancing via classes weighting and transfer learning. Bull. Electr. Eng. Inform. 2024, 13, 628–637. [Google Scholar] [CrossRef]
- Eid, A.M.; Soudan, B.; Nassif, A.B.; Injadat, M. Comparative study of ML models for IIoT intrusion detection: Impact of data preprocessing and balancing. Neural Comput. Appl. 2024, 36, 6955–6972. [Google Scholar] [CrossRef]
- Khushi, M.; Shaukat, K.; Alam, T.M.; Hameed, I.A.; Uddin, S.; Luo, S.; Yang, X.; Reyes, M.C. A comparative performance analysis of data resampling methods on imbalance medical data. IEEE Access 2021, 9, 109960–109975. [Google Scholar] [CrossRef]
- Kumar, V.; Lalotra, G.S.; Sasikala, P.; Rajput, D.S.; Kaluri, R.; Lakshmanna, K.; Shorfuzzaman, M.; Alsufyani, A.; Uddin, M. Addressing binary classification over class imbalanced clinical datasets using computationally intelligent techniques. Healthcare 2022, 10, 1293. [Google Scholar] [CrossRef] [PubMed]
- Mooijman, P.; Catal, C.; Tekinerdogan, B.; Lommen, A.; Blokland, M. The effects of data balancing approaches: A case study. Appl. Soft Comput. 2023, 132, 109853. [Google Scholar] [CrossRef]
- Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine learning with oversampling and undersampling techniques: Overview study and experimental results. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020; pp. 243–248. [Google Scholar]
- Kubus, M. Evaluation of resampling methods in the class unbalance problem. Econometrics 2020, 24, 39–50. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Menardi, G.; Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 2014, 28, 92–122. [Google Scholar] [CrossRef]
- Tsou, C.S.; Liou, C.; Cheng, L.; Zhou, H. Quality prediction through machine learning for the inspection and manufacturing process of blood glucose test strips. Cogent Eng. 2022, 9, 2083475. [Google Scholar] [CrossRef]
- Mukherjee, M.; Khushi, M. SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features. Appl. Syst. Innov. 2021, 4, 18. [Google Scholar] [CrossRef]
- Hall, M.A. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, The University of Waikato, Hamilton, New Zealand, 1999. [Google Scholar]
- Akande, K.O.; Owolabi, T.O.; Olatunji, S.O. Investigating the effect of correlation-based feature selection on the performance of neural network in reservoir characterization. J. Nat. Gas Sci. Eng. 2015, 27, 98–108. [Google Scholar]
- Bravo, F.P.; García, A.A.D.B.; Gallego, M.M.; Veiga, A.B.G.; Ruiz, M.; Peral, A.G.; Ayala, J.L. Prediction of patient’s response to OnabotulinumtoxinA treatment for migraine. Heliyon 2019, 5, e01043. [Google Scholar] [CrossRef]
- Mukherjee, M.; Khushi, M. GitHub—Mimimkh/SMOTE-ENC-Code: A New Smote Method for Dataset with Continuous and Multi-Level Categorical Features—github.com. 2021. Available online: https://github.com/Mimimkh/SMOTE-ENC-code (accessed on 20 September 2024).
- Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
- Pazzani, M.J. Constructive induction of Cartesian product attributes. In Feature Extraction, Construction and Selection: A Data Mining Perspective; Springer: Berlin/Heidelberg, Germany, 1998; pp. 341–354. [Google Scholar]
- Keogh, E.J.; Pazzani, M.J. Learning the structure of augmented Bayesian classifiers. Int. J. Artif. Intell. Tools 2002, 11, 587–601. [Google Scholar] [CrossRef]
- Webb, G.I.; Boughton, J.R.; Wang, Z. Averaged One-Dependence Estimators: Preliminary Results. In Proceedings of the AusDM, Canberra, Australia, 2 December 2002; pp. 65–74. [Google Scholar]
- Webb, G.I.; Boughton, J.R.; Wang, Z. Not so naive Bayes: Aggregating one-dependence estimators. Mach. Learn. 2005, 58, 5–24. [Google Scholar] [CrossRef]
- Dash, D.; Cooper, G.F. Exact model averaging with naive Bayesian classifiers. In Proceedings of the ICML, San Francisco, CA, USA, 8–12 July 2002; pp. 91–98. [Google Scholar]
- Hall, M. A decision tree-based attribute weighting filter for naive Bayes. In Proceedings of the International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, UK, 11–13 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 59–70. [Google Scholar]
- Sahami, M. Learning Limited Dependence Bayesian Classifiers. In Proceedings of the KDD, Portland, OR, USA, 2–4 August 1996; pp. 335–338. [Google Scholar]
- Jiang, L.; Zhang, L.; Li, C.; Wu, J. A correlation-based feature weighting filter for naive Bayes. IEEE Trans. Knowl. Data Eng. 2018, 31, 201–213. [Google Scholar] [CrossRef]
- Kirsten, L. GitHub—CBFW_Naive_Bayes: Python implementation of “A Correlation-Based Feature Weighting Filter for Naive Bayes”—github.com. 2022. Available online: https://github.com/LucasKirsten/CBFW_Naive_Bayes/tree/master (accessed on 10 September 2024).
- CrossValidated. ROSE and SMOTE Oversampling Methods. 2019. Available online: https://shorturl.at/dlEQW (accessed on 8 May 2024).
Case | Number of Records |
---|---|
Positive | 351 |
Negative | 1116 |
Technique | Package | Version | Function |
---|---|---|---|
UNDER | caret | 6.0–94 | downSample |
OVER | caret | 6.0–94 | upSample |
ROSE | ROSE | 0.0–4 | ROSE |
SMOTE-NC | RSBID | 0.0.2.0000 | SMOTE_NC |
Feature | Description | Labels |
---|---|---|
“Hypertensionpersonalhistory” | Hypertension personal history | yes/no |
“Parity” | The number of times the fetus has reached a viable gestational age | 1/2/3/4/5/6/7 or more |
“Gravidity” | The number of times the woman has been pregnant | 1/2/3/4/5/6/7 or more |
“Fetalstatus” | Previous fetal status at birth | born alive/stillborn/NA |
“Tobaccouse” | Tobacco use | yes/no |
“Diabetesfamilyhistory” | Existence of relatives with diabetes | yes/no |
“Nupucells1” | Patient vaginal infection | mild/moderate/severe |
“Maternalage-categorized” | Maternal age by ranges | State0: <35/ State1: ≥35 |
“Education_Level” | Education level | primary/secondary/tertiary |
“Specificplacearealivedincountyof” | Area where the patient resides | urban/rural |
Technique | EMRs | Positive Cases | Negative Cases |
---|---|---|---|
Baseline | 1027 | 24.15% | 75.85% |
UNDER | 498 | 50% | 50% |
OVER | 1564 | 50% | 50% |
ROSE | 1027 | 49.18% | 50.82% |
SMOTE-NC | 1564 | 50% | 50% |
SMOTE-ENC | 1564 | 50% | 50% |
Technique | Model | Mean | Std.Dev. | Max. | Min. | Range |
---|---|---|---|---|---|---|
NB | 86.42% | 4.21% | 93.60% | 81.83% | 11.77% | |
TANcl | 85.57% | 4.67% | 95.00% | 77.66% | 17.33% | |
TANhcsp | 86.42% | 4.21% | 93.60% | 81.83% | 11.77% | |
FSSJ | 88.92% | 4.90% | 97.95% | 83.00% | 14.95% | |
Baseline | kDB | 86.02% | 4.73% | 93.60% | 79.00% | 14.60% |
AODE | 85.15% | 4.41% | 91.00% | 77.66% | 13.33% | |
AWNB | 86.44% | 4.53% | 93.60% | 81.83% | 11.77% | |
MANB | 86.85% | 2.55% | 90.16% | 83.00% | 7.16% | |
CBFW | 86.44% | 4.53% | 93.60% | 81.83% | 11.77% | |
NB | 60.16% | 17.45% | 86.00% | 27.66% | 58.33% | |
TANcl | 60.01% | 18.39% | 94.33% | 36.00% | 58.33% | |
TANhcsp | 61.83% | 17.45% | 86.00% | 27.66% | 58.33% | |
FSSJ | 57.59% | 18.90% | 86.00% | 27.66% | 58.33% | |
UNDER | kDB | 54.03% | 15.98% | 77.66% | 27.66% | 50.00% |
AODE | 61.83% | 16.54% | 86.00% | 27.66% | 58.33% | |
AWNB | 58.57% | 11.02% | 69.33% | 36.00% | 33.33% | |
MANB | 56.53% | 13.67% | 77.66% | 38.27% | 39.39% | |
CBFW | 58.42% | 16.11% | 77.66% | 27.66% | 50.00% | |
NB | 72.95% | 8.76% | 86.00% | 58.36% | 27.63% | |
TANcl | 78.04% | 5.79% | 86.67% | 67.75% | 18.91% | |
TANhcsp | 76.91% | 6.12% | 84.68% | 67.75% | 16.92% | |
FSSJ | 76.67% | 5.17% | 83.97% | 67.75% | 16.21% | |
OVER | kDB | 76.37% | 6.86% | 83.97% | 62.35% | 21.62% |
AODE | 82.08% | 8.93% | 97.11% | 65.05% | 32.05% | |
AWNB | 72.69% | 9.34% | 86.00% | 61.00% | 25.00% | |
MANB | 67.78% | 7.49% | 80.44% | 56.94% | 23.49% | |
CBFW | 67.78% | 7.49% | 80.44% | 56.94% | 23.49% | |
NB | 73.78% | 8.74% | 90.16% | 59.00% | 31.16% | |
TANcl | 71.97% | 14.04% | 91.55% | 52.66% | 38.88% | |
TANhcsp | 76.38% | 8.97% | 91.55% | 63.00% | 26.55% | |
FSSJ | 67.96% | 8.57% | 83.00% | 56.83% | 26.16% | |
ROSE | kDB | 72.96% | 10.43% | 90.16% | 59.00% | 31.16% |
AODE | 74.85% | 10.04% | 95.00% | 56.83% | 38.16% | |
AWNB | 61.80% | 1.03% | 63.00% | 61.00% | 2.00% | |
MANB | 66.80% | 8.15% | 77.66% | 51.00% | 26.66% | |
CBFW | 74.85% | 10.04% | 95.00% | 56.83% | 38.16% | |
NB | 80.48% | 6.11% | 88.77% | 70.45% | 18.31% | |
TANcl | 79.97% | 6.29% | 89.37% | 70.45% | 18.91% | |
TANhcsp | 81.02% | 5.32% | 88.77% | 74.15% | 14.61% | |
FSSJ | 80.70% | 7.62% | 92.08% | 68-89% | 23.18% | |
SMOTE-NC | kDB | 81.81% | 5.68% | 89.37% | 72.11% | 17.26% |
AODE | 79.93% | 6.39% | 86.67% | 67.75% | 18.91% | |
AWNB | 80.22% | 6.57% | 91.55% | 70.45% | 21.09% | |
MANB | 76.41% | 8.48% | 94.33% | 66.55% | 27.77% | |
CBFW | 79.69% | 7.47% | 91.55% | 70,45% | 21.09% | |
NB | 78.59% | 6.91% | 89.37% | 67.75% | 21.62% | |
TANcl | 81.05% | 8.47% | 94.78% | 67.75% | 27.02% | |
TANhcsp | 79.97% | 6.96% | 91.55% | 71.52% | 20.02% | |
FSSJ | 83.73% | 8.15% | 97.48% | 68.89% | 28.59% | |
SMOTE-ENC | kDB | 79.41% | 6.42% | 91.55% | 71.52% | 20.02% |
AODE | 83.50% | 7.01% | 94.33% | 68.89% | 25.43% | |
AWNB | 77.52% | 6.79% | 84.68% | 66.26% | 18.42% | |
MANB | 68.83% | 6.40% | 79.42% | 58.36% | 21.05% | |
CBFW | 79.94% | 4.61% | 86.67% | 70.45% | 16.21% |
Technique | Model | Accuracy | Sensitivity | Specificity | Precision | Recall | F1-Score | AUC |
---|---|---|---|---|---|---|---|---|
NB | 86.69% | 39.16% | 98.32% | 60.00% | 39.16% | 47.39% | 0.419884 | |
TANcl | 81.84% | 35.00% | 96.05% | 45.29% | 35.00% | 39.48% | 0.429377 | |
TANhcsp | 86.69% | 39.16% | 98.32% | 60.00% | 39.16% | 47.39% | 0.419884 | |
FSSJ | 88.64% | 35.00% | 99.71% | 70.00% | 35.00% | 46.66% | 0.498206 | |
Baseline | kDB | 86.69% | 39.16% | 98.32% | 60.00% | 39.16% | 47.39% | 0.419884 |
AODE | 84.75% | 35.00% | 99.87% | 52.85% | 35.00% | 42.11% | 0.433069 | |
AWNB | 85.72% | 18.33% | 99.92% | 50.00% | 18.33% | 26.82% | 0.428850 | |
MANB | 88.64% | 35.00% | 99.71% | 70.00% | 35.00% | 46.66% | 0.434651 | |
CBFW | 86.69% | 39.16% | 98.32% | 60.00% | 39.16% | 47.39% | 0.419884 | |
NB | 77.96% | 64.16% | 82.15% | 47.14% | 64.16% | 54.35% | 0.573787 | |
TANcl | 64.36% | 72.50% | 61.89% | 38.30% | 72.50% | 50.12% | 0.559547 | |
TANhcsp | 67.28% | 60.00% | 69.49% | 37.27% | 60.00% | 45.98% | 0.516403 | |
FSSJ | 63.39% | 72.50% | 60.63% | 37.77% | 72.50% | 49.67% | 0.496888 | |
UNDER | kDB | 71.16% | 64.16% | 73.29% | 40.95% | 64.16% | 49.99% | 0.577479 |
AODE | 68.25% | 68.33% | 68.22% | 39.78% | 68.33% | 50.29% | 0.433597 | |
AWNB | 76.01% | 51.66% | 83.41% | 42.25% | 51.66% | 46.49% | 0.446782 | |
MANB | 88.64% | 35.16% | 91.42% | 68.33% | 39.16% | 49.79% | 0.425158 | |
CBFW | 68.25% | 68.33% | 68.22% | 39.78% | 68.33% | 50.29% | 0.433597 | |
NB | 72.13% | 64.16% | 74.55% | 41.70% | 64.16% | 50.55% | 0.588028 | |
TANcl | 73.10% | 60.00% | 77.08% | 41.57% | 60.00% | 49.11% | 0.540032 | |
TANhcsp | 76.01% | 76.66% | 75.82% | 47.20% | 76.66% | 58.43% | 0.585918 | |
FSSJ | 68.25% | 68.33% | 68.22% | 39.78% | 68.33% | 50.29% | 0.517879 | |
OVER | kDB | 74.07% | 76.66% | 73.29% | 45.55% | 76.66% | 57.15% | 0.581699 |
AODE | 72.13% | 68.33% | 73.29% | 42.55% | 68.33% | 52.45% | 0.576425 | |
AWNB | 73.10% | 76.66% | 72.02% | 44.78% | 76.66% | 56.53% | 0.600686 | |
MANB | 88.64% | 35.16% | 91.42% | 70.00% | 35.00% | 46.66% | 0.547943 | |
CBFW | 73.10% | 60.00% | 77.08% | 41.57% | 60.00% | 49.11% | 0.540032 | |
NB | 55.63% | 56.83% | 51.66% | 29.23% | 51.66% | 37.33% | 0.443090 | |
TANcl | 52.71% | 55.83% | 51.77% | 29.29% | 51.77% | 38.43% | 0.481065 | |
TANhcsp | 51.74% | 51.66% | 51.77% | 27.85% | 51.66% | 36.19% | 0.512711 | |
FSSJ | 57.57% | 43.33% | 61.89% | 27.39% | 43.33% | 33.56% | 0.488976 | |
ROSE | kDB | 56.60% | 55.83% | 56.83% | 30.75% | 55.83% | 39.66% | 0.491613 |
AODE | 51.74% | 55.83% | 50.50% | 28.96% | 55.83% | 38.14% | 0.437816 | |
AWNB | 56.60% | 51.66% | 58.10% | 29.60% | 51.66% | 37.64% | 0.442035 | |
MANB | 31.35% | 85.00% | 15.06% | 29.35% | 85.00% | 43.63% | 0.426213 | |
CBFW | 63.25% | 59.16% | 64.49% | 33.88% | 59.16% | 43.09% | 0.565243 | |
NB | 70.19% | 47.50% | 77.08% | 35.71% | 47.50% | 40.77% | 0.558597 | |
TANcl | 69.22% | 47.50% | 75.82% | 35.00% | 47.50% | 40.30% | 0.550158 | |
TANhcsp | 68.25% | 51.66% | 73.29% | 35.64% | 51.66% | 42.18% | 0.531170 | |
FSSJ | 59.51% | 64.16% | 58.10% | 34.07% | 64.16% | 44.51% | 0.569936 | |
SMOTE-NC | kDB | 69.75% | 55.83% | 73.29% | 37.50% | 55.83% | 44.86% | 0.559651 |
AODE | 69.75% | 55.83% | 73.29% | 37.50% | 55.83% | 44.86% | 0.534862 | |
AWNB | 56.60% | 60.00% | 55.56% | 31.81% | 60.00% | 41.58% | 0.561761 | |
MANB | 50.77% | 64.16% | 40.37% | 34.65% | 85.00% | 49.23% | 0.541719 | |
CBFW | 63.39% | 72.50% | 60.63% | 37.77% | 72.50% | 49.67% | 0.496888 | |
NB | 79.45% | 84.59% | 74.32% | 77.64% | 84.59% | 80.97% | 0.674054 | |
TANcl | 83.10% | 91.48% | 74.72% | 79.07% | 91.48% | 84.82% | 0.811358 | |
TANhcsp | 84.72% | 91.48% | 77.97% | 81.19% | 91.48% | 86.03% | 0.824361 | |
FSSJ | 84.72% | 91.48% | 77.97% | 81.19% | 91.48% | 86.03% | 0.824361 | |
SMOTE-ENC | kDB | 79.59% | 85.54% | 73.64% | 76.98% | 85.54% | 81.03% | 0.772367 |
AODE | 63.25% | 59.16% | 64.49% | 33.88% | 59.16% | 43.09% | 0.565243 | |
AWNB | 74.72% | 92.56% | 56.89% | 69.54% | 92.56% | 79.41% | 0.723558 | |
MANB | 65.00% | 97.31% | 28.78% | 60.79% | 97.31% | 74.83% | 0.649694 | |
CBFW | 64.36% | 72.50% | 61.89% | 38.30% | 72.50% | 50.12% | 0.559547 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Parrales-Bravo, F.; Caicedo-Quiroz, R.; Tolozano-Benitez, E.; Gómez-Rodríguez, V.; Cevallos-Torres, L.; Charco-Aguirre, J.; Vasquez-Cevallos, L. OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia. Mathematics 2024, 12, 3351. https://doi.org/10.3390/math12213351
Parrales-Bravo F, Caicedo-Quiroz R, Tolozano-Benitez E, Gómez-Rodríguez V, Cevallos-Torres L, Charco-Aguirre J, Vasquez-Cevallos L. OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia. Mathematics. 2024; 12(21):3351. https://doi.org/10.3390/math12213351
Chicago/Turabian StyleParrales-Bravo, Franklin, Rosangela Caicedo-Quiroz, Elena Tolozano-Benitez, Víctor Gómez-Rodríguez, Lorenzo Cevallos-Torres, Jorge Charco-Aguirre, and Leonel Vasquez-Cevallos. 2024. "OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia" Mathematics 12, no. 21: 3351. https://doi.org/10.3390/math12213351
APA StyleParrales-Bravo, F., Caicedo-Quiroz, R., Tolozano-Benitez, E., Gómez-Rodríguez, V., Cevallos-Torres, L., Charco-Aguirre, J., & Vasquez-Cevallos, L. (2024). OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia. Mathematics, 12(21), 3351. https://doi.org/10.3390/math12213351