Author Contributions
Conceptualization, Z.S. and W.Z.; methodology, W.A. and I.U.H.; software, Z.S.; validation, M.H.A.-A., Y.Y.G. and A.A.; formal analysis, G.H.; investigation, Z.S., M.H.A.-A. and A.A.; resources, Y.Y.G. and G.H.; data curation, W.Z., W.A. and I.U.H.; writing—original draft preparation, Z.S., W.Z., W.A. and I.U.H.; writing—review and editing, G.H., M.H.A.-A., Y.Y.G. and A.A; visualization, Z.S. and G.H.; supervision, G.H.; project administration, M.H.A.-A., Y.Y.G. and A.A.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Accuracy and inconsistence rate.
Figure 1.
Accuracy and inconsistence rate.
Figure 2.
MCC, ROC, and PRC by WEKA.
Figure 2.
MCC, ROC, and PRC by WEKA.
Figure 3.
MCC, ROC, and PRC by Python.
Figure 3.
MCC, ROC, and PRC by Python.
Figure 4.
Performance metric through WEKA.
Figure 4.
Performance metric through WEKA.
Figure 5.
Performance metrics through Python.
Figure 5.
Performance metrics through Python.
Figure 6.
ROC curve using Decision Tree.
Figure 6.
ROC curve using Decision Tree.
Figure 7.
ROC curve using KNN.
Figure 7.
ROC curve using KNN.
Figure 8.
ROC curve using Logistic Regression.
Figure 8.
ROC curve using Logistic Regression.
Figure 9.
ROC curve using Naïve Bayes.
Figure 9.
ROC curve using Naïve Bayes.
Figure 10.
ROC curve using Random Forest.
Figure 10.
ROC curve using Random Forest.
Figure 11.
ROC curve using SVM.
Figure 11.
ROC curve using SVM.
Table 1.
Comparison to past literature.
Table 1.
Comparison to past literature.
Author | Dataset | Technique | Tools | Accuracy |
---|
Deepti Sisodia et al. [5] | PIDD | SVM, Naïve Bayes, Decision Tree | WEKA | 76.30% |
Steffi Dr. R. Balsubram et al. [6] | PIDD | SVM, Decision Tree, Decision Table | MATLAB | 74.9% |
R. Madhusmita, Amandeep, K et al. [8] | PIDD | LR, SVM, KNN, NB, DT | - | 82.35% |
Varma, K. M., and Panda, B. S et al. [9] | PIMA | Naïve Bayes, SVM, Logistic Regression, Decision Tree | R-tool | 74.67% |
Wu, H., Yang, S., Huang, Z., He, J., and Wang, X. et al. [10] | PIMA | K-means, Logistic Regression | WEKA | Applicable |
O. Dr. O., S. Dr. K., and B Ramudu et al. (2020) [11] | PIMA | Clustering regression, SVM, KNN, Neural Network | R-Studio | 78% |
Talha Mahboob Alama, Muhammad Atif Iqbala et al. [15] | UCI ML Repository | ANN, K-Mean, Random Forest | - | 75.7% |
Mukesh kumari1, Dr. Rajan Vohra, Anshul arora et al. [16] | PIMA | Bayesian Network Classifier | WEKA | 70.60% |
Mir, A., and Dhage, S. N. et al. [17] | PIMA | Naïve Bayes, SVM, CART, Random Forest | WEKA | 79.13% |
Huma. Naz and Sachin. Ahuja et al. [18] | PIMA | ANN, DT, DL, NB | - | 90–98% |
A. Iyer, J. S, and R. Sumbaly et al. [19] | PIMA Dataset | DT, NB | WEKA | 79% |
Abdulhadi, N., and Al-mousa, A. et al. [13] | PIMA Dataset | RF, SVM, LDA, VC, SCV | Python | 82% |
Chou, C.-Y.; Hsu, D.-Y.; Chou, C.-H. et al. [14] | PIMA Dataset with Patients ID | LR, Bi-NN, RF, DT | Python | 95% |
Table 2.
Attribute descriptions.
Table 2.
Attribute descriptions.
Attribute Name | Attribute Description | Min | Max | Mean | St_dev |
---|
Pregnancies | Number of times pregnant | 0.00 | 17.00 | 3.845 | 3.37 |
Glucose | 2 h plasma glucose concentration in an oral glucose tolerance test | 0.00 | 199.00 | 120.895 | 31.973 |
Blood Pressure | Diastolic blood pressure (mm Hg) | 0.00 | 122.00 | 69.105 | 19.356 |
Skin Thickness | Triceps skin fold thickness (mm) | 0.00 | 99.00 | 20.536 | 15.244 |
Insulin | Serum insulin 2 h (muU/mL) | 0.00 | 846 | 79.799 | 115.244 |
BMI | Body mass index (weight in kg/(height in m)2) | 0.00 | 67.10 | 31.993 | 7.884 |
Diabetes Pedigree Function | Diabetes pedigree function | 0.00 | 2.24 | 0.472 | 0.331 |
Age | Age (year) | 0.00 | 81.00 | 33.241 | 11.78 |
Class | Class variable (1 for positive and 0 for negative) | 0.00 | 1.00 | - | - |
Table 3.
Confusion matrix through WEKA using the decision tree algorithm.
Table 3.
Confusion matrix through WEKA using the decision tree algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 134 (TP) | 24 (TN) |
Negative | 24 (FN) | 48 (FP) |
Table 4.
Actual class and ICI details achieved via Decision Tree through WEKA.
Table 4.
Actual class and ICI details achieved via Decision Tree through WEKA.
Decision Tree | No. of Instances |
---|
Actual Class | 182 |
Table 5.
Decision Tree details of accuracy through WEKA.
Table 5.
Decision Tree details of accuracy through WEKA.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.848 | 0.333 | 0.848 | 0.848 | 0.848 |
1 | 0.667 | 0.152 | 0.667 | 0.667 | 0.667 |
Weighted Average | 0.791 | 0.277 | 0.791 | 0.791 | 0.791 |
Table 6.
Confusion matrix through Python using decision tree algorithm.
Table 6.
Confusion matrix through Python using decision tree algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 120 (TP) | 38 (TN) |
Negative | 27 (FN) | 45 (FP) |
Table 7.
Actual class and ICI details achieved via Decision Tree through Python.
Table 7.
Actual class and ICI details achieved via Decision Tree through Python.
Decision Tree | No. of Instances |
---|
Actual Class | 165 |
Incorrectly Classified Instances | 65 |
Total Number of Instances | 230 |
Table 8.
Decision Tree details of accuracy through Python.
Table 8.
Decision Tree details of accuracy through Python.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.625 | 0.1257 | 0.82 | 0.76 | 0.79 |
1 | 0.8742 | 0.375 | 0.54 | 0.62 | 0.58 |
Weighted Average | 0.7496 | 0.2503 | 0.73 | 0.72 | 0.72 |
Table 9.
Confusion matrix through WEKA using logistic regression algorithm.
Table 9.
Confusion matrix through WEKA using logistic regression algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 142 (TP) | 16 (TN) |
Negative | 29 (FN) | 43 (FP) |
Table 10.
Actual class and ICI details achieved via Logistic Regression through WEKA.
Table 10.
Actual class and ICI details achieved via Logistic Regression through WEKA.
Logistic Regression | No. of Instances |
---|
Actual Class | 185 |
Incorrectly Classified Instances | 45 |
Total Number of Instances | 230 |
Table 12.
Actual class and ICI details achieved via Logistic Regression through Python.
Table 12.
Actual class and ICI details achieved via Logistic Regression through Python.
Logistic Regression | No. of Instances |
---|
Actual Class | 183 |
Incorrectly Classified Instances | 47 |
Total Number of Instances | 230 |
Table 13.
Confusion matrix through WEKA using random forest algorithm.
Table 13.
Confusion matrix through WEKA using random forest algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 138 (TP) | 20 (TN) |
Negative | 34 (FN) | 38 (FP) |
Table 14.
Random Forest details of accuracy through WEKA.
Table 14.
Random Forest details of accuracy through WEKA.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.873 | 0.472 | 0.802 | 0.873 | 0.836 |
1 | 0.528 | 0.127 | 0.655 | 0.528 | 0.585 |
Weighted Average | 0.765 | 0.364 | 0.756 | 0.756 | 0.758 |
Table 15.
Confusion matrix through Python using random forest algorithm.
Table 15.
Confusion matrix through Python using random forest algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 137 (TP) | 21 (TN) |
Negative | 28 (FN) | 45 (FP) |
Table 16.
Random Forest details of accuracy through Python.
Table 16.
Random Forest details of accuracy through Python.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.6111 | 0.1320 | 0.83 | 0.87 | 0.85 |
1 | 0.8679 | 0.3888 | 0.68 | 0.61 | 0.64 |
Weighted Average | 0.7395 | 0.2604 | 0.78 | 0.79 | 0.78 |
Table 17.
Confusion matrix through WEKA. using K-NN algorithm.
Table 17.
Confusion matrix through WEKA. using K-NN algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 126 (TP) | 32 (TN) |
Negative | 29 (FN) | 43 (FP) |
Table 18.
Actual class and ICI details achieved via KNN through WEKA.
Table 18.
Actual class and ICI details achieved via KNN through WEKA.
KNN | No. of Instances |
---|
Actual Class | 169 |
Incorrectly Classified Instances | 61 |
Total Number of Instances | 230 |
Table 19.
KNN details of accuracy through WEKA.
Table 19.
KNN details of accuracy through WEKA.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.797 | 0.403 | 0.813 | 0.797 | 0.805 |
1 | 0.597 | 0.203 | 0.573 | 0.597 | 0.585 |
Weighted Average | 0.735 | 0.340 | 0.738 | 0.735 | 0.736 |
Table 20.
Confusion matrix through Python.
Table 20.
Confusion matrix through Python.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 124 (TP) | 34 (TN) |
Negative | 30 (FN) | 42 (FP) |
Table 21.
Actual class and ICI details achieved via KNN through Python.
Table 21.
Actual class and ICI details achieved via KNN through Python.
Naïve Bayes | No. of Instances |
---|
Actual Class | 166 |
Incorrectly Classify Instances | 64 |
Total Number Instances | 230 |
Table 22.
KNN details of accuracy through Python.
Table 22.
KNN details of accuracy through Python.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.5833 | 0.2138 | 0.81 | 0.79 | 0.80 |
1 | 0.7861 | 0.4166 | 0.55 | 0.58 | 0.57 |
Weighted Average | 0.6847 | 0.3152 | 0.73 | 0.72 | 0.72 |
Table 23.
Confusion matrix through WEKA using Naïve Bayes Algorithm.
Table 23.
Confusion matrix through WEKA using Naïve Bayes Algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 133 (TP) | 25 (TN) |
Negative | 28 (FN) | 42 (FP) |
Table 24.
Actual class and ICI details achieved via Naïve Bayes through WEKA.
Table 24.
Actual class and ICI details achieved via Naïve Bayes through WEKA.
Naïve Bayes | No. of Instances |
---|
Actual Class | 177 |
Incorrectly Classified Instances | 53 |
Total Number of Instances | 230 |
Table 25.
Naïve Bayes details of accuracy through WEKA.
Table 25.
Naïve Bayes details of accuracy through WEKA.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.842 | 0.389 | 0.826 | 0.842 | 0.834 |
1 | 0.611 | 0.158 | 0.638 | 0.611 | 0.624 |
Weighted Average | 0.770 | 0.317 | 0.767 | 0.770 | 0.768 |
Table 26.
Confusion matrix through Python using Naïve Bayes algorithm.
Table 26.
Confusion matrix through Python using Naïve Bayes algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 132 (TP) | 26 (TN) |
Negative | 28 (FN) | 44 (FP) |
Table 27.
Actual class and ICI details achieved via Naïve Bayes through Python.
Table 27.
Actual class and ICI details achieved via Naïve Bayes through Python.
Naïve Bayes | No. of Instances |
---|
Actual Class | 176 |
Incorrectly Classified Instances | 54 |
Total Number of Instances | 230 |
Table 28.
Naïve Bayes details of accuracy through Python.
Table 28.
Naïve Bayes details of accuracy through Python.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.6111 | 0.1635 | 0.83 | 0.84 | 0.83 |
1 | 0.8364 | 0.3888 | 0.63 | 0.61 | 0.62 |
Weighted Average | 0.7237 | 0.2762 | 0.76 | 0.77 | 0.77 |
Table 29.
Confusion matrix through WEKA.
Table 29.
Confusion matrix through WEKA.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 143 (TP) | 15 (TN) |
Negative | 33 (FN) | 39 (FP) |
Table 30.
Actual class and ICI details achieved via SVM through WEKA.
Table 30.
Actual class and ICI details achieved via SVM through WEKA.
SVM | No. of Instances |
---|
Actual Class | 182 |
Incorrectly Classify Instances | 48 |
Total Number Instances | 230 |
Table 31.
SVM details of accuracy through WEKA.
Table 31.
SVM details of accuracy through WEKA.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.905 | 0.458 | 0.813 | 0.705 | 0.856 |
1 | 0.542 | 0.095 | 0.722 | 0.542 | 0.619 |
Weighted Average | 0.791 | 0.345 | 0.784 | 0.791 | 0.782 |
Table 32.
Confusion matrix through Python using Support Vector Machine algorithm.
Table 32.
Confusion matrix through Python using Support Vector Machine algorithm.
| | Actual Values |
---|
Predicted Values | | Positive | Negative |
Positive | 13 (TP) | 20 (TN) |
Negative | 28 (FN) | 44 (FP) |
Table 33.
Actual class and ICI details achieved via SVM through Python.
Table 33.
Actual class and ICI details achieved via SVM through Python.
SVM | No. of Instances |
---|
Actual Class | 182 |
Incorrectly Classify Instances | 48 |
Total Number Instances | 230 |
Table 34.
SVM details of accuracy through Python.
Table 34.
SVM details of accuracy through Python.
Class | TPR | FPR | Prec | Rec | F-M |
---|
0 | 0.5604 | 0.1079 | 0.83 | 0.87 | 0.85 |
1 | 0.8930 | 0.4305 | 0.69 | 0.61 | 0.65 |
Weighted Average | 0.7312 | 0.2687 | 0.79 | 0.79 | 0.79 |
Table 35.
Accuracy and inconsistence rate.
Table 35.
Accuracy and inconsistence rate.
Accuracy of Classification Algorithms | WEKA Accuracy | WEKA Inconsistence | Python Accuracy | Python Inconsistence |
---|
Decision Tree | 80.10% | 19.90% | 71.86% | 28.14% |
Logistic Regression | 80.43% | 19.57% | 79.66% | 20.34% |
Random Forest | 79.87% | 20.13% | 78.79% | 21.21% |
Naïve Bayes | 76.56% | 23.44% | 76.62% | 23.38% |
KNN | 73.38% | 26.62% | 72.30% | 27.70 % |
SVM | 80% | 20% | 79.22% | 20.78% |
Table 36.
MCC, ROC, and PRA area.
Table 36.
MCC, ROC, and PRA area.
Machine Learning Algorithms | MCC | ROC | PRC | Python MCC | Python ROC | Python PRC |
---|
Decision Tree | 0.515 | 0.839 | 0.837 | 0.372 | 0.693 | 0.642 |
Logistic Regression | 0.527 | 0.848 | 0.859 | 0.514 | 0.842 | 0.694 |
Random Forest | 0.428 | 0.832 | 0.845 | 0.493 | 0.827 | 0.707 |
Naïve Bayes | 0.458 | 0.845 | 0.862 | 0.451 | 0.820 | 0.661 |
KNN | 0.39 | 0.697 | 0.688 | 0.364 | 0.754 | 0.566 |
SVM | 0.489 | 0.723 | 0.717 | 0.064 | 0.832 | 0.714 |
Table 37.
TPR, TNR, precision, recall, and F-measure through WEKA.
Table 37.
TPR, TNR, precision, recall, and F-measure through WEKA.
Algorithms | TPR | TNR | Precision | Recall | F-Measure |
---|
DT | 0.791 | 0.277 | 0.791 | 0.791 | 0.791 |
LR | 0.804 | 0.308 | 0.799 | 0.804 | 0.799 |
RF | 0.765 | 0.364 | 0.756 | 0.756 | 0.758 |
NB | 0.770 | 0.317 | 0.767 | 0.770 | 0.768 |
KNN | 0.735 | 0.340 | 0.738 | 0.735 | 0.736 |
SVM | 0.791 | 0.345 | 0.784 | 0.791 | 0.782 |
Table 38.
TPR, TNR, precision, recall, and F-measure through Python.
Table 38.
TPR, TNR, precision, recall, and F-measure through Python.
Algorithms | TPR Python | TNR Python | Precision Python | Recall Python | F-Measure Python |
---|
DT | 0.7496 | 0.2503 | 0.73 | 0.72 | 0.72 |
LR | 0.7496 | 0.2503 | 0.79 | 0.80 | 0.79 |
RF | 0.7395 | 0.2604 | 0.78 | 0.79 | 0.78 |
NB | 0.7237 | 0.2762 | 0.76 | 0.77 | 0.77 |
KNN | 0.6847 | 0.3152 | 0.73 | 0.72 | 0.72 |
SVM | 0.7312 | 0.2687 | 0.79 | 0.79 | 0.79 |