A New Under-Sampling Method to Face Class Overlap and Imbalance
Abstract
:1. Introduction
- Algorithmic-level methods. These consist in internally biasing the discrimination-based process so as to compensate for the class imbalance.
- Data-level methods. These perform some sort of data preprocessing with the aim of reducing the imbalance ratio.
- Cost-sensitive methods. These incorporate distinct misclassification costs into the classification process and assign higher misclassification costs to the errors on the minority class.
- Ensemble-based techniques. These consist of a combination between an ensemble algorithm and either the data-level or the cost-sensitive approaches. In the first one, data preprocessing is performed before training the classifier, whereas in the second one, the low cost guides the use of the ensemble algorithm.
2. Resampling Algorithms to Face Class Imbalance
2.1. Neighborhood-Based Algorithms
2.2. Clustering-Based Algorithms
3. The DBMIST-US Algorithm
- Cleaning stage (lines 3–7). The DBSCAN clustering method [30] is applied to the set with the negative instances to produce a noise-free subset.
- Core stage (line 8). An MST is built to get a core representation from the majority class. The result is a subset of the majority class with less dispersion than the set obtained in the previous stage.
Algorithm 1 DBMIST-US |
Input: , Output: DS’
|
Algorithm 2 MSTGraph |
Input:
, Output:
|
3.1. Core Stage
- a set of vertices.
- a set of edges.
- , where is the Euclidean distance between v and u.
- Choose an initial vertex v at random.
- Select the edge with the lowest weight in and mark v as visited. Now, the next vertex to be analyzed is u.
- Repeat Step 2 now taking u as the initial vertex of the edge e and while there exists a vertex z such that z has not been visited yet, e = {u,z}.
- The MST will be built doing backtracking on the already marked vertices until each vertex has been marked.
3.2. Time Complexity
- The computation of the incidence matrix from takes steps.
- The computation of an MST based on the Prim’s algorithm takes steps.
3.3. Differences between DBMIST-US and Related Works
4. Experimental Set-Up
- Q1.
- What is the classification performance of DBMIST-US in comparison to several state-of-the-art under-sampling algorithms?
- Q2.
- How robust is DBMIST-US across the use of different classification models?
- Q3.
- What is the impact of each under-sampling algorithm on the imbalance ratio?
4.1. Data Sets
4.2. Reference Under-Sampling Algorithms and Classifiers
4.3. Evaluation Metrics
- True-positive rate (or sensitivity). It measures the proportion of positive instances correctly classified, .
- True-negative rate (or specificity). It measures the proportion of negative examples correctly classified, .
5. Results
5.1. Classification Performance Comparison with State-of-the-Art Methods
5.2. Statistical Significance Analysis
5.3. Evaluation of the Impact on the Imbalance Ratio
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
DBSCAN | Density Based Spatial Clustering of Applications with Noise |
MST | Minimum Spanning Tree |
DBMIST-US | DBSCAN and MST - Under-Sampling |
IR | Imbalance ratio |
kNN | k Nearest Neighbor |
SVM | Support Vector Machine |
SMOTE | Synthetic Minority Over-sampling Technique |
Appendix A. Classification Results
Original | RUS | CNN | NCL | TL | ENN | OSS | EUS | EE | BC | RUSBOOST | SBC | ClusterOSS | DBMIST-US | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
yeast-0-5-6-7-9_vs_4 | 61.2 | 73.5 | 50.7 | 83.0 | 69.0 | 78.9 | 72.2 | 78.4 | 73.2 | 73.5 | 76.7 | 77.1 | 58.9 | 75.4 |
glass-0-1-6_vs_2 | 47.0 | 74.4 | 51.0 | 66.9 | 52.6 | 67.9 | 52.9 | 68.3 | 76.2 | 73.5 | 64.1 | 53.1 | 33.0 | 76.3 |
glass2 | 40.7 | 76.2 | 51.6 | 53.1 | 41.0 | 67.1 | 47.4 | 64.7 | 60.0 | 67.6 | 68.7 | 53.1 | 66.9 | 76.9 |
shuttle-c0-vs-c4 7 | 99.6 | 99.6 | 100.0 | 99.6 | 99.6 | 99.6 | 100.0 | 99.6 | 99.6 | 99.6 | 99.6 | 99.6 | 98.5 | 99.6 |
yeast-1_vs_7 | 62.3 | 74.8 | 46.2 | 76.8 | 67.6 | 70.4 | 69.9 | 66.6 | 59.6 | 58.1 | 65.9 | 67.2 | 62.4 | 69.2 |
glass4 | 86.6 | 79.9 | 65.3 | 91.3 | 86.6 | 87.7 | 83.6 | 80.7 | 84.3 | 84.3 | 87.4 | 86.6 | 56.9 | 92.1 |
ecoli4 | 86.3 | 95.0 | 81.4 | 89.4 | 86.3 | 92.2 | 86.2 | 92.5 | 100.0 | 100.0 | 92.2 | 91.9 | 98.9 | 100.0 |
page-blocks-1-3_vs_4 | 98.2 | 98.2 | 94.5 | 98.2 | 98.2 | 98.2 | 98.2 | 94.5 | 94.6 | 96.4 | 96.6 | 99.9 | 92.2 | 98.2 |
glass-0-1-6_vs_5 | 80.9 | 77.0 | 65.1 | 100.0 | 80.9 | 100.0 | 79.3 | 94.3 | 94.3 | 94.3 | 91.0 | 94.0 | 78.1 | 95.4 |
yeast-1-4-5-8_vs_7 | 44.1 | 64.5 | 39.7 | 62.7 | 47.9 | 48.0 | 54.2 | 69.7 | 69.3 | 60.0 | 65.1 | 57.0 | 64.9 | 61.3 |
glass5 | 81.1 | 94.3 | 75.2 | 93.8 | 81.2 | 94.0 | 79.8 | 88.2 | 72.0 | 72.0 | 91.2 | 93.3 | 44.3 | 92.7 |
yeast-2_vs_8 | 77.0 | 59.8 | 58.3 | 80.6 | 77.3 | 80.5 | 77.4 | 54.8 | 77.5 | 72.5 | 75.1 | 80.2 | 71.5 | 74.5 |
flare-F | 30.3 | 81.4 | 30.6 | 65.9 | 33.7 | 72.6 | 26.2 | 78.9 | 76.6 | 68.5 | 77.3 | 26.1 | 0.0 | 87.3 |
yeast4 | 58.8 | 83.3 | 43.7 | 80.1 | 62.2 | 71.3 | 58.9 | 84.3 | 77.4 | 82.4 | 77.1 | 78.6 | 75.4 | 79.8 |
yeast-1-2-8-9_vs_7 | 47.6 | 56.6 | 42.5 | 65.4 | 54.2 | 60.4 | 60.0 | 64.5 | 63.2 | 69.9 | 65.0 | 47.6 | 69.2 | 66.1 |
yeast5 | 82.1 | 100.0 | 65.5 | 96.3 | 87.6 | 94.1 | 88.7 | 93.2 | 92.0 | 96.5 | 94.6 | 82.1 | 92.5 | 97.1 |
ecoli-0-1-3-7_vs_2-6 | 83.7 | 78.2 | 48.8 | 84.4 | 84.0 | 84.4 | 84.2 | 92.6 | 71.4 | 71.4 | 81.9 | 84.2 | 83.7 | 87.3 |
abalone-17_vs_7-8-9-10 | 50.6 | 77.6 | 44.8 | 66.7 | 45.3 | 61.4 | 50.7 | 73.9 | 78.2 | 73.2 | 78.6 | 58.3 | 44.8 | 78.9 |
yeast6 | 71.1 | 69.9 | 54.8 | 86.1 | 79.0 | 82.7 | 80.6 | 81.0 | 75.1 | 81.4 | 83.7 | 71.2 | 88.0 | 81.1 |
poker-8-9_vs_5 | 19.9 | 53.1 | 14.8 | 20.0 | 20.0 | 20.0 | 20.0 | 61.2 | 45.3 | 63.5 | 65.1 | 34.5 | 64.0 | 80.0 |
subcl-0 | 86.2 | 89.9 | 66.1 | 91.1 | 90.4 | 95.1 | 87.8 | 94.5 | 90.4 | 92.5 | 89.5 | 91.7 | 88.6 | 98.1 |
subcl-30 | 72.0 | 72.5 | 58.0 | 78.9 | 75.3 | 83.4 | 75.5 | 74.5 | 83.5 | 77.0 | 77.3 | 83.3 | 87.2 | 90.6 |
subcl-50 | 60.5 | 69.0 | 46.0 | 73.6 | 68.2 | 81.9 | 68.3 | 74.0 | 71.5 | 81.0 | 74.5 | 80.6 | 86.6 | 89.3 |
subcl-70 | 52.5 | 70.5 | 35.9 | 72.3 | 62.7 | 82.4 | 62.0 | 79.8 | 72.5 | 73.5 | 71.6 | 74.0 | 85.5 | 93.0 |
clover-0 | 89.1 | 90.4 | 59.5 | 98.7 | 94.8 | 96.8 | 95.1 | 89.9 | 89.4 | 93.4 | 91.6 | 98.1 | 87.6 | 100.0 |
clover-30 | 80.4 | 78.0 | 50.3 | 86.8 | 81.6 | 88.0 | 81.9 | 78.5 | 80.9 | 76.9 | 82.6 | 87.6 | 80.4 | 96.3 |
clover-50 | 70.0 | 74.5 | 48.6 | 83.1 | 75.9 | 87.5 | 75.6 | 75.8 | 75.4 | 76.9 | 76.3 | 85.9 | 81.6 | 98.5 |
clover-70 | 62.5 | 73.9 | 41.1 | 80.4 | 71.9 | 87.4 | 75.3 | 77.5 | 74.9 | 77.0 | 73.4 | 78.7 | 85.0 | 99.1 |
paw-0 | 93.5 | 94.9 | 60.3 | 98.4 | 96.7 | 98.5 | 95.6 | 97.0 | 93.0 | 92.4 | 95.1 | 98.7 | 71.9 | 97.3 |
paw-30 | 78.3 | 77.5 | 51.9 | 86.7 | 83.5 | 91.8 | 83.7 | 83.0 | 83.5 | 77.9 | 83.0 | 87.5 | 79.4 | 94.0 |
paw-50 | 74.1 | 81.5 | 46.3 | 86.0 | 81.8 | 89.4 | 83.1 | 82.8 | 83.0 | 87.0 | 79.9 | 86.1 | 45.9 | 98.1 |
paw-70 | 64.0 | 77.0 | 45.0 | 78.3 | 74.7 | 89.8 | 75.8 | 74.6 | 79.0 | 77.5 | 76.8 | 82.1 | 86.2 | 98.5 |
Original | RUS | CNN | NCL | TL | ENN | OSS | EUS | EE | BC | RUSBOOST | SBC | ClusterOSS | DBMIST-US | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
yeast-0-5-6-7-9_vs_4 | 65.1 | 70.6 | 60.2 | 70.6 | 65.8 | 74.5 | 64.3 | 73.2 | 70.5 | 68.6 | 77.0 | 74.3 | 71.3 | 76.7 |
glass-0-1-6_vs_2 | 52.3 | 60.0 | 43.8 | 66.7 | 53.1 | 53.3 | 58.3 | 46.7 | 57.6 | 45.6 | 63.5 | 52.0 | 4.6 | 64.8 |
glass2 | 53.1 | 90.7 | 22.4 | 57.6 | 33.2 | 58.0 | 52.4 | 55.2 | 73.0 | 61.1 | 65.2 | 58.2 | 65.0 | 72.5 |
shuttle-c0-vs-c4 7 | 100.0 | 100.0 | 99.6 | 99.9 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.9 | 99.9 | 97.0 | 100.0 |
yeast-1_vs_7 | 54.3 | 64.5 | 57.2 | 57.2 | 65.2 | 62.6 | 54.2 | 71.6 | 58.3 | 81.5 | 67.4 | 62.8 | 68.4 | 69.7 |
glass4 | 82.2 | 73.0 | 73.0 | 82.1 | 82.2 | 86.6 | 79.3 | 88.4 | 92.3 | 92.3 | 89.1 | 72.4 | 9.0 | 92.3 |
ecoli4 | 82.9 | 87.5 | 92.3 | 80.1 | 82.9 | 80.1 | 86.0 | 76.5 | 87.5 | 92.5 | 86.3 | 77.1 | 93.5 | 82.8 |
page-blocks-1-3_vs_4 | 96.1 | 100.0 | 94.5 | 99.8 | 99.8 | 96.1 | 97.4 | 98.2 | 96.4 | 94.5 | 97.5 | 97.1 | 93.8 | 98.2 |
glass-0-1-6_vs_5 | 99.4 | 88.2 | 95.3 | 99.7 | 93.7 | 99.7 | 97.1 | 100.0 | 94.3 | 94.3 | 92.5 | 99.7 | 79.0 | 92.3 |
yeast-1-4-5-8_vs_7 | 0.0 | 59.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 66.6 | 60.6 | 44.7 | 53.1 | 25.8 | 64.3 | 55.0 |
glass5 | 98.8 | 100.0 | 85.3 | 99.7 | 99.5 | 99.7 | 98.5 | 94.3 | 94.3 | 94.3 | 93.9 | 100.0 | 49.5 | 95.8 |
yeast-2_vs_8 | 22.4 | 61.2 | 69.2 | 22.3 | 0.0 | 31.6 | 22.4 | 74.8 | 71.4 | 76.5 | 72.1 | 31.6 | 67.1 | 73.0 |
flare-F | 15.2 | 85.9 | 32.7 | 62.4 | 0.0 | 56.8 | 0.0 | 84.8 | 86.0 | 79.5 | 84.2 | 0.0 | 0.0 | 90.5 |
yeast4 | 53.9 | 78.3 | 0.0 | 60.7 | 57.4 | 65.4 | 53.9 | 83.3 | 80.3 | 84.1 | 78.2 | 68.4 | 80.9 | 79.3 |
yeast-1-2-8-9_vs_7 | 48.2 | 53.7 | 47.9 | 44.6 | 44.6 | 44.6 | 44.6 | 59.9 | 54.2 | 69.7 | 67.3 | 40.8 | 68.4 | 62.8 |
yeast5 | 86.3 | 96.6 | 77.8 | 96.4 | 89.0 | 93.9 | 88.9 | 87.4 | 94.1 | 89.8 | 94.2 | 88.9 | 90.2 | 96.6 |
ecoli-0-1-3-7_vs_2-6 | 84.4 | 71.4 | 70.4 | 84.5 | 84.4 | 84.5 | 83.5 | 70.0 | 78.2 | 78.2 | 74.5 | 84.4 | 72.4 | 94.3 |
abalone-17_vs_7-8-9-10 | 34.7 | 75.0 | 0.0 | 43.3 | 34.6 | 39.3 | 32.1 | 78.8 | 75.0 | 74.0 | 75.5 | 57.0 | 45.9 | 76.0 |
yeast6 | 73.3 | 72.8 | 70.2 | 77.2 | 73.3 | 73.5 | 67.3 | 82.6 | 71.4 | 88.6 | 81.9 | 71.4 | 80.4 | 84.6 |
poker-8-9_vs_5 | 0.0 | 47.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 44.0 | 40.0 | 42.3 | 44.6 | 0.0 | 52.5 | 59.6 |
subcl-0 | 97.8 | 92.3 | 77.9 | 97.8 | 97.8 | 98.3 | 97.2 | 95.5 | 93.9 | 88.6 | 92.3 | 96.6 | 87.3 | 94.4 |
subcl-30 | 66.7 | 75.9 | 74.0 | 71.4 | 70.2 | 87.5 | 68.5 | 75.3 | 83.2 | 79.7 | 83.5 | 87.3 | 89.4 | 85.2 |
subcl-50 | 26.4 | 78.0 | 31.3 | 74.8 | 53.4 | 84.8 | 68.2 | 74.7 | 80.8 | 82.0 | 80.9 | 83.2 | 83.7 | 82.6 |
subcl-70 | 0.0 | 76.5 | 0.0 | 32.6 | 24.4 | 72.6 | 14.1 | 77.8 | 80.7 | 73.0 | 79.9 | 80.1 | 83.5 | 86.7 |
clover-0 | 63.3 | 85.6 | 11.6 | 65.1 | 66.7 | 66.9 | 66.6 | 83.4 | 77.5 | 78.2 | 81.5 | 92.7 | 82.1 | 83.9 |
clover-30 | 26.4 | 74.6 | 59.2 | 42.9 | 45.1 | 71.5 | 28.2 | 68.9 | 79.1 | 79.6 | 81.7 | 84.0 | 52.7 | 97.4 |
clover-50 | 0.0 | 71.8 | 0.0 | 59.1 | 47.9 | 64.3 | 10.0 | 77.7 | 82.1 | 75.5 | 75.8 | 84.2 | 78.4 | 94.7 |
clover-70 | 0.0 | 68.2 | 27.2 | 22.3 | 0.0 | 67.7 | 0.0 | 71.4 | 80.1 | 62.6 | 75.3 | 82.2 | 80.4 | 97.4 |
paw-0 | 80.4 | 93.9 | 39.0 | 65.9 | 65.1 | 71.6 | 80.1 | 91.2 | 89.0 | 90.3 | 92.2 | 97.2 | 35.1 | 94.6 |
paw-30 | 19.9 | 83.4 | 60.1 | 79.7 | 72.5 | 87.5 | 60.7 | 79.2 | 83.2 | 82.8 | 82.3 | 87.6 | 76.0 | 91.1 |
paw-50 | 28.2 | 80.6 | 52.0 | 82.5 | 51.8 | 87.5 | 51.5 | 86.7 | 86.9 | 85.6 | 82.0 | 89.0 | 30.0 | 94.4 |
paw-70 | 0.0 | 78.4 | 0.0 | 63.9 | 32.9 | 88.4 | 34.2 | 77.3 | 82.1 | 78.2 | 82.0 | 90.8 | 76.8 | 92.8 |
Original | RUS | CNN | NCL | TL | ENN | OSS | EUS | EE | BC | RUSBOOST | SBC | ClusterOSS | DBMIST-US | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
yeast-0-5-6-7-9_vs_4 | 0.0 | 78.3 | 55.0 | 42.0 | 0.0 | 37.0 | 0.0 | 78.4 | 75.3 | 74.5 | 77.7 | 44.2 | 30.1 | 79.5 |
glass-0-1-6_vs_2 | 0.0 | 55.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 45.6 | 47.8 | 44.0 | 32.1 | 0.0 | 0.0 | 57.4 |
glass2 | 0.0 | 68.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 47.1 | 59.4 | 54.2 | 21.8 | 0.0 | 29.5 | 71.7 |
shuttle-c0-vs-c4 7 | 99.6 | 100.0 | 99.6 | 100.0 | 99.6 | 99.6 | 99.6 | 100.0 | 100.0 | 100.0 | 100.0 | 99.6 | 99.9 | 100.0 |
yeast-1_vs_7 | 0.0 | 81.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 83.3 | 74.1 | 74.8 | 64.5 | 0.0 | 12.7 | 76.6 |
glass4 | 39.2 | 68.8 | 40.7 | 39.2 | 39.2 | 39.2 | 39.2 | 84.3 | 84.3 | 84.3 | 87.6 | 39.2 | 0.0 | 92.0 |
ecoli4 | 63.2 | 92.5 | 92.3 | 74.2 | 63.2 | 74.2 | 70.7 | 97.5 | 97.5 | 100.0 | 94.9 | 74.2 | 99.0 | 100.0 |
page-blocks-1-3_vs_4 | 65.4 | 71.9 | 74.5 | 65.4 | 65.4 | 65.4 | 65.5 | 79.4 | 79.9 | 75.1 | 73.1 | 65.4 | 25.6 | 84.5 |
glass-0-1-6_vs_5 | 0.0 | 81.6 | 73.9 | 0.0 | 0.0 | 0.0 | 0.0 | 94.3 | 81.6 | 81.6 | 86.4 | 0.0 | 42.3 | 90.2 |
yeast-1-4-5-8_vs_7 | 0.0 | 60.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 58.9 | 67.8 | 49.4 | 47.3 | 0.0 | 60.5 | 62.0 |
glass5 | 0.0 | 88.2 | 75.2 | 0.0 | 0.0 | 0.0 | 0.0 | 88.2 | 81.6 | 81.6 | 81.8 | 0.0 | 43.1 | 91.2 |
yeast-2_vs_8 | 74.1 | 72.3 | 73.4 | 74.2 | 74.2 | 74.2 | 74.2 | 74.2 | 74.2 | 77.5 | 73.3 | 74.2 | 70.6 | 73.0 |
flare-F | 0.0 | 78.2 | 36.2 | 42.9 | 15.2 | 50.4 | 0.0 | 76.8 | 78.5 | 76.2 | 81.7 | 0.0 | 0.0 | 88.7 |
yeast4 | 0.0 | 82.4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 85.2 | 82.3 | 84.3 | 80.9 | 0.0 | 24.6 | 82.3 |
yeast-1-2-8-9_vs_7 | 0.0 | 52.4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 69.7 | 74.5 | 74.8 | 56.4 | 0.0 | 43.1 | 70.9 |
yeast5 | 0.0 | 98.9 | 70.9 | 33.7 | 0.0 | 39.9 | 30.2 | 92.9 | 94.1 | 94.1 | 95.8 | 0.0 | 92.6 | 96.3 |
ecoli-0-1-3-7_vs_2-6 | 84.1 | 78.2 | 78.7 | 84.2 | 84.0 | 84.2 | 83.5 | 92.6 | 84.5 | 84.5 | 84.1 | 84.0 | 82.8 | 92.6 |
abalone-17_vs_7-8-9-10 | 0.0 | 74.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 69.8 | 75.0 | 69.8 | 69.6 | 0.0 | 0.0 | 72.1 |
yeast6 | 0.0 | 78.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 88.4 | 81.3 | 87.1 | 87.7 | 0.0 | 86.1 | 86.3 |
poker-8-9_vs_5 | 0.0 | 51.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 44.9 | 37.9 | 35.8 | 30.1 | 0.0 | 2.5 | 62.1 |
subcl-0 | 0.0 | 51.3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 54.5 | 47.0 | 56.8 | 0.0 | 0.0 | 0.0 | 67.5 |
subcl-30 | 0.0 | 41.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 54.0 | 41.9 | 50.4 | 0.0 | 0.0 | 76.3 | 59.4 |
subcl-50 | 0.0 | 47.6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 50.3 | 48.5 | 52.6 | 0.0 | 0.0 | 77.3 | 62.3 |
subcl-70 | 0.0 | 52.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 53.2 | 43.0 | 54.8 | 0.0 | 0.0 | 67.6 | 50.3 |
clover-0 | 0.0 | 54.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 47.1 | 52.0 | 48.4 | 0.0 | 0.0 | 29.5 | 93.5 |
clover-30 | 0.0 | 55.8 | 27.2 | 0.0 | 0.0 | 0.0 | 0.0 | 50.5 | 58.9 | 58.6 | 0.0 | 0.0 | 2.2 | 54.0 |
clover-50 | 0.0 | 51.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 58.6 | 51.4 | 56.8 | 0.0 | 0.0 | 74.5 | 53.3 |
clover-70 | 0.0 | 57.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 57.4 | 57.4 | 41.4 | 0.0 | 0.0 | 80.0 | 61.2 |
paw-0 | 0.0 | 64.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 47.4 | 39.4 | 57.6 | 0.0 | 0.0 | 0.0 | 85.8 |
paw-30 | 0.0 | 57.5 | 49.6 | 0.0 | 0.0 | 0.0 | 0.0 | 59.8 | 43.0 | 45.0 | 0.0 | 0.0 | 32.4 | 50.6 |
paw-50 | 0.0 | 61.4 | 51.3 | 0.0 | 0.0 | 0.0 | 0.0 | 62.0 | 54.1 | 56.7 | 0.0 | 0.0 | 0.0 | 61.9 |
paw-70 | 0.0 | 49.4 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 | 55.9 | 55.3 | 49.6 | 0.0 | 0.0 | 77.0 | 27.1 |
References
- Sun, Y.; Kamel, M.S.; Wong, A.K.; Wang, Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit. 2007, 40, 3358–3378. [Google Scholar] [CrossRef]
- Codetta-Raiteri, D.; Portinale, L. Dynamic Bayesian networks for fault detection, identification, and recovery in autonomous spacecraft. IEEE Trans. Syst. Man Cybern. Syst. 2015, 45, 13–24. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhou, Z.H. Cost-sensitive face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1758–1769. [Google Scholar] [CrossRef] [PubMed]
- Liu, C.L.; Hsaio, W.H.; Lee, C.H.; Tao, C.; Kuo, T.H. Semi-supervised text classification with universum learning. IEEE Trans. Cybern. 2015, 46, 462–473. [Google Scholar] [CrossRef]
- Gopalakrishnan, V.; Ramaswamy, C. Sentiment learning from imbalanced dataset: An ensemble based method. Int. J. Artif. Intell. 2014, 12, 75–87. [Google Scholar]
- García, V.; Marqués, A.I.; Sánchez, J.S. Improving risk predictions by preprocessing imbalanced credit data. In Proceedings of the 19th International Conference on Neural Information Processing, Doha, Qatar, 12–15 November 2012; pp. 68–75. [Google Scholar]
- Fernández, A.; García, S.; Galar, M.; Prati, R.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Cham, Switzerland, 2018; Volume 1. [Google Scholar]
- Sáez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 2015, 291, 184–203. [Google Scholar] [CrossRef]
- García, V.; Alejo, R.; Sánchez, J.S.; Sotoca, J.M.; Mollineda, R.A. Combined effects of class imbalance and class overlap on instance-based classification. In Proceedings of the 6th International Conference on Intelligent Data Engineering and Automated Learning, Burgos, Spain, 20–23 September 2006; pp. 371–378. [Google Scholar]
- Gupta, S.; Gupta, A. Handling class overlapping to detect noisy instances in classification. Knowl. Eng. Rev. 2018, 33, e8. [Google Scholar] [CrossRef]
- Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans. Neural Netw. 2010, 21, 813–830. [Google Scholar] [CrossRef]
- Van Hulse, J.; Khoshgoftaar, T.M.; Napolitano, A. A novel noise filtering algorithm for imbalanced data. In Proceedings of the 9th International Conference on Machine Learning and Applications, Washington, DC, USA, 12–14 December 2010; pp. 9–14. [Google Scholar]
- Muhlenbach, F.; Lallich, S.; Zighed, D.A. Identifying and handling mislabelled instances. J. Intell. Inf. Syst. 2004, 22, 89–109. [Google Scholar] [CrossRef]
- Dong, X.; He, H.; Li, C.; Liu, Y.; Xiong, H. Scene-based big data quality management framework. In Data Science; Springer: Singapore, 2018; pp. 122–139. [Google Scholar]
- Barandela, R.; Sánchez, J.; García, V.; Rangel, E. Strategies for learning in class imbalance problems. Pattern Recognit. 2003, 36, 849–851. [Google Scholar] [CrossRef]
- Ofek, N.; Rokach, L.; Stern, R.; Shabtai, A. Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem. Neurocomputing 2017, 243, 88–102. [Google Scholar] [CrossRef]
- Yen, S.J.; Lee, Y.S. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 2009, 36, 5718–5727. [Google Scholar] [CrossRef]
- Napierała, K.; Stefanowski, J.; Wilk, S. Learning from imbalanced data in the presence of noisy and borderline examples. In Proceedings of the 7th International Conference on Rough Sets and Current Trends in Computing, Warsaw, Poland, 28–30 June 2010; pp. 158–167. [Google Scholar]
- Alejo, R.; Valdovinos, R.M.; García, V.; Pacheco-Sanchez, J.H. A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit. Lett. 2013, 34, 380–388. [Google Scholar] [CrossRef] [Green Version]
- García, V.; Sánchez, J.S.; Mollineda, R.A. An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In Proceedings of the 5th Iberoamerican Congress on Pattern Recognition, Valparaiso, Chile, 13–16 November 2007; pp. 397–406. [Google Scholar]
- Van-Hulse, J.; Khoshgoftaar, T. Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 2009, 68, 1513–1542. [Google Scholar] [CrossRef]
- Hart, P. The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 1968, 14, 515–516. [Google Scholar] [CrossRef]
- Tomek, I. Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 1976, SMC-6, 769–772. [Google Scholar] [CrossRef] [Green Version]
- Laurikkala, J. Improving identification of difficult small classes by balancing class distribution. In Proceedings of the 8th Conference on Artificial Intelligence in Medicine, Cascais, Portugal, 1–4 July 2001; pp. 63–66. [Google Scholar]
- Branco, P.; Torgo, L.; Ribeiro, R.P. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 2016, 49, 1–50. [Google Scholar] [CrossRef]
- Lin, W.C.; Tsai, C.F.; Hu, Y.H.; Jhang, J.S. Clustering-based undersampling in class-imbalanced data. Inf.Sci. 2017, 409–410, 17–26. [Google Scholar] [CrossRef]
- Drummond, C.; Holte, R.C. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the Workshop on Learning from Imbalanced Datasets II, Washington, DC, USA, 21 August 2003; Volume 11. [Google Scholar]
- García, V.; Sánchez, J.S.; Marqués, A.I.; Florencia, R.; Rivera, G. Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data. Expert Syst. Appl. 2019, 1–19. [Google Scholar] [CrossRef]
- Basgall, M.J.; Hasperué, W.; Naiouf, M.; Fernández, A.; Herrera, F. An analysis of local and global solutions to address big data imbalanced classification: A case study with SMOTE Preprocessing. In Cloud Computing and Big Data; Springer International Publishing: Cham, Switzerland, 2019; pp. 75–85. [Google Scholar]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; AAAI Press: Portland, OR, USA, 1996; pp. 226–231. [Google Scholar]
- Ijaz, M.F.; Attique, M.; Son, Y. Data-driven cervical cancer prediction model with outlier detection and over-sampling methods. Sensors 2020, 20, 2809. [Google Scholar] [CrossRef]
- García, S.; Derrac, J.; Cano, J.; Herrera, F. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 417–435. [Google Scholar] [CrossRef] [PubMed]
- Wilson, D.L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 1972, SMC-2, 408–421. [Google Scholar] [CrossRef] [Green Version]
- Tomek, I. An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 1976, SMC-6, 448–452. [Google Scholar] [CrossRef]
- Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997; pp. 179–186. [Google Scholar]
- Longadge, R.; Dongre, S.S.; Malik, L. Multi-cluster based approach for skewed data in data mining. IOSR J. Comput. Eng. 2013, 12, 66–73. [Google Scholar] [CrossRef]
- Barella, V.H.; Costa, E.P.; Carvalho, A.C.P.L.F. ClusterOSS: A new undersampling method for imbalanced learning. In Proceedings of the 3rd Brazilian Conference on Intelligent Systems, São Carlos, Brazil, 18–23 October 2014; pp. 453–458. [Google Scholar]
- Sowah, R.A.; Agebure, M.A.; Mills, G.A.; Koumadi, K.M.; Fiawoo, S.Y. New cluster undersampling technique for class imbalance learning. Int. J. Mach. Learn. Comput. 2016, 6, 205. [Google Scholar] [CrossRef] [Green Version]
- Das, B.; Krishnan, N.C.; Cook, D.J. Handling imbalanced and overlapping classes in smart environments prompting dataset. In Data Mining for Service; Springer: Berlin/Heidelberg, Germany, 2014; pp. 199–219. [Google Scholar]
- Tsai, C.F.; Lin, W.C.; Hu, Y.H.; Yao, G.T. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 2019, 477, 47–54. [Google Scholar] [CrossRef]
- Ng, W.W.Y.; Hu, J.; Yeung, D.S.; Yin, S.; Roli, F. Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans. Cybern. 2015, 45, 2402–2412. [Google Scholar] [CrossRef] [PubMed]
- Sun, Z.; Song, Q.; Zhu, X.; Sun, H.; Xu, B.; Zhou, Y. A novel ensemble method for classifying imbalanced data. Pattern Recognit. 2015, 48, 1623–1637. [Google Scholar] [CrossRef]
- Kim, H.J.; Jo, N.O.; Shin, K.S. Optimization of cluster-based evolutionary undersampling for the artificial neural networks in corporate bankruptcy prediction. Expert Syst. Appl. 2016, 59, 226–234. [Google Scholar] [CrossRef]
- Smiti, A.; Elouedi, Z. DBSCAN-GM: An improved clustering method based on Gaussian means and DBSCAN techniques. In Proceedings of the IEEE 16th International Conference on Intelligent Engineering Systems, Lisbon, Portugal, 13–15 June 2012; pp. 573–578. [Google Scholar]
- Prim, R.C. Shortest connection networks and some generalizations. Bell Syst. Tech. J. 1957, 36, 1389–1401. [Google Scholar] [CrossRef]
- Torres, M.; Paz, K.; Salazar, F. Tamaño de una muestra para una investigación de mercado. Boletín Electrónico 2006, 2, 1–13. [Google Scholar]
- Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Suthar, N.; Indr, P.; Vinit, P. A technical survey on DBSCAN clustering algorithm. Int. J. Sci. Eng. Res. 2013, 4, 1775–1781. [Google Scholar]
- Chen, L.; Fang, B.; Shang, Z.; Tang, Y. Tackling class overlap and imbalance problems in software defect prediction. Software Qual. J. 2018, 26, 97–125. [Google Scholar] [CrossRef]
- Koziarski, M.; Wożniak, M. CCR: A combined cleaning and resampling algorithm for imbalanced data classification. Int. J. Appl. Math. Comput. Sci. 2017, 27, 727–736. [Google Scholar] [CrossRef] [Green Version]
- Xiao, L.; Gao, M.; Su, X. An under-sampling ensemble classification algorithm based on fuzzy C-means clustering for imbalanced data. Data Anal. Knowl. Discov. 2019, 3, 90–96. [Google Scholar] [CrossRef]
- Liang, J.; Bai, L.; Dang, C.; Cao, F. The K-means-type algorithms versus imbalanced data distributions. IEEE Trans. Fuzzy Syst. 2012, 20, 728–745. [Google Scholar] [CrossRef]
- García, V.; Sánchez, J.S.; Martín-Félez, R.; Mollineda, R.A. Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Progr. Artif. Intell. 2012, 1, 347–362. [Google Scholar] [CrossRef] [Green Version]
- Sanguanmak, Y.; Hanskunatai, A. DBSM: The combination of DBSCAN and SMOTE for imbalanced data classification. In Proceedings of the 13th International Joint Conference on Computer Science and Software Engineering, Khon Kaen, Thailand, 13–15 July 2016; pp. 1–5. [Google Scholar]
- Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. DBSMOTE: Density-based synthetic minority over-sampling technique. Appl. Intell. 2012, 36, 664–684. [Google Scholar] [CrossRef]
- Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. MUTE: Majority under-sampling technique. In Proceedings of the 8th International Conference on Information, Communications & Signal Processing, Singapore, 13–16 December 2011; pp. 1–4. [Google Scholar]
- Bunkhumpornpat, C.; Sinapiromsaran, K. DBMUTE: Density-based majority under-sampling technique. Knowl. Inf. Syst. 2017, 50, 827–850. [Google Scholar] [CrossRef]
- García, S.; Herrera, F. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 2009, 17, 275–306. [Google Scholar] [CrossRef]
- Liu, X.; Wu, J.; Zhou, Z. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2009, 39, 539–550. [Google Scholar] [CrossRef]
- Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
- Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Cambridge, MA, USA, 2017. [Google Scholar]
- García, V.; Marqués, A.I.; Sánchez, J.S. Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion 2019, 47, 88–101. [Google Scholar] [CrossRef]
- Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 463–484. [Google Scholar] [CrossRef]
- García, V.; Mollineda, R.A.; Sánchez, J.S. A bias correction function for classification performance assessment in two-class imbalanced problems. Knowl. Based Syst. 2014, 59, 66–74. [Google Scholar] [CrossRef]
Data Set | Class Distribution | #Instances | IR | |
---|---|---|---|---|
1 | yeast-0-5-6-7-9_vs_4 | 51–477 | 528 | 9.35 |
2 | glass-0-1-6_vs_2 | 17–175 | 192 | 10.29 |
3 | glass2 | 17–197 | 214 | 11.59 |
4 | shuttle-c0-vs-c4 | 123–1706 | 1829 | 13.87 |
5 | yeast-1_vs_7 | 30–429 | 459 | 14.30 |
6 | glass4 | 13–201 | 214 | 15.47 |
7 | ecoli4 | 20–316 | 336 | 15.80 |
8 | page-blocks-1-3_vs_4 | 28–444 | 472 | 15.86 |
9 | glass-0-1-6_vs_5 | 9–175 | 184 | 19.44 |
10 | yeast-1-4-5-8_vs_7 | 30–663 | 693 | 22.10 |
11 | glass5 | 9–205 | 214 | 22.78 |
12 | yeast-2_vs_8 | 20–462 | 482 | 23.10 |
13 | flare-F | 43–1023 | 1066 | 23.79 |
14 | yeast4 | 51–1433 | 1484 | 28.10 |
15 | yeast-1-2-8-9_vs_7 | 30–917 | 947 | 30.57 |
16 | yeast5 | 44–1440 | 1484 | 32.73 |
17 | ecoli-0-1-3-7_vs_2-6 | 7–274 | 281 | 39.14 |
18 | abalone-17_vs_7-8-9-10 | 58–2280 | 2338 | 39.31 |
19 | yeast6 | 35–1449 | 1484 | 41.40 |
20 | poker-8-9_vs_5 | 25–2050 | 2075 | 82.00 |
Method | Description |
---|---|
Random under-sampling (RUS) | It balances the data set through the random elimination of instances that belong to the over-sized class. |
Evolutionary under-sampling (EUS) | It consists of removing instances in the majority class by using the guide of a genetic-based algorithm [58]. |
Easy ensemble (EE) | The data set is divided into several subsets by random resampling with replacement, and each subset is then used to train a base classifier of the ensemble with AdaBoost [59]. |
Balance cascade (BC) | It performs bagging and removes negative instances that can be classified correctly with high confidence from future selections [59]. |
Random under-sampling boosting (RUSBOOST) | It combines RUS with the AdaBoost algorithm [60]. |
Predicted as Positive | Predicted as Negative | |
---|---|---|
Actually positive | TP | FN |
Actually negative | FP | TN |
Original | RUS | CNN | NCL | TL | ENN | OSS | EUS | EE | BC | RUSBOOST | SBC | ClusterOSS | DBMIST-US | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
yeast-0-5-6-7-9_vs_4 | 9.4 | 1.0 | 1.7 | 8.0 | 9.1 | 8.2 | 7.3 | 1.0 | 1.0 | 1.0 | 1.6 | 7.5 | 0.4 | 1.0 |
glass-0-1-6_vs_2 | 19.3 | 1.0 | 2.2 | 8.7 | 10.1 | 8.4 | 9.6 | 1.0 | 1.0 | 1.0 | 1.0 | 8.9 | 4.2 | 1.1 |
glass2 | 11.6 | 1.0 | 2.0 | 10.1 | 11.4 | 9.8 | 10.5 | 1.0 | 1.0 | 1.0 | 1.5 | 9.9 | 1.8 | 1.1 |
shuttle-c0-vs-c4 7 | 13.9 | 1.0 | 0.1 | 13.8 | 13.9 | 13.8 | 0.3 | 1.0 | 1.0 | 1.0 | 1.5 | 13.8 | 0.4 | 1.2 |
yeast-1_vs_7 | 14.3 | 1.0 | 2.8 | 12.1 | 14.0 | 12.5 | 12.1 | 1.0 | 1.0 | 1.0 | 1.5 | 11.4 | 2.0 | 1.0 |
glass4 | 15.5 | 1.0 | 1.0 | 14.5 | 15.5 | 14.9 | 4.2 | 1.0 | 1.0 | 1.0 | 1.5 | 15.6 | 5.0 | 1.1 |
ecoli4 | 15.8 | 1.0 | 1.0 | 15.2 | 15.8 | 15.4 | 10.3 | 1.0 | 1.0 | 1.0 | 1.5 | 14.9 | 0.9 | 1.1 |
page-blocks-1-3_vs_4 | 15.9 | 1.0 | 1.0 | 15.3 | 15.8 | 15.2 | 4.5 | 1.0 | 1.0 | 1.0 | 1.5 | 14.7 | 3.0 | 1.0 |
glass-0-1-6_vs_5 | 19.4 | 1.0 | 1.2 | 18.3 | 19.3 | 18.7 | 5.9 | 1.0 | 1.0 | 1.0 | 1.4 | 18.7 | 2.0 | 1.1 |
yeast-1-4-5-8_vs_7 | 22.1 | 1.0 | 4.1 | 19.6 | 21.8 | 19.8 | 21.8 | 1.0 | 1.0 | 1.0 | 1.5 | 16.9 | 2.3 | 1.0 |
glass5 | 22.8 | 1.0 | 1.2 | 21.6 | 22.7 | 22.0 | 7.6 | 1.0 | 1.0 | 1.0 | 1.4 | 22.0 | 3.8 | 1.1 |
yeast-2_vs_8 | 23.1 | 1.0 | 2.4 | 21.9 | 22.9 | 22.0 | 21.2 | 1.0 | 1.0 | 1.0 | 1.5 | 21.0 | 0.8 | 1.0 |
flare-F | 23.8 | 1.0 | 3.1 | 22.0 | 23.7 | 21.8 | 23.7 | 1.0 | 1.0 | 1.0 | 1.5 | 23.2 | 0.1 | 1.1 |
yeast4 | 28.1 | 1.0 | 2.5 | 26.4 | 27.8 | 26.6 | 24.7 | 1.0 | 1.0 | 1.0 | 1.5 | 22.1 | 0.3 | 1.1 |
yeast-1-2-8-9_vs_7 | 30.6 | 1.0 | 4.3 | 28.3 | 30.2 | 28.4 | 30.1 | 1.0 | 1.0 | 1.0 | 1.5 | 29.6 | 0.4 | 1.0 |
yeast5 | 32.7 | 1.0 | 1.1 | 32.0 | 32.6 | 32.0 | 25.0 | 1.0 | 1.0 | 1.0 | 1.5 | 32.0 | 0.3 | 1.1 |
ecoli-0-1-3-7_vs_2-6 | 39.1 | 1.0 | 2.1 | 38.1 | 38.9 | 38.0 | 17.3 | 1.0 | 1.0 | 1.0 | 1.4 | 37.6 | 2.2 | 1.3 |
abalone-17_vs_7-8-9-10 | 39.3 | 1.0 | 2.5 | 37.5 | 39.1 | 38.2 | 37.3 | 1.1 | 1.0 | 1.0 | 1.5 | 31.0 | 0.3 | 1.1 |
yeast6 | 41.4 | 1.0 | 2.9 | 40.0 | 41.1 | 39.9 | 35.5 | 1.0 | 1.0 | 1.0 | 1.5 | 40.2 | 0.4 | 1.1 |
poker-8-9_vs_5 | 82.0 | 1.0 | 5.1 | 79.4 | 81.7 | 80.5 | 80.6 | 1.0 | 1.0 | 1.0 | 1.5 | 69.2 | 0.7 | 1.1 |
subcl-0 | 7.0 | 1.0 | 1.3 | 6.6 | 6.9 | 6.5 | 6.3 | 1.0 | 1.0 | 1.0 | 1.5 | 5.7 | 0.5 | 1.0 |
subcl-30 | 7.0 | 1.0 | 1.3 | 6.4 | 6.8 | 5.9 | 6.7 | 1.0 | 1.0 | 1.0 | 1.5 | 4.7 | 0.6 | 0.9 |
subcl-50 | 7.0 | 1.0 | 1.4 | 6.3 | 6.6 | 5.7 | 6.5 | 1.0 | 1.0 | 1.0 | 1.5 | 4.6 | 1.5 | 1.0 |
subcl-70 | 7.0 | 1.0 | 1.4 | 6.2 | 6.6 | 5.6 | 6.6 | 1.0 | 1.0 | 1.0 | 1.5 | 4.6 | 0.5 | 1.1 |
clover-0 | 7.0 | 1.0 | 1.4 | 6.6 | 6.9 | 6.6 | 6.5 | 1.0 | 1.0 | 1.0 | 1.5 | 5.5 | 1.4 | 0.9 |
clover-30 | 7.0 | 1.0 | 1.1 | 6.5 | 6.8 | 6.2 | 6.5 | 1.0 | 1.0 | 1.0 | 1.5 | 5.1 | 0.7 | 1.2 |
clover-50 | 7.0 | 1.0 | 1.3 | 6.3 | 6.7 | 5.8 | 6.2 | 1.0 | 1.0 | 1.0 | 1.5 | 5.1 | 0.4 | 1.0 |
clover-70 | 7.0 | 1.0 | 1.3 | 6.2 | 6.6 | 5.6 | 6.0 | 1.0 | 1.0 | 1.0 | 1.5 | 4.7 | 1.2 | 1.1 |
paw-0 | 7.0 | 1.0 | 2.2 | 6.8 | 7.0 | 6.7 | 7.0 | 1.0 | 1.0 | 1.0 | 1.5 | 6.3 | 0.1 | 0.9 |
paw-30 | 7.0 | 1.0 | 1.1 | 6.6 | 6.8 | 6.2 | 6.3 | 1.0 | 1.0 | 1.0 | 1.5 | 5.2 | 0.7 | 0.8 |
paw-50 | 7.0 | 1.0 | 1.0 | 6.5 | 6.7 | 6.0 | 6.1 | 1.0 | 1.0 | 1.0 | 1.5 | 5.4 | 0.2 | 1.0 |
paw-70 | 7.0 | 1.0 | 1.1 | 6.3 | 6.6 | 5.8 | 5.8 | 1.0 | 1.0 | 1.0 | 1.5 | 5.2 | 1.2 | 1.1 |
% Avg. reduction | 91.55 | 86.98 | 8.55 | 3.28 | 10.18 | 23.24 | 91.51 | 91.55 | 91.55 | 87.34 | 17.33 | 90.26 | 91.29 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guzmán-Ponce, A.; Valdovinos, R.M.; Sánchez, J.S.; Marcial-Romero, J. A New Under-Sampling Method to Face Class Overlap and Imbalance. Appl. Sci. 2020, 10, 5164. https://doi.org/10.3390/app10155164
Guzmán-Ponce A, Valdovinos RM, Sánchez JS, Marcial-Romero J. A New Under-Sampling Method to Face Class Overlap and Imbalance. Applied Sciences. 2020; 10(15):5164. https://doi.org/10.3390/app10155164
Chicago/Turabian StyleGuzmán-Ponce, Angélica, Rosa María Valdovinos, José Salvador Sánchez, and José Raymundo Marcial-Romero. 2020. "A New Under-Sampling Method to Face Class Overlap and Imbalance" Applied Sciences 10, no. 15: 5164. https://doi.org/10.3390/app10155164
APA StyleGuzmán-Ponce, A., Valdovinos, R. M., Sánchez, J. S., & Marcial-Romero, J. (2020). A New Under-Sampling Method to Face Class Overlap and Imbalance. Applied Sciences, 10(15), 5164. https://doi.org/10.3390/app10155164