Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
Abstract
:1. Introduction
- 1)
- An effective novel oversampling approach called EMDO is proposed for multi-class imbalanced data problems. Different from the MDO and AMDO approaches that use only one ellipsoid, EMDO learns multiple ellipsoids in parallel to approximate the decision region of the target minority class samples.
- 2)
- MOPSO is utilized along with GKA in EMDO to optimize the parameters, including the centers, orientations, and sizes of multiple ellipsoids approximating the target class of decision regions with reasonable accuracy.
- 3)
- Synthetic minority samples are generated based on the Mahalanobis distance within every ellipsoid learned by EMDO. A novel adaptive approach is proposed to determine the number of synthetic minority samples to be generated based on the density of minority samples in every ellipsoid.
- 4)
- EMDO was evaluated and found to perform better than other widely used oversampling schemes.
2. Problem Statement and GKA
3. Multi-Objective Optimization in EMDO
4. Determining Number of Ellipsoids
5. Generating Synthetic Samples
6. Simulation
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Li, Y.; Zhong, X.; Ma, Z.; Liu, H. The outlier and integrity detection of rail profile based on profile registration. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1074–1085. [Google Scholar] [CrossRef]
- Kang, S.; Sristi, S.; Karachiwala, J.; Hu, Y.-C. Detection of anomaly in train speed for intelligent railway systems. In Proceedings of the 2018 International Conference on Control, Automation and Diagnosis (ICCAD), Marrakech, Morocco, 19–21 March 2018; pp. 1–6. [Google Scholar]
- Wang, H. Unsupervised anomaly detection in railway catenary condition monitor4ing using auto-encoders. In Proceedings of the IECON 2020 the 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 2636–2641. [Google Scholar]
- Qian, G.; Lu, S.; Pan, D.; Tang, H.; Liu, Y.; Wang, Q. Edge computing: A promising framework for real-time fault diagnosis and dynamic control of rotating machines using multi-sensor data. IEEE Sensors J. 2019, 19, 4211–4220. [Google Scholar] [CrossRef]
- Maruthi, G.S.; Hegde, V. Application of MEMS accelerometer for detection and diagnosis of multiple faults in roller element bearings of three phase induction motor. IEEE Sensors J. 2016, 16, 145–152. [Google Scholar] [CrossRef]
- Tong, Z.Y.; Dong, Z.Y.; Li, M. A new entropy bi-cepstrum based method for DC motor brush abnormality recognition. IEEE Sensors J. 2017, 17, 745–754. [Google Scholar] [CrossRef]
- Kim, E.; Cho, S.; Lee, B.; Cho, M. Fault detection and diagnosis using self-attentive convolutional neural networks for variable-length sensor data in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 2019, 32, 302–309. [Google Scholar] [CrossRef]
- Azamfar, M.; Li, X.; Lee, J. Deep learning-based domain adaptation method for fault diagnosis in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 2020, 33, 445–453. [Google Scholar] [CrossRef]
- Ghosh, A.; Qin, S.; Lee, J.; Wang, G. FBMTP: An automated fault and behavioral anomaly detection and isolation tool for PLC-controlled manufacturing systems. IEEE Trans. Syst. Man Cyber. Syst. 2017, 47, 3397–3417. [Google Scholar] [CrossRef]
- Quang, X.; Huo, H.; Xia, L.; Shan, F.; Liu, J.; Mo, Z.; Yan, F.; Ding, Z.; Yang, Q.; Song, B.; et al. Dual-sampling attention network for diagnosis of COVID-19 from community acquired pneumonia. IEEE Trans. Med. Imaging 2020, 39, 2595–2605. [Google Scholar]
- Liu, N.; Li, E.; Qi, M.; Xu, L.; Gao, B. A novel ensemble learning paradigm for medical diagnosis with imbalanced data. IEEE Access 2020, 8, 171263–171280. [Google Scholar] [CrossRef]
- Huda, S.; Yearwood, J.; Jelinek, H.F.; Hassan, M.M.; Fortino, G.; Buckland, M. A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for brain tumor diagnosis. IEEE Access 2016, 4, 9145–9154. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Batista, G.E.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef] [Green Version]
- Guo, H.-X.; Li, Y.-J.; Shang, J.; Gu, M.-Y.; Huang, Y.-Y. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar]
- Wu, G.; Chang, E.Y. KBA: Kernel boundary alignment considering imbalanced data classification. IEEE Trans. Knowl. Data Eng. 2005, 17, 786–795. [Google Scholar] [CrossRef] [Green Version]
- Ohsaki, M.; Wang, P.; Matsuda, K.; Katagiri, S.; Watanabe, H.; Ralescu, A. Confusion-matrix-based kernel logistic regression for imbalanced data classification. IEEE Trans. Knowl. Data Eng. 2017, 29, 1806–1819. [Google Scholar] [CrossRef]
- Manevitz, L.M.; Yousef, M. One-class SVMs for document classification. J. Mach. Learn. Res. 2002, 2, 139–154. [Google Scholar]
- Raskutti, B.; Kowalczyk, A. Extreme rebalancing for SVMs:a case study. ACM SIGKDD Explor. Newsl. 2004, 6, 60–69. [Google Scholar] [CrossRef]
- Khan, S.H.; Hayat, M.; Bennamoun, M.; Sohel, F.A.; Togneri, R. Cost-sensitive learning of deep feature representations form imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3573–3587. [Google Scholar] [PubMed] [Green Version]
- Huang, C.; Li, Y.; Loy, C.C.; Tang, X. Learning deep representation for imbalanced classification. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5375–5384. [Google Scholar]
- Ng, W.W.Y.; Hu, J.; Yeung, D.S.; Yin, S.; Roli, F. Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans. Cybern. 2015, 45, 2402–2412. [Google Scholar] [CrossRef] [PubMed]
- Tang, Y.; Zhang, Y.Q.; Chawla, N.V.; Krasser, S. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B 2009, 39, 281–288. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B 2009, 39, 539–550. [Google Scholar]
- Kang, Q.; Shi, L.; Zhou, M.; Wang, X.; Wu, Q.; Wei, Z. A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 4152–4165. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 26, 321–357. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks, Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Chawla, N.V.; Lazarevic, A.; Hall, L.O.; Bowyer, K.W. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the Knowledge Discovery in Databases: PKDD (Lecture Notes in Computer Science), Cavtat-Dubrovnik, Croatia, 22–26 September 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 107–119. [Google Scholar]
- Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; pp. 878–887. [Google Scholar]
- Xie, Z.; Jiang, L.; Ye, T.; Li, X. A synthetic minority oversampling method based on local densities in low-dimensional space for imbalanced learning. International Conference on Database Systems for Advanced Applications, Taipei, Taiwan, 20–23 April 2015; pp. 3–18. [Google Scholar]
- Das, B.; Krishnan, N.C.; Cook, D.J. RACOG and wRACOG: Two probabilistic oversampling techniques. IEEE Trans. Knowl. Data Eng. 2015, 27, 222–234. [Google Scholar] [CrossRef] [Green Version]
- Pérez-Ortiz, M.; Gutiérrez, P.A.; Hervás-Martínez, C.; Yao, X. Graph-based approaches for over-sampling in the context of ordinal regression. IEEE Trans. Knowl. Data Eng. 2015, 27, 1233–1245. [Google Scholar] [CrossRef]
- Schapire, R.E. The boosting approach to machine learning: An overview. In Nonlinear Estimation Classification; Springer: New York, NY, USA, 2003; pp. 149–171. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Polikar, R. Ensemble learning. In Ensemble Machine Learning; Springer: Berlin, Germany, 2012; pp. 1–34. [Google Scholar]
- Moniz, N.; Ribeiro, R.P.; Cerqueira, V.; Chawla, N. SMOTEBoost for regression: Improving the prediction of extreme values. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; pp. 150–159. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Guo, H.; Viktor, H.L. Learning form imbalanced data sets with boosting and data generation: The Databoost-IM approach. ACM SIGKDD Explor. Newsl. 2004, 6, 30–39. [Google Scholar] [CrossRef]
- Khoshgoftaar, T.M.; Hulse, J.V.; Napolitano, A. Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A 2011, 41, 552–568. [Google Scholar] [CrossRef]
- Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalanced problem: Bagging, boosting, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C 2012, 42, 473–484. [Google Scholar] [CrossRef]
- Abdi, L.; Hashemi, S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 2016, 28, 238–251. [Google Scholar] [CrossRef]
- Yang, X.; Kuang, Q.; Zhang, W.; Zhang, G. AMDO: An over-sampling technique for multi-class imbalanced problems. IEEE Trans. Knowl. Data Eng. 2018, 30, 1672–1685. [Google Scholar] [CrossRef]
- Gustafson, D.E.; Kessel, W.C. Fuzzy clustering with a fuzzy covariance matrix. In Proceedings of the 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, San Diego, CA, USA, 10–12 January 1979; pp. 761–766. [Google Scholar]
- Bezdek, J. Pattern Recognition with Fuzzy Objective Function; Plenum Press: New York, NY, USA, 1981. [Google Scholar]
- Yao, L.; Weng, K.-S. Imputation of incomplete data using adaptive ellipsoids with liner regression. J. Intell. Fuzzy Syst. 2015, 29, 253–265. [Google Scholar] [CrossRef]
- Yao, L.; Weng, K.-S.; Wu, M.S. Evolutionary learning of classifiers for disc discrimination. IEEE/ASME Trans. Mechatron. 2015, 20, 3194–3203. [Google Scholar] [CrossRef]
- Reyes-Sierra, M.; Coello, C.A.C. Multi-objective particle swarm optimizers: A survey of the state-of-the art. Int. J. Comput. Intell. Res. 2006, 2, 287–308. [Google Scholar]
- Hu, W.; Yen, G.G. Adaptive multi-objective particle swarm optimization based on parallel cell coordinate system. IEEE Trans. Evo. Comp. 2015, 19, 1–18. [Google Scholar]
- Chen, V.C.P.; Ruppert, D.; Shoemaker, C.A. Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 1999, 47, 38–53. [Google Scholar] [CrossRef]
- Liu, F.; Ju, X.; Wang, N.; Wang, L.; Lee, W.-J. Wind farm macro-siting optimization with insightful bi-criteria identification and relocation mechanism in genetic algorithm. Energy Convers. Manag. 2020, 217, 112964. [Google Scholar] [CrossRef]
- Ahmed, W.; Hanif, A.; Kallu, K.D.; Kouzani, A.Z.; Ali, M.U.; Zafar, A. Photovoltaic panels classification using isolated and transfer learned deep neural models using infrared thermographic images. Sensors 2021, 21, 5668. [Google Scholar] [CrossRef]
- Knowles, J.D.; Corne, D.W. Approximating the Nondominated Front Using the Pareto Archived Evolution Strateg. Evol. Comput. 2000, 8, 149–172. [Google Scholar] [CrossRef] [PubMed]
- Bradley, A.P. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recog. 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
- Tang, K.; Wang, R.; Chen, T. Towards maximizing the area under the ROC curve for multi-class classification problems. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011; pp. 483–488. [Google Scholar]
- Ferri, C.; Hernandez-Navarro, J.; Modroiu, R. An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 2009, 30, 27–38. [Google Scholar] [CrossRef]
- Loyola-Gonzalez, O.; Martınez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Garcıa-Borroto, M. Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 2016, 175, 935–947. [Google Scholar] [CrossRef]
- Alcalá, J.; Fernández, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Logic Soft. Comput. 2010, 17, 255–287. [Google Scholar]
- Frank, A.; Asuncion, A. UCI machine learning repository. 2010. Available online: http://archive.ics.uci.edu/ml (accessed on 5 March 2020).
- Fernández-Navarro, F.; Hervás-Martínez, C.; Gutiérrez, P.A. A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recog. 2011, 44, 1821–1833. [Google Scholar] [CrossRef]
- Zhou, Z.-H.; Liu, X.-Y. On multi-class cost-sensitive learning. Comput. Intell. 2010, 26, 232–257. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.; Yao, X. Multi-class imbalance problems: Analysis and potential solutions. IEEE Trans. Syst. Man Cybern. B 2012, 42, 1119–1130. [Google Scholar] [CrossRef] [PubMed]
- Fernández, A.; López, V.; Galar, M.; del Jesus, M.J.; Herrera, F. Analysing the classification of imbalacned data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 2013, 42, 97–110. [Google Scholar] [CrossRef]
- Malfada. 13 December 2017. Available online: https://github.com/sisinflab-swot/mafalda (accessed on 3 September 2021).
Data Set | Size | Attributes | Classes | Class Distribution | IRmax | No. of Ellipsoids |
---|---|---|---|---|---|---|
Balance | 625 | 4 | 3 | 288/49/288 | 5.88 | NA/4/NA |
Hayes-Roth | 132 | 4 | 3 | 51/51/30 | 1.7 | NA/NA/3 |
New-Thyroid | 215 | 5 | 3 | 150/35/30 | 5 | NA/3/3 |
Page-Blocks | 5472 | 10 | 5 | 4913/329/28/87/115 | 175.46 | NA/9/4/6/9 |
Dermatology | 358 | 34 | 6 | 111/60/71/48/48/20 | 5.55 | NA/4/4/3/3/3 |
Breast-Tissue | 106 | 9 | 6 | 21/15/18/16/14/22 | 1.57 | NA/NA/NA/ NA/4/NA |
User-Knowledge-Modelling (UKM) | 403 | 5 | 5 | 50/102/129/122 | 2.58 | 4/NA/NA/NA |
Vertebral-Column | 310 | 6 | 3 | 60/150/100 | 2.5 | 6/NA/NA |
Ecoli | 327 | 7 | 5 | 143/77/52/35/20 | 7.15 | NA/6/4/3/3 |
Data Set | Baseline | SSMOTE | GCS | ABNC | OSMOTE | MDO | MDO+ | AMDO | EMDO |
---|---|---|---|---|---|---|---|---|---|
Balance | 0.000.00 | 8.449.18 | 6.449.83 | 2.224.97 | 12.4413.41 | 2.004.47 | 0.000.00 | 10.220.50 | 20.4110.95 |
Hayes-Roth | 100.000.00 | 100.000.00 | 100.000.00 | 100.000.00 | 100.000.00 | 100.000.00 | 100.000.00 | 100.000.00 | 100.000.00 |
New-Thyroid | 83.3311.79 | 90.0014.91 | 93.339.13 | 93.339.13 | 90.009.13 | 83.3316.67 | 96.677.45 | 100.000.00 | 100.000.00 |
Page-Blocks | 82.6711.88 | 78.676.91 | 93.3314.91 | 72.6724.99 | 93.339.13 | 75.3316.26 | 93.339.13 | 96.677.45 | 96.675.33 |
Dermatology | 95.0011.18 | 95.0011.18 | 90.0013.69 | 100.000.00 | 90.0013.69 | 95.0011.18 | 95.0011.18 | 100.000.00 | 100.000.00 |
Breast-Tissue | 60.0027.89 | 40.0036.51 | 46.6729.81 | 60.0027.89 | 53.3329.81 | 60.0027.89 | 53.3329.81 | 60.0027.89 | 73.3613.32 |
UKM | 88.0013.04 | 92.0013.04 | 90.0010.00 | 94.008.94 | 88.0016.43 | 86.0013.42 | 86.0013.42 | 94.008.94 | 96.005.76 |
Vertebral-Column | 65.0016.03 | 65.0019.00 | 60.0019.90 | 61.6718.26 | 66.675.89 | 68.3322.36 | 66.6728.87 | 86.679.50 | 90.676.02 |
Ecoli | 65.0037.91 | 70.0027.39 | 55.0032.60 | 70.0027.39 | 55.0032.60 | 75.0030.62 | 90.0013.69 | 90.0013.69 | 90.0012.40 |
Average | 71.00 | 71.01 | 70.53 | 72.65 | 72.09 | 71.67 | 75.67 | 81.95 | 85.23 |
Rank Avg. | 6.33 | 5.89 | 6.39 | 5.11 | 5.78 | 5.89 | 5.28 | 2.56 | 1.78 |
Data Set | Baseline | SSMOTE | GCS | ABNC | OSMOTE | MDO | MDO+ | AMDO | EMDO |
---|---|---|---|---|---|---|---|---|---|
Balance | 56.381.75 | 58.382.94 | 56.654.03 | 62.333.29 | 57.862.78 | 57.282.92 | 55.451.66 | 60.372.06 | 64.383.59 |
Hayes-Roth | 84.916.71 | 84.675.06 | 85.335.58 | 83.527.08 | 84.976.74 | 84.916.71 | 84.976.74 | 84.976.74 | 85.612.39 |
New-Thyroid | 88.866.02 | 91.083.02 | 93.144.55 | 93.591.45 | 92.546.61 | 89.816.84 | 94.983.59 | 96.542.99 | 96.602.05 |
Page-Blocks | 84.302.14 | 84.571.88 | 88.774.49 | 79.705.57 | 89.612.98 | 81.241.79 | 86.132.48 | 88.771.92 | 90.151.87 |
Dermatology | 95.672.05 | 95.731.83 | 93.502.71 | 97.100.75 | 95.312.45 | 95.672.05 | 96.061.62 | 96.880.26 | 97.130.20 |
Breast-Tissue | 63.223.74 | 60.893.94 | 68.785.77 | 66.004.88 | 65.837.05 | 63.223.74 | 66.562.17 | 63.223.74 | 70.606.28 |
UKM | 92.182.02 | 92.574.85 | 91.032.00 | 94.492.45 | 91.782.32 | 91.452.48 | 91.922.50 | 94.232.14 | 95.241.37 |
Vertebral-Column | 76.442.65 | 77.224.66 | 75.675.86 | 76.673.12 | 77.004.71 | 78.225.55 | 76.567.41 | 81.892.38 | 85.771.53 |
Ecoli | 74.647.88 | 72.8112.01 | 72.7311.35 | 76.237.14 | 73.128.72 | 77.247.03 | 82.305.21 | 82.445.08 | 85.311.62 |
Average | 79.61 | 79.77 | 80.62 | 81.07 | 80.89 | 79.89 | 81.66 | 83.26 | 85.65 |
Rank Avg. | 7.00 | 6.11 | 6.17 | 4.78 | 5.44 | 6.33 | 4.89 | 3.28 | 1.00 |
Data Set | Base | SSMOTE | GCS | ABNC | OSMOTE | MDO | MDO+ | AMDO | EMDO |
---|---|---|---|---|---|---|---|---|---|
Balance | 56.951.91 | 58.123.27 | 57.103.43 | 60.522.89 | 58.583.18 | 57.402.78 | 56.600.88 | 60.611.24 | 65.614.30 |
Hayes-Roth | 94.342.52 | 94.251.90 | 94.502.09 | 93.822.65 | 94.362.53 | 94.342.52 | 94.362.53 | 94.362.53 | 94.672.81 |
New-Thyroid | 91.405.33 | 93.904.45 | 95.433.76 | 95.762.37 | 94.045.11 | 91.856.74 | 97.042.94 | 98.371.41 | 98.551.23 |
Page-Blocks | 90.753.57 | 89.842.16 | 94.814.66 | 86.827.87 | 94.963.16 | 87.904.66 | 93.972.91 | 95.512.07 | 96.072.24 |
Dermatology | 97.323.46 | 97.393.27 | 95.554.17 | 99.050.28 | 96.094.09 | 97.323.46 | 97.483.23 | 98.980.26 | 99.060.22 |
Breast-Tissue | 76.807.31 | 72.1810.10 | 75.809.99 | 78.308.92 | 76.929.75 | 76.807.31 | 77.308.09 | 76.807.31 | 82.373.77 |
UKM | 94.143.65 | 94.945.27 | 94.193.04 | 96.412.61 | 94.014.64 | 93.333.88 | 93.553.92 | 96.453.02 | 97.262.49 |
Vertebral-Column | 79.253.17 | 79.965.56 | 78.386.36 | 79.425.04 | 80.042.48 | 81.136.76 | 79.548.76 | 86.041.72 | 89.072.00 |
Ecoli | 83.0711.54 | 83.4310.88 | 79.7811.44 | 84.789.21 | 79.7411.00 | 86.389.56 | 91.974.61 | 91.594.19 | 92.912.92 |
Average | 84.89 | 84.89 | 85.06 | 86.10 | 85.42 | 85.16 | 86.87 | 88.75 | 90.62 |
Rank Avg. | 7.00 | 6.22 | 6.33 | 4.89 | 5.44 | 6.33 | 4.89 | 2.89 | 1.00 |
Data Set | Baseline | SSMOTE | GCS | ABNC | OSMOTE | MDO | MDO+ | AMDO | EMDO |
---|---|---|---|---|---|---|---|---|---|
Balance | 67.291.31 | 68.792.20 | 67.493.02 | 71.752.47 | 68.392.09 | 67.962.19 | 66.591.25 | 70.271.54 | 73.132.77 |
Hayes-Roth | 88.685.03 | 88.503.79 | 89.004.18 | 87.645.31 | 88.735.06 | 88.685.03 | 88.735.06 | 88.735.06 | 89.154.86 |
New-Thyroid | 91.644.52 | 93.312.27 | 94.863.41 | 95.191.09 | 94.404.96 | 92.365.13 | 96.242.69 | 97.402.24 | 97.762.14 |
Page-Blocks | 90.191.34 | 90.351.18 | 92.982.80 | 87.313.48 | 93.511.86 | 88.271.12 | 91.331.55 | 92.981.20 | 93.851.17 |
Dermatology | 97.401.23 | 97.441.10 | 96.101.62 | 98.260.45 | 97.191.47 | 97.401.23 | 97.630.97 | 98.130.16 | 98.260.16 |
Breast-Tissue | 77.932.24 | 76.532.36 | 81.273.46 | 79.602.93 | 79.504.23 | 77.932.24 | 79.931.30 | 77.932.24 | 84.075.16 |
UKM | 94.791.34 | 95.053.23 | 94.021.33 | 96.321.64 | 94.521.54 | 94.301.65 | 94.611.67 | 96.151.42 | 96.911.05 |
Vertebral-Column | 82.331.99 | 82.923.50 | 81.754.39 | 82.502.34 | 82.753.53 | 83.674.16 | 82.425.55 | 86.421.78 | 89.051.33 |
Ecoli | 84.154.93 | 83.017.50 | 82.967.10 | 85.154.46 | 83.205.45 | 85.774.39 | 88.933.26 | 89.033.18 | 90.821.01 |
Average | 86.04 | 86.21 | 86.71 | 87.08 | 86.91 | 86.26 | 87.38 | 88.56 | 90.33 |
Rank Avg. | 7.00 | 6.11 | 6.17 | 4.72 | 5.44 | 6.33 | 4.89 | 3.28 | 1.06 |
Data Set | Size | Attributes | Classes | Class Distribution | IRmax | No. of Ellipsoids |
---|---|---|---|---|---|---|
Statlog (Shuttle) | 58000 | 9 | 7 | 45586/50/171/8903/ 3267/10/13 | 5684.67 | NA/3/4/5/5/ 3/1 |
Mafalda | 23762 | 14 | 3 | 17757/2990/3015 | 5.94 | NA/11/11 |
Data Set | Pmin | Pavg | AUCm | MAUC | |
---|---|---|---|---|---|
Statlog (Shuttle) | w/ EMDO | 89.3313.73 | 96.601.95 | 99.660.47 | 99.320.94 |
w/o EMDO | 6048.99 | 93.217.42 | 77.656.17 | 92.801.66 | |
Mafalda | w/ EMDO | 57.3818.33 | 72.349.33 | 76.399.31 | 78.198.22 |
w/o EMDO | 33.1712.13 | 60.7710.53 | 65.257.88 | 67.806.99 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yao, L.; Lin, T.-B. Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification. Sensors 2021, 21, 6616. https://doi.org/10.3390/s21196616
Yao L, Lin T-B. Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification. Sensors. 2021; 21(19):6616. https://doi.org/10.3390/s21196616
Chicago/Turabian StyleYao, Leehter, and Tung-Bin Lin. 2021. "Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification" Sensors 21, no. 19: 6616. https://doi.org/10.3390/s21196616
APA StyleYao, L., & Lin, T. -B. (2021). Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification. Sensors, 21(19), 6616. https://doi.org/10.3390/s21196616