A Broad TSK Fuzzy Classifier with a Simplified Set of Fuzzy Rules for Class-Imbalanced Learning
Abstract
:1. Introduction
- (1)
- Sampling methods including oversampling and under-sampling techniques [2,6]. These methods aim to adjust the distribution of original data instances to approach a balanced state, thereby improving the model’s predictive ability for minority classes. Random Over-Sampling (ROS) randomly increases the number of minority class instances [6]. Random Under-Sampling (RUS) randomly reduces the number of majority class instances [2]. Although these strategies show some effectiveness, their reliance on simple replication or deletion of original data instances can potentially lead to overfitting or loss of information. In response, an oversampling method named the Synthetic Minority Over-sampling Technique (SMOTE) has been proposed [7]. This method mitigates the risk of overfitting by performing orderly interpolation between minority class instances, thereby enhancing the capability of dealing with the class imbalance issue. In the literature [8], the combination of a Fuzzy Support Vector Machine (FSVM) with instance relative density information provides a more efficient approach for classification tasks with a complex class imbalance problem [9].
- (2)
- Cost-sensitive learning methods. These methods construct a cost–weight matrix by analyzing factors such as the error costs of the minority and majority classes, training costs, and instance quantities, thereby achieving an effect in dealing with class imbalance. These methods focus on the difference impact on the loss function of the misclassification of the instances in minority and majority classes. Cost-sensitive weight matrices are constructed by analyzing factors including misclassification costs, training costs, and instance numbers of minority and majority classes. With the employment of this weight matrix, the methods will protect the distribution region of minority classes instead of just pursuing high accuracy. In cost-sensitive learning methods, if minority classes are prone to misclassification, they will be assigned greater weights via a specific cost matrix [10]. Conversely, since majority classes are seldom misclassified, they will be assigned smaller weights to enhance the model’s classification performance for the minority classes. For example, by incorporating the concept of cost matrix weighting into Extreme Learning Machines (ELM) [11], researchers have proposed a high-performing and computationally efficient Weighted Extreme Learning Machine (WELM) method [12]. By combining cost-sensitive thinking with ensemble learning, the literature [13] introduces a sensitive decision tree ensemble method. In particular, the advent of AdaCost [14], a cost-sensitive boosting method combined with the Boosting ensemble method, has greatly improved the prediction accuracy for minority classes by incorporating an optimized weight update strategy and the strengths of the AdaBoost method. Notably, Support Vector Machine (SVM) methods have consistently performed well in classification effects. The method proposed in the literature [15] combines a Fuzzy Support Vector Machine (FSVM) with cost sensitivity, assigning greater weights to the instances of minority classes to address class imbalance [9]. A novel approach proposed in the literature [16] combines cost sensitivity with a Broad Learning System (BLS), using weighted penalty factors to constrain each instance’s contribution in different classes, allocating higher weights to the instances of smaller classes to enhance their contribution. Reference [17] presents a cost-sensitive variable selection method for Bayesian network classifiers, which optimizes the performance of multi-class classification problems with class imbalance in practical applications. In cost-sensitive methods, how to determine the weights is an open research hotspot [18].
- (3)
- Hybrid methods for class imbalance problems. These methods primarily combine the above two strategies or integrate them with advanced techniques, e.g., ensemble learning, cluster learning, and deep learning, thereby enhancing the capacity to handle class imbalance problems. These methods usually employ cost-sensitive learning methods in the form of ensemble learning after, respectively, oversampling and/or under-sampling the minority classes and majority classes. In the data preprocessing stage, sampling methods such as SMOTE are used to balance the distribution of data instances [7], and then classic methods such as KNN and CART4.5 are employed to learn from these more balanced data. This has been proven to be an effective hybrid strategy. The advantages of ensemble learning techniques in enhancing the generalization performance of methods and reducing overfitting have been demonstrated in the literature [19]. Leveraging the strengths of ensemble learning, several highly robust and generalizable methods such as SMOTEBagging [19], SMOTEBoost [20], UnderBagging [21], RUSBoost [22], and OverBoost [23] have been proposed. These approaches incorporate advanced sampling techniques into ensemble method frameworks, including Bagging and Boosting, forming advanced class imbalance ensemble frameworks. In the field of class imbalance learning, ensemble methods have shown higher robustness and foresight compared to single classifiers; hence, the method proposed in this paper also cleverly uses the unique advantages of ensemble techniques in the field of class-imbalanced learning.
- (1)
- Even though the random generation of fuzzy rules with equal partitions along each feature has been widely adopted by the current TSK fuzzy classifier methods [25,27,36], the exclusion of ineffective fuzzy rules has yet to be of concern, which may lead to an increase in the number of fuzzy rules and inevitably damage the interpretability of the model. Based on the adaptability of the antecedent parts and consequent parts of fuzzy rules to different complex data environments, we propose a fuzzy rule simplification strategy that effectively reduces the number of fuzzy rules, enhances the interpretability of the TSK fuzzy classifier, and improves the classification performance of the classifier.
- (2)
- Different from the current methods in which all the fuzzy rules are considered indiscriminately when facing a classification task on class-imbalanced data, the fuzzy rules of the TSK fuzzy sub-classifiers in B-TSK-FC may play significantly different roles in the classification task. For the scenario of class-imbalanced learning, we recognize that fuzzy rules contain knowledge of different data distributions. By generating a weight matrix that leverages information about the number of classes in the data, we propose a concise and easy-to-implement fuzzy rule weighting scheme to modify the fuzzy system to adapt to class-imbalanced scenarios. This fuzzy rule weighting scheme is coincident with the working manner of human thinking in which different knowledge works with different magnitudes.
- (3)
- In a class-imbalanced data environment, guided by the objective of improving the classification accuracy of each class, we propose a dynamic weighted ensemble strategy that effectively enhances the prediction accuracy of each class. By assembling a series of zero-order TSK fuzzy sub-classifiers in a broad manner, we significantly improve the generalization performance of the system and effectively reduce the risk of overfitting while maintaining interpretability.
- (4)
- Comparative experimental results from fifteen benchmark datasets and the state-of-the-art comparative methods demonstrate the efficiency of our proposed B-TSK-FC fuzzy classifier in class-imbalanced scenarios in both linguistic interpretability and superior classification performance.
2. Classical Zero-Order TSK Fuzzy Classifier
3. The Proposed Method
- (1)
- In order to solve the fuzzy rule explosion problem, which is encountered by current fuzzy systems in complex and variable data environments [34], we adopt a strategy for simplifying fuzzy rules and improving the quality of fuzzy rules. While the random selection of the centers of fuzzy rule antecedent parts offers interpretability, some initially generated fuzzy rules may not align well with data characteristics, indicating low fuzzy rule quality. Since the adaptability of a fuzzy rule to specific data scenarios is primarily reflected in its antecedent parts and consequent parts, we simplify the fuzzy rules according to the antecedent parts and consequent parts to improve the quality of fuzzy rules.
- (2)
- Although the current class-imbalanced learning techniques have achieved significant progress in classification performance, they are not interpretable. As a result, we choose the zero-order TSK fuzzy classifier, known for its excellent interpretability, incorporate cost-sensitive reasoning, and propose a simple yet effective fuzzy rule weighting method. This allows the TSK fuzzy classifier to tackle class-imbalanced data more efficiently.
- (3)
- After the above improvements, the TSK fuzzy classifier has a strong class imbalance classification performance. However, the generated TSK fuzzy sub-classifiers are similar to each other. Using conventional simple voting for the ensemble would restrict performance enhancement. Hence, we employ the idea of a class-imbalanced G-mean metric with the objective of optimizing per-class classification accuracy. This approach allows for the reasonable weighting of individual fuzzy sub-classifiers within the ensemble. This not only enhances the generalization performance of the ensemble classifier but also reduces the risk of overfitting.
3.1. Structure of B-TSK-FC
Algorithm 1 Training process of the tth fuzzy sub-classifier | ||
Input: Training dataset , consisting of , and the corresponding class labels Here, is the number of training instances, and is the total dimension of the instance. For binary classification, . For multi-class classification, is transformed into a one-hot encoded binary vector, as outlined in [37]. The method requires a pre-set initial fuzzy rule count , optimized fuzzy rule count where , , a regularization constant parameter , and the width of the Gaussian function, denoted as , in which ,, . | ||
Output: the consequent part parameters of the learned fuzzy rules in the tth zero-order TSK fuzzy sub-classifier, the antecedent part matrix after fuzzy rule improvement, where is the weight of the corresponding fuzzy rule. | ||
Procedure: | ||
Step 1 | Using distribution information across classes, construct a diagonal weight matrix . | |
Let the number of instances of class c in the training dataset be denoted as , where is the class label of the instances. The total number of training instances is , and the weight diagonal matrix is defined as , where . Here, “diag” denotes a diagonal matrix where the diagonal elements are the provided values, and all off-diagonal elements are zero. In this context, is an diagonal matrix. | ||
Step 2 | Compute the Gaussian membership function for each feature of the instance, defined as follows for the kth fuzzy rule and jth input feature. | |
where and denotes the center of the kth fuzzy rule along the jth feature. Here, denotes the tth fuzzy sub-classifier, and is determined either manually or using the method described in [36]. | ||
Then, compute the normalized membership function value for the instance under the kth fuzzy rule. | ||
where | ||
Step 3 | Computing the consequent part of the fuzzy rule. | |
Initially, the number of fuzzy rules is set to K. The consequent part parameter matrix of the fuzzy rule is defined as . Subsequently, based on [35,36], it can be transformed into a linear equation form. | ||
By introducing the identity matrix and using the LLM [38,39,40], the consequent part parameter of the fuzzy rule can be determined as | ||
Step 4 | Calculate the matrix of the antecedent parts and consequent parts of the fuzzy rules. | |
Step 5 | Select fuzzy rules corresponding to the columns in that have the largest average values and construct matrix . | |
Step 6 | Let the consequent part of the optimized fuzzy rules be denoted as . Using the weighted matrix , recalculate the consequent part of the fuzzy rules, and again transform it into a linear equation form as suggested by [35,36]. | |
Introduce the identity matrix and use the LLM to derive the parameters for the improved consequent part of the fuzzy rule [38,39,40]. | ||
Step 7 | Return , . |
Algorithm 2 Training process of B-TSK-FC | ||||
Input: Training set , where denotes the number of instances in the training set, and is the total dimension of the instance. Validation set , where is the number of instances in the validation set. For binary classification, . For multi-class classification, is encoded using one-hot encoding into a binary vector following [37]. Number of sub-classifiers, T. | ||||
Output: Fuzzy rule-improved fuzzy sub-classifier , B-TSK-FC broad ensemble classifier | ||||
, where is the ensemble weight of the tth TSK fuzzy sub-classifier, . | ||||
Procedure: | ||||
Step 1 | Using T and , invoke Algorithm 1 to generate T sub-classifiers post-fuzzy rule improvement. Let denote the tth zero-order TSK fuzzy sub-classifier, which has finished fuzzy rule selection and weight optimization, where . Algorithm 1 is denoted as . Execute the following iterative procedure. | |||
; | ||||
Step 2 | Use the validation set to generate the training set for training the ensemble weightings. Initially set . Subsequently, execute the following iterative procedure. | |||
; | ||||
; | ||||
Step 3 | Use the gradient descent method on to compute the ensemble weighting matrix for each sub-classifier, denoted as , where each weight is the ensemble weight for the tth sub-classifier to facilitate the weighted ensemble of all sub-classifiers, with . | |||
Step 3.1 | Define Loss Function. | |||
Drawing inspiration from the G-mean metric, design a loss function using the mean square error for each instance class. | ||||
where P and Q, respectively, denote the count of positive and negative instances in the validation set, such that is the predictive output for the mth instance in the validation set across all sub-classifiers, serving as the training feature set for the ensemble weights. To minimize this loss, compute the gradient of the loss function | ||||
where and , respectively, denote the predicted outputs for all positive and negative instances across all sub-classifiers. Meanwhile, and correspondingly represent the actual classes of all positive and negative instances. | ||||
Step 3.2 | Set | |||
where is the learning rate, thus deriving the optimal solution matrix for the ensemble weights on data using the gradient descent method. | ||||
Step 3.3 | Use the obtained weight matrix to ensemble all TSK fuzzy sub-classifiers, obtaining the total output for the classifier post-ensemble as | |||
Step 4 | Output , . |
3.2. Theoretical Analysis and Proof of the Principle behind B-TSK-FC
- (1)
- In the selection phase, the quality of fuzzy rules varies depending on the chosen centers of the antecedent parts. Some fuzzy rules align exceptionally well with particular data, while others do not. It is worth noting that fuzzy rules with larger antecedent part values typically produce higher values of the membership functions, suggesting the appropriate selection of the antecedent part centers. As a result, the constructed fuzzy rules better adhere to the original data distribution. The consequent part of a fuzzy rule is its weight; the larger its value, the closer its decision boundary is to the real data boundary. By affecting the output through a linear combination of antecedent parts and consequent parts (Equation (8)), the selection method proposed in this study optimizes the overall quality of fuzzy rules, reduces their number, enhances interpretability, and significantly boosts classification performance. This approach also tackles the fuzzy rule explosion issue induced by increasing data complexity, dynamically adapting to complex and variable data environments.
- (2)
- In the weighting phase, traditional zero-order TSK fuzzy classifiers underperform when dealing with imbalanced data. Hence, leveraging class-specific information, we generate a concise weight matrix that assigns higher weights to fuzzy rules encompassing minority class membership function knowledge, enhancing training efficacy. In contrast, fuzzy rules with majority class membership function information are assigned lower weights, which improves the capability to handle class imbalances.
- (3)
- At the ensemble stage, we define a new loss function inspired by the G-mean metric from imbalanced learning to compute the rational weights of each fuzzy sub-classifier in imbalanced scenarios. This G-mean weighted ensemble scheme effectively ameliorates prediction performance, precludes the negligence of minority classes, and mitigates overfitting risks.
- (1)
- Assuming the label of the ith instance is (i.e., the instance belongs to the positive class), the cross-entropy loss function becomes
- (2)
- Assuming the label of the ith instance is (i.e., the instance belongs to the negative class), the cross-entropy loss function is expressed as
3.3. Complexity Analysis
4. Experimental Results
4.1. Datasets
4.2. Comparative Methods
4.3. Parameter Settings and Evaluation Metrics
4.4. Comparative Experimental Study
- (1)
- On the majority of the datasets, the proposed B-TSK-FC method exhibits superior generalization performance. Particularly on datasets such as PEN, MAR, MUS, VOW, CAG, P96, P86, LET, and PAB, B-TSK-FC significantly outperforms the comparative methods, indicating its exceptional effectiveness in handling class-imbalanced scenarios.
- (2)
- For certain datasets, including TUR, DNA, and THY, the performance of B-TSK-FC is comparable to some of the comparative methods yet remains competitive. This suggests that the B-TSK-FC method maintains consistent generalization performance across various datasets.
- (3)
- On some datasets, especially those with higher imbalance ratios, B-TSK-FC achieves the best results. This demonstrates that in specific settings, such as highly imbalanced complex datasets, B-TSK-FC can provide superior classification outcomes. This also indicates that our designed mechanism for the selection of fuzzy rules is not overly dependent on data distribution but can obtain higher quality fuzzy rules accordingly based on the specific data distribution, thereby efficiently tackling diverse and complex datasets.
- (4)
- Please note that some comparative methods achieve much lower testing accuracies than training accuracies when carried out on some datasets, e.g., SMOTEBagging carried out on the datasets MAR, TUR, DNA, USP, P96, P86, LET, PAB, and SMOTEboost implemented on the datasets USP, CAG, P96, P86, and PAB. The big margin between testing accuracies and training accuracies indicates that the method has seriously overfitted the training instances and lost the generalization capability for the testing instances. In contrast, the testing accuracies and the training accuracies of B-TSK-FC are much closer than that of the comparative methods, which demonstrates the advantage of the generalization capability of B-TSK-FC.
4.5. Statistical Test
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Abbreviation | Full Form |
---|---|
TSK | Takagi–Sugeno–Kang |
IR | Imbalanced Ratio |
diag | Diagonal Matrix |
G-mean | Geometric Mean |
ROS | Random Over-Sampling |
RUS | Random Under-Sampling |
SMOTE | Synthetic Minority Over-sampling Technique |
LLM | Lazy Learning Machine |
KEEL | Knowledge Extraction based on Evolutionary Learning |
UCI | University of California, Irvine |
W-TSK | Loss-Weighted TSK |
KNN | K-Nearest Neighbors |
RUSBoost | Random Under-Sampling Boosting |
OverBoost | Over-Sampling Boosting |
SMOTEBagging | Synthetic Minority Over-Sampling Technique Bootstrap Aggregating |
SMOTEBoost | Synthetic Minority Over-Sampling Technique Boosting |
References
- Chawla, N.V.; Japkowicz, N.; Kołcz, A. Editorial: Special issue on learning from imbalanced datasets. ACM SIGKDD Explor. Newsl. 2004, 6, 1–6. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Xu, L.; Chow, M.; Taylor, L. Power distribution fault cause identification with imbalanced data using the data mining-based fuzzy classification e-method. IEEE Trans. Power Syst. 2007, 22, 164–171. [Google Scholar] [CrossRef]
- Verbraken, T.; Verbeke, W.; Baesens, B. A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE Trans. Knowl. Data Eng. 2013, 25, 961–973. [Google Scholar] [CrossRef]
- Pozzolo, A.D.; Boracchi, G.; Caelen, O.; Alippi, C.; Bontempi, G. Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Trans. Neural Netw. Learn. Syst. 2018, 28, 3784–3797. [Google Scholar]
- Cao, H.; Li, X.L.; Woon, D.Y.K.; Ng, S.K. Integrated oversampling for imbalanced time series classification. IEEE Trans. Knowl. Data Eng. 2013, 25, 2809–2822. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.; Hall, L.; Kegelmeyer, W. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Yu, H.; Sun, C.; Yang, X.; Zheng, S.; Zou, H. Fuzzy support vector machine with relative density information for classifying imbalanced data. IEEE Trans. Fuzzy Syst. 2019, 27, 2353–2367. [Google Scholar] [CrossRef]
- Lin, C.F.; Wang, S.D. Fuzzy support vector machines. IEEE Trans. Neural Netw. 2002, 13, 464–471. [Google Scholar]
- Sun, Y.; Kamel, M.S.; Wong, A.K.C.; Wang, Y. Cost-Sensitive Boosting for Classification of Imbalanced Data. Pattern Recognit. 2007, 40, 3358–3378. [Google Scholar] [CrossRef]
- Li, K.; Kong, X.; Lu, Z.; Wenyin, L.; Yin, J. Boosting weighted ELM for imbalanced learning. Neurocomputing 2014, 128, 15–21. [Google Scholar] [CrossRef]
- Zong, W.; Huang, G.B.; Chen, Y. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013, 101, 229–242. [Google Scholar] [CrossRef]
- Krawczyk, M.W.B.; Schaefer, G. Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 2014, 14, 554–562. [Google Scholar] [CrossRef]
- Fan, W.; Stolfo, S.J.; Zhang, J.; Chan, P.K. Adacost: Misclassification Cost-Sensitive Boosting. In Proceedings of the International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; pp. 97–105. [Google Scholar]
- Batuwita, R.; Palade, V. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 2010, 18, 558–571. [Google Scholar] [CrossRef]
- Yao, L.; Wong, P.K.; Zhao, B.; Wang, Z.; Lei, L.; Wang, X.; Hu, Y. Cost-Sensitive Broad Learning System for Imbalanced Classification and Its Medical Application. Mathematics 2022, 10, 829. [Google Scholar] [CrossRef]
- Ramos-López, D.; Maldonado, A.D. Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks. Mathematics 2021, 9, 156. [Google Scholar] [CrossRef]
- Loyola-Gonzlez, O.; Martinez-Trinidad, J.F.C.O.; Carrasco-Ochoaand, J.A.; Garcia-Borroto, M. Cost-Sensitive Pattern-Based classification for Class Imbalance problems. IEEE Access 2019, 7, 60411–60427. [Google Scholar] [CrossRef]
- Wang, S.; Yao, X. Diversity analysis on imbalanced data sets by using ensemble models. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; pp. 324–331. [Google Scholar]
- Chawla, N.V.; Lazarevic, A.; Hall, L.O.; KBowyer, W.K. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Database, Cavtat-Dubrovnik, Croatia, 22–26 September 2003; pp. 107–119. [Google Scholar]
- Seiffert, C.; Khoshgoftaar, T.M.; Hulse, J.V.A.; Napolitano, A. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
- Liu, X.; Wu, J.; Zhou, Z. Exploratory Undersampling for Class-Imbalance Learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2009, 39, 539–550. [Google Scholar]
- Seiffert, C.; Khoshgoftaar, T.M.; Hulse, J.V.; Napolitano, A. Resampling or Reweighting: A Comparison of Boosting Implementations. In Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence, Dayton, OH, USA, 3–5 November 2008; pp. 445–451. [Google Scholar] [CrossRef]
- Zhang, X.; Nojima, Y.; Ishibuchi, H.; Hu, W.; Wang, S. Prediction by Fuzzy Clustering and KNN on Validation Data With Parallel Ensemble of Interpretable TSK Fuzzy Classifiers. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 400–414. [Google Scholar] [CrossRef]
- Qin, B.; Chung, F.-L.; Wang, S. Biologically Plausible Fuzzy-Knowledge-Out and Its Induced Wide Learning of Interpretable TSK Fuzzy Classifiers. IEEE Trans. Fuzzy Syst. 2020, 28, 1276–1290. [Google Scholar] [CrossRef]
- Zhou, W.; Li, H.; Bao, M. Stochastic Configuration Based Fuzzy Inference System with Interpretable Fuzzy Rules and Intelligence Search Process. Mathematics 2023, 11, 614. [Google Scholar] [CrossRef]
- Qin, B.; Chung, F.-L.; Wang, S. KAT: A Knowledge Adversarial Training Method for Zero-Order Takagi–Sugeno–Kang Fuzzy Classifiers. IEEE Trans. Cybern. 2021, 52, 6857–6871. [Google Scholar] [CrossRef]
- Qin, B.; Chung, F.-L.; Nojima, Y.; Ishibuchi, H.; Wang, S. Fuzzy rule dropout with dynamic compensation for wide learning algorithm of TSK fuzzy classifier. Appl. Soft Comput. 2022, 127, 109410. [Google Scholar] [CrossRef]
- Fernández, A.; García, S.; del Jesus, M.J.; Herrera, F. A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 2008, 159, 2378–2398. [Google Scholar] [CrossRef]
- Cordón, O.; del Jesus, M.J.; Herrera, F. A proposal on reasoning methods in fuzzy rule-based classification systems. Int. J. Approx. Reason. 1999, 20, 21–45. [Google Scholar] [CrossRef]
- Soler, V.; Cerquides, J.; Sabria, J.; Roig, J.; Prim, M. Imbalanced datasets classification by fuzzy rule extraction and genetic methods. In Proceedings of the Sixth IEEE International Conference on Data Mining-Workshops (ICDMW′06), Hong Kong, China, 18–22 December 2006; pp. 330–336. [Google Scholar]
- Ishibuchi, H.; Yamamoto, T. Fuzzy rule selection by multi-objective genetic local search methods and rule evaluation measures in data mining. Fuzzy Sets Syst. 2004, 141, 59–88. [Google Scholar] [CrossRef]
- Ishibuchi, H.; Yamamoto, T. Rule weight specification in fuzzy rule-based classification systems. IEEE Trans. Fuzzy Syst. 2005, 13, 428–435. [Google Scholar] [CrossRef]
- Information Resources Management Association USA. Fuzzy Systems: Concepts, Methodologies, Tools, and Applications; Springer: Heidelberg, Germany, 2017. [Google Scholar]
- Qin, B.; Nojima, Y.; Ishibuchi, H.; Wang, S. Realizing Deep High-Order TSK Fuzzy Classifier by Ensembling Interpretable Zero-Order TSK Fuzzy Subclassifiers. IEEE Trans. Fuzzy Syst. 2021, 29, 3441–3455. [Google Scholar] [CrossRef]
- Sonbol, A.H.; Fadali, M.S.; Jafarzadeh, S. TSK fuzzy function approximators: Design and accuracy analysis. IEEE Trans. Syst. Man Cybern. B Cybern. 2012, 42, 702–712. [Google Scholar] [CrossRef] [PubMed]
- Min, Y.; Abbe, E. Communication-computation efficient gradient coding. International Conference on Machine Learning. PMLR 2018, 80, 5610–5619. [Google Scholar]
- Wang, S.; Chung, K.F.-L. On least learning machine. J. Jiangnan Univ. (Natural Sci. Ed.) 2010, 9, 505–510. [Google Scholar]
- Wang, S.; Jiang, Y.; Chung, F.-L.; Qian, P. Feedforward kernel neural networks, generalized least learning machine, and its deep learning with application to image classification. Appl. Soft Comput. 2015, 37, 125–141. [Google Scholar] [CrossRef]
- Wang, S.; Chung, F.-L.; Wu, J.; Wang, J. Least learning machine and its experimental studies on regression capability. Appl. Soft Comput. 2014, 21, 677–684. [Google Scholar] [CrossRef]
- Zhou, T.; Ishibuchi, H.; Wang, S. Stacked Blockwise Combination of Interpretable TSK Fuzzy Classifiers by Negative Correlation Learning. IEEE Trans. Fuzzy Syst. 2018, 26, 3327–3341. [Google Scholar] [CrossRef]
- Alcal-Fdez, J.; Fernndez, A.; Luengo, J.; Derrac, J.; Garcła, S. KEEL Data-Mining Software Tool: Dataset Repository, Integration of Methods and Experimental Analysis Framework. J. Mult.-Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
- Lichman, M. UCI Machine Learning Repository. 2013. Available online: http://archive.ics.uci.ed-u/ml (accessed on 15 March 2023).
- Zhang, Y.; Ishibuchi, H.; Wang, S. Deep Takagi-Sugeno-Kang fuzzy classifier with shared linguistic fuzzy rules. IEEE Trans. Fuzzy Syst. 2018, 26, 1535–1549. [Google Scholar] [CrossRef]
- Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
# | Datasets | The Number of Fuzzy Rules in the Candidate Set for Each TSK Fuzzy Sub-Classifier |
---|---|---|
1 | penbased(PEN) | 255 |
2 | marketing(MAR) | 200 |
3 | turkiyestudentevaluationRSpecific(TUR) | 90 |
4 | DNA(DNA) | 400 |
5 | skin(SKI) | 220 |
6 | usps(USP) | 220 |
7 | musk(MUS) | 535 |
8 | vowel0(VOW) | 200 |
9 | car-good(CAG) | 90 |
10 | thyroid(THY) | 280 |
11 | poker-8-9_vs_6(P96) | 145 |
12 | shuttle-2_vs_5(SHU) | 180 |
13 | poker-8_vs_6(P86) | 135 |
14 | letterA(LET) | 220 |
15 | page_blocks(PAB) | 155 |
# | Datasets | IR | No. of Instances | No. of Features | No. of Classes |
---|---|---|---|---|---|
1 | penbased(PEN) | 1.95 | 1100 | 16 | 10 |
2 | marketing(MAR) | 2.49 | 6877 | 13 | 9 |
3 | turkiyestudentevaluationRSpecific(TUR) | 3.03 | 5820 | 33 | 5 |
4 | DNA(DNA) | 3.29 | 3186 | 180 | 2 |
5 | skin(SKI) | 3.82 | 245,057 | 3 | 2 |
6 | usps(USP) | 4.00 | 1500 | 241 | 2 |
7 | musk(MUS) | 5.49 | 6598 | 166 | 2 |
8 | vowel0(VOW) | 9.98 | 988 | 13 | 2 |
9 | car-good(CAG) | 24.04 | 1728 | 6 | 2 |
10 | thyroid(THY) | 40.16 | 7200 | 21 | 3 |
11 | poker-8-9_vs_6(P96) | 58.40 | 1485 | 10 | 2 |
12 | shuttle-2_vs_5(SHU) | 66.67 | 3316 | 9 | 2 |
13 | poker-8_vs_6(P86) | 85.88 | 1477 | 10 | 2 |
14 | letterA(LET) | 112.64 | 2000 | 21 | 2 |
15 | page_blocks(PAB) | 175.46 | 5473 | 10 | 5 |
Parameters | Ranges and Intervals |
---|---|
: Center value of the Gaussian membership function | [0, 0.25, 0.5, 0.75, 1] |
: Number of fuzzy rules for the TSK fuzzy sub-classifier | 5:5:500 |
: Number of sub-classifiers in the ensemble | 5:5:500 |
Approaches | Default Values of Parameters | Ranges and Intervals of Parameters |
---|---|---|
SMOTE+TSK | sampling_strategy = ‘auto’, random_state = None, k_neighbors = 5 | [0, 0.25, 0.5, 0.75, 1] K: 5:5:500 |
W-TSK | - | [0, 0.25, 0.5, 0.75, 1] K: 5:5:500 |
SMOTE+KNN | sampling_strategy = ‘auto’, random_state = None, k_neighbors = 5(KNN) | k_neighbors(KNN): 2:1:100 |
RUSBoost | learning_rate = 1.0, random_state = None | n_estimators: 5:5:500 |
OverBoost | random_state = None, k_neighbors = 5, early_termination; False | n_estimators: 5:5:500 |
SMOTEBagging | random_state = None, k_neighbors = 5, sampling_strateg = ‘auto’ | n_estimators: 5:5:500 |
SMOTEBoost | random_state = None, learning_rate = 1.0, k_neighbors = 5 | n_estimators: 5:5:500 |
True Condition | Predicted Result | |
---|---|---|
Positive | Negative | |
Positive | TP (True Positive) | FN (False Negative) |
Negative | FP (False Positive) | TN (True Negative) |
DAS | B-TSK-FC | SMOTE+TSK | W-TSK | SMOTE+KNN | RusBoost | OverBoost | SMOTEBagging | SMOTEBoost | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training ± Std Testing ± Std | K | T | Training ± Std Testing ± Std | K | Training ± Std Testing ± Std | K | Training ± Std Testing ± Std | Training ± Std Testing ± Std | T | Training ± Std Testing ± Std | T | Training ± Std Testing ± Std | T | Training ± Std Testing ± Std | T | |
PEN | 0.9714 ± 0.0001 0.9811 ± 0.0001 | 230 | 10 | 0.9517 ± 0.0000 0.9434 ± 0.0003 | 200 | 0.9774 ± 0.0000 0.9672 ± 0.0001 | 300 | 0.9778 ± 0.0000 0.9612 ± 0.0001 | 0.7791 ± 0.0006 0.7209 ± 0.0015 | 350 | 0.8810 ± 0.0003 0.8804 ± 0.0005 | 55 | 1.0000 ± 0.0000 0.9783 ± 0.0000 | 500 | 0.1408 ± 0.0470 0.1364 ± 0.0443 | 100 |
MAR | 0.2455 ± 0.0001 0.2518 ± 0.0001 | 180 | 10 | 0.2731 ± 0.0001 0.2227 ± 0.0003 | 400 | 0.3143 ± 0.0000 0.2306 ± 0.0001 | 200 | 0.4155 ± 0.0000 0.2274 ± 0.0001 | 0.2205 ± 0.0002 0.1977 ± 0.0005 | 170 | 0.2523 ± 0.0000 0.2353 ± 0.0002 | 90 | 0.9133 ± 0.0000 0.2324 ± 0.0000 | 50 | 0.2436 ± 0.0086 0.2264 ± 0.0124 | 25 |
TUR | 0.8706 ± 0.0001 0.8297 ± 0.0002 | 80 | 50 | 0.8579 ± 0.0000 0.8170 ± 0.0001 | 250 | 0.8678 ± 0.0000 0.8178 ± 0.0001 | 275 | 0.8601 ± 0.0000 0.8272 ± 0.0001 | 0.8234 ± 0.0009 0.8241 ± 0.0012 | 300 | 0.8485 ± 0.0001 0.8294 ± 0.0002 | 250 | 0.9990 ± 0.0000 0.8448 ± 0.0001 | 25 | 0.8468 ± 0.0095 0.8432 ± 0.0114 | 30 |
DNA | 0.8206 ± 0.0002 0.7487 ± 0.0003 | 360 | 30 | 0.7614 ± 0.0000 0.6300 ± 0.0004 | 450 | 0.7793 ± 0.0001 0.6223 ± 0.0007 | 450 | 0.4978 ± 0.0000 0.3622 ± 0.0003 | 0.8409 ± 0.0000 0.7994 ± 0.0002 | 100 | 0.8526 ± 0.0000 0.8294 ± 0.0002 | 40 | 0.9740 ± 0.0000 0.7121 ± 0.0005 | 5 | 0.8303 ± 0.0092 0.8200 ± 0.0137 | 5 |
SKI | 0.9793 ± 0.0000 0.9814 ± 0.0000 | 200 | 10 | 0.9669 ± 0.0000 0.9665 ± 0.0000 | 10 | 0.9670 ± 0.0000 0.9672 ± 0.0000 | 19 | 0.9791 ± 0.0000 0.9782 ± 0.0000 | 0.9652 ± 0.0001 0.9629 ± 0.0001 | 24 | 0.9643 ± 0.0000 0.9614 ± 0.0001 | 18 | 0.9999 ± 0.0000 0.9993 ± 0.0000 | 20 | 0.9438 ± 0.0007 0.9434 ± 0.0014 | 10 |
USP | 0.9569 ± 0.0001 0.9444 ± 0.0003 | 200 | 10 | 0.9536 ± 0.0001 0.8938 ± 0.0006 | 240 | 0.9481 ± 0.0000 0.8710 ± 0.0014 | 325 | 0.9468 ± 0.0000 0.9467 ± 0.0004 | 0.8795 ± 0.0001 0.8293 ± 0.0005 | 20 | 0.9479 ± 0.0001 0.9371 ± 0.0002 | 10 | 1.0000 ± 0.0000 0.8550 ± 0.0004 | 500 | 0.9996 ± 0.0007 0.8251 ± 0.0008 | 100 |
MUS | 0.9696 ± 0.0000 0.9653 ± 0.0000 | 480 | 50 | 0.9582 ± 0.0000 0.9396 ± 0.0001 | 600 | 0.9579 ± 0.0000 0.9464 ± 0.0001 | 500 | 0.9687 ± 0.0000 0.9311 ± 0.0000 | 0.9649 ± 0.0000 0.9385 ± 0.0002 | 80 | 0.9559 ± 0.0000 0.9250 ± 0.0001 | 50 | 0.9995 ± 0.0000 0.9355 ± 0.0002 | 25 | 0.9999 ± 0.0000 0.9601 ± 0.0000 | 250 |
VOW | 0.9979 ± 0.0000 0.9969 ± 0.0000 | 180 | 300 | 0.9811 ± 0.0000 0.9794 ± 0.0001 | 90 | 0.9822 ± 0.0001 0.9795 ± 0.0000 | 85 | 0.9890 ± 0.0000 0.9868 ± 0.0000 | 0.9650 ± 0.0025 0.9386 ± 0.0025 | 29 | 0.9944 ± 0.0000 0.9529 ± 0.0006 | 10 | 0.9958 ± 0.0000 0.9702 ± 0.0008 | 5 | 1.0000 ± 0.0000 0.9789 ± 0.0168 | 80 |
CAG | 0.9832 ± 0.0000 0.9854 ± 0.0000 | 80 | 150 | 0.9247 ± 0.0000 0.9117 ± 0.0009 | 170 | 0.9210 ± 0.0000 0.9203 ± 0.0001 | 130 | 0.9589 ± 0.0000 0.9361 ± 0.0003 | 0.8671 ± 0.0093 0.8842 ± 0.0059 | 475 | 0.9625 ± 0.0000 0.9612 ± 0.0000 | 25 | 1.0000 ± 0.0000 0.8908 ± 0.0036 | 20 | 0.9572 ± 0.0153 0.9049 ± 0.0773 | 30 |
THY | 0.7511 ± 0.0006 0.7580 ± 0.0004 | 250 | 10 | 0.7450 ± 0.0000 0.7227 ± 0.0007 | 475 | 0.7571 ± 0.0000 0.7291 ± 0.0004 | 300 | 0.7263 ± 0.0000 0.7034 ± 0.0006 | 0.8442 ± 0.0366 0.8097 ± 0.0469 | 15 | 0.9910 ± 0.0000 0.9897 ± 0.0000 | 50 | 0.9978 ± 0.0000 0.9820 ± 0.0001 | 5 | 0.9925 ± 0.0020 0.9916 ± 0.0029 | 5 |
P96 | 0.9776 ± 0.0008 0.9573 ± 0.0031 | 130 | 10 | 0.9529 ± 0.0010 0.8928 ± 0.0083 | 475 | 0.9770 ± 0.0004 0.8875 ± 0.0061 | 180 | 0.9045 ± 0.0000 0.8960 ± 0.0002 | 0.6709 ± 0.0009 0.4454 ± 0.0312 | 17 | 0.9483 ± 0.0003 0.4573 ± 0.0279 | 375 | 0.9295 ± 0.0021 0.4812 ± 0.0468 | 5 | 0.6205 ± 0.0544 0.2446 ± 0.2092 | 20 |
SHU | 0.9991 ± 0.0000 0.9984 ± 0.0000 | 160 | 10 | 0.9693 ± 0.0002 0.9689 ± 0.0028 | 250 | 0.9622 ± 0.0000 0.9831 ± 0.0007 | 180 | 0.9980 ± 0.0000 0.9965 ± 0.0000 | 0.9982 ± 0.0000 0.9810 ± 0.0024 | 20 | 1.0000 ± 0.0000 0.9827 ± 0.0027 | 50 | 1.0000 ± 0.0000 1.0000 ± 0.0000 | 5 | 1.0000 ± 0.0000 1.0000 ± 0.0000 | 5 |
P86 | 0.9887 ± 0.0001 0.9595 ± 0.0029 | 120 | 10 | 0.9811 ± 0.0000 0.9373 ± 0.0034 | 200 | 0.9811 ± 0.0001 0.8519 ± 0.0135 | 170 | 0.9667 ± 0.0000 0.9045 ± 0.0085 | 0.6304 ± 0.0195 0.4575 ± 0.0772 | 22 | 0.9746 ± 0.0001 0.3647 ± 0.0416 | 100 | 0.8858 ± 0.0047 0.0947 ± 0.0360 | 5 | 0.6149 ± 0.0617 0.4122 ± 0.2328 | 5 |
LET | 0.9492 ± 0.0000 0.9519 ± 0.0000 | 200 | 10 | 0.9541 ± 0.0000 0.9437 ± 0.0001 | 250 | 0.9554 ± 0.0000 0.9387 ± 0.0002 | 275 | 0.9468 ± 0.0000 0.9055 ± 0.0003 | 0.8895 ± 0.0083 0.8813 ± 0.0062 | 15 | 0.9888 ± 0.0000 0.9218 ± 0.0007 | 50 | 0.9579 ± 0.0001 0.7320 ± 0.0013 | 5 | 0.9455 ± 0.0122 0.9229 ± 0.0197 | 10 |
PAB | 0.8498 ± 0.0010 0.8543 ± 0.0013 | 140 | 200 | 0.8263 ± 0.0002 0.8092 ± 0.0014 | 25 | 0.8369 ± 0.0002 0.7976 ± 0.0025 | 40 | 0.8150 ± 0.0001 0.8039 ± 0.0006 | 0.8206 ± 0.0051 0.7951 ± 0.0033 | 300 | 0.3754 ± 0.1055 0.3912 ± 0.1154 | 375 | 0.9942 ± 0.0000 0.8306 ± 0.0020 | 300 | 0.3818 ± 0.2045 0.2936 ± 0.2520 | 60 |
Method Type | Performance Improvement Percentage (%) |
---|---|
Single-Class-Imbalanced Method | 5.44 |
Class-Imbalanced Ensemble Method | 15.48 |
Method | Ranking |
---|---|
B-TSK-FC | 1.8667 |
SMOTE+TSK | 4.8667 |
W-TSK | 4.4667 |
SMOTE+KNN | 4.3333 |
RUSBoost | 6.2667 |
OverBoost | 4.8667 |
SMOTEBagging | 4.3 |
SMOTEBoost | 5.0333 |
Method | Holm Hommel | |||
---|---|---|---|---|
7 | RUSBoost | 4.91935 | 0.000001 | 0.007143 |
6 | SMOTEBoost | 3.540441 | 0.000399 | 0.008333 |
5 | SMOTE+TSK | 3.354102 | 0.000796 | 0.01 |
4 | OverBoost | 3.354102 | 0.000796 | 0.0125 |
3 | W-TSK | 2.906888 | 0.00365 | 0.016667 |
2 | SMOTE+KNN | 2.757817 | 0.005819 | 0.025 |
1 | SMOTEBagging | 2.720549 | 0.006517 | 0.05 |
Method | ||||
---|---|---|---|---|
1 | RUSBoost | 0.000001 | 0.000006 | 0.000006 |
2 | SMOTEBoost | 0.000399 | 0.002397 | 0.001991 |
3 | SMOTE+TSK | 0.000796 | 0.003981 | 0.003185 |
4 | OverBoost | 0.000796 | 0.003981 | 0.003185 |
5 | W-TSK | 0.00365 | 0.010951 | 0.006517 |
6 | SMOTE+KNN | 0.005819 | 0.011638 | 0.006517 |
7 | SMOTEBagging | 0.006517 | 0.011638 | 0.006517 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Li, Y.; Liu, B.; Chen, H.; Zhou, J.; Yu, H.; Qin, B. A Broad TSK Fuzzy Classifier with a Simplified Set of Fuzzy Rules for Class-Imbalanced Learning. Mathematics 2023, 11, 4284. https://doi.org/10.3390/math11204284
Zhang J, Li Y, Liu B, Chen H, Zhou J, Yu H, Qin B. A Broad TSK Fuzzy Classifier with a Simplified Set of Fuzzy Rules for Class-Imbalanced Learning. Mathematics. 2023; 11(20):4284. https://doi.org/10.3390/math11204284
Chicago/Turabian StyleZhang, Jinghong, Yingying Li, Bowen Liu, Hao Chen, Jie Zhou, Hualong Yu, and Bin Qin. 2023. "A Broad TSK Fuzzy Classifier with a Simplified Set of Fuzzy Rules for Class-Imbalanced Learning" Mathematics 11, no. 20: 4284. https://doi.org/10.3390/math11204284
APA StyleZhang, J., Li, Y., Liu, B., Chen, H., Zhou, J., Yu, H., & Qin, B. (2023). A Broad TSK Fuzzy Classifier with a Simplified Set of Fuzzy Rules for Class-Imbalanced Learning. Mathematics, 11(20), 4284. https://doi.org/10.3390/math11204284