Next Article in Journal
A New Program to Estimate the Parameters of Preston’s Equation, a General Formula for Describing the Egg Shape of Birds
Previous Article in Journal
A Class of Smoothing Modulus-Based Iterative Methods for Solving the Stochastic Mixed Complementarity Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Quadratic Surface Minimax Probability Machine for Imbalanced Classification

1
School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China
2
School of Management, Shanghai University of International Business and Economics, Shanghai 201620, China
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(1), 230; https://doi.org/10.3390/sym15010230
Submission received: 21 December 2022 / Revised: 8 January 2023 / Accepted: 11 January 2023 / Published: 13 January 2023
(This article belongs to the Section Computer)

Abstract

:
In this paper, a kernel-free minimax probability machine model for imbalanced classification is proposed. In this model, a quadratic surface is adopted directly for separating the data points into two classes. By using two symmetry constraints to define the two worst-case classification accuracy rates, the model of maximizing both the F 1 value of the minority class and the classification accuracy rate of all the data points is proposed. The proposed model corresponds to a fractional programming problem. Since the two worst-case classification accuracy rates are the symmetry, the proposed model can be further simplified. After this, the alternating descent algorithm is adopted for efficiently solving. The proposed method reduces the computational costs by both using the kernel-free technique and adopting the efficient algorithm. Some numerical tests on benchmark datasets are conducted to investigate the classification performance of the proposed method. The numerical results demonstrate that the proposed method performs better when compared with the other state-of-the-art methods, especially for classifying the imbalanced datasets. The better performance for the imbalanced classification is also demonstrated on a Wholesale customers dataset. This method can provide methodological support for the research in areas such as customer segmentation.

1. Introduction

Binary classification is an important branch in the field of machine learning. Minimax probability machine (MPM), which is first proposed in [1], is a competitive machine learning method for binary classification problems. It directly maximizes the correct classification probability of the worst-case with a given mean and covariance matrix [1]. In other words, MPM provides a clear lower bound on prediction accuracy without making any assumptions on data distribution. The robustness of the MPM-based methods have been shown with successful applications in the fields such as medical diagnosis [2], anomaly detection [3], and customer churn prediction [4].
Since the role of two classes are often symmetry, the same kind of modeling technique can be used. In the traditional MPM model, the lower bounds of correct classification probability for two symmetry categories are defined to be the same [1]. It may not be effective enough when a certain class of samples needs more attention. The minimum error minimax probability machine (MEMPM) [5] learns the classifier by maximizing the lower bound of the weighted correct classification probability of two categories, which provides different approaches to different classes of samples. After setting the weight to be 1 / 2 , a biased minimax probability machine is proposed in [6] to deal with the problem of imbalanced learning. Although the balanced parameter in MEMPM can deal with the different importance of two classes to some extent, only the accuracy rate is adopted as the objective function. Luo et al. [7] developed a new minimax probability machine for the F β measure (MPMF), which provided a new perspective for the classification on imbalanced datasets. By using the lower bound of the correct classification probability of the two classes, the F β measure is defined as the objective function. To solve the MPMF model effectively, its equivalent form, which can be solved by an alternating descent method to learn a linear classifier, is derived. The kernel trick is employed to derive a nonlinear MPMF model to learn a nonlinear classifier. Although MPMF provides a good perspective for dealing with imbalanced datasets, the performance also depends on the choice of kernel functions and kernel parameters, a process that consumes a lot of time.
Some other MPM-based models have also been studied. Nath et al. proposed a maximum margin classifier [8] by considering the structural risk of the model. Maldonado et al. [9] proposed the regularized MPM and regularized MEMPM models, which minimized both the l 2 norm of the weight vector and the upper bound on the classification error rate. Drawing on the idea of twin support vector machines [10], Yang et al. [11] proposed the twin minimax probability machine model, which attempts to construct a pair of non-parallel decision hyperplanes such that each hyperplane separates one class of samples with a lower bound on the classification accuracy and is far from the other classes of samples. Although these methods show their good performance for classification, kernel functions are all needed to deal with nonlinear classification problems. This process may run into the efficiency issues of choosing kernel functions and their corresponding parameters.
For dealing with the issues of choosing a suitable kernel function and its optimal parameter, Dagher proposed a kernel-free quadratic surface support vector machine model [12] for binary classification. Luo et. al. extended it to a soft-margin quadratic surface SVM (SQSSVM) model [13]. Furthermore, more kernel-free SVM models are proposed in the literature. Bai et. al. proposed a new quadratic kernel-free least squares support vector machine for a binary classification problem [14]. Yan et. al. proposed a kernel-free semi-supervised quadratic surface support vector machine model [15]. Tian et. al. proposed a new approach based on the kernel-free quadratic surface support vector machine model to handle a binary classification problem with mislabeled information [16]. Luo et. al. proposed an unsupervised quadratic surface support vector machine and applied it to credit risk assessment [17]. Gao et. al. proposed a kernel-free soft quartic surface SVM model by utilizing the double well potential function for highly nonlinear binary classification [18]. Yan et. al. analyzed the equivalent form of the soft quadratic surface support vector machine and proposed a novel support vector machine model with feature mapping [19]. Based on the idea of a soft quadratic surface support vector machine [13], Wang et al. [20] proposed a kernel-free quadratic surface minimax probability machine (QSMPM), which attempts to find a quadratic surface to separate two classes of samples. QSMPM does not require use of the nonlinear mapping for dealing with nonlinear separable datasets, which saves a lot of time for tuning the parameters. Nevertheless, the QSMPM model still aims to maximize the lower bound of the correct classification probability of two classes, which is not suitable enough for dealing with imbalanced classification.
The existing minimax probability machines only focus on kernel-free schemes or imbalanced classification. Meanwhile, the existing optimization goal considers only one certain classification performance measure. For efficiently dealing with imbalanced classification problems, a new quadratic surface minimax probability machine model with the optimization goal that considers two measures (QSMPMFA) is proposed. The optimization goal consists of both the F 1 value of the minority class and the weight classification accuracy rate of the two classes. In this model, QSMPMFA tries to find a quadratic surface to separate the samples into two classes. Two symmetry constraints are used to define the lower bound of the probability of different classes of samples being correctly classified, i.e., worst-case classification accuracy rates. Since two worst-case classification accuracy rates are symmetry in the constraint, the proposed model can be further simplified. The original optimization problem of the QSMPMFA can then be transformed into an equivalent form, which can be solved by the alternating descent method. The advantages can be summarized as follows.
First, QSMPMFA is optimized by using the combination of the F 1 value and accuracy rate in the objective function, instead of considering only one certain classification performance measure, causing it to be better for handling imbalanced datasets.
Second, the proposed model can be further simplified by using the symmetry structure in the constraint. The alternating descent algorithm is adopted for efficiently solving.
Third, the numerical tests of QSMPMFA on several benchmark datasets show that QSMPMFA outperforms the other state-of-the-art methods in terms of both the F 1 value and accuracy rate, especially for the imbalanced datasets.
This paper is structured as follows. In Section 2, the related studies are briefly reviewed, including MPM, SQSSVM, and the formulation of the F 1 value. Section 3 provides the details of the proposed QSMPMFA model and the corresponding algorithm. Section 4 shows the numerical tests on different kinds of datasets and illustrates the comparison performance of the proposed method with other methods. Section 5 provides some conclusions and future work.

2. Related Work

This section provides a brief introduction to the minimax probability machine, soft quadratic surface support vector machine, and the performance measures.
Consider the dataset of T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x l , y l ) } , where x i R d refers to the feature vector of the i-th sample and y i { + 1 , 1 } represents this sample’s class label for i = 1 , 2 , , l .

2.1. Minimax Probability Machine (MPM)

MPM attempts to separate the two classes of data points with the goal of maximizing a lower bound on the classification accuracy rate given the mean and covariance matrix of each class of data.
Let x ( μ , Σ ) be the random variable x with mean μ and covariance matrix Σ . Suppose the positive class sample x + ( μ + , Σ + ) , the negative class sample x ( μ , Σ ) , μ + , μ R d , and Σ + , Σ S + d . Using MPM tries to find the hyperplane:
g ( x ) = w T x b = 0 ,
where w R d , b R . The optimization problem of MPM can be formulated as following:
max w , b , α α s . t . inf x + ( μ + , Σ + ) P r { w T x + b } α , inf x ( μ , Σ ) P r { w T x b } α .
In Problem (2), α ( 0 , 1 ) is the lower bound on the probability of the two classes of samples being correctly classified, which can also be explained as the worst-case classification accuracy rate. The structure of the two constraints are symmetry. By maximizing α in the objective function, the hyperplane with the best lower bound can be chosen.
Problem (2) is an optimization problem with probability constraints, which causes the problem to be difficult to deal with. The following two Lemmas are introduced for transformation.
Lemma 1. 
Suppose x is a random vector, S is a given convex set, and x ( μ , Σ ) , then the supremum on the probability of x S is equivalent to:
sup x ( μ , Σ ) P r { x S } = 1 1 + d 2 , d 2 = inf x S ( x μ ) T Σ 1 ( x μ ) .
Lemma 2. 
Given w 0 , b , μ , Σ , the following condition:
inf x ( μ , Σ ) P r { w T x b } α
holds if and only if:
b w T x k ( α ) w T Σ w ,
where k ( α ) = α 1 α .
It is obvious that k ( α ) is a monotonically increasing function of α , so the original optimization problem of MPM can be transformed into the following formulation according to the above two lemmas.
max w , b , α k ( α ) s . t . b + w T μ + k ( α ) w T Σ + w , b w T μ k ( α ) w T Σ w .
Problem (6) can be formulated as a second-order cone programming problem as the following:
min w | | Σ + 1 2 w | | 2 + | | Σ 1 2 w | | 2 s . t . w T ( μ + μ ) = 1 ,
which can be solved by the interior point method. With the optimal value of w * , the optimal value of b * can be obtained by:
b * = ( w * ) T μ + Σ 1 2 w * 2 Σ 1 2 w * 2 + Σ + 1 2 w * 2 = ( w * ) T μ + Σ + 1 2 w * 2 Σ + 1 2 w * 2 + Σ 1 2 w * 2 .
A new data point x R d can be assigned to positive class or negative class by calculating:
f ( x ) = s i g n ( w * ) T x + b * .
For more details, please refer to [21].

2.2. Soft Quadratic Surface Support Vector Machine (SQSSVM)

The SQSSVM model intends to find the optimal parameter set ( W , b , c ) of a quadratic surface:
1 2 x T W x + b T x + c = 0 ,
where W = W T R d × d , b R d , and c R , which separates the training points into two classes by maximizing the distance from each data point to the separating surface. The SQSSVM model is as follows:
min W , b , c i = 1 l W x i + b 2 2 + C i = 1 l ξ i s . t . y i ( 1 2 x i T W x i + b T x i + c ) 1 ξ i , , i = 1 , , l , ξ i 0 , i = 1 , , l ,
where C > 0 is the penalty parameter.
The matrix W = ( ω i j ) can be vectorized to
W ¯ = [ ω 11 ω 12 ω 1 d ω 22 ω 23 ω 2 d ω d d ] T R d 2 + d 2 .
Denote z = ( W ¯ T , b T ) T . Problem (10) can be reformulated as follows:
min z , c , ξ z T G ^ z + C i = 1 l ξ i s . t . y i ( s i T z + c ) 1 ξ i , i = 1 , , l , ξ i 0 , i = 1 , , l .
In Problem (12), G ^ = i = 1 n ( [ M i , I ] ) T [ M i , I ] R ( d 2 + 3 d 2 ) × ( d 2 + 3 d 2 ) , where I is the m-dimensional identity matrix. M i R d × ( d 2 + d 2 ) is constructed by using the training point x i = [ x i 1 , x i 2 , , x i d ] T R d as follows. For the j-th row of M i in R d 2 + d 2 , j = 1 , , d , check the elements of W ¯ one by one. If the p-th element of W ¯ is ω j k or ω k j for some k = 1 , 2 , , d , then assign the p-th element of the j-th row of M i to be x i k . Otherwise, assign it to be 0. Besides,
s i = [ 1 2 x i 1 x i 1 , , x i 1 x i d , 1 2 x i 2 x i 2 , , x i 2 x i d , , 1 2 x i , d 1 x i , d 1 , x i , d 1 x i d ,
1 2 x i d x i d , x i 1 , x i 2 , , x i d ] R d 2 + d 2 + d .
Problem (12) is a convex quadratic programming problem. With the optimal value of z * and c * , the parameter set of ( W * , b * , c * ) can then be obtained. A new data point x R n can be assigned to positive class or negative class by calculating
f ( x ) = s i g n 1 2 x T W * x + ( b * ) T x + c * .
For more details, please refer to [13].

2.3. F 1 Value

In this subsection, some performance measures are introduced. The true positive rate ( T P R ) and true negative rate ( T N R ) can be defined as follows.
T P R = 1 F N R = P ( y ^ = 1 y = 1 ) , T N R = 1 F P R = P ( y ^ = 1 y = 1 ) ,
where y is the true value of the label of the data, y ^ is the predicted value of the label of the data, F N R is the false negative rate, and F P R is the false positive rate. Then the precision rate can be defined as:
P r e c i s i o n = P ( y = 1 y ^ = 1 ) = p · T P R p · T P R + ( 1 p ) · F P R ,
where p denotes the proportion of positive class samples.
T P R is more concerned with the comprehensiveness for the recognition of positive class samples, while P r e c i s i o n is more concerned with the accuracy rate of the recognition of positive class samples. Since the T P R and P r e c i s i o n are two contradictory performance measures, some combinations are proposed. The F 1 value is the harmonic mean of T P R and P r e c i s i o n , i.e.:
F 1 = 2 · T P R · P r e c i s i o n P r e c i s i o n + T P R .
Since the classification of positive class samples has a great influence on the F 1 value, so the F 1 value is commonly used to evaluate the classification effect of imbalanced datasets. For more details, please refer to [7].

3. A Quadratic Surface Minimax Probability Machine with a Mixed Performance Measure

In this section, a quadratic surface minimax probability machine with a mixed performance measure (QSMPMFA) is proposed. The mixed performance measure considers both the F 1 value and the weighted accuracy rate. The F 1 measure is adopted for paying more attention on the class with less points. The weighted accuracy rate is adopted for balancing the importance between two classes. For better understanding, the proposed QSMPMFA is first introduced. After this, the alternating descend method for solving this problem is constructed.

3.1. The Proposed Model

Suppose the positive class sample x + ( μ + , Σ + ) and the negative class sample x ( μ , Σ ) , μ + , μ R d , Σ + , Σ S + d . Assume that the mean and covariance matrix of the positive and negative samples are all reliable.
For obtaining a robust classifier for imbalanced datasets, the improved minimax probability machine based on the soft quadratic surface support vector machine is proposed first as follows.
min W , b , c , α 1 , α 2 1 F 1 + C ( ( 1 p ) α 1 + p α 2 ) s . t . inf x + ( μ + , Σ + ) P r { 1 2 x + T W x + + b T x + c 0 } 1 α 1 , inf x ( μ , Σ ) P r { 1 2 x T A x + b T x c 0 } 1 α 2 ,
where α 1 denotes the classification error rate of the positive class, α 2 denotes the classification error rate of negative class, C is the penalty parameter, and p denotes the proportion of positive class points. It can be seen that α 1 is the false negative rate (FNR), while α 2 is the false positive rate (FPR) mentioned in Section 2.3. The first constraint in Problem (19) guaranteed that the infimum of the probability that the positive class sample x + is above the separating quadratic surface was not less than 1 α 1 . The second constraint in Problem (19) guaranteed that the infimum of the probability that the positive class sample x behind the separating quadratic surface was not less than 1 α 2 .
Since α 1 = F N R and α 2 = F P R , the first item in the objective function can be represented as follows:
1 F 1 = ( 1 p ) α 2 + p 2 p ( 1 α 1 ) + 1 2 .
By omitting the constant, Problem (19) can be transformed to the following problem:
min W , b , c , α 1 , α 2 ( 1 p ) α 2 + p 2 p ( 1 α 1 ) + C ( ( 1 p ) α 1 + p α 2 ) s . t . inf x + ( μ + , Σ + ) P r { 1 2 x + T W x + + b T x + c 0 } 1 α 1 , inf x ( μ , Σ ) P r { 1 2 x T A x + b T x c 0 } 1 α 2 .
Use the definition of W ¯ in formulation (11) and denote z = ( W ¯ T , b T ) T . Problem (21) can be reformulated as follows:    
min z , c , α 1 , α 2 ( 1 p ) α 2 + p 2 p ( 1 α 1 ) + C ( ( 1 p ) α 1 + p α 2 ) s . t . inf s + ( μ p , Σ p ) P r { z T s + c 0 } 1 α 1 , inf s ( μ n , Σ n ) P r { z T s c 0 } 1 α 2 ,
where
s + = [ 1 2 x + 1 x + 1 , , x + 1 x + d , 1 2 x + 2 x + 2 , , x + 2 x + d , , 1 2 x + , d 1 x + , d 1 , x + , d 1 x + d , 1 2 x + d x + d , x + 1 , x + 2 , , x + d ] R d 2 + d 2 + d ,
and
s = [ 1 2 x 1 x 1 , , x 1 x d , 1 2 x 2 x 2 , , x 2 x d , , 1 2 x , d 1 x , d 1 , x , d 1 x d , 1 2 x d x d , x 1 , x 2 , , x d ] R d 2 + d 2 + d .
μ p represents the mean of the dataset of s + , Σ p represents the covariance matrix of the dataset of s + , μ n represents the mean of the dataset of s , and Σ n represents the covariance matrix of the dataset of s . The separation obtained in Problem (22) can be interpreted as follows. Each data point x R d in the original dataset of T is first mapped into a new space, i.e., s R d 2 + d 2 + d . Next, a linear classifier is adopted for the separation.
Since the mean and covariance matrix are often unknown, μ ^ and Σ ^ are used to estimate of the true value, i.e.:
μ ^ p ( n ) = 1 N + ( ) i = 1 N + ( ) s i R d 2 + 3 d 2 , Σ ^ p ( n ) = 1 N + ( ) i = 1 N + ( ) ( s i μ ^ p ( n ) ) ( s i μ ^ p ( n ) ) T R d 2 + 3 d 2 ,
where N + represents the number of positive points in the dataset and N represents the number of negative points in the data set.
By using Lemma 2, the probability inequalities in Problem (22) can be equivalently replaced by two inequalities. Then Problem (22) can be formulated as follows:
min z , c , α 1 , α 2 ( 1 p ) α 2 + p 2 p ( 1 α 1 ) + C ( ( 1 p ) α 1 + p α 2 ) s . t . z T μ p c π ( α 1 ) z T Σ p z , c z T μ n π ( α 2 ) z T Σ n z ,
where π ( α 1 ) = 1 α 1 α 1 , π ( α 2 ) = 1 α 2 α 2 .
Since the objective function is an increasing function with respect to α 1 or α 2 , Problem (26) can be further formulated as the following problem:
min z , c , α 1 , α 2 ( 1 p ) α 2 + p 2 p ( 1 α 1 ) + C ( ( 1 p ) α 1 + p α 2 ) s . t . z T μ p c = π ( α 1 ) z T Σ p z , c w T μ n = π ( α 2 ) z T Σ n z .
The equivalence of Problem (26) and Problem (27) can be derived similarly to Lemma 4 in [7]. By summarizing the two constraints, the variable c can be omitted, i.e.:    
min z , α 1 , α 2 ( 1 p ) α 2 + p 2 p ( 1 α 1 ) + C ( ( 1 p ) α 1 + p α 2 ) s . t . π ( α 1 ) z T Σ p z + π ( α 2 ) z T Σ n z = z T ( μ p μ n ) .
Furthermore, as shown in (28), α 1 and α 2 play the symmetry role in the equality constraint. Then, α 2 can be represented by α 1 and z as follows.
α 2 = z T Σ n z z T Σ n z + ( z T ( μ p μ n ) π ( α 1 ) z T Σ p z ) 2 .
Finally, by using this formulation, Problem (28) can be transformed to the following problem with respect to z and α 1 .
min z , α 1 M ( z , α 1 ) = ( 1 p 2 p ( 1 α 1 ) + C p ) z T Σ n z z T Σ n z + ( z T ( μ p μ n ) π ( α 1 ) z T Σ p z ) 2 + 1 2 ( 1 α 1 ) + C ( 1 p ) α 1 s . t . z T ( μ p μ n ) π ( α 1 ) z T Σ p z 0 , z = 1 .
The constraint z = 1 is added to ensure the uniqueness of the solution. Problem (30) is a nonconvex optimization problem with a fractional objective function.

3.2. Algorithm for QSMPMFA

For efficiently solving this problem, an alternating descent algorithm [7] is adopted. The details of the algorithm are described as follows.
In the t-th iteration, suppose that the value of z t satisfies the constraint of z t = 1 , then α 1 t can be updated by solving the following problem:
α 1 t = a r g m i n α 1 ( 1 p 2 p ( 1 α 1 ) + C p ) z T Σ n z z T Σ n z + ( z T ( μ p μ n ) π ( α 1 ) z T Σ p z ) 2 + 1 2 ( 1 α 1 ) + C ( 1 p ) α 1 s . t . α 1 z t T Σ p z t / ( z t T Σ p z t + ( μ p μ n ) 2 ) .
Since this problem is a univariate optimization problem, it can be solved by the line search method.
Now, z is updated by solving Problem (30) with α 1 = α 1 t , i.e.,:
max z λ t ( z ) = z T ( μ p μ n ) π ( α 1 t ) z T Σ p z z T Σ n z s . t . z = 1 .
To find a z t + 1 that satisfies λ t ( z t + 1 ) > λ t ( z t ) , define the following function:
f t ( z ) = z T ( μ p μ n ) π ( α 1 t ) z T Σ p z λ t ( z t ) z T Σ n z .
The f t ( z t ) = 0 . A z t + 1 that satisfies f t ( z t + 1 ) > 0 should be found. The following nonlinear programming problem can be solved for obtaining the inexact solution of Problem (32).
max z f t ( z ) s . t . z = 1 .
The gradient ascent method is used to solve Problem (34). Let v 1 = z t , step γ k = 1 / k , k = 1 , 2 , , and, at the k-th step, v k is updated according to the following equation:
u k = v k + γ k ( ( μ p μ n ) π ( α 1 t ) Σ p v k v k T Σ p v k λ t ( z t ) Σ n v k v k T Σ n v k ) , v k + 1 = u k u k .
Set the stopping condition of the above iterative process as:
f t ( v k + 1 ) 0 , f t ( v k ) 0 , f t ( v k + 1 ) f t ( v k ) f t ( v k + 1 ) ϵ .
The ϵ in the above equation is the stopping error. The v * that satisfied conditions (36) can be set to the updated z t + 1 .
After t 0 rounds of the alternating descent algorithm, the value of z t 0 and b t 0 can be obtained. Correspondingly, c * in the quadratic surface parameters are represented as:
c * = z t 0 T μ p π ( α 1 T ) z t 0 T Σ p z t 0 .
Furthermore, the parameters W * and b * can be determined. For a new sample x , the label can be further determined according to the following decision function:
f ( x ) = s i g n ( 1 2 x T W * x + ( b * ) T x + c * ) ,
where s i g n ( · ) is a sign function.
The algorithm has been summarized in Algorithm 1. The convergence of this algorithm can be illustrated by using the analyze in [7].
Algorithm 1: Training process of QSMPMFA
Input: Training set T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x m 1 + m 2 , y m 1 + m 2 ) } , C, t = 0 .
Output: W * , b * , c * .
1. Transform the training set T to T ˜ = { ( s 1 , y 1 ) , ( s 2 , y 2 ) , , ( s m + + m , y m + + m ) } by using the formulations (23) and (24).
2. Calculate the mean and covariance matrices ( μ ^ p , Σ ^ p ) , ( μ ^ n , Σ ^ n ) according to formula (25).
3. Denote z t = ( μ ^ p μ ^ n ) / μ ^ p μ ^ n .
4. for t =1, 2, ⋯ do
5.    Calculate α 1 t by solving Problem (31).
6.    Calculate the objective function of Problem (30), i.e., M ( z t , α 1 t ) .
7.    Let v k = z t , k = 1 .
8.    While the condition (36) is not met
9.         update v k + 1 by using formula (35).
10.    End
11.    Let z t + 1 = v k + 1 , calculate the objective function of Problem (30), i.e., M ( z t + 1 , α 1 t ) .
12.    If  ( M ( z t , α 1 t ) M ( z t + 1 , α 1 t ) ) / M ( z t , α 1 t ) ϵ
13.       terminate
14.    End
15. End
16. Calculate W * , b * , c * according to Equations (11) and (37).

4. Numerical Tests

In this section, the classification performance of the QSMPMFA model is investigated. For comparison, a minimax probability machine with a linear kernel (MPM-l), polynomial kernel (MPM-p), and Gaussian kernel (MPM-g) [1]; a minimax probability machine for F measure with a linear kernel (MPMF-l), polynomial kernel (MPMF-p), and Gaussian kernel (MPMF-g) [7]; and a quadratic surface minimax probability machine (QSMPM) [22] are tested.
All the experiments are implemented in MATLAB R2018a on a PC with 2.1 GHz CPU and 16 GB RAM. Both the F 1 value of the minority class samples and accuracy rate are considered for measuring the classification performance. The F 1 value is defined in Section 2.3. The accuracy rate is defined as the ratio of correctly classified numbers to the total number. The penalty parameter C is chosen in the range of { 2 10 , 2 5 , 2 0 , 2 5 } . The kernel function parameter of the kernel-based methods is chosen in the range of { 2 5 , 2 6 , , 2 8 } . They are all trained by using a 5-fold cross-validation [23]. All the methods are tested ten times by randomly selecting points in two classes to form training sets. For each dataset, a random selection of 70% of the samples in the dataset is used for training and the remaining 30% are used for testing.
In order to illustrate the classification performance of the proposed QSMPMFA, comparative experiments on 12 benchmark datasets extracted from the UCI Machine Learning Repository [24] are conducted. The information of all the datasets is shown in Table 1. In this table, the number of positive samples, negative samples, total samples, and features are listed. The last column in Table 1 records the imbalanced ratio, which is calculated by the ratio of the negative data numbers and positive data numbers. The larger the imbalanced ratio, the higher the imbalanced degree.
Table 2 shows the F 1 value of each method on all the datasets. The bold ones are the best results among all the methods. “-” indicates that the memory exceeds the limit in the training process. From Table 2, it can be seen that QSMPMFA outperforms the comparison methods for classifying most datasets, especially on datasets with higher imbalance ratios such as Segment, Dry Bean, Skin, and so on. For the other datasets that QSMPMFA performs not optimally on, it can be seen that the best results often occur in the methods with Gaussian kernel functions. The reason may be that the Gaussian kernel is more proper for dealing with these datasets. For showing the results more intuitively, the boxplot of the F 1 value of the positive class of each method for classifying all the datasets is illustrated in Figure 1.
For comparison, the classification accuracy rate of each method is also listed in Table 3. Except for the WDBC and Segment datasets, the performance of QSMPMFA is similar to that in Table 2. For the WDBC dataset, QSMPMFA performs almost the same as the best method. The reasons for explaining these results may be that the F 1 value is adopted in the cross-validation of choosing parameters. It can be seen that the accuracy rates are often larger than the F 1 value on most of the datasets. The boxplot of the accuracy rates of all the methods on each dataset are illustrated in Figure 2.
From Table 2 and Table 3, it can be seen that QSMPMFA performs better than QSMPM for classifying each dataset except for the Balance dataset. QSMPM only aims to maximize the lower bound of the correct classification probability of two classes. Since both QSMPM and QSMPMFA are kernel-free methods, the better performance of the proposed method demonstrates the effectiveness of setting both the F 1 value and accuracy rate as objective functions. MPMF with different kernel functions adopts F β as the optimization goal, which focuses on the classification of imbalanced datasets. When compared with MPMF, the better performance also illustrates the effectiveness of using two measures and the kernel-free setting.
For illustrating the performance of the proposed method on imbalanced datasets, more tests on the wholesale customers dataset are conducted. The wholesale customer dataset refers to the information of clients of a wholesale distributor. It includes the annual spending in monetary units on diverse product categories. Some positive samples are chosen with a different ratio of 1, 0.9, 0.6, and 0.2 randomly. Next, the chosen samples with different ratios and all the negative samples are combined to construct four datasets. A similar training process is utilized for testing these datasets.
The comparison results of MPM-l, MPM-g, MPM-p, MPMF-l, MPMF-g, MPM-p, QSMPM, and QSMPMFA on the four datasets are illustrated in Figure 3 and Figure 4. It can be seen that the proposed QSMPMFA performs the best for classifying all the datasets. As shown in Figure 3, the F 1 values of each method on each dataset show a decreasing trend overall when the ratio of positive points decreases (i.e., the imbalance degree increases). The variation of the F 1 values of the proposed QSMPMFA is relatively stable. The other methods, especially MPM-g, MPM-p, and QSMPM, decrease more significantly when the ratio is small. From Figure 4, it can be seen that the accuracy rate of each method on each dataset increase overall when the ratio of positive points decreases (i.e., the imbalance degree increases). The performance of QSMPMFA is also relatively stable. Although the accuracy rate is not a proper measure for analyzing the imbalanced dataset, it still demonstrates the better performance of the proposed method. When compared with QSMPM, the relatively large growth rate shows the better performance of the proposed method, which further demonstrates the superior performance for classifying imbalanced datasets.

5. Conclusions

In this paper, a new quadratic surface minimax probability machine model considering both the F 1 value and accuracy rate as the optimization goal is proposed. On the one hand, the proposed QSMPMFA combines the kernel-free technique with the MPM model, which causes it to be more efficient to deal with large-scale datasets. On the other hand, by adopting two performance measures in the objective function of the proposed model, QSMPMFA is more flexible for dealing with the imbalanced dataset. Some public benchmark datasets are adopted for validating the effectiveness of the proposed method. From the numerical tests, it can be seen that QSMPMFA performs the best when compared with other state-of-the-art MPM methods. It can also be concluded that the proposed QSMPMFA enjoys the superiority on dealing with imbalanced datasets.
In the proposed model, both the F 1 value of the minority class and the weighted classification accuracy rate of the two classes are considered as the optimization goal. In our settings, the parameter for balancing the importance between the two items in the objective function is tuned by using cross-validation. The parameter in the weighted classification accuracy rate is fixed by using the ratio of the minority class. Actually, the choice of these parameters affects the classification performance and efficiency of the proposed model. In the future, more research on how to choose proper parameters should be performed. More applications of this method can be explored in other areas [25]. Besides, methods with prior information will be considered.

Author Contributions

Conceptualization, X.Y.; methodology, Z.X.; validation, X.Y.; writing—original draft preparation, Z.X.; writing—review and editing, X.Y., Z.M.; supervision, X.Y.; funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (No. 71901140) and the Humanities and Social Science Fund of Ministry of Education of China (No. 18YJC630220).

Data Availability Statement

All of the benchmark datasets used in our numerical experiments are from the UCI Machine Learning Repository, which are available at http://archive.ics.uci.edu/ml, accessed on 1 January 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lanckriet, G.R.G.; El Ghaoui, L.; Bhattacharyya, C.; Jordan, M.I. Minimax probability machine. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic; MIT Press: Cambridge, MA, USA, 2001; pp. 801–807. [Google Scholar]
  2. Huang, K.; Yang, H.; King, I.; Lyu, M.R.; Chan, L. Biased minimax probability machine for medical diagnosis. In Proceedings of the 8th International Symposium on Artificial Intelligence & Mathematics, Fort Lauderdale, FL, USA, 4–6 January 2004. [Google Scholar]
  3. Jiang, B.; Guo, Z.; Zhu, Q.; Huang, G. Dynamic minimax probability machine-based approach for fault diagnosis using pairwise discriminate analysis. IEEE Trans. Control. Syst. Technol. 2017, 27, 806–813. [Google Scholar] [CrossRef]
  4. Maldonado, S.; López, J.; Vairetti, C. Profit-based churn prediction based on minimax probability machines. Eur. J. Oper. Res. 2020, 284, 273–284. [Google Scholar] [CrossRef]
  5. Huang, K.; Yang, H.; King, I.; Lyu, M.R.; Chan, L. The minimum error minimax probability machine. J. Mach. Learn. Res. 2004, 5, 1253–1286. [Google Scholar]
  6. Huang, K.; Yang, H.; King, I.; Lyu, M.R. Imbalanced learning with a biased minimax probability machine. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2006, 36, 913–923. [Google Scholar] [CrossRef] [PubMed]
  7. Luo, J.; Qiao, H.; Zhang, B. A minimax probability machine for nondecomposable performance measures. IEEE Trans. Neural Netw. Learn. Syst. 2021. [Google Scholar] [CrossRef] [PubMed]
  8. Nath, J.S.; Bhattacharyya, C. Maximum margin classifiers with specified false positive and false negative error rates. In Proceedings of the 2007 SIAM International Conference on Data Mining, Minneapolis, MN, USA, 26–28 April 2007; pp. 35–46. [Google Scholar]
  9. Maldonado, S.; Carrasco, M.; López, J. Regularized minimax probability machine. Knowl.-Based Syst. 2019, 177, 127–135. [Google Scholar] [CrossRef]
  10. Jayadeva; Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905–910. [Google Scholar]
  11. Yang, L.; Wen, Y.; Zhang, M.; Wang, X. Twin minimax probability machine for pattern classification. Neural Netw. 2020, 131, 201–214. [Google Scholar] [CrossRef] [PubMed]
  12. Dagher, I. Quadratic kernel-free non-linear support vector machine. J. Glob. Optim. 2008, 4, 15–30. [Google Scholar] [CrossRef]
  13. Luo, J.; Fang, S.C.; Deng, Z.; Guo, X. Soft quadratic surface support vector machine for binary classification. Asia-Pac. J. Oper. Res. 2016, 33, 1650046. [Google Scholar] [CrossRef]
  14. Bai, Y.; Han, X.; Chen, T.; Yu, H. Quadratic kernel-free least squares support vector machine for target diseases classification. J. Comb. Optim. 2015, 30, 850–870. [Google Scholar] [CrossRef]
  15. Yan, X.; Bai, Y.; Fang, S.C.; Luo, J. A kernel-free quadratic surface support vector machine for semi-supervised learning. J. Oper. Res. Soc. 2016, 67, 1001–1011. [Google Scholar] [CrossRef]
  16. Tian, Y.; Sun, M.; Deng, Z.; Luo, J.; Li, Y. A new fuzzy set and nonkernel SVM approach for mislabeled binary classification with applications. IEEE Trans. Fuzzy Syst. 2017, 25, 1536–1545. [Google Scholar] [CrossRef]
  17. Luo, J.; Yan, X.; Tian, Y. Unsupervised quadratic surface support vector machine with application to credit risk assessment. Eur. J. Oper. Res. 2020, 280, 1008–1017. [Google Scholar] [CrossRef]
  18. Gao, Z.; Fang, S.C.; Luo, J.; Medhin, N. A kernel-free double well potential support vector machine with applications. Eur. J. Oper. Res. 2021, 290, 248–262. [Google Scholar] [CrossRef]
  19. Yan, X.; Zhu, H. A novel robust support vector machine classifier with feature mapping. Knowl.-Based Syst. 2022, 257, 109928. [Google Scholar] [CrossRef]
  20. Wang, Y.; Yang, Z.; Yang, X. Kernel-free quadratic surface minimax probability machine for a binary classification problem. Symmetry 2021, 13, 1378. [Google Scholar] [CrossRef]
  21. Lanckriet, G.R.; Ghaoui, L.E.; Bhattacharyya, C.; Jordan, M.I. A robust minimax approach to classification. J. Mach. Learn. Res. 2002, 3, 555–582. [Google Scholar]
  22. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  23. Zhou, Z. Machine Learning; Tsinghua University Press: Beijing, China, 2016. [Google Scholar]
  24. Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 1 January 2022).
  25. Chang, J.; He, L.; Chen, L.; Shen, Z.; Chuah, L.F.; Bokhari, A.; Klemeš, J.J.; Han, N. Numerical simulation of liquid jet atomization in subsonic crossflow. Energy 2022, 257, 124676. [Google Scholar] [CrossRef]
Figure 1. Box plot of the F 1 value of each method on all the datasets.
Figure 1. Box plot of the F 1 value of each method on all the datasets.
Symmetry 15 00230 g001
Figure 2. Box plot of the accuracy rate of each method on all the datasets.
Figure 2. Box plot of the accuracy rate of each method on all the datasets.
Symmetry 15 00230 g002
Figure 3. Imbalanced tests for the wholesale customers dataset ( F 1 value).
Figure 3. Imbalanced tests for the wholesale customers dataset ( F 1 value).
Symmetry 15 00230 g003
Figure 4. Imbalanced tests for the wholesale customers dataset (accuracy rate).
Figure 4. Imbalanced tests for the wholesale customers dataset (accuracy rate).
Symmetry 15 00230 g004
Table 1. Information of the benchmark datasets.
Table 1. Information of the benchmark datasets.
Data SetPositive #Negative #Total #Feature #Imbalanced Ratio
Iris505010041
Seeds707014071
Glass5116321493.20
Wholesale customers14229844072.10
WDBC212357569301.68
Balance28828857641
Breast cancer24145869991.90
Banknote610762137241.25
Segment33019802310196
Rice16302180381071.34
Dry Bean13221228913,611169.30
Skin50,859194,198245,05733.82
Table 2. F 1 value for each method on each data set.
Table 2. F 1 value for each method on each data set.
Data SetMPM-lMPM-gMPM-pMPMF-lMPMF-gPMPF-pQSMPMQSMPMFA
Iris1.0000± 0.00001.0000 ± 0.00001.0000 ± 0.00001.0000 ± 0.00000.9968 ± 0.01021.0000 ± 0.00001.0000 ± 0.00001.0000 ± 0.0000
Seeds0.9530 ± 0.03090.9907 ± 0.01200.9887 ± 0.01300.9839 ± 0.01880.9643 ± 0.04540.9497 ± 0.02910.9699 ± 0.2950.9862 ± 0.0160
Glass0.7728 ± 0.07020.8456 ± 0.04720.8722 ± 0.05580.8158 ± 0.04520.8795 ± 0.04780.8604 ± 0.05040.8259 ± 0.04630.8534 ± 0.0564
Wholesale customers0.8158 ± 0.04130.8452 ± 0.04210.8425 ± 0.01640.8017 ± 0.12770.8451 ± 0.04300.7256 ± 0.13610.8608 ± 0.03270.8660 ± 0.0379
WDBC0.9357 ± 0.01720.9495 ± 0.01920.9524 ± 0.01710.9495 ± 0.01950.9548 ± 0.01940.9519 ± 0.01080.9598 ± 0.01620.9603 ± 0.0116
Balance0.9448 ± 0.01470.9901 ± 0.00870.9849 ± 0.01260.9605 ± 0.01050.9885 ± 0.00660.9596 ± 0.00990.9878 ± 0.00800.9818 ± 0.0105
Breast cancer0.9427 ± 0.01580.9558 ± 0.01560.9503 ± 0.02100.9500 ± 0.02000.9556 ± 0.01420.9471 ± 0.02940.9527 ± 0.01580.9567 ± 0.0117
Banknote0.8619 ± 0.02530.9895 ± 0.00420.9820 ± 0.00500.9850 ± 0.00600.9908 ± 0.00710.9823 ± 0.00690.9773 ± 0.00700.9957 ± 0.0057
Segment0.5791 ± 0.01640.6265 ± 0.07140.7106 ± 0.05210.9660 ± 0.01200.9771 ± 0.03120.9110 ± 0.03590.9302 ± 0.02090.9814 ± 0.0153
Rice0.9047 ± 0.00940.9142 ± 0.02370.9073 ± 0.03130.9098 ± 0.00740.9063 ± 0.01170.9131 ± 0.00780.9135 ± 0.00740.9142 ± 0.0080
Dry Bean0.5263 ± 0.01210.8361 ± 0.04360.8117 ± 0.03930.8790 ± 0.01090.8639 ± 0.00780.8617 ± 0.00820.8353 ± 0.01350.8867 ± 0.0097
Skin0.8098 ± 0.0017--0.8793 ± 0.0013--0.8562 ± 0.00120.8824 ± 0.0178
Average rank7.083.834.794.793.965.464.082.00
Table 3. Accuracy rate for each method on each dataset.
Table 3. Accuracy rate for each method on each dataset.
Data SetMPM-lMPM-gMPM-pMPMF-lMPMF-gPMPF-pQSMPMQSMPMFA
Iris1.0000 ± 0.00001.0000 ± 0.00001.0000 ± 0.00001.0000 ± 0.00000.9967 ± 0.01051.0000 ± 0.00001.0000 ± 0.00001.0000 ± 0.0000
Seeds0.9524 ± 0.03170.9905 ± 0.01230.9893 ± 0.01440.9833 ± 0.01960.9619 ± 0.05040.9476 ± 0.03100.9690 ± 0.02980.9857 ± 0.0166
Glass0.8938 ± 0.03280.9281 ± 0.02230.9406 ± 0.02800.9125 ± 0.02350.9406 ± 0.02420.9375 ± 0.01560.9141 ± 0.02470.9328 ± 0.0256
Wholesale customers0.8864 ± 0.02500.8992 ± 0.03050.8909 ± 0.01480.8341 ± 0.15720.8962 ± 0.02970.7561 ± 0.16860.9076 ± 0.02200.9099 ± 0.0221
WDBC0.9520 ± 0.01320.9626 ± 0.01440.9709 ± 0.01370.9632 ± 0.01380.9661 ± 0.01480.9637 ± 0.00870.9696 ± 0.01260.9708 ± 0.0087
Balance0.9453 ± 0.01430.9901 ± 0.00870.9849 ± 0.01270.9593 ± 0.01100.9884 ± 0.00670.9581 ± 0.01040.9878 ± 0.00800.9814 ± 0.0109
Breast cancer0.9603 ± 0.01080.9689 ± 0.01110.9656 ± 0.01410.9646 ± 0.01380.9689 ± 0.00990.9636 ± 0.01990.9670 ± 0.01090.9694 ± 0.0082
Banknote0.8789 ± 0.02140.9914 ± 0.00210.9872 ± 0.00420.9864 ± 0.00550.9917 ± 0.00640.9840 ± 0.00630.9794 ± 0.00650.9961 ± 0.0051
Segment0.7924 ± 0.01440.8658 ± 0.03470.8831 ± 0.03050.9902 ± 0.00360.9953 ± 0.00950.9729 ± 0.01160.9788 ± 0.00660.9947 ± 0.0044
Rice0.9182 ± 0.00820.9248 ± 0.01270.9094 ± 0.02160.9190 ± 0.00680.9158 ± 0.01130.9227 ± 0.00680.9241 ± 0.00670.9253 ± 0.0071
Dry Bean0.8412 ± 0.00650.9490 ± 0.02080.9571 ± 0.02020.9761 ± 0.00220.9771 ± 0.00320.9705 ± 0.00470.9634 ± 0.00340.9775 ± 0.0022
Skin0.9096 ± 0.0008--0.9430 ± 0.0007--0.9342 ± 0.00060.9448 ± 0.0089
Average rank6.924.084.464.833.835.634.082.17
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, X.; Xiao, Z.; Ma, Z. A Quadratic Surface Minimax Probability Machine for Imbalanced Classification. Symmetry 2023, 15, 230. https://doi.org/10.3390/sym15010230

AMA Style

Yan X, Xiao Z, Ma Z. A Quadratic Surface Minimax Probability Machine for Imbalanced Classification. Symmetry. 2023; 15(1):230. https://doi.org/10.3390/sym15010230

Chicago/Turabian Style

Yan, Xin, Zhouping Xiao, and Zheng Ma. 2023. "A Quadratic Surface Minimax Probability Machine for Imbalanced Classification" Symmetry 15, no. 1: 230. https://doi.org/10.3390/sym15010230

APA Style

Yan, X., Xiao, Z., & Ma, Z. (2023). A Quadratic Surface Minimax Probability Machine for Imbalanced Classification. Symmetry, 15(1), 230. https://doi.org/10.3390/sym15010230

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop