Next Article in Journal
Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences
Next Article in Special Issue
Sharper Sub-Weibull Concentrations
Previous Article in Journal
A Comprehensive Approach for an Approximative Integration of Nonlinear-Bivariate Functions in Mixed-Integer Linear Programming Models
Previous Article in Special Issue
Heterogeneous Overdispersed Count Data Regressions via Double-Penalized Estimations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Group Logistic Regression Models with lp,q Regularization

College of Science, Minzu University of China, 27 Zhongguancun South Street, Haidian District, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(13), 2227; https://doi.org/10.3390/math10132227
Submission received: 24 April 2022 / Revised: 23 June 2022 / Accepted: 23 June 2022 / Published: 25 June 2022
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)

Abstract

:
In this paper, we proposed a logistic regression model with l p , q regularization that could give a group sparse solution. The model could be applied to variable-selection problems with sparse group structures. In the context of big data, the solutions for practical problems are often group sparse, so it is necessary to study this kind of model. We defined the model from three perspectives: theoretical, algorithmic and numeric. From the theoretical perspective, by introducing the notion of the group restricted eigenvalue condition, we gave the oracle inequality, which was an important property for the variable-selection problems. The global recovery bound was also established for the logistic regression model with l p , q regularization. From the algorithmic perspective, we applied the well-known alternating direction method of multipliers (ADMM) algorithm to solve the model. The subproblems for the ADMM algorithm were solved effectively. From the numerical perspective, we performed experiments for simulated data and real data in the factor stock selection. We employed the ADMM algorithm that we presented in the paper to solve the model. The numerical results were also presented. We found that the model was effective in terms of variable selection and prediction.

1. Introduction

For regression models, the categorical variables are important for applications and the explanatory variables are always thought of as grouped. Considering the interpretability and the accuracy of the models, the information regarding the group should be considered for the modeling, especially for the high-dimensional settings where sparsity and variable selection can play a very important role in estimation accuracy. Generally speaking, the regression model with the penalized regularizations gives a good result for variable-selection problems; we found lots of research in the literature for this kind of problem [1,2,3,4,5,6,7,8,9,10,11,12]. When the explanatory variables have a group structure, penalized regularization also plays an important role, such as with group least absolute shrinkage and selection operators (LASSOs) [13], the group smoothly clipped absolute deviation (SCAD) penalty [14] and the group minimax concave penalty (MCP) [15] models.
As we all know, the l p ( 0 < p < 1 ) norm is seen as a good approximation of the l 0 norm, and it can recover a more sparse solution than the l 1 norm [16]. For the group sparsity of variables with a group structure, the l p , q norm plays an important role in the sparse aspect. The l p , q norm with a group structure is described as follows:
β p , q : = i = 1 r β G i p q 1 q ,
where β : = ( β G 1 T , , β G r T ) T , and { β G i R n i : i = 1 , , r } is the grouping of the variable β . Here i = 1 r n i = d + 1 , and G i denotes the index set corresponding to the i-th group. We denote S , N to be the index sets, and G S is to denote the index set { G i : i S } .
For the linear regression problem with l p , q regularization, the oracle inequality and the global recovery bound were established in the paper [17]. The goodness of l p , q regularization was very obvious. For the logistic regression model, we employed penalized regularization in the variable-selection problems.
We assume that β = ( β 0 , β 1 , , β d ) T R d + 1 are the coefficients of the explanatory variables. The matrix X denotes the explanatory variables, and it is given as follows:
X = 1 X 11 X 1 d 1 X 21 X 2 d 1 X n 1 X n d R n × ( d + 1 ) .
X i denotes the i th row of the matrix X, y = ( y 1 , , y n ) T are the categorical variables and y i { 0 , 1 } .
In this paper, we considered the logistic regression model with the l p , q norm, described as follows:
min β f ( β ) + λ β p , q q ,
where f ( β ) : = 1 n y T X β + 1 n i = 1 n ln ( 1 + e x p ( X i β ) ) is the loss function and · p , q is defined by formulation (1). Moreover, we know f ( β ) > 0 from the properties of the logistic regression model, and β p , q q = i = 1 r β G i p q , p 1 , 0 < q < 1 and λ is the penalized parameter.
The group LASSO for the logistic regression model [18] is able to perform variable selection on groups of variables, and the model has the following form:
min β f ( β ) + λ i = 1 r s ( d f G i ) β G i 2 ,
where λ 0 controls the amount of penalization, and s ( d f G i ) is used to rescale the penalty with respect to the dimensionality of the parameter vector β G i .
Moreover, a quite general composite absolute penalty for the group sparsity problem is considered in [19], and this model includes the group LASSO as a special case. The group LASSO is an important extension of regularization, and it proposes an l 2 regularization for each group and, ultimately, gives the sparsity in a group manner. This property can be found in the numerical experiments.
We found that models (2) and (3) were logistic regression models with different penalized regularizations. They aimed to give a solution with group sparsity. The logistic regression model with l p , q regularization is different with the LASSO logistic regression, and it can give a more sparse solution within a group or between groups by adjusting the values of p and q. This could be found in the numerical experiments.
To illustrate the goodness of model (2), we also introduced the logistic regression problem with the elastic net penalty that could give a sparse solution of the problem, and it is described as follows:
min β f ( β ) + λ 1 β 1 + λ 2 β 2 2 ,
where λ 1 > 0 and λ 2 > 0 are the penalized parameters. Model (4) was not good at group sparsity, and we showed this in the numerical parts.
This paper is organized as follows. In Section 2, we introduce the inequalities of l p , q regularization, the properties of the loss function for the logistic regression model, the ( p , q ) -group restricted eigenvalue condition relative to ( S , N ) ( ( p , q ) -GREC(S,N)) and establish the oracle inequality and the global recovery bound for model (2). In Section 3, we apply the ADMM to solve model (2), and we found that the subproblems of the algorithm could be solved efficiently. In Section 4, we employ two numerical experiments that are always used for variable-selection problems to show the goodness of model (2) and the ADMM algorithm. The results for the LASSO logistic regression model (3) and the logistic regression model with the elastic net penalty (4) are compared with those of model (2), and we determine the advantages of model (2). We also give the results for model (2), which produced the real data in Section 5, and the results showed the effectiveness of the model and the algorithm that we gave in the paper. The last section draws conclusions and presents future work.
We introduce some notations that we will use in the following analysis. Let S : = { i { 1 , , r } : β G i 0 } be the index set of nonzero groups of β , S c : = { 1 , , r } S be the complement of S , S : = | S | be the group sparsity of β . For a variable β R d + 1 and S { 1 , , d + 1 } , we employ β S to denote the subvector of β corresponding to S . For a group β G i , we employ β G i = 0 to describe a zero group, where β G i = 0 means that β j = 0 for all j G i . We give S N r , β R d + 1 and J { 1 , , r } , and we use rank i ( β ) to denote the rank of β G i p among { β G j p : j J c } (in a decreasing order). We employ J ( β ; N ) to denote the index set of the first N largest groups in the value of β G i p among { β G j p : j J c } , which means
J ( β ; N ) : = { i J c : rank i ( β ) { 1 , , N } } .
Moreover, we let R : = r | J | N , and we denote
J k ( x ; N ) : = { i J c : r a n k i ( β ) { k N + 1 , , ( k + 1 ) N } } , k = 1 , , R 1 , { i J c : r a n k i ( β ) { R N + 1 , , r | J | } } , k = R .

2. Theoretical Analysis

In this section, we analyze the oracle property and the global recovery bound of the penalized regression model (2). Firstly, we introduce the following inequalities of the l p , q norm and the properties of the loss function f ( β ) .
Lemma 1
([17], p. 8). Let 0 < q p 2 , β R d + 1 and K be the smallest integer such that 2 K 1 q 1 . Then the following relation holds:
β p , q q r 1 2 K β p , 2 q .
Lemma 2
([17], p. 9). Let 0 < q 1 p and β 1 , β 2 R d + 1 . Then we have
β 1 p , q q β 2 p , q q β 1 β 2 p , q q .
Lemma 3
([17], p. 13). Let 0 < q 1 p , τ 1 and β R d + 1 , N : = J ( β ; N ) J and J k : = J k ( β ; N ) for k = 1 , , R . Then the following inequalities hold:
β G N c p , τ k = 1 R β G J k p , τ N 1 τ 1 q β G J c p , q .
Moreover, the following properties are about the Lipschitz continuity and the convexity of the loss function f ( β ) .
Proposition 1.
For β 1 , β 2 R d + 1 , we have
| f ( β 1 ) f ( β 2 ) | 2 n X 2 β 1 β 2 2 .
Proof. 
For β 1 , β 2 , β ˜ R d + 1 , based on the differential mean value theorem and the properties of the norms we can get
| f ( β 1 ) f ( β 2 ) | = | 1 n y T X ( β 1 β 2 ) + 1 n i = 1 n ( l n ( 1 + e x p ( X i β 1 ) ) l n ( 1 + e x p ( X i β 2 ) ) ) | = | 1 n y T X ( β 1 β 2 ) + 1 n i = 1 n e x p ( X i β ˜ ) X i T 1 + e x p ( X i β ˜ ) ( β 1 β 2 ) | | 1 n y T X ( β 1 β 2 ) | + | 1 n i = 1 n X i T ( β 1 β 2 ) | ( n + y 2 ) X 2 β 1 β 2 2 2 n X 2 β 1 β 2 2 .
Hence, we obtain our desirable result.    □
Proposition 2.
For β R d + 1 , the function f ( β ) is convex.
Proof. 
From the definition of the function f ( β ) , we know the Hessian matrix of the function f ( β ) is
H f ( β ) = 1 n i = 1 n X i T X i e x p ( X i β ) ( 1 + e x p ( X i β ) ) 2 .
Here, we find for i = 1 , , n the matrix X i T X i 0 and e x p ( X i β ) ( 1 + e x p ( X i β ) ) 2 > 0 . Thus, the Hessian matrix H f ( β ) is a positive semi-definite matrix. Hence, the function f ( β ) is convex.    □
The above lemmas and properties state the inequalities for l p , q regularization, and they will help the proof of the oracle inequality and the global recovery bound. The oracle inequalities for predicition error were discussed in [20,21], and they were derived without restricted eigenvalue conditions for LASSO-type estimators or sparsity. Morever, for the group LASSO problems, the oracle inequalities were discussed in [22,23,24] under the restricted eigenvalue assumption. For linear regression with l p , q regularizaiton, the oracle inequality was established in [17] with the help of the ( p , q ) -GREC(S,N).
Moreover, the ( p , q ) -GREC(S,N) was very important for the analysis of the oracle property and the global recovery bound of the l p , q norm. We define γ to be the smallest non-zero eigenvalue of the Hessian matrix of the function f ( β ) . We introduce it in the following definition.
Definition 1.
Let 0 < q p 2 . The ( p , q ) -GREC(S,N) is said to be satisfied if
ϕ p , q ( S , N ) : = min γ β 2 β G N p , 2 : | J | S , β G J c p , q β G J p , q , N = J ( β , N ) J > 0 .
The oracle property is an important property for the variable selection, and it gives an upper bound on the square error of the logistic regression problem and the violation of the true nonzero groups for each point in the level set of the objective function of problem (2).
For β ¯ R d + 1 , the level set is given as follows:
L e v F ( β ¯ ) : = { β R d + 1 : f ( β ) + λ β p , q q λ β ¯ p , q q } .
From the definition of the level set in ([25], p. 8), we know that many properties of the optimization problem (2) relate to the level set L e v F ( β ¯ ) .
Theorem 1.
Let 0 < q 1 p , S > 0 and let the ( p , q ) - G R E C ( S , S ) hold. Let β ¯ be the unique solution of min β f ( β ) at a group sparsity level S, and S be the index set of nonzero groups of β ¯ . Let K be the smallest integer such that 2 K 1 q 1 . Then, for any β * l e v F ( β ¯ ) , which means that f ( β * ) + λ β * p , q q λ β ¯ p , q q , the following oracle inequality holds:
f ( β * ) f ( β ¯ ) + λ β G S c * p , q q 2 q 2 q λ 2 2 q γ q 2 q S 2 ( 1 2 K ) 2 q / ϕ p , q 2 q 2 q ( S , S ) .
Moreover, letting N * : = S S ( β * ; S ) , we have
β G N * * β ¯ G N * p , 2 2 2 2 2 q γ 2 2 q λ 2 2 q S 2 ( 1 2 K ) / ( 2 q ) / ϕ p , q 4 2 q ( S , S ) .
Proof. 
Let β * L e v F ( β ¯ ) , and by the definition of the level set L e v F ( β ¯ ) we have
f ( β * ) + λ β * p , q q λ β ¯ p , q q .
Then by Lemmas 1 and 2 and the fact f ( β ¯ ) > 0 , one has the following formulation:
f ( β * ) f ( β ¯ ) + λ β G S c * p , q q f ( β * ) + λ β G S c * p , q q λ ( β ¯ G S p , q q β G S * p , q q ) λ β ¯ G S β G S * p , q q λ S 1 2 K β ¯ G S β G S * p , 2 q .
Moreover, we find
β G S c * β ¯ G S c p , q q β G S * β ¯ G S p , q q β G S c * p , q q ( β ¯ G S p , q q β G S * p , q q ) = β * p , q q β ¯ p , q q 0 .
Then, the ( p , q ) - G R E C ( S , S ) implies the following:
β ¯ G S β G S * p , 2 ( γ β * β ¯ 2 ) / ϕ p , q ( S , S ) .
From the expansion of the Taylor formulation, we obtain the following relationship:
f ( β * ) f ( β ¯ ) = f ( β ¯ ) T ( β * β ¯ ) + 1 2 ( β * β ¯ ) T 2 f ( β ˜ ) ( β * β ¯ ) 1 2 ( β * β ¯ ) T 2 f ( β ˜ ) ( β * β ¯ ) 1 2 γ β * β ¯ 2 2 ,
where β ˜ { β ˜ : β ˜ β ¯ 2 β * β ¯ 2 } . The first inequality of formulation (10) is based on the fact that β ¯ is the unique optimal solution of min β f ( β ) at group sparsity level S . Moreover, because of the uniqueness of β ¯ , we obtain that the smallest eigenvalue of the Hessian matrix of the function f ( β ) is positive. Hence, we obtain the second inequality.
Then, we get
γ β * β ¯ 2 2 2 ( f ( β * ) f ( β ¯ ) ) .
Combining this with formulation (9), we get
f ( β * ) f ( β ¯ ) + λ β G S c * p , q q 2 q 2 λ γ q 2 S 1 2 K ( f ( β * ) f ( β ¯ ) ) q 2 / ϕ p , q q ( S , S ) .
Thus, we have
f ( β * ) f ( β ¯ ) 2 q 2 q γ q 2 q λ 2 2 q S 2 ( 1 2 K ) / ( 2 q ) / ϕ p , q 2 q 2 q ( S , S ) .
Hence, by formulations (12) and (13), we obtain the oracle inequality (8). Moreover, from the definition of N * , the ( p , q ) - G R E C ( S , S ) implies that
β G N * * β ¯ G N * p , 2 2 2 γ ( f ( β * ) f ( β ¯ ) ) / ϕ p , q 2 ( S , S ) 2 2 2 q γ 2 2 q λ 2 2 q S 2 ( 1 2 K ) / ( 2 q ) / ϕ p , q 4 2 q ( S , S ) .
Thus, the proof is complete.    □
In the following, we established the global recovery bound for the l p , q regularization problem (2). The global recovery bound shows that the sparse solution β ¯ could be recovered by any point β * in the level set l e v F ( β ¯ ) . Here β * is a global optimal solution of problem (2) when the penalized parameter λ is small enough. We show the global recovery bound for the l p , q regularization problem (2) in the next theorem.
Theorem 2.
Let 0 < q 1 p 2 , S > 0 and suppose the ( p , q ) -GREC(S,S) holds. Let β ¯ be the unique solution of min β f ( β ) at group sparsity level S, and S be the index set of nonzero groups of β ¯ . Let K be the smallest integer such that 2 K 1 q 1 . Then, for any β * l e v F ( β ¯ ) , the following global recovery bound for problem (2) holds:
β * β ¯ 2 2 2 2 2 2 q λ 4 q ( 2 q ) γ 2 2 q S q 2 q + 4 ( 1 2 K ) q ( 2 q ) / ϕ p , q 4 2 q ( S , S ) .
More precisely,
β * β ¯ 2 2 O ( λ 4 q ( 2 q ) γ 2 2 q S ) , 2 K 1 q = 1 , O ( λ 4 q ( 2 q ) γ 2 2 q S 4 q 2 q ) , 2 K 1 q > 1 .
Proof. 
We suppose N * : = S S ( β * ; S ) as defined in Theorem 1. Since p 2 , from Lemma 3 and Theorem 1, we get
β G N * c * 2 2 β G N * c * p , 2 2 S 1 2 / q β G S c * p , q 2 2 2 2 q λ 4 q ( 2 q ) γ 2 2 q S q 2 q + 4 ( 1 2 K ) q ( 2 q ) / ϕ p , q 4 2 q ( S , S ) .
Furthermore, from Theorem 1 and the fact 2 K 1 q 1 , we get
β * β ¯ 2 2 = β G N * * β ¯ G N * * 2 2 + β G N * c * 2 2 2 2 2 q γ 2 2 q λ 2 2 q S 2 ( 1 2 K ) / ( 2 q ) / ϕ p , q 4 2 q ( S , S ) + 2 2 2 q λ 4 q ( 2 q ) γ 2 2 q S q 2 q + 4 ( 1 2 K ) q ( 2 q ) / ϕ p , q 4 2 q ( S , S ) 2 2 2 2 q λ 4 q ( 2 q ) γ 2 2 q S q 2 q + 4 ( 1 2 K ) q ( 2 q ) / ϕ p , q 4 2 q ( S , S ) .
Hence, formulation (14) holds.
Moreover, if 2 K 1 q = 1 , we have q 2 q + 4 ( 1 2 K ) q ( 2 q ) = 1 and we get
β * β ¯ 2 2 O ( λ 4 q ( 2 q ) γ 2 2 q S ) .
If 2 K 1 q > 1 , we know q > 2 1 K . Hence q 2 q + 4 ( 1 2 K ) q ( 2 q ) 4 q 2 q , and we get
β * β ¯ 2 2 O ( λ 4 q ( 2 q ) γ 2 2 q S 4 q 2 q ) .
Thus, formulation (15) holds.    □
Remark 1.
From Proposition 2, we know that f ( β ) is convex. By the convexity of the function f ( β ) , we know it is suitable for the assumption for the variable β ¯ . The conditions of the variable β ¯ help us to obtain desirable results for Theorems 1 and 2.

3. ADMM Algorithm

In this section, we give an algorithm based on the ADMM algorithm [26,27] for solving the logistic regression model with l p , q regularization (2). The ADMM algorithm performs very well for problems where the variables can be separated. Model (2) can be equivalently described as follows:
min β f ( β ) + λ r p , q q , s . t . β = r .
The augmented Lagrange function of the above model is as follows:
L ρ ( β , r , u ) = f ( β ) + λ r p , q q + u T ( β r ) + ρ 2 β r 2 2 ,
where u R d + 1 is the dual variable, and ρ > 0 is the augmented Lagrange multiplier.
Generally speaking, the structure of the ADMM algorithm is given as follows:
β k + 1 = argmin β f ( β ) + ( β r k ) T u k + ρ 2 β r k 2 2 ,
r k + 1 = argmin r λ r p , q q + ( β k + 1 r ) T u k + ρ 2 β k + 1 r 2 2 ,
u k + 1 = u k + ρ ( β k + 1 r k + 1 ) .
Based on the above structure and Propositions 1 and 2, the subproblem (19) is an unconstrained convex optimization problem and the objective function is Lipschitz continuous. Based on these good properties of the subproblem (19), we found that it could be effectively solved by many optimal algorithms, such as the trust region algorithm, the sequential quadratic programming algorithm, the algorithm based on the gradient and so on. Moreover, the first order optimal conditions of problem (19) are given by the following formulation:
f ( β ) + u k T ( β r k ) + ρ 2 β r k 2 2 β = 0 d + 1 .
Thus, we obtain the optimal solution for subproblem (19) by solving the following nonlinear equations:
n ρ β i = 1 n X i T 1 + e x p ( X i β ) = n ρ r k n u k + X T ( y e n ) ,
where e n = ( 1 , 1 , , 1 n ) T R n .
From this fact we know that the variable β is group sparse. Hence, we can divide the variables β k + 1 and u k by the group structure. β G i k + 1 and u G i k + 1 denote the G i th group variables of β k + 1 and u k , respectively. Subproblem (20) is solved by employing a group structure. Then, for i = 1 , , r , r G i k + 1 can be given by solving the following optimization problem:
r G i k + 1 = argmin r m λ r G i p , q q + ( β G i k + 1 r G i ) T u G i k + ρ 2 β G i k + 1 r G i 2 2 .
We can then obtain the solution for subproblem (20) by the following:
r k + 1 = ( r G 1 k + 1 T , , r G r k + 1 T ) T .
Moreover, we find that the problem (23) can be equivalently solved by the following one:
( r G i ) k + 1 = argmin r G i r G i ( β G i k + 1 + u G i k ρ ) 2 2 + 2 λ ρ r G i p , q q .
The proximal gradient method given in [17] has proven very useful for solving (25). Based on the above analysis, we found that the ADMM algorithm was effective for solving the logistic regression problem with l p , q regularization, and we describe the structure of Algorithm 1 in the following.
Algorithm 1: ADMM algorithm for solving (2).
  • Step 1: Initialization: give β 0 , u 0 , r 0 , ρ > 0 , λ > 0 , and set k = 0 ;
  • Step 2: for k = 0 , 1 , , if the stop criteria is satisfied, the algorithm is stopped; otherwise go to Step 3;
  • Step 3: Update β k + 1 : β k + 1 is given by solving nonlinear Equation (22);
  • Step 4: Update r k + 1 : for i = 1 , , r , we employ the proximal gradient method to solve the optimization problem (25) and give r G i k + 1 . Then, we have
    r k + 1 = ( r G 1 k + 1 T , , r G r k + 1 T ) T .
  • Step 5: Update u k + 1 : u k + 1 = u k + ρ ( β k + 1 r k + 1 ) .

4. Simulation Examples

In this section, we employ the simulation data to illustrate the efficiency of the logistic regression model with l p , q regularization and the ADMM algorithm. l 1 / 2 regularization is shown to give a more sparse optimal solution than the l 1 norm [28]. Hence, we employed q = 1 / 2 to do our numerical experiments, and we used the ADMM algorithm that we designed in Section 3 to solve model (2). The environment for the simulations was Python 3.7.
In order to verify the effect of the prediction and the classification for penalized logistic regression model (2), we designed two simulation experiments with different data structures. At the same time, we employed the LASSO logistic regression model and the logistic regression model with the elastic net penalty to solve the numerical problems. The numerical results illustrated the advantages of the logistic regression model with l p , q regularization.
We mainly considered two aspects of the effects of the models: the ability of the model to select variables; the ability of the model to test the effects of the classifications and predictions for the penalized logistic regression models. The evaluation indexes for the models in this section mainly included the following:
  • P: the number of non-zero coefficients in the variables that the model gives.
  • T P : the number of coefficients predicted to be non-zero which are actually non-zero.
  • T N : the number of coefficients predicted to be zero which are actually zero.
  • F P : the number of coefficients predicted to be non-zero but which are actually zero.
  • F N : the number of coefficients predicted to be zero but which are actually non-zero.
  • P S R : the ratio of the number of non-zero coefficients in the variables for the predicted case to that of the true case, which is calculated by the following:
    P S R = T P p .
  • A c c u r a c y : the accuracy of the prediction for the test data, which is calculated by the following formulation:
    A c c u r a c y = T P + T N T P + T N + F P + F N .
  • A U C : area under the curve.
A P value close to T P shows that the model is good. The greater the A c c u r a c y and A U C , the better the model. P S R being close to 1 shows the goodness of the model.
Moreover, for the penalized parameter λ , we used the test set verification method to choose it. Firstly, we selected a value of λ that made all coefficients equal to 0, and we set it as λ m a x . Secondly, we chose a number that was very close to 0, such as 0.0001, and set it as λ m i n . We chose λ [ λ m i n , λ m a x ] to do the numerical experiments. Finally, we gave λ [ λ m i n , λ m a x ] , and it produced a maximum value for the AUC. The augmented Lagrange multiplier ρ did not influence the convergence of the algorithm, but the proper one would give a fast convergence rate. When we performed the numerical experiments, we just chose a better one, which meant it made the algorithm converge quickly.

4.1. Simulation Experiment with Non-Sparse Variables in the Group

Firstly, we constructed the group structure features with similar features in the group and different features between different groups, and we obtained the data as it is given in [1]. The data was generated according to the following model:
y i = 1 1 + e x p ( X i β + ε ) .
Here, the explanatory variables for the groups followed the multivariate normal distributions, which meant X i N ( 0 , Σ p s ) , and the error followed the standard normal distribution, which meant ε N ( 0 , 1 ) . The correlation coefficient of the variables ( X G i ) i and ( X G i ) j was ρ s | i j | . Generally speaking, we employed ρ s = 0.2 or ρ s = 0.7 to denote the weak correlation or strong correlation for the variables in the group.
For this simulation experiment, we generated data using 10 groups independently, and each group contained five variables. Hence, the total number of variables was 50. There were three groups that were significant, and the other seven groups were not significant. The correlation coefficients within the groups were 0.2 and 0.7, respectively. The sample size was 500. We selected 80 % of the data for the training set and the others were the test set. The experimental simulation was repeated 30 times. For this example, the penalized parameter was λ = 0.001 for the LASSO logistic regression model. For model (2) the penalized parameter was λ = 0.02 and the augmented Lagrange multiplier was chosen as ρ = 0.1 . For the logistic regression model with the elastic net penalty, the parameters were set as λ 1 = 0.0009 and λ 2 = 0.0001 .
The following table gives the numerical results for this example with different models and different correlation coefficients.
According to Table 1, the logistic regression model with the elastic net penalty, the LASSO logistic regression model and the logistic regression model with l p , q regularization could perform variable selection. When the correlation of variables was different, the criteria of the logistic regression model with l p , q regularization were better than the other two models. According to the indexes P and T P , it could be seen that all variables with non-zero coefficients in the logistic regression model with l 2 , 1 / 2 regularization were screened out when the correlation was different. The value of P was closer to that of T P , and all the selected variables were variables with non-zero coefficients. When choosing different parameters, the logistic regression model with l p , q regularization selected more variables with true non-zero coefficients than that of the other two models, and the logistic regression model with l p , q regularization gave a solution that was closer to the number of variables with true non-zero coefficients. The P S R was closer to 1 and could select significant variables. From the prediction effect of the model, the AUC and accuracy of the logistic regression model with l p , q regularization showed better prediction effects with different correlation coefficients.
According to the variable-selection effect and the prediction effect of the models, the logistic regression model with L 2 , 1 / 2 regularization was better than those of the logistic regression model with the elastic net penalty and the LASSO logistic regression model. Since the data for this simulation experiment was designed to be zero or non-zero in one group, the logistic regression model with L 2 , 1 / 2 regularization performed better than the logistic regression model with L 1 , 1 / 2 regularization. The logistic regression model with L 2 , 1 / 2 regularization could select a set of variables or not, and it had an ideal group variable-selection effect. The logistic regression model with the elastic net penalty and the LASSO logistic regression model compressed the variables to achieve a sparse effect, but because they did not make full use of the group structure information of the variables, they selected too many variables with zero coefficients during variable selection, so the performance of the two models was not good.

4.2. Simulation Experiment with the Sparse Variables in the Group

The data give were similar to those in the above experiment. The difference between these two simulation experiments are given as follows. A total of six groups of variables with intragroup correlation were simulated. Each group contained 10 variables. A total of two groups of 12 variables were significant. One group of variables was completely significant, and the other group contained two significant variables. The sample size was 500. The ratio of the training set to the test set was 8 : 2 . The correlation coefficients within the group were 0.2 and 0.7, respectively. For this example, the penalized parameter was λ = 0.001 for the LASSO logistic regression model, and for model (2) the penalized parameter was λ = 0.032 and the augmented Lagrange multiplier was 0.1 . When ρ s = 0.2 , the parameters for the logistic regression model with the elastic net penalty were λ 1 = 0.0018 and λ 2 = 0.0002 . Moreover, when ρ s = 0.7 , the parameters for the logistic regression model with the elastic net penalty were λ 1 = 0.0021 and λ 2 = 0.0009 .
The following table gives us the numerical results for this example with different models and different correlation coefficients.
According to Table 2, the logistic regression model with L 1 , 1 / 2 regularization performed very well. From the perspective of variable selection, the logistic regression model with the elastic net penalty, the LASSO logistic regression model and the logistic regression model with l p , q regularization could screen out all the variables with non-zero coefficients. The logistic regression model with the elastic net penalty and the LASSO logistic regression model screened out too many variables with non-zero coefficients. However, the logistic regression model with l p , q regularization not only gave all the variables with non-zero coefficients, but also the values of P and T P were very close, which meant the variables given by the logistic regression model with l p , q regularization were close to the predicted ones. For this kind of data, we found that the logistic regression model with l 2 , 1 / 2 regularization tended to compress a group of variables to be zero or non-zero at the same time. Therefore, the logistic regression model with l 2 , 1 / 2 regularization selected a significant difference, but it did not filter out the important variables in the group. In terms of the prediction effect of the model, the logistic regression model with l 1 , 1 / 2 regularization performed well for different correlation coefficients, and its prediction ability also improved compared with the univariate selection model.
Combining the effects of variable selection and model prediction, the logistic regression model with l p , q regularization performed well when the variables were sparse in the group. Due to the “all in all out” mechanism, the logistic regression model with l 2 , 1 / 2 regularization could only screen out important variable groups. The important variables in the group could not be screened out, and the variable-selection ability was not good. But the logistic regression model with l 1 , 1 / 2 regularization could overcome this disadvantage.
From the above two simulation experiments, we found the goodness of the logistic regression model with l p , q regularization for variable selection and prediction for the data with a group structure. Moreover, for different data, we adjusted the value of p to adapt to the problems.

5. Real-Data Experiment

Data description and preprocessing are described as follows. In this section, the real data are considered. The data came from the excellent mining quantitative platform, and the website is https://uqer.datayes.com/ (acdcessed on 31 October 2021) The data were the factor data and the yield data of constituent stocks for the Shanghai and Shenzhen 300 index in China’s stock market from 1 January 2010 to 31 December 2020. The advantages for using these data were good performance, large scale, high liquidity and active trading in the market. In order to ensure the accuracy and rationality of the analysis results, we needed to select and correct the range of samples. According to the development experience of China’s stock market and previous research experience, the empirical part located the sample starting point in 2010. Secondly, because the capital of companies in the financial industry has the characteristics of high leverage and high debt, the construction of some financial indicators is quite different from the other listed companies. Therefore, we excluded some financial companies based on our experience. In addition, ST and PT stocks in the market have abnormal financial conditions and performance losses. We also excluded such kinds of stocks with weak comparability. Among the 243 stock factors visible on the excellent mining quantitative platform, 34 factors that belong to nine groups were selected to evaluate model (2). The data were daily data, but in practice, if daily transaction data were used for investment, the frequent transactions would lead to a significant increase in transaction costs. The rise of transaction costs would affect the annualized rate of return. In order to reduce the impact of transaction costs, we used monthly transaction data for modeling. Hence, it was necessary to conduct a monthly average processing for each factor’s data, and we used monthly data for stock selection. In the division of the data set, the data from 1 January 2010 to 30 April 2018 were selected as the training set, and the data from 1 May 2018 to 31 December 2020 were used for back testing.
In order to ensure the quality of subsequent factor screening and the effect of stock rise and fall prediction, we needed to preprocess the data before the analysis. The methods for data preprocessing included noise cleaning, missing value processing, data standardization, lag processing and so on.
Using the above factors and data processing methods to generate the stock factor matrix, we employed the logistic regression model with l p , q regularization and the ADMM algorithm proposed in Section 3 to estimate the coefficients of the factors. We also calculated the posterior probability of the stock, sorted the probability from large to small, and bought the top ten stocks with equal weight.
In the following, we introduce the historical back test evaluation index. When we performed the back test analysis for the solutions, in order to perform the objective reflective and comprehensive evaluation of the solutions, it was necessary to give the evaluation indicators. We selected nine indicators to do this, and they were the return rate of the year, the return rate of the benchmark year, the sharp ratio, the volatility, the return unrelated to the market fluctuations ( α ), the sensitivity to market changes ( β ), the information ratio, the maximum pullback, the turnover rate of the year, etc.
Based on the above evaluation criteria, we used the selected data to back test and verify the effectiveness of model (2), and we employed the above model to predict the stock trend. Moreover, we sorted the predicted yield data, and selected the top 10 stocks with the highest rise in probability as the stock portfolio for the month, according to the equal weight reconstruction portfolio. We held it until the end of the month to calculate the month’s income. The initial capital was set at 10 million yuan, the tax for buying was 0.003, and the tax for selling was 0.0013. The sliding point was 0. Moreover, the parameters for model (2) were given as λ = 0.004 . λ = 0.003 was chosen for the LASSO logistic regression model. We adopted λ 1 = λ 2 = 0.02 for the logistic regression model with the elastic net penalty. We employed the ADMM algorithm that we gave in Section 3 to solve this example, and the Lagrange multipliers for the ADMM algorithm to solve these models were chosen as ρ = 0.1 . After the calculation, we listed the following transaction back test results and the cumulative yield figure, which are listed in Table 3 and Figure 1, respectively.
From Table 3, we found that the return rates for the year for these three models were all higher than the return rate of the benchmark year, 13.3 % . For the return rate of the benchmark year, the logistic regression model with l 1 , 1 / 2 regularization was the highest. Model (2) gave a good strategy from the perspective of excess return α and the sharp ratio. Under the same risk coefficient, the investment strategy based on the logistic regression model with l 1 , 1 / 2 regularization could help investors make effective investment decisions and obtain higher yields. Moreover, the model could give a strategy with an acceptable range in the maximum pullback, and the strategy could also more effectively prevent the pullback risk.
From the graph of the cumulative rate of return, the line given by the logistic regression model with l 1 , 1 / 2 regularization was basically always above the benchmark annualized rate of the return curve, which indicated that the return of the portfolio constructed with the group information was always stable. The portfolio constructed based on the logistic regression model with l 1 , 1 / 2 regularization could not only screen out the important factor types affecting stock returns, but could also screen out the important factor indicators in the group, so as to more accurately predict the probability of stock returns rising. When constructing the portfolio based on this model, it had certain advantages over the other two regression models.

6. Conclusions

Combined with the data requirements, this paper proposed a logistic regression model with l p , q regularization. We showed the properties of the l p , q norm and the loss function of the logistic regression problem. Moreover, the oracle inequality for the l p , q norm and the global recovery bound for the penalized regression model were established with the help of the ( p , q ) -group restricted eigenvalue condition. These properties were important for variable selection. In Section 3, we showed the framework for the ADMM algorithm for solving the penalized logistic regression model. For the algorithm, we gave a method for solving the subproblems, so as to reduce the difficulty and complexity of solving the model (2).
By the numerical simulation results, since the logistic regression with l p , q regularization comprehensively considered the group structure information of variables, compared with the univariate selection method considering only a single variable, it could eliminate more redundant variables, so as to screen out the more important characteristics of dependent variables. It also had higher accuracy when we performed the test on the test set. Moreover, for the real-data experiment, it was found that the logistic regression model with l p , q regularization could show a more stable effect in variable selection and prediction.
In future work, we could extend the group logistic regression model with L p , q regularization. In the theoretical part, we could do more analysis, such as local recovery bounds and design a more appropriate algorithm, which could give a convergence solution. Moreover, in the numerical parts, we could choose more p and q to illustrate the goodness of the model (2).

Author Contributions

Conceptualization and methodology, Y.Z. and C.W.; investigation and data curation, X.L.; writing—original draft preparation, writing—review and editing, Y.Z. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

The authors’ work was supported by the National Natural Science Foundation of China (No. 12171027), National Statistical Science Research Project (2019LZ40), State Key Laboratory of Scientific and Engineering Computing, Chinese Academy of Sciences, and the Youth Foundation of Minzu University of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  2. Huang, J.; Ma, S.; Xie, H.; Zhang, C. A group bridge approach for variable selection. Biometrika 2009, 96, 339–355. [Google Scholar] [CrossRef] [Green Version]
  3. Kim, S.; Koh, K.; Lustig, M.; Boyd, S.; Gorinevsky, D. An interior-point method for large-scale regularized least squares. IEEE J. Sel. Top. Signal Process. 2007, 1, 606–617. [Google Scholar] [CrossRef]
  4. Meinshausen, N. Relaxed lasso. Comput. Stat. Data Anal. 2006, 52, 374–393. [Google Scholar] [CrossRef]
  5. Nikolova, M.; Ng, M.K.; Zhang, S.; Ching, W. Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 2008, 1, 2–25. [Google Scholar] [CrossRef] [Green Version]
  6. Ong, C.S.; An, L.T.H. Leaning sparse classifiers with difference of convex functions algorithms. Optim. Method Softw. 2013, 28, 830–854. [Google Scholar] [CrossRef]
  7. Soubies, E.; Blance-Fraud, L.; Aubert, G. A continuous exact penalty (cel0) for least squares regularized problem. SIAM J. Imaging Sci. 2015, 8, 1607–1639. [Google Scholar] [CrossRef]
  8. Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. B 2011, 73, 273–282. [Google Scholar] [CrossRef]
  9. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
  10. Zhang, T. Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 2010, 11, 1081–1107. [Google Scholar]
  11. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
  12. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  13. Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 2006, 68, 49–67. [Google Scholar] [CrossRef]
  14. Wang, L.; Li, H.; Huang, J.Z. Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 2008, 103, 1556–1569. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Huang, J.; Breheny, P.; Ma, S. A selective review of group selection in high-dimensional models. Stat. Sci. 2012, 27, 481–499. [Google Scholar] [CrossRef] [PubMed]
  16. Chartrand, R.; Staneva, V. Restricted isometry properties and nonconvex compressive sensing. Inverse Probl. 2008, 24, 035020. [Google Scholar] [CrossRef] [Green Version]
  17. Hu, Y.; Li, C.; Meng, K.; Qin, J.; Yang, X. Group sparse optimization via lp,q regularization. J. Mach. Learn. Res. 2017, 18, 1–52. [Google Scholar]
  18. Meier, L.; Sara, G.; Bühlmann, P. The group lasso for logistic regression. J. R. Statist. Soc. B 2008, 70, 53–71. [Google Scholar] [CrossRef] [Green Version]
  19. Zhao, P.; Rocha, G.; Yu, B. The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 2009, 37, 3468–3497. [Google Scholar] [CrossRef] [Green Version]
  20. Bartlett, P.L.; Mendelson, S.; Neeman, J. L1 regularized linear regression: Persistence and oracle inequalities. Probab. Theory Relat. Fields 2012, 154, 193–224. [Google Scholar] [CrossRef] [Green Version]
  21. Greenshtein, E.; Ritov, Y.A. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 2012, 10, 971–988. [Google Scholar] [CrossRef]
  22. Blazère, M.; Loubes, J.M.; Gamboa, F. Oracle inequalities for a group lasso procedure applied to generalized linear models in high dimension. IEEE Trans. Inform. Theory 2014, 60, 2303–2318. [Google Scholar]
  23. Kwemou, M. Non-asymptotic oracle inequalities for the Lasso and group Lasso in high dimensional logistic model. ESAIM-Probab. Stat. 2016, 20, 309–331. [Google Scholar] [CrossRef]
  24. Xiao, Y.; Yan, T.; Zhang, H.; Zhang, Y. Oracle inequalities for weighted group lasso in high-dimensional misspecified Cox models. J. Inequal. Appl. 2020, 1, 1–33. [Google Scholar] [CrossRef]
  25. Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis, 3rd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
  26. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  27. Han, D. A survey on some recent developments of alternating direction method of multipliers. J. Oper. Res. Soc. China 2022, 10, 1–52. [Google Scholar] [CrossRef]
  28. Xu, Z.; Chang, X.; Xu, F.; Zhang, H. l1/2 regularization: A thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1013–1027. [Google Scholar]
Figure 1. The cumulative yield figure.
Figure 1. The cumulative yield figure.
Mathematics 10 02227 g001
Table 1. The results for the simulations in experiment one.
Table 1. The results for the simulations in experiment one.
The Correlation CoefficientModelsP TP PSR Accuracy AUC
ρ s = 0.2 Elastic Net33.0715.001.00000.95200.9503
LASSO32.1715.001.00000.95030.9497
p = 2 , q = 1 2 15.0015.001.00000.96570.9658
p = 1 , q = 1 2 15.2014.930.99560.95800.9582
ρ s = 0.7 Elastic Net31.3315.001.00000.95770.9571
LASSO29.4714.830.98890.95770.9571
p = 2 , q = 1 2 15.0015.001.00000.97100.9716
p = 1 , q = 1 2 16.0314.900.99330.96370.9634
Table 2. The results for the simulations in experiment two.
Table 2. The results for the simulations in experiment two.
The Correlation CoefficientModelsP TP PSR Accuracy AUC
ρ s = 0.2 Elastic Net29.8312.001.00000.94670.9467
LASSO36.0012.001.00000.94800.9478
p = 2 , q = 1 2 14.6712.001.00000.95800.9650
p = 1 , q = 1 2 12.3712.001.00000.96510.9673
ρ s = 0.7 Elastic Net24.3612.001.00000.95700.9561
LASSO29.4012.001.00000.95900.9589
p = 2 , q = 1 2 17.3312.001.00000.96400.9635
p = 1 , q = 1 2 12.9012.001.00000.97000.9704
Table 3. The back test results.
Table 3. The back test results.
The Logistic Regression with Different PenaltiesThe l 1 , 1 / 2 NormLASSOElastic Net
The return rate of the year 22.7 % 20.5 % 18.8 %
The return rate of the benchmark year 13.3 % 13.3 % 13.3 %
α 10.5 % 6.2 % 8.0 %
β 0.890.920.64
The sharp ratio0.840.720.41
The volatility 22.8 % 23.5 % 23.7 %
The information ratio0.680.540.41
The maximum pullback 18.8 % 27.7 % 25.1 %
The turnover rate of the year9.147.679.33
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Wei, C.; Liu, X. Group Logistic Regression Models with lp,q Regularization. Mathematics 2022, 10, 2227. https://doi.org/10.3390/math10132227

AMA Style

Zhang Y, Wei C, Liu X. Group Logistic Regression Models with lp,q Regularization. Mathematics. 2022; 10(13):2227. https://doi.org/10.3390/math10132227

Chicago/Turabian Style

Zhang, Yanfang, Chuanhua Wei, and Xiaolin Liu. 2022. "Group Logistic Regression Models with lp,q Regularization" Mathematics 10, no. 13: 2227. https://doi.org/10.3390/math10132227

APA Style

Zhang, Y., Wei, C., & Liu, X. (2022). Group Logistic Regression Models with lp,q Regularization. Mathematics, 10(13), 2227. https://doi.org/10.3390/math10132227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop