Next Article in Journal
Matheuristics and Column Generation for a Basic Technician Routing Problem
Next Article in Special Issue
On the Adaptive Penalty Parameter Selection in ADMM
Previous Article in Journal
A Non-Dominated Genetic Algorithm Based on Decoding Rule of Heat Treatment Equipment Volume and Job Delivery Date
Previous Article in Special Issue
A Convex Optimization Algorithm for Electricity Pricing of Charging Stations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Linearly Involved Generalized Moreau Enhancement of 2,1-Norm with Application to Weighted Group Sparse Classification

Department of Information and Communications Engineering, Tokyo Institute of Technology, 2-12-1 Okayama, Meguro-ku, Tokyo 152-8552, Japan
*
Author to whom correspondence should be addressed.
Algorithms 2021, 14(11), 312; https://doi.org/10.3390/a14110312
Submission received: 10 September 2021 / Revised: 25 October 2021 / Accepted: 26 October 2021 / Published: 27 October 2021
(This article belongs to the Special Issue Recent Advances in Nonsmooth Optimization and Analysis)

Abstract

:
This paper proposes a new group-sparsity-inducing regularizer to approximate 2 , 0 pseudo-norm. The regularizer is nonconvex, which can be seen as a linearly involved generalized Moreau enhancement of 2 , 1 -norm. Moreover, the overall convexity of the corresponding group-sparsity-regularized least squares problem can be achieved. The model can handle general group configurations such as weighted group sparse problems, and can be solved through a proximal splitting algorithm. Among the applications, considering that the bias of convex regularizer may lead to incorrect classification results especially for unbalanced training sets, we apply the proposed model to the (weighted) group sparse classification problem. The proposed classifier can use the label, similarity and locality information of samples. It also suppresses the bias of convex regularizer-based classifiers. Experimental results demonstrate that the proposed classifier improves the performance of convex 2 , 1 regularizer-based methods, especially when the training data set is unbalanced. This paper enhances the potential applicability and effectiveness of using nonconvex regularizers in the frame of convex optimization.

1. Introduction

In recent decades, sparse reconstruction has become an active topic in many areas, such as in fields of signal processing, statistics, and machine learning [1]. By reconstructing a sparse solution from a linear measurement, we can obtain a certain expression of high-dimensional data as a vector with only a small number of nonzero entries. In practical applications, the data of interest can often be assumed to have a certain special structure. For example, in microarray analysis of gene expression [2,3], hyperspectral image unmixing [4,5,6], force identification in industrial applications [7], classification problems [8,9,10,11,12,13], etc., the solution of interest often possesses group-sparsity structure, namely the solution has a natural grouping of its coefficients and nonzero entries only occur in few groups.
This paper focuses on the estimation of group sparse solution, which is related to the Group LASSO (least absolute shrinkage and selection operator) [14]. Suppose x = [ x 1 , x 2 , , x g ] R n is a group sparse signal, where x i R n i , i = 1 g n i = n and g is the number of groups. Just as with the use of 0 pseudo-norm for evaluation of the sparsity, the group sparsity of x can be evaluated with the 2 , 0 pseudo-norm, i.e., x 2 , 0 = x 1 2 , x 2 2 , , x g 2 0 , where · 2 is the Euclidean norm, and · 0 is the 0 pseudo-norm which counts the number of nonzero entries in the vector in R g .
The group sparse regularized least squares problem can be modeled as
minimize x R n 1 2 y A x 2 2 + λ x 2 , 0 ,
where y R m and A R m × n are known, and λ > 0 is the regularization parameter. However, the employment of the pseudo-norm 2 , 0 makes (1) NP-hard [15]. Most studies in the application replace the nonconvex regularizer 2 , 0 with its tightest convex envelope 2 , 1 [16] (or its weighted variants), and the following regularized least squares problem has been proposed known as the Group LASSO [14],
minimize x R n 1 2 y A x 2 2 + λ i = 1 g w i x i 2 ,
where w i > 0 ( i = 1 , , g ) in the regularization term
x w , 2 , 1 : = i = 1 g w i x i 2
(this is w , 2 , 1 -norm of x , i.e., a separable weighted version [17] of 2 , 1 -norm x 2 , 1 = i = 1 g x i 2 ) are used to adjust for group sizes with w i = n i in [14,18]. We give a simple but clear explanation in Appendix A, to show the bias of 2 , 1 -norm caused by group size in the application of group sparse classification (GSC).
Although the convex optimization problem (2) has been used as a standard model for group sparse estimation applications, the convex regularizer w , 2 , 1 does not necessarily promote group sparsity sufficiently, mainly due to the fact that w , 2 , 1 -norm is just an approximation of 2 , 0 pseudo-norm within the severe restriction of the convexity. To promote the group sparsity more effectively than convex regularizers, nonconvex regularizers such as group SCAD (smoothly clipped absolute deviation) [3], group MCP (minimax concave penalty) [18,19], p , q regularization ( x p , q : = i = 1 g x i p q 1 / q , 0 < q < 1 p ) [20], iterative weighted group minimization [21], and 2 , 0 [22] have been used for group sparse estimation problems. However, they lose the overall convexity (In [23], a nonconvex regularizer which can preserve the overall convexity was proposed, but the fidelity term of the optimization model is 1 2 y x 2 2 (limited to the case of A = I n , where I n R n × n is the identity matrix), which cannot be applied to (1) for general A R m × n .) of the optimization problems, which results in their algorithms of no guarantee of convergence to global minimizers of the overall cost functions.
In this paper, we propose a generalized weighted group sparse estimation model based on the linearly involved generalized-Moreau-enhanced (LiGME) approach [24] that uses nonconvex regularizer while maintaining the overall convexity of the optimization problem. Our contributions can be summarized as follows:
  • We show in Proposition 2 that the generalized Moreau enhancement (GME) of w , 2 , 1 , i.e., ( · w , 2 , 1 ) B (see (11)), can bridge the gap between w , 2 , 1 and 2 , 0 . For the non-separable weighted 2 , 1 , i.e., W · 2 , 1 , its GME can be expressed as LiGME of 2 , 1 in the case of weight matrix W has full row-rank.
  • We present a convex regularized least squares model with a nonconvex group sparsity promoting regularizer based on LiGME. It can be served as a unified model of many types of group sparsity related applications.
  • We illustrate the unfairness of 2 , 1 regularizer in unbalanced classification and then apply the proposed model to reduce the unfairness of it in GSC and weighted GSC (WGSC) [11].
The remainder of this paper is organized as follows. In Section 2, we give a brief review of LiGME model and WGSC method. In Section 3, we present our group sparse enhanced representation model and its mathematical properties. In Section 4, we apply the proposed model to group-sparsity-based classification problems. The conclusion is given in Section 5.
A preliminary short version of this paper was presented at a conference [25].

2. Preliminaries

2.1. Review of Linearly Involved Generalized-Moreau-Enhanced (LiGME) Model

We first give a brief review of linearly involved generalized-Moreau-enhanced (LiGME) models, which is closely related to our method. Although the convex function 1 -norm (or nuclear norm) is the most frequently adopted regularizer for sparsity (or low-rank) pursuing problems, it tends to yield underestimation for high-amplitude value (or large singular value) [26,27]. The convexity-preserving nonconvex regularizers have been widely explored in [24,28,29,30,31,32,33], which promote sparsity (or low-rank) more effectively than convex regularizers without losing the overall convexity. Among them, the generalized minimax concave (GMC) function in [31] does not rely on certain strong assumptions in the least squares term and has great potential for dealing with nonconvex variations of · 1 . Motivated by GMC function, the LiGME model [24] provides a general framework for constructing linearly involved nonconvex regularizers for sparsity (or low-rank) regularized linear least squares while maintaining the overall convexity of the cost function.
Let ( X , · , · X , · X ) , ( Y , · , · Y , · Y ) , ( Z , · , · Z , · Z ) , and ( Z ˜ , · , · Z ˜ , · Z ˜ ) be finite-dimensional real Hilbert spaces. Let a function Ψ Γ 0 ( Z ) be coercive with dom Ψ = Z . Here Γ 0 ( Z ) is the set of proper (i.e., dom Ψ : = { z Z | Ψ ( z ) < } ) lower semicontinuous (i.e., lev a Ψ : = { z Z | Ψ ( z ) a } is closed for a R ) convex function (i.e., Ψ ( θ z 1 + ( 1 θ ) z 2 ) θ Ψ ( z 1 ) + ( 1 θ ) Ψ ( z 2 ) ) for z 1 , z 2 dom Ψ , 0 θ 1 ) from Z to ( , ] ; a function Ψ Γ 0 ( Z ) is called coercive if z 2 Ψ ( z ) . For Ψ Γ 0 ( Z ) , the proximity operator of Ψ is defined by Prox Ψ : Z Z : z arg min v Z Ψ ( v ) + 1 2 v z Z 2 .
The generalized Moreau enhancement (GME) of Ψ with B B ( Z , Z ˜ ) is defined as
Ψ B ( · ) : = Ψ ( · ) min v Z Ψ ( v ) + 1 2 B ( · v ) Z ˜ 2 ,
where B is a tuning matrix for the enhancement. Then the LiGME model defined as the minimization of
J Ψ B L : X R : x 1 2 y A x Y 2 + λ Ψ B L ( x ) ,
where ( A , L , λ ) B ( X , Y ) × B ( X , Z ) × R + .
Please note that GMC [31] can be seen as a special case of (5) with Ψ = · 1 and L = Id , where Id is the identity operator. Model (5) can also be seen as an extension of [32,33].
Although the GME function Ψ B in (4) is not convex in general for B O B ( Z , Z ˜ ) , where O B ( Z , Z ˜ ) B ( Z , Z ˜ ) is the zero operator, the overall convexity of the cost function (5) can be achieved with B designed to satisfy the following convexity condition.
Proposition 1
([24], Proposition 1). The cost function J Ψ B L in (5) belongs to Γ 0 ( X ) for any y Y , if the GME regularizer Ψ B in (4) satisfies that
A * A λ L * B * B L O X ,
where A * denotes the adjoint of A and O X B ( X , X ) is the zero operator. In particular, when Ψ is a certain norm over the vector space Z , J Ψ B L Γ 0 ( X ) if and only if (6) is satisfied.
A method of designing B satisfying (6) for X = R n is provided in [24]; see Proposition A1 in Appendix B. For any Ψ Γ 0 ( Z ) that is coercive, even symmetry and prox-friendly (Even symmetry means Ψ ( Id ) = Ψ ; prox-friendly means Prox γ Ψ is computable ( γ R + + ).) with dom Ψ = Z , [24] provides a proximal splitting algorithm (see Proposition A2 in Appendix B) of guaranteed convergence to a globally optimal solution of model (5) under the overall-convexity condition (6).

2.2. Basic Idea of Weighted Group Sparse Classification (WGSC)

As a relatively simple but typical scenario for the application of the proposed idea in this paper, we introduce the main idea of weighted group sparse classification (WGSC). Classification is one of fundamental tasks in the field of the signal and image processing and pattern recognition. For a classification problem with g classes of subjects, the training samples can formulate a dictionary matrix A = [ A 1 , A 2 , , A g ] R m × n , where A i = [ a i 1 , a i 2 , , a i n i ] R m × n i is the subset of the training samples from subject i, a i j is the j-th training sample from the i-th class, n i is the number of training samples from class i, and n = i = 1 g n i is the number of total training samples. The aim is to correctly determine which class the input test sample y R m belongs to. Although deep learning is very popular and powerful for classification tasks, it requires a very large-scale training set and computation resources for numerous parameters training with complicated back-propagation.
Wright et al. proposed the sparse representation-based classification (SRC) [34] for face recognition. With the assumption that samples of a specific subject lie in a linear subspace, a valid test sample y is expected to be approximated well by a linear combination of the training samples from the same class, which leads to a sparse representation coefficient over all training samples. Specifically, the test sample y is approximated by the linear combination of the dictionary items, i.e., y A x , where x is the coefficient vector. A simple minimization model with sparse representation can be minimize x R n 1 2 y A x 2 2 + λ x 0 . In most SRC-based approaches, 0 regularizer is relaxed to 1 , and the model becomes the well-known LASSO model [35] in statistics.
The label information of the dictionary atoms is not used in the simple model of SRC, hence the regression is based solely on the structure of each sample. When the subspaces spanned by different classes are not independent, SRC may lead the test image to be represented by training samples from multiple different classes. Considering ideal situation where the test image should only be approximated well by the training samples from one class corresponding to the correct one, in [8,9,10], the authors divided training samples into groups by prior label information and used group-sparsity regularizers. Naturally, the coefficient vector x has group structure x = [ x 1 , x 2 , , x g ] R n , where x i = [ x i 1 , x i 2 , , x i n i ] R n i ( i = 1 , 2 , , g ) . This kind of group sparse classification (GSC) approach aims to represent the test image using the minimum number of groups, and thus an ideal model is (1) which is NP-hard. As stated in Section 1, a convex approximation of 2 , 0 , i.e., 2 , 1 -norm, has been used widely as a best convex regularizer to incorporate the class labels.
More generally, the non-separable weighted 2 , 1 -norm, i.e., W · 2 , 1 has also been used as the regularizer in GSC [11,36,37]. For example, Tang et al. [11] proposed a weighted GSC (WGSC) model as follows, by involving the information of the similarity between query sample and each class as well as the distance between query sample and each training sample,
minimize x R n 1 2 y A x 2 2 + λ i = 1 g w i d i x i 2 ,
where d i = [ d i 1 , d i 2 , , d i n i ] R n i penalizes the distance between y and each training sample of i-th class, w i is set to assess the relative importance of training samples from i-th class for representing the test sample, and here ⊙ denotes element-wise multiplication. Specifically, the weights are computed by
d i j = exp y a i j 2 σ 1 and w i = exp r i r min σ 2 ,
where σ 1 and σ 2 are bandwidth parameters, x i * = arg min x i y A i x i 2 2 , r i = y A i x i * computes the distance from y to the individual subspace generated by A i , and r min denotes the minimum reconstruction error of { r i } i = 1 g . The regularizer in (7) can be written as a non-separable weighted 2 , 1 , i.e., W x 2 , 1 , where
W = BlockDiag ( W 1 , W 2 , , W g ) and W i = Diag ( w i d i 1 , w i d i 2 , , w i d i n i ) .
For the aforementioned methods, after obtaining the optimal solution (denoted by x ^ = [ x ^ 1 , x ^ 2 , , x ^ g ] ), they assign y to the class that minimizes the class reconstruction residual defined by y A i x ^ i 2 .
Although 2 , 1 regularizer and its weighted variants are widely used in GSC and WGSC-based methods, they not only suppress the number of selected classes, but also suppress significant nonzero coefficients within classes. The later may lead to underestimation of high-amplitude elements and adversely affect the performance. The nonconvex regularizers such as 2 , p ( 0 < p < 1 ) [37] and group MCP [38] make the corresponding optimization problems nonconvex. Therefore, we hope to use a regularizer which can reduce the bias and approximate 2 , 0 better than 2 , 1 while ensuring the overall convexity of the problem.

3. LiGME Model for Group Sparse Estimation

3.1. GME of Weighted 2 , 1 -Norm and Its Properties

Although 2 , 1 -norm (or its weighted variants) acts as the favorable approach to approximate 2 , 0 in the literature of group sparse estimation, it has large bias and does not promote group sparsity as effective as 2 , 0 . Since GME provides an approach to better approximate direct discrete measures (e.g., 0 for sparsity, matrix rank for low-rankness) than their convex envelopes, we propose to use it for designing group-sparsity pursuing regularizers.
More generally, let us consider the GME of · w , 2 , 1 in (3). Clearly, · w , 2 , 1 Γ 0 ( R n ) is coercive, even symmetry and prox-friendly, whose proximity operator can be computed by
Prox γ · w , 2 , 1 : R n R n : x 1 γ w i max { x i 2 , γ w i } x i i = 1 g ,
where x = [ x 1 , x 2 , , x g ] R n is a signal with group structure, x i R n i ( i = 1 , 2 , , g ) and i = 1 g n i = n .
Actually, the GME of · w , 2 , 1 with B R b × n (see (4)):
· w , 2 , 1 B ( x ) = i = 1 g w i x i 2 min v R n i = 1 g w i v i 2 + 1 2 B ( x v ) 2 2 ,
where v i R n i ( i = 1 , 2 , , g ) and v = [ v 1 , v 2 , , v g ] R n , can serve as a parametric bridge between · 2 , 0 and · w , 2 , 1 .
Proposition 2.
(GME of · w , 2 , 1 can bridge the gap between · 2 , 0 and · w , 2 , 1 .) Let B γ : = BlockDiag ( w 1 γ I n 1 , w 2 γ I n 2 , , w g γ I n g ) for γ > 0 , where w i > 0 is the weight in (3) for i = 1 , , g . Then, for any x R n ,
lim γ 0 2 γ · w , 2 , 1 B γ ( x ) = x 2 , 0 .
Together with the fact that · w , 2 , 1 O n × n ( x ) = x w , 2 , 1 where O n × n R n × n is the zero matrix, the regularization term 2 γ · w , 2 , 1 B γ ( x ) can serve as a parametric bridge between · 2 , 0 and · w , 2 , 1 . As a special case, the GME of · 2 , 1 can serve as a parametric bridge between · 2 , 0 and · 2 , 1 .
Proof. 
The regularization term 2 γ · w , 2 , 1 B γ ( x ) : R n R : [ x 1 , x 2 , , x g ] i = 1 g 2 γ φ i ( x i ) , where
φ i ( x i ) : = w i x i 2 min v i R n i w i v i 2 + w i 2 2 γ x i v i 2 2
for i = 1 , , g . By ([39], Example 24.20), we obtain
2 γ φ i ( x i ) = 2 w i γ x i 2 w i 2 γ 2 x i 2 2 , if x i 2 γ w i 1 , otherwise .
Then, we obtain
lim γ 0 2 γ φ i ( x i ) = 0 , if x i 2 = 0 , 1 , otherwise ,
and
lim γ 0 2 γ · w , 2 , 1 B γ = lim γ 0 i = 1 g 2 γ φ i ( x i ) = x 2 , 0 .
Figure 1 illustrates simple examples of x 2 , 1 and ( · 2 , 1 ) B ( x ) when g = 1 , n = 2 and B = I 2 . As we can see, ( · 2 , 1 ) I 2 ( x ) can approximate x 2 , 0 better than x 2 , 1 .
Of course, as reviewed in Section 2.1, we can minimize J · w , 2 , 1 B Id (see (5)) with the algorithm in (A3) in Proposition A2, under the overall-convexity condition A A λ B B O n × n .
In the following, we consider the GME of non-separable weighted 2 , 1 -norm W · 2 , 1 , where W R l × n is not necessarily a diagonal matrix. This is because in some applications, such as classification problems [11,36,37] stated in Section 2.2, and also heterogeneous feature selection [40], weights are introduced inside groups as well (i.e., the weight of every entry can be different) to improve the estimation accuracy. The GME of W · 2 , 1 with B ˜ R b × n is well-defined (The lack of coercivity requires slight modification from min to inf.) as
W · 2 , 1 B ˜ ( x ) = W x 2 , 1 inf v R n W v 2 , 1 + 1 2 B ˜ ( x v ) 2 2 ,
and therefore we can formulate
minimize x R n J ( W · 2 , 1 ) B ˜ Id = 1 2 y A x 2 2 + λ ( W · 2 , 1 ) B ˜ ( x ) .
However, we should remark that W · 2 , 1 Γ 0 ( R n ) is even symmetric but not necessarily coercive or prox-friendly (As found in ([39], Proposition 24.14), it is known that for Ψ Γ 0 ( Z ) and L B ( X , Z ) satisfying L L * = μ Id with some μ R + + , we have Prox Ψ L ( x ) = x + μ 1 L * ( Prox μ Ψ ( L x ) L x ) for x X . In such a special case, if Ψ is prox-friendly, Ψ L is also prox-friendly. However, for general L B ( X , Z ) not necessarily satisfying such standard conditions, we have to discuss the prox-friendliness of Ψ L case by case.). Fortunately, by Proposition 3 below, if rank ( W ) = l and B ˜ can be expressed as B ˜ = B W for some B R b × l , we can show the useful relation
W · 2 , 1 B ˜ ( x ) = · 2 , 1 B W ( x ) ,
which implies that the GME W · 2 , 1 B ˜ of W · 2 , 1 can be handled as the LiGME · 2 , 1 B W of · 2 , 1 .
Proposition 3.
For Ψ Γ 0 ( Z ) which is coercive and B B ( Z , Z ˜ ) , assume L B ( X , Z ) has full row-rank. Then for any x X ,
Ψ L B L ( x ) = Ψ B L ( x ) ,
where Ψ L B L ( · ) : = Ψ ( L · ) inf v X Ψ ( L v ) + 1 2 B L ( · v ) 2 2 and Ψ B ( · ) : = Ψ ( · ) min v Z Ψ ( v ) + 1 2 B ( · v ) 2 2 .
Proof. 
On one hand, by the definition of GME, we have
Ψ L B L ( x ) = Ψ ( L x ) inf v X Ψ ( L v ) + 1 2 B L ( x v ) 2 2 = Ψ ( L x ) h ( B L x ) ,
where h ( z ) : Z ˜ R is given by
h ( z ) = inf v X Ψ ( L v ) + 1 2 z B L v 2 2 = inf u ( null L ) inf u ^ null L Ψ ( L ( u + u ^ ) ) + 1 2 z B L ( u + u ^ ) 2 2 = inf u ( null L ) Ψ ( L u ) + 1 2 z B L u 2 2 = inf u range L * Ψ ( L u ) + 1 2 z B L u 2 2 = inf v Z Ψ ( L L * v ) + 1 2 z B L L * v 2 2 = inf v Z Ψ ( L L * ( L L * ) 1 v ) + 1 2 z B L L * ( L L * ) 1 v 2 2 = inf v Z Ψ ( v ) + 1 2 z B v 2 2 .
Therefore, Ψ L B L ( x ) = Ψ ( L x ) inf v Z Ψ ( v ) + 1 2 B L x B v 2 2 .
On the other hand, Ψ B L ( x ) = Ψ ( L x ) min v Z Ψ ( v ) + 1 2 B ( L x v ) 2 2 by definition. Thus, we obtain the conclusion. □
In the rest of the paper, we focus on LiGME model of 2 , 1 -norm.

3.2. LiGME of 2 , 1 -Norm

For simplicity as well as for effectiveness in application to GSC and WGSC, we focus on the LiGME model of · 2 , 1 with an invertible linear operator W ,
minimize x R n J ( · 2 , 1 ) B W = 1 2 y A x 2 2 + λ · 2 , 1 B W ( x ) .
In this case, for achieving J ( · 2 , 1 ) B W Γ 0 ( R n ) , we can simply design B R m × n , in a way similar to ([31], (48)), as in the next proposition.
Proposition 4.
For an invertible W R n × n , let
B = θ / λ A W 1 , 0 θ 1 ,
then for the LiGME model in (20), J ( · 2 , 1 ) B W Γ 0 ( R n ) .
Proof. 
By A A λ W B B W = A A λ W ( θ / λ A W 1 ) ( θ / λ A W 1 ) W = ( 1 θ ) A A O n × n and Proposition 1, J ( · 2 , 1 ) B W Γ 0 ( R n ) is ensured. □
Model (20) can be applied to many different applications that conform to group-sparsity structure.

4. Application to Classification Problems

4.1. Proposed Algorithm for Group-Sparsity Based Classification

Since 2 , 1 regularizer in GSC is unfair for classes of different sizes (see Appendix A) while 2 , 0 -regularizer is not, our purpose is to use a better approximation of 2 , 0 as the regularizer. Therefore, we apply model (20) to group-sparsity-based classification. Following GSC, we can set W = I n in (20).
Inspired by WGSC [11] which well designs weights to enforce locality and similarity information of samples, we can also set the weight matrix W according to (9). The classification algorithm is summarized in Algorithm 1.
The 2 , 1 -norm regularized least squares problem in WGSC can be solved by a proximal gradient method [41]. Compared with it, the step 2 in Algorithm 1 for solving (20) requires at each update only one additional computation for Prox γ · 2 , 1 (see (10) with w i = 1 ).

4.2. Experiments

First, by setting W = I n , we conduct the experiments on a relatively simple dataset to investigate the influence by bias of 2 , 1 regularizer on the classification problem (especially when training set is unbalanced), and verify the performance improvement using ( · 2 , 1 ) B as the regularizer by conducting the experiments on a relatively simple dataset. The USPS handwritten digit database [42] has 11,000 samples of digits “0” through “9” (1100 samples per class). The dimension of each sample is 16 × 16 . In our classification experiments, we vectorized them to 256-D vectors. The number of training samples for each class is not necessarily equal, which varies from 5 to 50 (the size of test set is fixed to 50 images per class).
We set W = I n (the initialization of W should be modified in Algorithm 1) for the proposed model (20) and compared it with GSC (with 2 , 1 regularizer) [10]. We set B = θ / λ A and fix θ = 0.9 to achieve the overall convexity of proposed method, and set κ = 1.1 , ι = ( κ / 2 ) A A + λ I n spec + ( κ 1 ) , τ = ( κ / 2 + 2 / κ ) λ B spec 2 + ( κ 1 ) . The initial estimate is set as ( x ( 0 ) , u ( 0 ) , w ( 0 ) ) = ( O n × 1 , O n × 1 , O n × 1 ) , and the stopping criterion is set to either ( x ( k ) , u ( k ) , w ( k ) ) ( x ( k + 1 ) , u ( k + 1 ) , w ( k + 1 ) ) 2 < 10 4 or steps reaching 10,000.
Figure 2 shows an example of unbalanced training set (digits “0” through “4” have 5 samples per class and “5” through “9” have 25 samples per class). The input (an image of digit “0”) was misclassified (into digital “6”) by GSC while classified correctly by proposed method. The obtained coefficient vectors by GSC and proposed method (both with λ = 4 ) are illustrated respectively, and some samples corresponding to nonzero coefficients are also displayed in Figure 2. It can be seen that the samples from digit “6” made the greatest contribution to the representation in GSC, and samples from “5” and “0” also made small contribution. In our method, samples from the correct class “0” made the biggest contribution and led to correct result. It is reasonable, because our method did not suppress the high value coefficients too much whereas 2 , 1 did. The big suppression of 2 , 1 made the coefficients of the correct class cannot be large enough, and thus easily led to misclassification.
Algorithm 1: The proposed group-sparsity enhanced classification algorithm
Input: A matrix of training samples A = [ A 1 , A 2 , , A G ] R m × n grouped by
 class information, a test sample vector y R m , parameters λ , σ 1 and σ 2 .
 1. Initialization: Let ( x ( 0 ) , u ( 0 ) , v ( 0 ) ) R n × R n × R n .
  Compute the weight matrix W by (9).
  Choose B satisfying A A λ W B B W O n × n .
  Choose ( ι , τ , κ ) R + + × R + + × ( 1 , + ) satisfying
ι I n κ 2 A A λ W W O n × n and τ κ 2 + 2 κ λ B spec 2 .

 2. For  k = 0 , 1 , 2 , , compute
x ( k + 1 ) = I n 1 ι ( A A λ W B B W ) x ( k ) λ ι W B B u ( k ) λ ι W v ( k ) + 1 ι A y , u ( k + 1 ) = Prox λ τ · 2 , 1 2 λ τ B B W x ( k + 1 ) λ τ B B W x ( k ) + ( I n λ τ B B ) u ( k ) , v ( k + 1 ) = 2 W x ( k + 1 ) W x ( k ) + v ( k ) Prox · 2 , 1 2 W x ( k + 1 ) W x ( k ) + v ( k )

  until the stopping criterion is fulfilled.
 3. Compute the class label i of y by
i = arg min i y A i x i ( k + 1 ) 2 .

Output: The class label i corresponding to y .
(For example, any κ > 1 , ι = ( κ / 2 ) A A + λ W W spec + ( κ 1 ) and τ = ( κ / 2 + 2 / κ ) λ B spec 2 + ( κ 1 ) can satisfy (22).)
Table 1 summarizes the recognition accuracy of GSC and the proposed method with W = I n . The training set includes digits “0” through “4” β samples per class and “5” through “9” α samples per class. Through numerical experiments, we found that GSC with λ = λ GSC = 1.5 and the proposed method with λ = λ prop = 3 perform well on this dataset. We also experimented the proposed method of using λ GSC , which did not degrade too much compared with using λ prop . We see that the GSC model degrades when the training set is unbalanced, and the proposed method outperforms GSC especially in such case.
Next, we conduct the experiments on a classic face dataset to verify the validity of the proposed linearly involved model by setting the weight matrix W according to (9). The ORL Database of Faces [43] contains 400 images from 40 distinct subjects (10 images per subject) with variations in lighting, facial expressions (open or closed eyes, smiling or not smiling) and facial details (glasses or no glasses). In our experiments, following [44], all images were downsampled from 112 × 92 to 16 × 16 and then formed 256-D vectors. The number of training samples for each class is not necessarily equal, which varies from 4 to 8 (test set is fixed to 2 images per class).
We compared the proposed model (20) ( W = I n and W by (9) respectively) with GSC [10] and WGSC [11]. In order to achieve the overall convexity, we set B = θ / λ A W 1 , 0 θ 1 and fix θ = 0.9 for proposed method. Settings of ( ι , τ , κ ) , initial estimate and stopping criterion are the same as those in the previous experiment. When the parameter λ is assigned too small, the obtained coefficient vector is not group sparse; when the parameter σ 1 or σ 2 is assigned too small, the information of locality or similarity plays a decisive role. We found that λ = 0.05 for 2 , 1 regularizer-based methods (i.e., GSC and WGSC), λ = 0.2 for the proposed method and σ 1 [ 2 , 4 ] , σ 2 [ 0.5 , 2 ] for weights involved methods (i.e., WGSC and proposed method with W by (9)) work well on this dataset.
Figure 3 shows a classification result of WGSC and proposed method ( W by (9)) (both with σ 1 = 4 , σ 2 = 2 ) ) when training set is unbalanced (20 subjects have 8 samples per class and the others have 6 samples per class). The input is an image of subject 10 which was misclassified into subject 8 by WGSC while classified correctly by proposed method.
Table 2 summarizes the recognition accuracy of GSC, the proposed method with W = I n , WGSC and the proposed method with W computed by (9). Training set setting is that 20 subjects have β samples per class and the others have α samples per class. With the strategically designed matrix (9), WGSC achieves a significant improvement over GSC. By using the proposed method with W computed by (9), the performance can be further improved, especially when the training set is unbalanced.

5. Conclusions

In this paper, the potential applicability and effectiveness of using nonconvex regularizers in convex optimization framework was explored. We proposed a generalized Moreau enhancement (GME) of weighted 2 , 1 function and analyzed its relationship with the linearly involved GME of 2 , 1 -norm. The proposed regularizer is nonconvex and promotes group sparsity more effectively than 2 , 1 while maintaining the overall convexity of the regression model at the same time. The model can be used in many applications and we applied it to classification problems. Our model makes use of the grouping structure by class information and suppresses the tendency of underestimation of high-amplitude coefficients. Experimental results showed that the proposed method is effective for image classification.

Author Contributions

Conceptualization, M.Y. and I.Y.; methodology, Y.C., M.Y. and I.Y.; software, Y.C.; writing-original draft, Y.C., writing-review and editing, M.Y. and I.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by JSPS Grants-in-Aid grant number 18K19804 and by JST SICORP grant number JPMJSC20C6.

Data Availability Statement

Publicly available data sets were analyzed in this study. These data can be found here: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html##usps; https://cam-orl.co.uk/facedatabase.html; and http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html (accessed on 25 October 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
LASSOLeast Absolute Shrinkage and Selection Operator
SCADSmoothly Clipped Absolute Deviation)
MCPMinimax Concave Penalty
GMCGeneralized Minimax Concave
GMEGeneralized Moreau Enhancement
LiGMELinearly involved Generalized-Moreau-Enhanced (or Enhancement)
SRCSparse Representation-based Classification
GSCGroup Sparse Classification
WGSCWeighted Group Sparse Classification

Appendix A. The Bias of 2,1 Regularizer in Group Sparse Classification

Using 2 , 1 regularizer in classification problems not only minimizes the number of the selected classes, but also minimizes the 2 -norm of coefficients within each class. The later may adversely affect the classification result, since the optimal representation of a test sample by training samples of the correct subject may contain large coefficients. Moreover, in many classification applications, the number of training samples from different classes is not the same. We argue that the bias of 2 , 1 regularizer makes it unfair for classes of different sizes.
Example A1.
Suppose that a test sample y R m can be represented by a combination of all n i samples from class i without error, i.e., y = A i x i and x i 1 = 1 , where A i R m × n i and x i R n i .
(a) 
If the number of samples in this class is doubled by duplication, the training set of class i becomes A ˜ i = [ A i , A i ] R m × 2 n i . Obviously, y can also be well represented by y = A ˜ i x ˜ i , where x ˜ i = [ η x i , ( 1 η ) x i ] R 2 n i ( 0 η 1 ) and x ˜ i 1 = 1 . However, x i 2 2 x ˜ i 2 2 = 2 η ( 1 η ) x i 2 2 0 . That is, 2 , 1 value of the first representation (before duplication) is greater than that of the second one (after duplication).
(b) 
If the number of samples in this class is increased to d n i by copying d 1 times (d > 1), the training set of class i becomes A ˜ i = [ A i , , A i ] R m × d n i . Obviously, y = A ˜ i x ˜ i is a representation of y , where x ˜ i = [ 1 d x i , , 1 d x i ] R d n i and x ˜ i 1 = 1 . Then x ˜ i 2 = 1 d x i 2 < x i 2 .
Example A1 tells us that the group size affects the value of 2 , 1 regularizer. Even if the new training sample is only a copy of the original samples (without adding any new information), the value of 2 , 1 regularizer will decrease. Therefore, 2 , 1 regularizer is unfair for classes of different sizes. It has the tendency to refuse the class has relatively few samples, because the coefficient vector is more likely to have a large 2 , 1 regularizer value. Please note that 2 , 0 -regularizer is independent of group size and it does not have such unfairness.

Appendix B. Parameter Tuning and Proximal Splitting Algorithm for LiGME Model

Proposition A1
([24], Proposition 2). In (5), let ( X , Y , Z ) = ( R n , R m , R l ) , ( A , L , λ ) R m × n × R l × n × R + + , and r a n k ( L ) = l . Choose a nonsingular L ˜ R n × n satisfying [ O l × ( n l ) I l ] L ˜ = L . Then B θ : = θ / λ Λ 1 / 2 U R l × l , θ [ 0 , 1 ] , ensures J Ψ B θ L Γ 0 ( R n ) , where [ D ˜ 1 D ˜ 2 ] : = A ( L ˜ ) 1 and U Λ U : = D ˜ 2 D ˜ 2 D ˜ 2 D ˜ 1 ( D ˜ 1 D ˜ 1 ) D ˜ 1 D ˜ 2 R l × l is an eigendecomposition.
Proposition A2
([24], Theorem 1). Consider minimization of J Ψ B L in (5) under the overall-convexity condition (6). Let a real Hilbert space ( H : = X × Y × Z , · , · H , · H ) be a product space and define an operator T LiGME : H H : ( x , u , v ) ( ξ , ζ , η ) with parameters ( ι , τ ) R + + × R + + , by
ξ : = Id 1 σ ( A * A λ L * B * B L ) x λ ι L * B * B u λ ι L * v + 1 ι A * y , ζ : = Prox λ τ Ψ 2 λ τ B * B L ξ λ τ B * B L x + Id λ τ B * B u , η : = 2 L ξ L x + v Prox Ψ ( 2 L ξ L x + v ) .
Then the following holds:
1. 
arg min x X J Ψ B L ( x ) = x H | ( x , u , v ) Fix ( T LiGME ) , where Fix ( T LiGME ) : = { ( x , u , v ) H | T LiGME ( x , u , v ) = ( x , u , v ) } .
2. 
Choose ( ι , τ , κ ) R + + × R + + × ( 1 , ) satisfying
ι Id κ 2 A * A λ L * L O X , τ κ 2 + 2 κ λ B o 2 p ,
where · o p is the operator norm. Then
P : = ι Id λ L * B * B λ L * λ B * B L τ Id O Z λ L O Z λ Id O H
and T LiGME is κ 2 κ 1 -averaged nonexpansive in the Hilbert space ( H , · , · P , · P ) .
3. 
Assume the condition (A1) holds. Then, for any initial point ( x ( 0 ) , u ( 0 ) , v ( 0 ) ) , the sequence { ( x ( k ) , v ( k ) , u ( k ) ) } k N generated by
( x ( k + 1 ) , u ( k + 1 ) , v ( k + 1 ) ) = T LiGME ( x ( k ) , u ( k ) , v ( k ) )
converges to a point ( x , u , v ) F i x ( T LiGME ) and
lim k x ( k ) = x arg min x X J Ψ B L ( x ) .

References

  1. Theodoridis, S. Machine Learning: A Bayesian and Optimization Perspective; Academic Press: Cambridge, MA, USA, 2015. [Google Scholar]
  2. Ma, S.; Song, X.; Huang, J. Supervised group Lasso with applications to microarray data analysis. BMC Bioinform. 2007, 8, 60. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Wang, L.; Chen, G.; Li, H. Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 2007, 23, 1486–1494. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, X.; Zhong, Y.; Zhang, L.; Xu, Y. Spatial group sparsity regularized nonnegative matrix factorization for hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6287–6304. [Google Scholar] [CrossRef]
  5. Drumetz, L.; Meyer, T.R.; Chanussot, J.; Bertozzi, A.L.; Jutten, C. Hyperspectral image unmixing with endmember bundles and group sparsity inducing mixed norms. IEEE Trans. Image Process. 2019, 28, 3435–3450. [Google Scholar] [CrossRef] [PubMed]
  6. Huang, J.; Huang, T.Z.; Zhao, X.L.; Deng, L.J. Nonlocal tensor-based sparse hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2020, 59, 6854–6868. [Google Scholar] [CrossRef]
  7. Qiao, B.; Mao, Z.; Liu, J.; Zhao, Z.; Chen, X. Group sparse regularization for impact force identification in time domain. J. Sound Vib. 2019, 445, 44–63. [Google Scholar] [CrossRef]
  8. Majumdar, A.; Ward, R.K. Classification via group sparsity promoting regularization. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 861–864. [Google Scholar]
  9. Elhamifar, E.; Vidal, R. Robust classification using structured sparse representation. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1873–1879. [Google Scholar]
  10. Huang, J.; Nie, F.; Huang, H.; Ding, C. Supervised and projected sparse coding for image classification. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 14–18 July 2013. [Google Scholar]
  11. Tang, X.; Feng, G.; Cai, J. Weighted group sparse representation for undersampled face recognition. Neurocomputing 2014, 145, 402–415. [Google Scholar] [CrossRef]
  12. Rao, N.; Nowak, R.; Cox, C.; Rogers, T. Classification with the sparse group lasso. IEEE Trans. Signal Process. 2015, 64, 448–463. [Google Scholar] [CrossRef]
  13. Tan, S.; Sun, X.; Chan, W.; Qu, L.; Shao, L. Robust face recognition with kernelized locality-sensitive group sparsity representation. IEEE Trans. Image Process. 2017, 26, 4661–4668. [Google Scholar] [CrossRef]
  14. Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2006, 68, 49–67. [Google Scholar] [CrossRef]
  15. Natarajan, B.K. Sparse approximate solutions to linear systems. SIAM J. Comput. 1995, 24, 227–234. [Google Scholar] [CrossRef] [Green Version]
  16. Argyriou, A.; Foygel, R.; Srebro, N. Sparse prediction with the k-support norm. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Siem Reap, Cambodia, 13–16 December 2018; Volume 1, pp. 1457–1465. [Google Scholar]
  17. Deng, W.; Yin, W.; Zhang, Y. Group sparse optimization by alternating direction method. In Wavelets and Sparsity XV; International Society for Optics and Photonics: Bellingham, WA, USA, 2013; Volume 8858, p. 88580R. [Google Scholar]
  18. Huang, J.; Breheny, P.; Ma, S. A selective review of group selection in high-dimensional models. Stat. Sci. A Rev. J. Inst. Math. Stat. 2012, 27. [Google Scholar] [CrossRef]
  19. Breheny, P.; Huang, J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput. 2015, 25, 173–187. [Google Scholar] [CrossRef] [Green Version]
  20. Hu, Y.; Li, C.; Meng, K.; Qin, J.; Yang, X. Group sparse optimization via lp, q regularization. J. Mach. Learn. Res. 2017, 18, 960–1011. [Google Scholar]
  21. Jiang, L.; Zhu, W. Iterative Weighted Group Thresholding Method for Group Sparse Recovery. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 63–76. [Google Scholar] [CrossRef] [PubMed]
  22. Jiao, Y.; Jin, B.; Lu, X. Group Sparse Recovery via the 0(2) Penalty: Theory and Algorithm. IEEE Trans. Signal Process. 2016, 65, 998–1012. [Google Scholar] [CrossRef]
  23. Chen, P.Y.; Selesnick, I.W. Group-sparse signal denoising: Non-convex regularization, convex optimization. IEEE Trans. Signal Process. 2014, 62, 3464–3478. [Google Scholar] [CrossRef] [Green Version]
  24. Abe, J.; Yamagishi, M.; Yamada, I. Linearly involved generalized Moreau enhanced models and their proximal splitting algorithm under overall convexity condition. Inverse Probl. 2020, 36, 035012. [Google Scholar] [CrossRef] [Green Version]
  25. Chen, Y.; Yamagishi, M.; Yamada, I. A Generalized Moreau Enhancement of 2,1-norm and Its Application to Group Sparse Classification. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021. [Google Scholar]
  26. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  27. Larsson, V.; Olsson, C. Convex low rank approximation. Int. J. Comput. Vis. 2016, 120, 194–214. [Google Scholar] [CrossRef]
  28. Blake, A.; Zisserman, A. Visual Reconstruction; MIT Press: Cambridge, MA, USA, 1987. [Google Scholar]
  29. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
  30. Nikolova, M.; Ng, M.K.; Tam, C.P. Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Trans. Image Process. 2010, 19, 3073–3088. [Google Scholar] [CrossRef] [PubMed]
  31. Selesnick, I. Sparse regularization via convex analysis. IEEE Trans. Signal Process. 2017, 65, 4481–4494. [Google Scholar] [CrossRef]
  32. Yin, L.; Parekh, A.; Selesnick, I. Stable principal component pursuit via convex analysis. IEEE Trans. Signal Process. 2019, 67, 2595–2607. [Google Scholar] [CrossRef]
  33. Abe, J.; Yamagishi, M.; Yamada, I. Convexity-edge-preserving signal recovery with linearly involved generalized minimax concave penalty function. In Proceedings of the ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 4918–4922. [Google Scholar]
  34. Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 210–227. [Google Scholar] [CrossRef] [Green Version]
  35. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
  36. Xu, Y.; Sun, Y.; Quan, Y.; Luo, Y. Structured sparse coding for classification via reweighted 2,1 minimization. In CCF Chinese Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2015; pp. 189–199. [Google Scholar]
  37. Zheng, J.; Yang, P.; Chen, S.; Shen, G.; Wang, W. Iterative re-constrained group sparse face recognition with adaptive weights learning. IEEE Trans. Image Process. 2017, 26, 2408–2423. [Google Scholar] [CrossRef]
  38. Zhang, C.; Li, H.; Chen, C.; Qian, Y.; Zhou, X. Enhanced group sparse regularized nonconvex regression for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020. [Google Scholar] [CrossRef]
  39. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.; Springer International Publishing: New York, NY, USA, 2017. [Google Scholar]
  40. Zhao, L.; Hu, Q.; Wang, W. Heterogeneous feature selection with multi-modal deep neural networks and sparse group lasso. IEEE Trans. Multimed. 2015, 17, 1936–1948. [Google Scholar] [CrossRef] [Green Version]
  41. Qin, Z.; Scheinberg, K.; Goldfarb, D. Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 2013, 5, 143–169. [Google Scholar] [CrossRef]
  42. Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
  43. Samaria, F.S.; Harter, A.C. Parameterisation of a stochastic model for human face identification. In Proceedings of the1994 IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA, 5–7 December 1994; pp. 138–142. [Google Scholar]
  44. Cai, D.; He, X.; Hu, Y.; Han, J.; Huang, T. Learning a spatially smooth subspace for face recognition. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–7. [Google Scholar]
Figure 1. Simple examples of two group sparse regularizers (one group case): (a) The 2 , 1 regularizer; (b) The regularizer · 2 , 1 I n .
Figure 1. Simple examples of two group sparse regularizers (one group case): (a) The 2 , 1 regularizer; (b) The regularizer · 2 , 1 I n .
Algorithms 14 00312 g001
Figure 2. Estimated sparse coefficients x ^ by GSC and proposed method respectively.
Figure 2. Estimated sparse coefficients x ^ by GSC and proposed method respectively.
Algorithms 14 00312 g002
Figure 3. An example of results by WGSC and proposed method.
Figure 3. An example of results by WGSC and proposed method.
Algorithms 14 00312 g003
Table 1. Recognition results on the USPS database.
Table 1. Recognition results on the USPS database.
MethodTraining Set Size ( α = max i { n i } , β = min i { n i } )
α 102550
β 5105252550
GSC(with λ = 1.5 )81.4%86.6%73.6%91.4%88.4%93.2%
Proposed ( λ = 1.5 )82.0%87.2%79.0%92.2%89.4%93.0%
Proposed ( λ = 3 )82.6%87.8%80.8%92.2%90.6%93.4%
Table 2. Recognition results on the ORL database.
Table 2. Recognition results on the ORL database.
MethodTraining Set Size ( α = max i { n i } , β = min i { n i } )
α 468
β 446468
GSC86.3%85.0%91.3%85.0%92.5%93.8%
Proposed ( W = I n )88.8%86.3%93.8%86.3%93.8%95.0%
WGSC90.6%87.5%95.0%88.8%93.8%96.3%
Proposed ( W by (9))91.3%89.4%95.6%91.9%94.4%96.3%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, Y.; Yamagishi, M.; Yamada, I. A Linearly Involved Generalized Moreau Enhancement of 2,1-Norm with Application to Weighted Group Sparse Classification. Algorithms 2021, 14, 312. https://doi.org/10.3390/a14110312

AMA Style

Chen Y, Yamagishi M, Yamada I. A Linearly Involved Generalized Moreau Enhancement of 2,1-Norm with Application to Weighted Group Sparse Classification. Algorithms. 2021; 14(11):312. https://doi.org/10.3390/a14110312

Chicago/Turabian Style

Chen, Yang, Masao Yamagishi, and Isao Yamada. 2021. "A Linearly Involved Generalized Moreau Enhancement of 2,1-Norm with Application to Weighted Group Sparse Classification" Algorithms 14, no. 11: 312. https://doi.org/10.3390/a14110312

APA Style

Chen, Y., Yamagishi, M., & Yamada, I. (2021). A Linearly Involved Generalized Moreau Enhancement of 2,1-Norm with Application to Weighted Group Sparse Classification. Algorithms, 14(11), 312. https://doi.org/10.3390/a14110312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop