Next Article in Journal
UIO-Based Practical Fixed-Time Fault Estimation Observer Design of Nonlinear Systems
Next Article in Special Issue
Confidence Intervals for Common Coefficient of Variation of Several Birnbaum–Saunders Distributions
Previous Article in Journal
Multi-Disciplinary Computational Investigations on Asymmetrical Failure Factors of Disc Brakes for Various CFRP Materials: A Validated Approach
Previous Article in Special Issue
A Continuous Granular Model for Stochastic Reserving with Individual Information
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Sparse Reduced-Rank Regression with Response Dependency

1
School of Statistics and Mathematics, Interdisciplinary Research Institute of Data Science, Shanghai Lixin University of Accounting and Finance, Shanghai 201209, China
2
School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China
3
KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200241, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(8), 1617; https://doi.org/10.3390/sym14081617
Submission received: 22 June 2022 / Revised: 3 August 2022 / Accepted: 4 August 2022 / Published: 6 August 2022

Abstract

:
In multiple response regression, the reduced rank regression model is an effective method to reduce the number of model parameters and it takes advantage of interrelation among the response variables. To improve the prediction performance of the multiple response regression, a method for the sparse robust reduced rank regression with covariance estimation(Cov-SR4) is proposed, which can carry out variable selection, outlier detection, and covariance estimation simultaneously. The random error term of this model follows a multivariate normal distribution which is a symmetric distribution and the covariance matrix or precision matrix must be a symmetric matrix that reduces the number of parameters. Both the element-wise penalty function and row-wise penalty function can be used to handle different types of outliers. A numerical algorithm with a covariance estimation method is proposed to solve the robust sparse reduced rank regression. We compare our method with three recent reduced rank regression methods in a simulation study and real data analysis. Our method exhibits competitive performance both in prediction error and variable selection accuracy.

1. Introduction

In traditional statistical models, the response variables are usually studied in the one-dimensional case, for example, references [1,2,3,4]. Recently, there has been an increasing need to predict several response variables by a common set of covariates. For example, in bioinformatics, multiple phenotypes of a patient are to be predicted based on measures of genetic variation. In economics, the returns of multiple stocks can be predicted based on a common set of economic and market variables. In neuroscience, the responses of different regions of the brain can be predicted based on the input stimulus. Such data analysis problems are widely available in multiple regression.
For multiple response variable regression, one can naively model linear regression for each response variable individually by ignoring the inherent correlation structure between the response variables. In fact, performing a separate ordinary least squares regression for each response variable to estimate the coefficient matrix is consistent with performing maximum likelihood estimation. Another practical problem is that the number of parameters in the regression coefficient matrix can be huge even for a modest number of response variables.
It is natural to think of ways to improve this simple approach. Here two directions can be used to reduce the degrees of freedom parameters of the model and to improve its predictive power and interpretability. One of these directions is to reduce the dimensionality of the coefficient matrix. To take advantage of correlations between the response variables, the rank of the coefficient matrix is restricted, which gives rise to the well-known reduced-order regression (RRR), as in references [5,6]. RRR allows the response variables to borrow information from each other through a common set of latent factors to improve the prediction accuracy. However, each latent factor is a linear combination of all predictors. When there are a large number of predictor variables, some of them may not be useful for prediction. Therefore, an alternative approach is to reduce the number of free parameters by variable selection to exclude redundant predictors in forming latent factors. The issue of rank reduction estimation and variable selection in the RRR framework has been extensively studied in the recent literature [7,8,9,10,11,12]. However, in all these works, the components of the error terms are assumed to be independently identically distributed. Ref. [13] added variable selection and covariance estimation to the RRR model to improve the predictive performance and facilitate interpretation.
Although RRR can substantially reduce the number of free parameters in multivariate problems, Ref. [14] found that it is extremely sensitive to outliers. In real-world data analysis, outliers and leverage points are bound to occur, and thus, the genuine reduced-rank structure could easily be masked or distorted. Ref. [15] proposed a method to detect outliers in a continuous distribution based on the cumulative distribution function. Ref. [16] proposes the use of order statistics to detect extreme values in continuously distributed samples. Ten case studies of outlier detection are given in reference [16], the method of order statistics demonstrates its excellence in a comparative study. However, this method of outlier detection is without covariates and the effect of covariates needs to be taken into account in our paper.
The closely related research of this paper includes sparse reduced-rank regression with covariance estimation (Cov-SRRR) in literature [13] and robust reduced-rank regression (R4) in literature [14]. However, the Cov-SRRR method does not consider the robustness and the R4 method does not consider the correlations among error terms and the sparsity of the coefficient matrix. They only considered a subset of our objectives. The main contributions of this paper are highlighted as follows.
  • In this work, the mean-shift method is used to get a robust estimation that can explicitly detect outliers, account for general correlations among error terms, and improve estimation accuracy by applying shrinkage on the precision matrix. In addition, we also incorporate variable selection in the model estimation, which is done through a penalty function on the coefficient.
  • A block coordinate descent algorithm is proposed to obtain parameter estimates of the objective function. The monotonicity of the algorithm is also given.
The rest of the paper is organized as follows: in Section 2, we formally introduce the framework of reduced-rank regression with a sparse mean-shift outlier component and an unknown error covariance matrix. The penalized estimation method is used to get a sparse estimation. A block coordinate descent algorithm is developed in Section 3 to get the estimation of the RRR model proposed in Section 2. The simulation studies are given in Section 4, which compared the proposed model to relevant competitors. A real data application is shown in Section 5. Finally, a summary and brief discussion are given in Section 6.

2. Robust Reduced-Rank Regression Model with Unknown Covariance

Firstly, we write the multivariate regression model in matrix notation as
Y = X B + E subject to r ( B ) = r ,
resulting in the famous RRR model, where Y = ( y 1 , , y n ) T , X = ( x 1 , , x n ) T , y i R m , x i R p , r min ( p , m ) , r ( · ) denote rank and B is a p × m coefficient matrix, and  E = ( ε 1 , , ε n ) T is a random n × m matrix and ε i from the N ( 0 , Σ ε ) distribution.
 Ref. [17] investigates the outlier detection from the perspective of adding a mean shift parameter for each data point in a one-dimensional response variable. Ref. [14] introduces a multivariate mean-shift regression model for detecting outlier terms. We introduce a mean-shift regression model that includes multivariate response variable dependence. The mean-shift multivariate response variable regression model can be written as
Y = X B + C + E ,
where Y is the n × m response matrix, X is the n × p predictor matrix, B is the p × m coefficient matrix, C is the n × m matrix which describes the outlying effects on Y and E is the n × m error matrix whose rows are independent draws from N ( 0 , Σ ε ) which is a symmetric distribution. The unknown parameters in the model can be collectively written as ( B , C , Σ ε ) or ( B , C , Ω ) , where Ω = Σ ε 1 denotes the precision matrix. The assumption that errors are correlated is necessary for many applications, for example, when the predictors are not sufficient to eliminate all the correlations between the dependent variables. Obviously, this leads to an over-parameterized model, so we must regularize the unknown matrices appropriately. We assume that B has a low-rank structure and C is a sparse matrix with only a few nonzero terms due to their inconsistency with the majority of the data. So the following sparse robust reduced-rank regression problem is proposed
min B , C , Ω 1 n tr { ( Y X B C ) Ω ( Y X B C ) T } log | Ω | + λ 1 j j | ω j j | + P ( B ; λ 2 ) + P ( C ; λ 3 ) subject to r ( B ) r ,
where P ( B ; λ 2 ) and P ( C ; λ 3 ) are the penalty functions for B and C . The penalty parameters are λ 2 and λ 3 , respectively. The term ω j j is the ( j , j ) entry in Ω and the lasso penalty on the off-diagonal entries of Ω has two reasons. First, it ensures the optimal solution of Ω has a finite objective function value when there are more responses than samples; second, the penalty has the effect of getting a sparse estimation. The penalty of P ( B ; λ 2 ) encourages sparsity estimation of coefficient matrix B . The term of P ( C ; λ 3 ) is used to get a sparse estimation of C . The  λ 1 , λ 2 and λ 3 are the tuning parameters. Based on the symmetry of multivariate normal distribution, the covariance matrix or precision matrix must be a symmetric matrix, which reduces the number of parameters and improves the efficiency of parameter estimation. Following reference [18], we estimate the precision matrix using the Gaussian likelihood together with a sparsity-inducing L 1 -penalty on all of its entries
min Ω tr ( S e Ω ) log | Ω | + λ 1 j j | ω j j |
where tr is the trace of the matrix and S e = 1 n ( Y X B C ) T ( Y X B C ) .
As shown in the above formula rank ( B ) = r and r min ( p , m ) resulting in the reduced-rank regression (RRR) model given in book [19]. The reduced-rank restriction means a number of linear constraints on regression coefficients which can reduce the number of parameters and improve the estimation efficiency. According to the rank constraint B can be expressed as a product of two rank r matrices as follows
B = S V T .
Inspired by the penalized regression with a grouped lasso penalty in literature [12,20] consider the following optimization problem
min S , V 1 n tr ( Y X S V T ) T ( Y X S V T ) + j = 1 p λ j | | S j | | 2 subject to V T V = I r .
where S j is the jth row of S , λ j is the corresponding tuning parameter and I r is a unit matrix of order r. The sparsity of the rows means that the unselected variables are not associated with any response variable. The element-wise sparsity penalty can also be applied to  S . These two types of sparsity can be blended, firstly, with row sparsity employed in a screening step to reduce p to some dimension d ( d < p ) desired, and then, the element-wise sparsity pursuit in each individual loading vector. In this paper, similar to sparse reduced rank regression (SRRR) in literature [12], only the row-wise sparsity is considered for S .
The problem (3) can be expressed as
min S , V , C , Ω f = 1 n tr { ( Y X S V T C ) Ω ( Y X S V T C ) T } log | Ω | + λ 1 j j | ω j j | + P ( S ; λ 2 ) + P ( C ; λ 3 ) subject to V T Ω V = I r .
As mentioned above, the assumptions of the model are very mild and, therefore, it can be applied to a wide range of applications. In addition, the model parameters are appropriately regularized, so the model fitting should be robust and the result has well interpretations. As is given in Theorem 2 by paper [14], when the covariance matrix is not considered, the mean-shift model is equivalent to the robust M-estimation problem for B . There is no doubt that both simulation studies and the form of model expressions can show the robustness of the method.

3. Computational Algorithm

This section presents an algorithm for solving the optimization problem (7), iteratively optimizing with respect to S , V , C and Ω .
For given ( S [ t ] , V [ t ] , C [ t ] ), the problem of solving for the precision matrix Ω can be expressed as
min Ω tr ( S R Ω ) log | Ω | + λ 1 j j | ω j j | ,
where S R = 1 n ( Y X S [ t ] V [ t ] T C [ t ] ) T ( Y X S [ t ] V [ t ] T C [ t ] ) . Reference [21] describes that this problem can be solved by applying the DP-GLASSO algorithm in the R package dpglasso. The DP-GLASSO algorithm optimizes the primal objective with the actual precision matrix and the GLASSO algorithm given in reference [22] can solve the dual of the graphical lasso penalized likelihood with the covariance matrix. However, reference [21] shows that the DP-GLASSO algorithm performs better than the GLASSO algorithm.
For fixed Ω [ t + 1 ] and C [ t ] , the objective of minimization over ( S , V ) is
1 n tr { ( Y X S V T C [ t ] ) Ω [ t + 1 ] ( Y X S V T C [ t ] ) T } + P ( S ; λ 2 ) ,
subject to the constraint V T Ω [ t + 1 ] V = I r . Applying the transformation V ˜ = ( Ω [ t + 1 ] ) 1 / 2 V , we have V ˜ T V ˜ = I r . The first term in the objective function (9) can be written as
1 n tr { ( Y ( Ω [ t + 1 ] ) 1 / 2 C [ t ] ( Ω [ t + 1 ] ) 1 / 2 X S V ˜ T ) ( Y ( Ω [ t + 1 ] ) 1 / 2 C [ t ] ( Ω [ t + 1 ] ) 1 / 2 X S V ˜ T ) T } = 1 n | | Y ˜ X S V ˜ T | | 2 = H ( S , V ) ,
where Y ˜ = Y ( Ω [ t + 1 ] ) 1 / 2 C [ t ] ( Ω [ t + 1 ] ) 1 / 2 . When P ( S ; λ 2 ) = i = 1 p λ 2 | | s i | | 2 , then the problem (9) becomes a SRRR problem given in literature [12], which can be solved iteratively. For the sparsity-inducing penalty functions P ( S ; λ 2 ) , there are a lot of choices. The  1 penalty in reference [23] is most popular among the sparse penalty literature but suffers from inconsistency and biased estimates, especially when predictors are correlated. To alleviate those problems, some non-convex penalties such as the p ( 0 < p < 1 ) penalty, SCAD and the capped 1 penalty are proposed to approximate the ideal non-convex 0 ( | | S | | 0 = i , j 1 s i j 0 ) penalty.
To solve the S-optimization problem for a general penalty function P ( t ; λ ) , we propose to use a threshold rule based Θ -estimator as multiple penalty functions usually correspond to a threshold rule. The threshold-based iterative selection procedure (TISP) in reference [24] can be used to solve the penalty problem for any P associated with a threshold rule (an odd, unbounded monotone shrinkage function) given in reference [17].
According to reference [25], Θ -estimator is linked to a general penalty function P ( t ; λ ) by
p ( t ; λ ) p ( 0 ; λ ) = 0 | t | ( sup { s : Θ ( s , λ ) u } u ) d u + q ( t ; λ ) ,
for some nonnegative q ( · ; λ ) such that q ( Θ ( s , λ ) ; λ ) = 0 for all s. Any thresholding rule can be solved by this method. We can apply this technique to solve all popular penalties including but not limited to the aforementioned 1 , SCAD, 0 and p ( 0 < p < 1 ) . This means this method is universally valid. The partial derivative of the function H ( S , V ) at S can be calculated:
S H ( S , V ) = X T ( X S ( V ) T Y ˜ ) V .
The closed-form solution for S given V [ t ] is
S [ t + 1 ] = Θ ( S [ t ] S H ( S [ t ] , V [ t ] ) / ρ ; q g ) ,
where ρ > | | X T X | | 2 2 and Θ is the thresholding function corresponding to penalty P ( S ; λ 2 ) . The further details are given in the literature [25].
When S [ t + 1 ] is given, the optimization problem of V becomes
min V | | Y ˜ X S [ t + 1 ] V ˜ T | | F 2 subject to V ˜ T V ˜ = I r × r .
This can be identified as a Procrustes rotation problem, realizable through computing the singular value decomposition of Y ˜ T X S = PD Q T . The optimal V = ( Ω [ t + 1 ] ) 1 / 2 P Q T . The full algorithm is presented in Algorithm 1.
For fixed Ω [ t + 1 ] , S [ t + 1 ] and V [ t + 1 ] , the problem of solving C reduces to the minimization of
F ( C ) = 1 n tr { ( Y X S [ t + 1 ] V [ t + 1 ] T C ) Ω [ t + 1 ] ( Y X S [ t + 1 ] V [ t + 1 ] T C } + P ( C ; λ 3 ) .
Suppose ( C ) = 1 n tr { ( Y X S [ t + 1 ] V [ t + 1 ] T C ) Ω [ t + 1 ] ( Y X S [ t + 1 ] V [ t + 1 ] T C ) T } . To solve problem (15), construct a surrogate function
g ( C ; C [ t ] ) = ( C [ t ] ) + ( C [ t ] ) , C C [ t ] + ρ 2 | | C C [ t ] | | F 2 + P ( C ; λ 3 ) ,
where ( C [ t ] ) is the gradient of with respect to C . Now define the ( t + 1 ) th iterate as
C [ t + 1 ] = argmin C g ( C ; C [ t ] ) .
The problem of minimizing C boils down to the g-optimization in (17), which is simpler than direct minimizing F. We rewrite the problem in the form of
min C ρ 2 | | C C [ t ] 1 ρ ( Y X S [ t + 1 ] V [ t + 1 ] T C [ t ] ) Ω [ t + 1 ] | | F 2 + P ( C ; λ 3 ) .
To solve this problem (18), for a general penalty function P ( C ; λ 3 ) , we still propose to use the thresholding rule based Θ -estimators. The  0 penalty has the appealing properties of promoting sparsity and can directly restrict the cardinality of nonzero elements/rows in the estimation matrix. Moreover, it is easy to see that tuning for the constraint form i , j 1 C i j 0 / ( p m ) q e is intuitive and easy compared to the penalty form λ | | C | | 0 . Here q e is used as an upper bound for the percentage of nonzero terms. In this paper, to guarantee the sparsity of C , we advise using 0 constraint in place of the 0 penalty λ | | C | | 0 for tuning ease. By employing a quantile thresholding rule Θ , this problem can be solved through TISP easily. The quantile thresholding rule is a special case of hard thresholding which correspond 0 penalty. Let Θ ( C ; q e ) be the element-wise quantile threshold function, then
Θ ( C ; q e ) = 0 , | c i j | λ e c i j , | c i j | > λ e ,
where λ e is the ( 1 q e ) t h quantile of the | c i j | . The tuning parameter q e has a defined range ( 0 < q e < 1 ) as an upper bound for the nonzero percentage, which is easier to interpret.
The row sparsity regularization can be used for variable selection in our multiple response problem and is used in our main algorithm. In particular, for the row sparsity regularization 0 constraint | | C | | 2 , 0 / p q g , it calls for the multivariate version of quantile thresholding Θ ( c ; q g ) :
Θ ( c ; q g ) = 0 , | | c | | 2 λ g c , | | c | | 2 > λ g ,
where λ g is the ( 1 q g ) t h quantile of the norm of the vector c . The row-wise constraint version can be realized by simply changing Θ to Θ .
Both the row-wise sparsity and element-wise sparsity are considered for P ( C ; λ 3 ) to find the outlier rows and outlier elements, respectively.
Assuming that P ( C ; λ 3 ) is constructed by (11), our algorithm iteratively applies the DP-GLASSO, the SRRR, and simple multivariate thresholding operations until the function (7) has converged. The complete numerical algorithm is summarized in Algorithm 1.
Algorithm 1 Algorithm to solve Cov-SR4
Require: Y R n × m , X R n × p , r: the desired rank; M: the maximum iteration number; ε : error tolerance; and the initial estimates S [ 0 ] R p × r , V [ 0 ] R m × r , C [ 0 ] R n × m .
1:
t 0 ;
2:
repeat
3:
   for fixed ( S [ t ] , V [ t ] , C [ t ] ), solve Ω with objective function (8) using the DP-GLASSO algorithm;
4:
   for fixed ( Ω [ t + 1 ] , C [ t ] ), solve ( S , V ˜ ) defined by the SRRR problem (11).
5:
   for fixed ( S [ t + 1 ] , V [ t + 1 ] , Ω [ t + 1 ] ), the multivariate thresholding operations can be used to solve problem (18) to get C [ t + 1 ] ;
6:
    t t + 1 ;
7:
   Calculate f [ t + 1 ] and f [ t ] according to Equation (7);
8:
until t M or | f [ t + 1 ] f [ t ] | ε
9:
return  S [ t + 1 ] , V [ t + 1 ] , C [ t + 1 ] and Ω [ t + 1 ] .
Theorem 1.
Assume the largest eigenvalue of Ω [ t ] equals to L. Then, as long as ρ L , the sequence of iterates defined by (1) satisfies
f ( S [ t + 1 ] , V [ t + 1 ] , C [ t + 1 ] , Ω [ t + 1 ] ) f ( S [ t ] , V [ t ] , C [ t ] , Ω [ t ] ) .
That is, the objective function values are non-increasing during the iteration.
Proof. 
Firstly, for fixed S [ t ] , V [ t ] and C [ t ] , the optimization for Ω with objective function (8) using the DP-GLASSO algorithm is a block coordinate descent (BCD) which is given in reference [21].
Secondly, for fixed Ω [ t + 1 ] , V [ t ] and C [ t ] , the optimization for S with closed-form solution (13) is equivalent to the minimization of following equation
h ( S ; S [ t ] ) = H ( S [ t ] , V [ t ] ) + S H ( S [ t ] , V [ t ] ) , S S [ t ] + ρ 2 | | S S [ t ] | | F 2 + P ( S ; λ 2 ) .
Let G ( S ) = H ( S , V [ t ] ) + P ( S ; λ 2 ) . Then
G ( S [ t + 1 ] ) h ( S [ t + 1 ] ; S [ t ] ) h ( S [ t ] ; S [ t ] ) G ( S [ t ] ) ,
where the first inequality comes from the Taylor expansion and ρ > | | X T X | | 2 2 , and the second inequality holds because S [ t + 1 ] is a minimum point of h ( S ; S [ t ] ) . The monotonicity of V-optimality is due to the fact that it is a closed form minimization.
The monotonicity of C-optimality can be deduced from the following inequality
F ( C [ t + 1 ] ) g ( C [ t + 1 ] ; C [ t ] ) g ( C [ t ] ; C [ t ] ) F ( C [ t ] ) ,
where the first inequality comes from the Taylor expansion and ρ L , and the second inequality holds because C [ t + 1 ] is a minimum point of g ( C ; C [ t ] ) . □

4. Simulation

4.1. Related Methods

In this section, the simulation method is used to illustrate the proposed method (Cov-SR4) and compare it with some related methods. In particular, we compare Cov-SR4 with SRRR, Cov-SRRR, and R4. For the methods of SRRR and R4, the R package rrpack is available. Ref. [13] gives two Cov-SRRR methods according to two cross-validation criteria for tuning. The first one (Cov-SRRR1) is based on likelihood and the second one (Cov-SRRR2) is based on the prediction error. Based on the similar results in literature [13], for convenience, only the Cov-SRRR2 is considered in our paper. To ensure comparability of model effects and algorithm times, all algorithms were rewritten in R and the R packages CVXR and glasso were used in the optimization algorithm. In our simulation, SRRR and R4 assume uncorrelated and homogeneous errors, which means the unweighted residual sum of squares is used to measure the goodness-of-fit. For Algorithm 1, the stopping criterion is based on the difference between the current and previous objective functions. That is, the algorithm used stops if
| f ( S [ t ] , V [ t ] , C [ t ] , Ω [ t ] ) f ( S [ t 1 ] , V [ t 1 ] , C [ t 1 ] , Ω [ t 1 ] ) | δ ,
where δ = 10 4 is used for our simulation study.

4.2. Simulation Setups

In the simulation study, the training data are generated using the model Y = X B + C + E and the prediction data are generated using the model Y = X B + E . The design matrix X is generated by sampling its n rows from multivariate normal distribution N ( 0 , Σ x ) , where Σ x ( i , j ) = 0 . 5 | i j | . The rows of the n × m random noise matrix E are generated from N ( 0 , σ 2 Σ ε ) . The error structure of Σ ε is Σ ε ( i , j ) = ρ ε | i j | , denoted by ar( ρ ε ). The corresponding precision matrix Ω is a tri-diagonal matrix and therefore sparse. We set ρ ε = 0 , 0.5 or 0.9 , representing independent errors, moderate correlation or strongly correlated errors. In each simulation, σ 2 is calculated to control the signal-to-noise ratio. In our paper, σ 2 is chosen to let trace ( C T Σ x C ) / trace ( E T E ) equal 1.
According to reference [10], the coefficient matrix B has the low-rank structure B = S V T and row-wise sparsity. For the p × r component matrix S , the elements of its first p 0 rows are generated from N ( 0 , 1 ) and the rest p p 0 rows are set to be zero. The elements of m × r matrix V are generated from N ( 0 , 1 ) .
All the elements of the first q row of C are added by 10. This yields some outliers with high leverage values. Overall, the signal is contaminated by both random errors and gross outliers. Under each setting, the entire data generation process described above is replicated 50 times. The rank for all the methods in our simulation is set to the true value. To choose the optimal combination of parameters for the Cov-SR4 method, cross-validation (CV) would seem to be an option. However, it lacks theoretical support in the robust low-rank setting, and for three tuning parameters, cross-validation can be quite expensive. As used in reference [14], the predictive information criterion is used for tuning. The five-fold CV is used for tuning the parameters for the other methods.

4.3. Performance Evaluation

In our simulation, the performance of prediction error and variable selection accuracy is considered. We compared the predictive error of these methods in terms of the mean squared prediction error (MSE), defined as
MSE = tr | | ( Y X B ^ ) | | 2 / ( m n ) ,
where B ^ = S [ t ] V [ t ] T and the algorithm stop at step t. As used in reference [13], two other aspects of the row-wise variable selection for the coefficient matrix B and the element-wise variable selection for the precision matrix Ω are also considered. The first one is sensitivity, the proportion of non-zero elements which are correctly selected and the second one is specificity, the proportion of zero elements that are correctly excluded.

4.4. Simulation Results

The data size of the prediction data set is the same as the training data set, which is used for the evaluation of the prediction error. The five-fold cross-validation is used to choose tuning parameters for all methods based on the prediction error.
We show the prediction errors in Table 1. Firstly, we consider the result without outliers. When ρ ε = 0 , these four methods shows the similar prediction error results. However, when ρ ε = 0.5 or 0.9 , the Cov-SR4 is similar to Cov-SRRR2 and these two method shows better performance than the other two methods because they do not consider the dependency between Y except the predictor matrix X . When outliers are considered, Cov-SR4 and R4 show better results than the other two methods because highly leveraged outliers have a significant influence on the other two methods. The mean shift method can stably capture outliers. The Cov-SR4 and R4 show robust estimation results and the performance of Cov-SR4 is slightly better than R4.
The results in terms of variable selection of coefficient matrix and precision matrix are given in Table 2. Here, sen is sensitivity and spe is specificity. Their formulas can be found in reference [26]. Firstly, the variable selection of coefficient matrix are considered. The method of R4 in reference [14] does not consider the sparsity of coefficient matrix and the function r4 in R package rrpack to calculate R4 estimation does not give a sparsity coefficient matrix estimation. So we do not consider the variable selection of coefficient matrix accuracy of R4. When outliers are not considered, all the other three methods give a good performance in terms of the sensitivity of the coefficient matrix. When ρ ε 0 , the Cov-SR4 and Cov-SRRR2 show similar results about specificity, which are slightly better than SRRR. When outliers are considered, the Cov-SR4 also shows a good performance in terms of sensitivity, while Cov-SRRR2 and SRRR influenced by the outliers show a bad performance in sensitivity. However, in this case, the results regarding specificity are similar. From the simulation study, we see that, on the whole, the Cov-SR4 method always shows satisfactory results. Secondly, the variable selection of the precision matrix is considered. When outliers are considered and ρ ε 0 , the Cov-SR4 shows a better performance than Cov-SRRR2 both in sensitivity and specificity.
The results of the average running time are given in Table 3. Algorithm runtimes are measured in CPU time (seconds). Each algorithm was run on a Windows 10 laptop with an Intel(R) Core(TM) i7-7500U CPU 2.70GHz and 8GB of RAM, and all simulations were computed using R-4.1.0. There is no significant difference in the running time of these models for different values of ρ ε . Therefore only the results for ρ ε = 0 are shown. Because the algorithm of Cov-SR4 is the most complex, its running time is the longest, while the algorithm of SRRR is the simplest, so its running time is the shortest.

5. Real Data Applications

5.1. Stock Data

The weekly log returns of nine stocks from 2004 available from the R package MRCE in the literature [27] are considered. This data are also analyzed by reference [14]. The data is modeled with a first-order vector autoregressive model both in references [14,27],
Y = Y ˜ B + C + E ,
where Y and Y ˜ are ( n 1 ) × q matrix. Y has rows y 2 , , y n and the predictor Y ˜ has rows y 1 , , y n 1 . y i means the log returns for the nine companies at week i. For the convenience of calculation, we expanded the data by 100 times.
Following the approach of reference [14], using the weekly log-returns in the first 26 weeks ( n = 26 ) for training and those in the last 26 weeks for forecast. All of the four methods introduced in our paper are used to analyze this data. All of the methods resulted in unit-rank models. The row-wise outliers detect method is used. The Cov-SR4 method and R4 method automatically detected week sixteen as outliers. This outlier corresponds to a real major market disturbance in the automotive industry. Our robust approach automatically takes into account outlier samples, leading to a more reliable model. The mean squared prediction errors for Cov-SR4, SRRR, R4, and Cov-SRRR2 are 7.008, 7.526, 8.543, and 7.022, respectively. The row sparsity is used to estimate coefficient matrix B in our method. The estimation of the unit lag coefficient matrix B for the approximate Cov-SR4 method is reported in Table 4. The cells on the rows of Exxon, GM, and IBM are zeros in our results which are the same as the coefficient matrix estimated by MRCE (Rothman et al., 2010). The only difference is that the coefficient of Walmart is also zeros in our work. From the coefficient matrix B , it can be found that GE, CPhillips, Citigroup, and AIG have a significant effect on the stock prices of all companies.
The estimation for the inverse error covariance matrix for the Cov-SR4 method is reported in Table 5 and this result is also similar to the inverse error covariance matrix estimated by MRCE in reference [27]. The non-zero term in Table 5 implies that the errors of the two firms are correlated to a given set of errors of the other firms. We see that AIG (an insurance company) is estimated to be partially correlated with most other companies. And companies with similar businesses are also more likely to be partially correlated, such as Ford and GM, both in the automotive industry, GE and IBM, both in the technology industry, and CPhillips and Exxon, both related to the oil industry. These results are meaningful in financial market practical applications.

5.2. US COVID-19 State-Level Mortality Rate Analysis

As the coronavirus is spreading, causing millions of deaths and hundreds of millions of people getting sick, it is extremely important to know the case fatality rate (CRF). Data comes from the r package covid19nytimes. This package contains the daily number of cases of infections and deaths in various states in the United States. Naive estimates of CFR from the reported numbers of total confirmed cases and total deaths are difficult to interpret due to the advances in treatment technology and differences in medical resources. The daily data is so irregular the features once we are settled on the appropriate lag time, we can look at the fatality rate per identified case. This is one possible measure of fatality rate over time. The 7-day moving averages were used to smooth the series. Divide the number of deaths in a moving average of seven days by the number of infections in a moving average of seven days as the daily mortality rate.
Data from eight states in the southeastern United States are used, such as Alabama, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, and Tennessee. Data for 110 days from 8 April 2020 to 29 July 2020 are used. This data is also modeled with a first-order vector autoregressive model,
Y = Y ˜ B + C + E ,
where Y and Y ˜ are ( n 1 ) × q matrix. Y has rows y 2 , , y n and the predictor Y ˜ has rows y 1 , , y n 1 . y i means the daily mortality rate for the eight states at day i. For the convenience of calculation, we also expanded the data by 100 times.
Similar to the previous example, the daily mortality rate for the first 60 days ( n = 60 ) is used for training, and the daily mortality rate for the 50 days after is used for prediction. All four methods described in our paper are also used to analyze this data. The model is best when the rank of all methods is 6. The row-wise outliers detect method is used. Both Cov-SR4 and R4 methods automatically detect that 25 April 2020 is an outlier. According to the CDC, the outlier mortality rate for that day’s data may be due to a decrease in COVID-19 mortality compared to last week but may rise again as more death certificates are counted. The outlier has a surprising effect on both coefficient estimation and model prediction. This can be seen from | | B ^ B ˜ | | F / | | B ˜ | | F 61.6 % and | | X B ^ X B ˜ | | F / | | X B ˜ | | F 14.5 % , where B ^ and B ˜ denote the Cov-SR4 and the Cov-SRRR2 estimates, respectively. The mean squared prediction errors for Cov-SR4, SRRR, R4, and Cov-SRRR2 are 0.130, 0.167, 0.133, and 0.156, respectively. The robust method of Cov-SR4 and R4 shows better performance. The estimation of the coefficient matrix B for the Cov-SR4 method is reported in Table 6. From the coefficient matrix B , a significant positive correlation can be found for CRF in the southeastern states of the United States. The estimation for the inverse error covariance matrix for the Cov-SR4 method is reported in Table 7, which cannot be obtained from the R4 method. The non-zero term in Table 7 implies that Louisiana is partially correlated with most other states. South Carolina is also partially correlated with Alabama and Georgia. These results have practical application to the study of CRF due to COVID-19 in the southeastern United States.

6. Discussion

In this paper, we propose a block coordinate descent algorithm to solve the sparse robust reduced-rank regression with covariance estimation and prove the monotonicity of the new algorithm. This method can explicitly detect outliers, account for general correlations among error terms, and incorporate variable selection. In the multiple response variable regression model, the new method outperformed other methods in terms of prediction error and variable selection. Both simulation studies and real data analysis demonstrate the effectiveness of the new models and algorithms.
Although the mean-shift method is used to detect the outliers in the new model, in the future, we will consider the general robust loss to conquer leverage outliers. The method of order statistics used in reference [16] is a new approach to outlier detection. In addition, for non-Gaussian distributed multivariate response variables, how to capture the dependence among response variables is a very challenging but widely used scenario. And the application of this dependence to regression models of multivariate non-Gaussian response variables is a topic worthy of further study. As far as I know, there are few studies on statistical tests in this direction.

Author Contributions

Data curation, W.L.; Formal analysis, W.L.; Writing—original draft, W.L. and G.L.; Writing—review and editing, W.L., G.L. and Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

The research is supported by the Natural Science Foundation of China (Nos. 11803159, 11271136, 81530086, 11671303, 11201345, 71771089), the 111 Project of China (No. B14019), the Shanghai Philosophy and Social Science Foundation (No. 2015BGL001) and the National Social Science Foundation Key Program of China (No. 17ZDA091).

Data Availability Statement

Stock data and COVID-19 data are available from the R packages MRCE and covid19nytimes, respectively.

Acknowledgments

The authors thank the editor, the associate editor, and the reviewers for their helpful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ancha, X.; Shirong, Z.; Yincai, T. A unified model for system reliability evaluation under dynamic operating conditions. IEEE Trans Reliab. 2021, 70, 65–72. [Google Scholar]
  2. Chunling, L.; Lijuan, S.; Ancha, X. Modelling and estimation of system reliability under dynamic operating environments and lifetime ordering constraints. Reliab. Eng. Syst. Saf. 2022, 218, 108136. [Google Scholar]
  3. Hu, J.; Chen, P. Predictive Maintenance of Systems Subject to Hard Failure Based on Proportional Hazards Model. Reliab. Eng. Syst. Saf. 2020, 196, 106707. [Google Scholar] [CrossRef]
  4. Chen, P.; Ye, Z.S. Approximate Statistical Limits for a Gamma Distribution. Int. J. Qual. Eng. Technol. 2017, 49, 64–77. [Google Scholar] [CrossRef]
  5. Anserson, T.W. Estimating linearrestrictions onregression coefficientsformultivariate normal distributions. Ann. Math. Stat. 1951, 22, 327–351. [Google Scholar] [CrossRef]
  6. Izenman, A.J. Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 1975, 5, 248–264. [Google Scholar] [CrossRef]
  7. Obozinski, G.; Wainwright, M.J.; Jordan, M.I. Support union recovery in high-dimensional multivariate regression. Ann. Stat. 2011, 39, 1–47. [Google Scholar] [CrossRef]
  8. Negahban, S.; Wainwright, M.J. Estimation of (near) lowrank matrices with noise and high-dimensional scaling. Ann. Stat. 2011, 39, 1069–1097. [Google Scholar] [CrossRef]
  9. Chen, K.; Chan, K.S.; Stenseth, N.C. Reduced rank stochastic regression with a sparse singular value decomposition. J. R. Stat. Soc. B. 2012, 74, 203–221. [Google Scholar] [CrossRef]
  10. Chen, L.; Huang, J.Z. Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection in Multivariate Regression. J. Am. Stat. Assoc. 2012, 107, 1533–1545. [Google Scholar] [CrossRef]
  11. Bunea, F.; She, Y.; Wegkamp, M.H. Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Stat. 2011, 39, 1282–1309. [Google Scholar] [CrossRef]
  12. Bunea, F.; She, Y.; Wegkamp, M.H. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Stat. 2012, 40, 2359–2388. [Google Scholar] [CrossRef]
  13. Chen, L.; Huang, J.Z. Sparse reduced-rank regression with covariance estimation. Comput. Stat. 2016, 26, 461–470. [Google Scholar] [CrossRef]
  14. She, Y.; Chen, K. Robust reduced-rank regression. Biometrika 2012, 104, 633–647. [Google Scholar] [CrossRef]
  15. Jäntschi, L. A test detecting the outliers for continuous distributions based on the cumulative distribution function of the data being tested. Symmetry 2019, 11, 835. [Google Scholar] [CrossRef]
  16. Jäntschi, L. Detecting extreme values with order statistics in samples from continuous distributions. Mathematics 2020, 8, 216. [Google Scholar] [CrossRef]
  17. She, Y.; Owen, A.B. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 2011, 106, 626–639. [Google Scholar] [CrossRef]
  18. Yuan, M.; Lin, Y. Model selection and estimation in the gaussian graphical model. Biometrika 2007, 94, 19–35. [Google Scholar] [CrossRef]
  19. Reinsel, G.C.; Velu, R.P. Reduced-Rank Regression Model. In Multivariate Reduced-Rank Regression, Theory and Applications, 1st ed.; Springer: New York, NY, USA, 1998; pp. 15–55. [Google Scholar]
  20. Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B. 2006, 68, 49–67. [Google Scholar] [CrossRef]
  21. Mazumder, R.; Hastie, T. The graphical lasso: New insights and alternatives. Electron. J. Stat. 2012, 6, 2125–2149. [Google Scholar] [CrossRef]
  22. Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2007, 9, 432–441. [Google Scholar] [CrossRef]
  23. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B. 1996, 267–288. [Google Scholar] [CrossRef]
  24. She, Y. Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Stat. 2009, 3, 384–415. [Google Scholar] [CrossRef]
  25. She, Y. An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Comput. Stat. Data An. 2012, 56, 2976–2990. [Google Scholar] [CrossRef]
  26. Trevethan, R. Sensitivity, specificity, and predictive values: Foundations, pliabilities, and pitfalls in research and practice. Front. Public Health 2017, 5, 307. [Google Scholar] [CrossRef]
  27. Rothman, A.; Levina, E.; Zhu, J. Sparse multivariate regression with covariance estimation. J. Comput. Graph. Stat. 2010, 19, 947–962. [Google Scholar] [CrossRef]
Table 1. Comparison of four methods in terms of prediction errors with rank r = 3 . The means and SEs (in parentheses) are given based on 50 simulation runs.
Table 1. Comparison of four methods in terms of prediction errors with rank r = 3 . The means and SEs (in parentheses) are given based on 50 simulation runs.
ParametersPredictions
n p p 0 m q Σ e Cov-SR4SRRRR4Cov-SRRR2
5010550ar(0)0.448(0.272)0.548(0.212)0.370(0.228)0.452(0.274)
ar(0.5)0.361(0.324)0.470(0.247)0.355(0.259)0.369(0.254)
ar(0.9)0.312(0.270)0.478(0.242)0.329(0.250)0.315(0.268)
5010551ar(0)0.484(0.240)0.636(0.203)0.542(0.211)1.056(0.552)
ar(0.5)0.394(0.234)0.571(0.205)0.488(0.207)0.802(0.465)
ar(0.9)0.343(0.181)0.524(0.188)0.428(0.184)0.851(0.665)
100201050ar(0)0.275(0.035)0.402(0.047)0.267(0.029)0.276(0.036)
ar(0.5)0.330(0.091)0.451(0.056)0.344(0.082)0.332(0.092)
ar(0.9)0.507(0.098)0.626(0.086)0.408(0.112)0.512(0.095)
100201051ar(0)0.395(0.120)0.584(0.133)0.425(0.170)1.292(0.311)
ar(0.5)0.417(0.143)0.578(0.179)0.416(0.151)1.835(1.445)
ar(0.9)0.368(0.151)0.561(0.153)0.403(0.177)1.312(0.773)
Table 2. Comparison of four methods in terms of variable selection accuracy with rank r = 3 .
Table 2. Comparison of four methods in terms of variable selection accuracy with rank r = 3 .
ParametersVariable Selection
n p p 0 m q Σ e Cov-SR4SRRRR4Cov-SRRR2
SenSpeSenSpeSenSpeSenSpe
5010550ar(0)10.9610.98--10.96
ar(0.5)0.980.980.990.96--0.981
ar(0.9)0.9910.980.99--10.99
5010551ar(0)10.9810.84--0.980.96
ar(0.5)0.980.980.990.86--0.981
ar(0.9)0.980.990.940.96--0.930.99
100201050ar(0)0.990.9910.99--0.991
ar(0.5)10.980.990.99--0.990.98
ar(0.9)10.9911--0.991
100201051ar(0)10.980.960.99--0.960.99
ar(0.5)0.980.980.990.98--0.970.98
ar(0.9)10.9911--0.991
parameters Ω
5010550ar(0)-0.92-----0.92
ar(0.5)0.300.90----0.320.88
ar(0.9)0.500.68----0.520.63
5010551ar(0)-0.95-----0.67
ar(0.5)0.580.80----0.520.68
ar(0.9)0.730.57----0.650.47
100201050ar(0)-0.98-----0.98
ar(0.5)0.400.98----0.420.98
ar(0.9)0.860.58----0.860.67
100201051ar(0)-0.97-----0.50
ar(0.5)0.570.87----0.550.53
ar(0.9)0.690.71----0.560.48
Table 3. Comparison of four methods in terms of running time with rank r = 3 . Average running time(s) comparison based on 50 simulation runs.
Table 3. Comparison of four methods in terms of running time with rank r = 3 . Average running time(s) comparison based on 50 simulation runs.
np p 0 mqCov-SR4SRRRR4Cov-SRRR2
501055064.584.4451.1247.39
5010551103.194.6581.5177.03
100201050176.414.51116.2868.22
100201051241.494.47146.68123.85
Table 4. Estimated coefficient matrix B of Cov-SR4 for stock data..
Table 4. Estimated coefficient matrix B of Cov-SR4 for stock data..
WalExxGMFordGECPhilCitiIBMAIG
Walmart0.0000.0000.0000.0000.0000.0000.0000.0000.000
Exxon0.0000.0000.0000.0000.0000.0000.0000.0000.000
GM0.0000.0000.0000.0000.0000.0000.0000.0000.000
Ford−0.008−0.004−0.016−0.002−0.006−0.006−0.002−0.005−0.006
GE0.0470.0280.1000.0120.0350.0400.0090.0300.037
CPhillips−0.138−0.082−0.292−0.034−0.103−0.116−0.028−0.089−0.107
Citigroup0.1330.0790.2810.0330.0990.1120.0270.0850.103
IBM0.0000.0000.0000.0000.0000.0000.0000.0000.000
AIG0.1300.0770.2760.0320.0970.1100.0260.0840.101
Table 5. Inverse error covariance estimate of Cov-SR4 for stock data.
Table 5. Inverse error covariance estimate of Cov-SR4 for stock data.
WalExxGMFordGECPhilCitiIBMAIG
Walmart1547.0250.000−214.4590.0000.00091.5220.0000.0000.000
Exxon0.0003037.3520.0000.0000.000−936.0690.0000.000−45.210
GM−214.4590.0002116.922−1103.131−167.8190.000−255.5710.000−20.347
Ford0.0000.000−1103.1311007.4510.0000.0000.0000.0000.000
GE0.0000.000−167.8190.0002031.9200.000−118.701−1013.879−118.590
CPhillips91.522−936.0690.0000.0000.0002233.8200.0000.000−96.543
Citigroup0.0000.000−255.5710.000−118.7010.0002918.2380.000−257.842
IBM0.0000.0000.0000.000−1013.8790.0000.0003203.4000.000
AIG0.000−45.210−20.3470.000−118.590−96.543−257.8420.0001668.546
Table 6. Estimated coefficient matrix B of Cov-SR4 for US COVID-19 data.
Table 6. Estimated coefficient matrix B of Cov-SR4 for US COVID-19 data.
AlaFLGeLouMisNCSCTen
Alabama0.5500.0550.027−0.201−0.076−0.1270.0150.024
Florida0.2070.2220.0090.0160.127−0.0210.333−0.074
Georgia0.0370.1160.6770.0320.1100.1510.031−0.027
Louisiana0.0880.119−0.0440.738−0.0010.2100.017−0.011
Mississippi−0.1250.3160.141−0.1340.778−0.106−0.2250.094
North Carolina−0.0900.0120.2200.932−0.0920.380−0.0750.051
South Carolina0.0500.2070.008−0.0100.0880.0720.7890.025
Tennessee0.234−0.2060.1160.2220.0780.2320.1130.826
Table 7. Inverse error covariance estimate of Cov-SR4 for US COVID-19 data.
Table 7. Inverse error covariance estimate of Cov-SR4 for US COVID-19 data.
AlaFLGeLouMisNCSCTen
Alabama5.1190.0000.0000.0000.0000.0000.1880.000
Florida0.0005.3550.000−0.0590.0000.0000.0000.000
Georgia0.0000.0005.5900.7220.0000.000−0.0970.000
Louisiana0.000−0.0590.7221.482−0.1710.0000.0000.000
Mississippi0.0000.0000.000−0.1717.3790.0000.0000.000
North Carolina0.0000.0000.0000.0000.0007.5030.0000.000
South Carolina0.1880.000−0.0970.0000.0000.0004.1710.000
Tennessee0.0000.0000.0000.0000.0000.0000.00019.458
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, W.; Liu, G.; Tang, Y. Robust Sparse Reduced-Rank Regression with Response Dependency. Symmetry 2022, 14, 1617. https://doi.org/10.3390/sym14081617

AMA Style

Liu W, Liu G, Tang Y. Robust Sparse Reduced-Rank Regression with Response Dependency. Symmetry. 2022; 14(8):1617. https://doi.org/10.3390/sym14081617

Chicago/Turabian Style

Liu, Wenchen, Guanfu Liu, and Yincai Tang. 2022. "Robust Sparse Reduced-Rank Regression with Response Dependency" Symmetry 14, no. 8: 1617. https://doi.org/10.3390/sym14081617

APA Style

Liu, W., Liu, G., & Tang, Y. (2022). Robust Sparse Reduced-Rank Regression with Response Dependency. Symmetry, 14(8), 1617. https://doi.org/10.3390/sym14081617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop