1. Introduction
Despite various types and goals, portfolio managing strategies can be classified into two main groups according to styles, namely, active and passive strategies. The former ones attempt to beat the market by exploiting market inefficiency, whereas the latter ones prefer to follow the market. However, the majority of actively managed funds do not outperform the market in the long run [
1]. On the contrary, passive funds provide market-level profits without taking an active risk. Passive funds become more popular in recent years. U.S. equity index fund assets have surpassed the assets of their actively managed counterparts for the first time, according to Morningstar’s latest fund flows report. This trend is also growing in other parts of the world.
The most commonly used passive investing strategy is index tracking, which aims to mimic the performance of a specified basket of underlying assets. Without exposure to active risk, an index tracking fund needs to follow the targeting index as closely as possible. The performance of an index tracker is measured by tracking error [
2], which is defined as the divergence between the index tracker and the targeting index. Tracking error gives investors a sense of how “tight” the index tracker goes after the index. Smaller tracking error means less exposure to active risk.
A straightforward method to construct an index tracking portfolio is full replication, which is to buy appropriate quantities of all assets composing the index. Theoretically, the performance of a full replication portfolio should match that of the targeting index perfectly. However, full replication portfolios often incur higher transaction and administration costs than portfolios of relatively fewer stocks. An alternative to full replication is sparse index tracking, which is to replicate the performance of an index by holding a representative sample of the securities in the index [
3,
4,
5]. The challenge of building a sparse index tracking portfolio lies in the trade-off between transaction costs and tracking efficiency. A sparse index tracking portfolio manager attempts to hold as few securities as possible to reduce the transaction costs and eliminate potential illiquid stocks and curb the tracking error to maintain tracking efficiency at the same time. Sparse index portfolios are often constructed via portfolio optimization. Such an algorithm-driven portfolio aims to minimize the tracking error in both training samples and testing samples. Specifically, index tracking optimization needs to incorporate certain constraints, such as no short selling and balance among industrial sectors. The latter constraint aims at taming sector risk of sparse portfolios that a set of factors particular to a sector drags down the sector’s overall performance in financial risk management. Upon the occurrence of events affecting the entire sector, such as sudden policy changes or technology innovation, the stock prices of many of the companies in the same sector may undergo drastic changes simultaneously. Unbalanced allocation of assets among different sectors can result in the failure of index tracking for a portfolio. In
Table 1, we present several periods when sector risk appears for China CSI300 Index. The table shows the performance of the index, the sector gainers and losers during the selected period. The large divergence between the index and the gainers or losers indicates that inappropriate exposure to sector risk can inflate the tracking error and undermine the portfolio performance. To manage the sector risk with index tracking, it is necessary to build a sector-neutral portfolio in which the total weight of stocks within each sector remains the same as that in the index. In particular, a sector-neutral strategy means not to overweight or underweight any given sector compared to its weight in the index, so as to ensure the performance of the portfolio will be stable and accordingly will be not affected by style switch in the market. Therefore, a sector-neutral portfolio can closely fit and help track the performance of a benchmark index, and essentially it turns out to be a passive investment strategy.
To overcome the drawbacks of full replication and accommodate the sector risk in traditional sparse portfolios, we propose a novel method to construct sparse index-tracking portfolios, assuming that short positions are forbidden and no cash is allowed in the constructed portfolios. Finally, we formulate the sparse index tracking problem with sector neutrality as the minimization of the tracking error under constraints of nonnegativity, sparsity, and sector neutrality.
2. Literature Review
There is a vast amount of research on quantitative methods for index tracking. Ref. [
6] studied the traditional Markowitz asset allocation problem and proposed an algorithm for designing a sparse index fund. This algorithm yields a locally optimal index portfolio with a fixed number of securities. Ref. [
7] investigated the relation between error measures in statistical tracking and asset allocation restrictions expressed as admissible weight ranges. It addressed the relationship between tracking errors caused by active portfolio management and given tactical asset allocation ranges. Ref. [
8] imposed the no-short-sale constraint on the Markowitz mean-variance optimization and gave an insightful explanation and demonstration of why the constraints help. Ref. [
9] considered maximizing the index fund’s tracking accuracy by rebalancing the composition of the index fund’s tracking portfolio in response to new market information and cash deposits and withdrawals from investors. Ref. [
10] investigated the inclusion of portfolio liquidity constraints for the construction of index tracking portfolios and proposed two liquidity modelling approaches for index tracking.
Many statistical regularization methods have been applied to the index tracking problem recently. In general, a sparse index tracking portfolio can be constructed via variable selection, which removes non-informative features and yields sparse models, especially for the high-dimensional data, and consequently facilitates inference, interpretability and prediction. Ref. [
11] formulated sparse index tracking problem as an optimization problem that minimizes the tracking error, subject to the number of selected assets less than or equal to a preset threshold. Ref. [
12] formulated the index tracking as a regression problem whose objective was to minimise the tracking error and adds a
penalty on weights corresponding to the amount to invest in each stock, then solved the optimization problem with stochastic neural networks. Ref. [
13] investigated the applications of sparse auto-encoder deep-learning architectures with
regularization in selecting representative stocks from the index constituents. Ref. [
14] analyzed the constraints effect on covariance regularization for index tracking, and developed an
and
norm constrained minimum-variance portfolio. Ref. [
15] reformulated the classical Markowitz mean-variance framework as a constrained least-squares regression problem and added a penalty function to construct a sparse index tracking portfolio. They emphasized that adding an
penalty on weights to the objective function may bring several advantages, including promoting sparsity and stabilizing the out-of-sample performance. Ref. [
16] offered deep mathematical insights into the utility approximations with the gross-exposure constraint and gave a theoretical justification for the empirical results by [
8]. These approaches consist in solving index tracking problems through constraining portfolio norms, for example, the
norm as implied in Lasso and the
norm as imposed in ridge regression. Besides, many other variable selection methods introduced in various fields can be applied to the index tracking problems as well. Truncated
penalty (TLP) [
17] has advantages over Lasso [
18] in that Lasso gives biased parameter estimates and possibly inconsistent variable selection. Intuitively, unlike Lasso which penalizes all variables, TLP does not penalize variables of large values, and thus enables us to incorporate more complicated constraints into optimization as in
Section 3. We asserted in
Section 1 that the novelty of our method lies in the construction of a sparse index tracking portfolio with exposure to sector risk. Sector neutrality can be achieved by constrained variable selection procedures. The constrained Lasso [
19] introduced additional equality and inequality constraints to the traditional Lasso method for asset allocation, which generalized the work of [
16]. However, with the constrained Lasso, the nonnegativity and sector neutrality constraints nullify the Lasso penalty and cause the method fails to give sparsity. More details will be discussed further in
Section 3. Ref. [
20] followed and added a ridge term into the objective function in constrained Lasso so that it may work in a high-dimensional case where sparsity is necessary.
Computationally solving a constrained variable selection problem is not an easy task, especially when complex penalty functions and constraints are used. The alternating direction method of multipliers (ADMM) algorithm [
21] is an efficient and scalable optimization algorithm for convex optimization. Ref. [
22] exploited the structure of distributed optimization framework and illustrated this framework via applications such as
mean and variance filtering. The idea of the ADMM algorithm is to break the original optimization into iterations of easier problems. However, optimization with the TLP term is not a convex problem. To solve this nonconvex problem, ref. [
17] presented an efficient algorithm based on a difference of convex (DC) decomposition [
23] and a coordinate descent (CD) algorithm [
24,
25]. The DC method decomposes the nonconvex constraint into a difference of two convex functions to produce a sequence of approximating convex constraints. Thus the original nonconvex optimization is broken down into a series of convex optimization problems. In minimization of a multivariate function, the CD algorithm iteratively minimizes the objective function with respect to each coordinate direction until convergence. This is an efficient algorithm for large-scale convex optimization problems.
In this paper, we propose a novel method to solve the sparse index tracking problem with sector neutrality. An error bound for variable selection is obtained for the method, and then variable selection consistency and asymptotic distribution are established for effective inference. An efficient minimization algorithm is developed by combining the ADMM algorithm, DC decomposition and CD algorithm. The new procedure is tested via numerical simulations under different data generation settings. An application is given to index tracking in the Chinese stock market. Both the numerical experiments and application confirm the good performance of our method in general.
This paper is organized as follows.
Section 3 formulates the sparse index tracking with sector neutrality as an optimization problem under constraints.
Section 4 discusses the theory for high-dimensional constrained variable selection.
Section 5 describes the algorithm for the optimization problem.
Section 6 presents the results of the simulated experiments.
Section 7 shows the application of the proposed method to index tracking portfolio construction. Summary and discussion are given in
Section 8. Technical proofs and some tables are relegated to the
Appendix A.
3. Methodology
From a statistical point of view, index tracking is a linear regression problem:
with the response
and the covariates
, where
n is the sample size and
p is the dimension of covariates.
is the parameter of covariates
, and
is the error term, we assume
, where
I is the identity matrix. In index tracking,
and
represent the returns of an index and its constituents, respectively, and
is the weight vector of the index’s constituents.
Suppose that there are q sectors. We rewrite the covariates , where are the covariates in the i-th sector, and is the number of stocks in the i-th sector, . Accordingly, we write , and .
For sparse index tracking, three categories of constraints are under consideration:
- (1)
the sparsity constraint within sector: ;
- (2)
the sector neutrality constraint: , for ;
- (3)
the nonnegativity constraint: , for .
Here are constants satisfying . Each is the sum of the original constituent weights in the i-th sector. These original non-sparse weights are pre-specified by portfolio managers and hence are known in advance from the definition of the index.
The penalty function needs to be chosen carefully. The Lasso penalty is widely used in variable selection problems because of its ease of computation. Nevertheless, it turns out that the Lasso penalty fails to yield a sparse portfolio under sector neutrality and nonnegativity constraints for index tracking, as explained below. Under the nonnegativity constraint, the Lasso penalty becomes , while the sector neutrality constraint makes this sum a constant: . Therefore, the Lasso penalty becomes invalid and gives no penalty on the portfolio size.
To solve this problem, we take
to be the truncated
penalty (TLP) function [
17], which can achieve sparse portfolio selection under the sector neutrality and nonnegativity constraints. TLP can be thought of as a truncated version of the
penalty approximating the
-function. As a piecewise linear function, TLP permits efficient computation and adaptive variable selection. TLP is defined as
where
is a tuning parameter controlling the degree of approximation. With the TLP function, the estimation of coefficients
is obtained by minimization:
where
is the tracking error. This is the dual problem corresponding to the constrained primal problem
The dual problem is not equivalent to the constrained primal problem in general. However, with the TLP as the penalty function, the two optimization problems are equivalent [
17]. In the remainder of the paper, we will consider only the unconstrained dual problem for its computational advantages. Minimization of (
2) reduces to a general weighted Lasso problem, which can be solved by many efficient algorithms [
18,
26,
27].
The tracking error
can assume many forms [
28], for example, the empirical tracking error (ETE), defined as
With TLP and the empirical tracking error, we can rewrite (
2) as
4. Theory
Let us first introduce some notations. The true value of
is denoted by
. Without loss of generality, we assume that the last component
of
is nonzero for
. By virtue of the sector neutrality constraint, we can solve for the last component
in each sector:
For any subscripts set , we use to denote the matrix consisting of the columns of with subscripts in ; likewise, comprises all the elements of with subscripts in .
Let
be the set of subscripts of all nonzero true coefficients in
. We denote the cardinality of
as
, and the minimal value of nonzero true coefficients as
.
By (
5), the regression Equation (
1) can be rewritten as
We expand
to be the oracle estimator
by letting
if
, and obtaining
by Equation (
5).
Let
be the global minimizer of (
4), and let
be the estimated set of nonzero coefficients.
We now give two key assumptions when establishing theoretical results regarding the error bound and asymptotic distribution of the coefficient estimator, and model selection consistency.
Assumption 1. For some constants and , , where Assumption 2. For some constants and ,where and denote the minimum and maximum eigenvalues of a matrix, respectively. Theorem 1. Assume that Assumption 1 holds, and . Then is upper bounded bywhere is the distribution function of . Theorem 2. Assume there are constants , , , and satisfyingwhere for two sequences and we say if both sequences and are bounded. Also assume that the conditions of Theorem 1 and Assumption 2 hold. Then - (A)
- (B)
Asymptotic distribution: Let denote the distribution function of
, and the distribution function of -dimensional standard multivariate normal distribution. We have
Remark 1. Theorems 1 and 2 stabilize the index tracking problem. They encourage the non-negativity and sector neutrality in constructing sparse index tracking portfolios in high dimensional cases and allow practical and empirical work with only a moderate size of training data. The proofs of Theorems 1 and 2 are displayed in Appendix A.1. 5. Computation
The minimization of (
4) with TLP can be treated as a sequence of weighted Lasso problems [
17] and can be solved iteratively. However, the sector neutrality constraint and the nonnegativity constraint in (
4) are not easy to handle directly. Fortunately, the alternating direction method of multipliers (ADMM) algorithm [
21,
22] can be applied to solve the constrained minimization problem (
4).
We are now to put our optimization problem in the ADMM framework. First, using the Lagrangian multiplier method, we transform the original optimization into minimization of the objective function
over the new parameter space
:
It is straightforward to verify that the new parameter space
is convex and the objective function
f in (
8) is a convex loss function plus a piecewise linear regularization function. Using the results in [
21,
29], our optimization problem is equivalent to
where
is the indicator function of space
(i.e.,
= 0 for
, and
for
). This is a standard starting form of the ADMM algorithm. Before following three steps in the ADMM algorithm, we write the augmented Lagrangian for (
9),
where
is a scaled dual variable associated with the constraint
. Here
is a penalty parameter.
In each iteration of ADMM, we perform alternating minimization of the augmented Lagrangian over
and
. At the
k-th iteration we update variables
,
and
by the following steps:
where
denotes the Euclidean projection onto space
. In the first step of ADMM, we fix
and
and update the value of
by minimization of the augmented Lagrangian; in the second step, we fix
and
and update the value of
by projection onto space
; and finally, we we fix
and
and update the dual variable
. Algorithm 1 summarizes the framework of our algorithm and the details are referred to
Appendix A.2.
Algorithm 1 ADMM algorithm for the minimization of (10). |
(Initialization) Let , , be the initial parameter vectors; (Iteration) At each iteration k, Update using Algorithm A1 (see Appendix A.2), by the projection ( 12), and by direct calculation ( 13); (Termination) Stop when , , converge.
|
6. Simulation Results
In this section, we show the performance of our estimation procedure via simulations and discuss the selection of appropriate tuning parameters
,
and
in the augmented Lagrangian (
10).
The detailed information about the simulation study, including the model and its variations, the methods to conduct the experiments and the choice of tuning parameters are given in
Appendix A.3.
The performance of various methods is evaluated by both variable selection performance and estimation accuracy. In terms of variable selection, we use four criteria: the mean true positive(TP), the mean false positive(FP), positive selection rate(PSR), and negative selection rate(NSR) [
30], respectively. The true positive(TP) is defined as
, which counts the variables with true non-zero coefficients and estimated non-zero coefficients. The false positive (FP) is defined as
, which counts the variables with true zero coefficients but estimated as non-zero coefficients. Additionally, PSR is the ratio of TP and the total number of the true non-zero coefficients. Similar to PSR, NSR is the ratio of FP and the total number of the true zero coefficients. Regarding estimation accuracy, we adopt the mean and the standard deviation of model errors (ME) as criteria, where the model error is defined as
, and the expectation is taken only with respect to new observation
. The running time of each method is evaluated using the machine with Intel Core i5 CPU, 2.4 GHz, 8 GB RAM. All methods were implemented in R.
6.1. Case 1
In this case
, five methods are compared. The first one is our method, named as constrained TLP method (CTLP). The second method uses the Lasso penalty for sparsity constraint instead of TLP, and we name it the constrained
method (C
). This method will fail to give a sparse portfolio because of the neutrality constraint, as is pointed out in
Section 3; we present the results of C
here for confirmation. The third method uses TLP for the sparsity constraint, but ignores the sector neutrality, and we refer to this method as TLP. The fourth and fifth methods are the index tracking procedures in the following form,
with
being empirical tracking error (ETE) and Huber empirical tracking error (HETE), which are denoted by ETE and HETE, respectively. In ETE,
is defined in (
3), while in HETE,
where
, and
with
M being the Huber parameter. Note that ETE and HETE methods ignore the sector-neutral constraints. These two methods can be carried out using R-package
sparseIndexTracking [
31].
We first investigate the case when the covariates
have an independent structure.
Table 2 shows the simulation results for both
and
settings. For the
setting, methods CTLP and C
have smaller model errors than ETE and HETE, and the model errors of ETE and HETE are much smaller than that of TLP. When it comes to the variable selection ability, all methods correctly select non-zero variables. CTLP, ETE and HETE perform similarly, and better than C
in that C
tends to select too many trivial variables and thus fails to produce sparse portfolios, which confirms our earlier assertion. On the other hand, TLP makes the fewest mistakes in selecting both trivial and non-trivial variables. When the variance of error terms becomes larger, the simulated data are noisier. For the
setting, CTLP and C
still have the smallest model error while TLP performs quite poorly in terms of model errors. Every method selects almost all nontrivial variables except for TLP, which selects slightly fewer nontrivial and trivial variables.
Now we investigate the simulation settings when the covariates
have
correlation, without and with group structures. Similar conclusions can be reached from
Table 3 and
Table 4. In the large signal-to-noise ratio setting (
), the model errors of CTLP and C
are similar and smaller than that of their competitors. Besides, CTLP, ETE and HETE maintain a good balance between selections of trivial and nontrivial variables whereas C
selects too many trivial variables. TLP gives good variable selection results but has poor estimation accuracy. In the small signal-to-noise ratio setting (
), the model errors of all five methods become larger than those in the case of the large signal-to-noise ratio setting above. However, the relative performances of the five methods are similar in terms of model errors and variable selection ability as well.
From the last columns in
Table 2,
Table 3 and
Table 4 we see that CTLP takes more time than other methods in training because of additional loops required to satisfy the sector neutrality constraint. In contrast, TLP and ETE consume the least time among all methods.
6.2. Case 2
When
, the variable selection becomes a high-dimensional problem and necessitates sparsity. Theoretically, the C
method with the Lasso penalty fails to yield sparse models due to sector neutrality and nonnegativity constraints. However, we can solve the convex optimization problem using some common convex optimizers like
CVXR [
32] to obtain an approximate solution C
.
Table 5,
Table 6 and
Table 7 present the results of this case.
The results are similar to that of the low-dimensional case, but all four methods have larger model errors and select fewer nontrivial variables. In the large signal-to-noise ratio setting (), CTLP has smaller model errors than other methods. CTLP, ETE and HETE have a similar ability in variable selection. TLP has larger model errors and it tends to select fewer nontrivial variables. The average model error of C is between CTLP and other methods. It usually selects about 20% trivial variables as in tables if we consider the numbers less than as 0. As the accuracy of numbers increases, more trivial variables are selected. No estimated coefficients of variables are exactly 0 with moderate accuracy, which proves that C is just an approximate solution and the term does not work. In the small signal-to-noise ratio setting (), there is no significant difference between CTLP and ETE in terms of model errors, but CTLP selects fewer trivial variables. As to the other two methods, HETE gives less accurate estimation, and TLP performs less well in terms of both variable selection and estimation. Although C yields the smallest model errors, it cannot provide the exact solution with the sparsity that we need. All five methods run faster than in the case due to the smaller sample size. CTLP is still the slowest one while ETE is the fastest one.
In conclusion, although CTLP takes more time to run, it performs similarly or better than the competitors in terms of both model errors and variable selection ability.
7. Real Data Results
Now we are to apply the proposed methodology to sparse index tracking for the CSI 300 Index. In this application, we use the daily return series of the CSI 300 Index and all stocks in this index from 2014 to 2018. The CSI 300 is a capitalization-weighted stock market index designed to replicate the performance of the top 300 stocks traded in the Shanghai and Shenzhen stock exchanges. The index is compiled by the China Securities Index and is considered a blue chip index for Mainland China stock exchanges. The return series, the methodology of the CSI 300 Index, the names and weights of the index constituents, and the corresponding sectors are available from China Securities Index (
http://www.csindex.com.cn/) (accessed on 31 September 2018).
According to the guidelines for the industry classification by the Global Industry Classification Standard (GICS) and the weights of constituents making up the CSI 300 Index, the stocks can be divided into 11 major sectors, including Materials, Communication services, Real estate, Industrials, Utilities, Financials, Consumer Discretionary, Energy, Consumer Staples, Information Technology and Health Care. In the composition of the CSI 300 index, the constituents, as well as their weights, are reviewed every six months. As a consequence, every six months, the stock numbers and the weights of sectors will change. Due to its dynamic properties, we must update our model periodically. We train and tune the model on the first day after constituent adjustment is implemented with the daily return series of the index and its renewed constituents before half a year. The total weights within sectors are calculated based on the renewed weights as well. Then the model will be tested with daily return series before the next adjustment day. These steps are in accordance with the practical procedure.
We treat the estimated coefficients as the weights of stocks in building an index-tracking portfolio. The daily tracking error or prediction error is adopted to evaluate the model performance. The daily tracking error measures the deviation of the index daily return from the portfolio daily return, defined as
, where
is the daily return of the index,
is the daily return of the constructed portfolio. We will compare our method CTLP with TLP, ETE and HETE as described in
Section 6.1 in terms of tracking errors. Since the training sample size is smaller than the number of index constituents, there is no exact solution without sparsity, but an approximate solution by a common convex optimizer. Thus, we show the C
solution as in
Section 6.2. For an additional comparison in the Lasso family, we give the performance of standard sparse group lasso [
33] by
SGL. To give a more intuitive view of the advantages of the proposed method, we also present the performance of portfolios created by traditional methods, including equal-weighted portfolio (EW) and inverse volatility weighted portfolio (IVW). Both portfolios consist of 300 stocks. The weights in the EW portfolio are equal while the weights in the IVW portfolio are inversely proportional to their historical volatility.
The CSI 300 index has been adjusted eight times from December 2014 to November 2018. Each time the constituents of the index were adjusted, we update the model and test it with the return series for the next six months.
Table 8 presents the mean and standard deviation of the daily tracking errors in each test set. It also shows the number of stocks building the portfolio. We highlight the method with the smallest mean daily tracking error in each period. It is clear that in a majority of periods, CTLP has the smallest mean tracking error among its competitors. Even in the period, such as 2017H1 and 2017H2, when CTLP is not the best, it is still very close to the best one. In terms of the standard deviation of tracking errors, CTLP has a slightly larger standard deviation than ETE and HETE, but significantly smaller than TLP. However, TLP and HETE tend to yield sparser portfolios than CTLP and ETE. As to tracking errors, C
performs not bad. But it has difficulties to give a sparse portfolio in several periods which violates our motivation to propose this research. The standard sparse group lasso (SGL) may provide negative coefficients but satisfy sector-neutral constraints. Although it has sparse enough solutions, its tracking errors are the largest. The traditional EW and IVW portfolios are not sparse and yield unstable performance. In some periods they track the index tightly, while in other periods they produce large tracking errors. We also present the average runtime of the first four methods in
Table 9, indicating that CTLP and HETE run more slowly than the other four methods. Even though the computation of CTLP requires longer computation time due to additional sector neutral constraints, it is still worthwhile for its competitive performance and the fact that this computation is not a frequent task.
As illustration, we display analysis results for the most recent two periods – the first and second half of 2018, in
Figure 1 and
Figure 2,
Table 10 and
Table 11.
Figure 1 and
Figure 2 draw the cumulative profit and loss of CTLP, TLP, ETE, HETE and the index. The closer of the cumulative profit and loss line to the red line, which represents that of the index, the better the method. It is clear that the CTLP replicates the index the best among all four methods.
Table 10 and
Table 11 present the number of stocks and the total weights of different sectors. Compared with the index, all portfolios built by five methods are sparse. However, the portfolio given by TLP consists of such a small number of stocks that it cannot track the index well. In the TLP portfolio, some sectors even have zero weights. CTLP and C
selected a portfolio with sector weights strictly equal to that of the index because of the sector neutrality constraint, whereas ETE and HETE produced sparse portfolios with sector weights only roughly equal to that of the index.
In summary, the proposed CTLP method demonstrates its advantages over the competitive methods in sparse index tracking. The sector neutrality constraint of CTLP guarantees the resulted sparse portfolio has the same sector risk exposure as the index. Most of the time, the CTLP method also gives smaller tracking errors. In addition, the non-negativity and sparsity of the CTLP portfolios are often desired properties in practical applications.
8. Summary
Motivated by a sparse index tracking problem, we propose a new method to do sparse variable selection under constraints. Our methodology extends the traditional variable selection with added constraints. Constraints either represent the lower dimensional structure of the data or special characteristics of practical applications.
In the sparse index tracking problem, sparsity, sector neutrality, and nonnegativity constraints are necessary to build an efficient, sector-risk neutral portfolio with lower transaction costs to track the performance of the index. We proved the consistency and asymptotic distribution for the constrained high-dimensional variable selection using our method. We also developed an efficient algorithm for the estimation of the stock weights of the sparse portfolio. Both simulations and the real application confirmed the validity and advantages of the new methodology.
In portfolio management applications, additional constraints may be incorporated into index tracking, for example, low volatility or size neutrality constraints. We leave these problems for future investigations.