Sparse Index Tracking Portfolio with Sector Neutrality

Che, Yuezhang; Chen, Shuyan; Liu, Xin

doi:10.3390/math10152645

Open AccessArticle

Sparse Index Tracking Portfolio with Sector Neutrality

by

Yuezhang Che

,

Shuyan Chen

and

Xin Liu

^*

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(15), 2645; https://doi.org/10.3390/math10152645

Submission received: 4 July 2022 / Revised: 22 July 2022 / Accepted: 25 July 2022 / Published: 28 July 2022

(This article belongs to the Special Issue Applications of Quantitative Methods in Business and Economics Research)

Download

Browse Figures

Versions Notes

Abstract

:

As a popular passive investment strategy, a sparse index tracking strategy has advantages over a full index replication strategy because of higher liquidity and lower transaction costs. Sparsity and nonnegativity constraints are usually assumed in the construction of portfolios in sparse index tracking. However, none of the existing studies considered sector risk exposure of the portfolios that prices of stocks in one sector may fall at the same time due to sudden changes in policy or unexpected events that may affect the whole sector. Therefore, sector neutrality appeals to be critical when building a sparse index tracking portfolio if not using full replication. The statistical approach to sparse index tracking is a constrained variable selection problem. However, the constrained variable selection procedure using Lasso fails to produce a sparse portfolio under sector neutrality constraints. In this paper, we propose a high-dimensional constrained variable selection method using TLP for building index tracking portfolios under sparsity, sector neutrality and nonnegativity constraints. Selection consistency is established for the proposed method, and the asymptotic distribution is obtained for the sparse portfolio weights estimator. We also develop an efficient iteration algorithm for the weight estimation. We examine the performance of the proposed methodology through simulations and an application to the CSI 300 index of China. The results demonstrate the validity and advantages of our methodology.

Keywords:

constrained variable selection; high-dimensional variable selection; sparse index tracking; sector neutrality; TLP; ADMM algorithm

MSC:

62H12; 62F12; 62F30; 62P05

1. Introduction

Despite various types and goals, portfolio managing strategies can be classified into two main groups according to styles, namely, active and passive strategies. The former ones attempt to beat the market by exploiting market inefficiency, whereas the latter ones prefer to follow the market. However, the majority of actively managed funds do not outperform the market in the long run [1]. On the contrary, passive funds provide market-level profits without taking an active risk. Passive funds become more popular in recent years. U.S. equity index fund assets have surpassed the assets of their actively managed counterparts for the first time, according to Morningstar’s latest fund flows report. This trend is also growing in other parts of the world.

The most commonly used passive investing strategy is index tracking, which aims to mimic the performance of a specified basket of underlying assets. Without exposure to active risk, an index tracking fund needs to follow the targeting index as closely as possible. The performance of an index tracker is measured by tracking error [2], which is defined as the divergence between the index tracker and the targeting index. Tracking error gives investors a sense of how “tight” the index tracker goes after the index. Smaller tracking error means less exposure to active risk.

A straightforward method to construct an index tracking portfolio is full replication, which is to buy appropriate quantities of all assets composing the index. Theoretically, the performance of a full replication portfolio should match that of the targeting index perfectly. However, full replication portfolios often incur higher transaction and administration costs than portfolios of relatively fewer stocks. An alternative to full replication is sparse index tracking, which is to replicate the performance of an index by holding a representative sample of the securities in the index [3,4,5]. The challenge of building a sparse index tracking portfolio lies in the trade-off between transaction costs and tracking efficiency. A sparse index tracking portfolio manager attempts to hold as few securities as possible to reduce the transaction costs and eliminate potential illiquid stocks and curb the tracking error to maintain tracking efficiency at the same time. Sparse index portfolios are often constructed via portfolio optimization. Such an algorithm-driven portfolio aims to minimize the tracking error in both training samples and testing samples. Specifically, index tracking optimization needs to incorporate certain constraints, such as no short selling and balance among industrial sectors. The latter constraint aims at taming sector risk of sparse portfolios that a set of factors particular to a sector drags down the sector’s overall performance in financial risk management. Upon the occurrence of events affecting the entire sector, such as sudden policy changes or technology innovation, the stock prices of many of the companies in the same sector may undergo drastic changes simultaneously. Unbalanced allocation of assets among different sectors can result in the failure of index tracking for a portfolio. In Table 1, we present several periods when sector risk appears for China CSI300 Index. The table shows the performance of the index, the sector gainers and losers during the selected period. The large divergence between the index and the gainers or losers indicates that inappropriate exposure to sector risk can inflate the tracking error and undermine the portfolio performance. To manage the sector risk with index tracking, it is necessary to build a sector-neutral portfolio in which the total weight of stocks within each sector remains the same as that in the index. In particular, a sector-neutral strategy means not to overweight or underweight any given sector compared to its weight in the index, so as to ensure the performance of the portfolio will be stable and accordingly will be not affected by style switch in the market. Therefore, a sector-neutral portfolio can closely fit and help track the performance of a benchmark index, and essentially it turns out to be a passive investment strategy.

To overcome the drawbacks of full replication and accommodate the sector risk in traditional sparse portfolios, we propose a novel method to construct sparse index-tracking portfolios, assuming that short positions are forbidden and no cash is allowed in the constructed portfolios. Finally, we formulate the sparse index tracking problem with sector neutrality as the minimization of the tracking error under constraints of nonnegativity, sparsity, and sector neutrality.

2. Literature Review

There is a vast amount of research on quantitative methods for index tracking. Ref. [6] studied the traditional Markowitz asset allocation problem and proposed an algorithm for designing a sparse index fund. This algorithm yields a locally optimal index portfolio with a fixed number of securities. Ref. [7] investigated the relation between error measures in statistical tracking and asset allocation restrictions expressed as admissible weight ranges. It addressed the relationship between tracking errors caused by active portfolio management and given tactical asset allocation ranges. Ref. [8] imposed the no-short-sale constraint on the Markowitz mean-variance optimization and gave an insightful explanation and demonstration of why the constraints help. Ref. [9] considered maximizing the index fund’s tracking accuracy by rebalancing the composition of the index fund’s tracking portfolio in response to new market information and cash deposits and withdrawals from investors. Ref. [10] investigated the inclusion of portfolio liquidity constraints for the construction of index tracking portfolios and proposed two liquidity modelling approaches for index tracking.

Many statistical regularization methods have been applied to the index tracking problem recently. In general, a sparse index tracking portfolio can be constructed via variable selection, which removes non-informative features and yields sparse models, especially for the high-dimensional data, and consequently facilitates inference, interpretability and prediction. Ref. [11] formulated sparse index tracking problem as an optimization problem that minimizes the tracking error, subject to the number of selected assets less than or equal to a preset threshold. Ref. [12] formulated the index tracking as a regression problem whose objective was to minimise the tracking error and adds a

L_{0}

penalty on weights corresponding to the amount to invest in each stock, then solved the optimization problem with stochastic neural networks. Ref. [13] investigated the applications of sparse auto-encoder deep-learning architectures with

L_{1}

regularization in selecting representative stocks from the index constituents. Ref. [14] analyzed the constraints effect on covariance regularization for index tracking, and developed an

L_{1}

and

L_{2}

norm constrained minimum-variance portfolio. Ref. [15] reformulated the classical Markowitz mean-variance framework as a constrained least-squares regression problem and added a penalty function to construct a sparse index tracking portfolio. They emphasized that adding an

L_{1}

penalty on weights to the objective function may bring several advantages, including promoting sparsity and stabilizing the out-of-sample performance. Ref. [16] offered deep mathematical insights into the utility approximations with the gross-exposure constraint and gave a theoretical justification for the empirical results by [8]. These approaches consist in solving index tracking problems through constraining portfolio norms, for example, the

L_{1}

norm as implied in Lasso and the

L_{2}

norm as imposed in ridge regression. Besides, many other variable selection methods introduced in various fields can be applied to the index tracking problems as well. Truncated

L_{1}

penalty (TLP) [17] has advantages over Lasso [18] in that Lasso gives biased parameter estimates and possibly inconsistent variable selection. Intuitively, unlike Lasso which penalizes all variables, TLP does not penalize variables of large values, and thus enables us to incorporate more complicated constraints into optimization as in Section 3. We asserted in Section 1 that the novelty of our method lies in the construction of a sparse index tracking portfolio with exposure to sector risk. Sector neutrality can be achieved by constrained variable selection procedures. The constrained Lasso [19] introduced additional equality and inequality constraints to the traditional Lasso method for asset allocation, which generalized the work of [16]. However, with the constrained Lasso, the nonnegativity and sector neutrality constraints nullify the Lasso penalty and cause the method fails to give sparsity. More details will be discussed further in Section 3. Ref. [20] followed and added a ridge term into the objective function in constrained Lasso so that it may work in a high-dimensional case where sparsity is necessary.

Computationally solving a constrained variable selection problem is not an easy task, especially when complex penalty functions and constraints are used. The alternating direction method of multipliers (ADMM) algorithm [21] is an efficient and scalable optimization algorithm for convex optimization. Ref. [22] exploited the structure of distributed optimization framework and illustrated this framework via applications such as

L_{1}

mean and variance filtering. The idea of the ADMM algorithm is to break the original optimization into iterations of easier problems. However, optimization with the TLP term is not a convex problem. To solve this nonconvex problem, ref. [17] presented an efficient algorithm based on a difference of convex (DC) decomposition [23] and a coordinate descent (CD) algorithm [24,25]. The DC method decomposes the nonconvex constraint into a difference of two convex functions to produce a sequence of approximating convex constraints. Thus the original nonconvex optimization is broken down into a series of convex optimization problems. In minimization of a multivariate function, the CD algorithm iteratively minimizes the objective function with respect to each coordinate direction until convergence. This is an efficient algorithm for large-scale convex optimization problems.

In this paper, we propose a novel method to solve the sparse index tracking problem with sector neutrality. An error bound for variable selection is obtained for the method, and then variable selection consistency and asymptotic distribution are established for effective inference. An efficient minimization algorithm is developed by combining the ADMM algorithm, DC decomposition and CD algorithm. The new procedure is tested via numerical simulations under different data generation settings. An application is given to index tracking in the Chinese stock market. Both the numerical experiments and application confirm the good performance of our method in general.

This paper is organized as follows. Section 3 formulates the sparse index tracking with sector neutrality as an optimization problem under constraints. Section 4 discusses the theory for high-dimensional constrained variable selection. Section 5 describes the algorithm for the optimization problem. Section 6 presents the results of the simulated experiments. Section 7 shows the application of the proposed method to index tracking portfolio construction. Summary and discussion are given in Section 8. Technical proofs and some tables are relegated to the Appendix A.

3. Methodology

From a statistical point of view, index tracking is a linear regression problem:

\begin{matrix} Y = X β + ϵ, \end{matrix}

(1)

with the response

Y \in R^{n \times 1}

and the covariates

X \in R^{n \times p}

, where n is the sample size and p is the dimension of covariates.

β \in R^{p}

is the parameter of covariates

X

, and

ϵ

is the error term, we assume

ϵ \sim N (0, σ^{2} I)

, where I is the identity matrix. In index tracking,

Y

and

X

represent the returns of an index and its constituents, respectively, and

β

is the weight vector of the index’s constituents.

Suppose that there are q sectors. We rewrite the covariates

X = (X_{1}, \dots, X_{q})

, where

X_{i} = (X_{i, 1}, \dots, X_{i, g_{i}})

are the covariates in the i-th sector, and

g_{i}

is the number of stocks in the i-th sector,

i \in {1, \dots, q}

. Accordingly, we write

β = {(β_{1}^{⊤}, \dots, β_{q}^{⊤})}^{⊤}

, and

β_{i} = {(β_{i, 1}, \dots, β_{i, g_{i}})}^{⊤}

.

For sparse index tracking, three categories of constraints are under consideration:

(1): the sparsity constraint within sector: $\sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} J (β_{i, j}) \leq K$ ;
(2): the sector neutrality constraint: $\sum_{j = 1}^{g_{i}} β_{i, j} = ω_{i}$ , for $i \in {1, \dots, q}$ ;
(3): the nonnegativity constraint: $β_{i, j} \geq 0$ , for $i \in {1, \dots, q}, j \in {1, \dots, g_{i}}$ .

Here

ω_{i}

are constants satisfying

\sum_{i = 1}^{q} ω_{i} = 1

. Each

ω_{i}

is the sum of the original constituent weights in the i-th sector. These original non-sparse weights are pre-specified by portfolio managers and hence are known in advance from the definition of the index.

The penalty function

J (\cdot)

needs to be chosen carefully. The Lasso penalty is widely used in variable selection problems because of its ease of computation. Nevertheless, it turns out that the Lasso penalty fails to yield a sparse portfolio under sector neutrality and nonnegativity constraints for index tracking, as explained below. Under the nonnegativity constraint, the Lasso penalty

\sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} | β_{i, j} |

becomes

\sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} β_{i, j}

, while the sector neutrality constraint makes this sum a constant:

\sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} β_{i, j} = \sum_{i = 1}^{q} ω_{i} = 1

. Therefore, the Lasso penalty becomes invalid and gives no penalty on the portfolio size.

To solve this problem, we take

J (\cdot)

to be the truncated

L_{1}

penalty (TLP) function [17], which can achieve sparse portfolio selection under the sector neutrality and nonnegativity constraints. TLP can be thought of as a truncated version of the

L_{1}

penalty approximating the

L_{0}

-function. As a piecewise linear function, TLP permits efficient computation and adaptive variable selection. TLP is defined as

J (| z |) = min (\frac{| z |}{τ}, 1),

where

τ > 0

is a tuning parameter controlling the degree of approximation. With the TLP function, the estimation of coefficients

β

is obtained by minimization:

\begin{matrix} min_{β} TE (β) + λ \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} J (| β_{i, j} |) . \end{matrix}

(2)

where

TE (β)

is the tracking error. This is the dual problem corresponding to the constrained primal problem

min_{β} TE (β), subject to \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} J (| β_{i, j} |) \leq K .

The dual problem is not equivalent to the constrained primal problem in general. However, with the TLP as the penalty function, the two optimization problems are equivalent [17]. In the remainder of the paper, we will consider only the unconstrained dual problem for its computational advantages. Minimization of (2) reduces to a general weighted Lasso problem, which can be solved by many efficient algorithms [18,26,27].

The tracking error

TE (β)

can assume many forms [28], for example, the empirical tracking error (ETE), defined as

\begin{matrix} TE (β) = \frac{1}{n} {∥ Y y - X β ∥}_{2}^{2} . \end{matrix}

(3)

With TLP and the empirical tracking error, we can rewrite (2) as

\begin{matrix} \begin{matrix} min_{β} & f (β) = n^{- 1} {∥ Y - X β ∥}_{2}^{2} + λ \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} min {\frac{| β_{i, j} |}{τ}, 1}, \\ subject to : \\ \sum_{j = 1}^{g_{i}} β_{i, j} = ω_{i}, i \in {1, \dots, q}, \\ β_{i, j} \geq 0, i \in {1, \dots, q}, j \in {1, \dots, g_{i}} . \end{matrix} \end{matrix}

(4)

4. Theory

Let us first introduce some notations. The true value of

β

is denoted by

β^{*}

. Without loss of generality, we assume that the last component

β_{i, g_{i}}^{*}

of

β_{i}^{*}

is nonzero for

i \in {1, \dots, q}

. By virtue of the sector neutrality constraint, we can solve for the last component

β_{i, g_{i}}

in each sector:

β_{i, g_{i}} = w_{i} - \sum_{j = 1}^{g_{i} - 1} β_{i, j}, i \in {1, \dots, q} .

(5)

We write

\begin{matrix} β_{- q} & = {(β_{1, 1}, \dots, β_{1, g_{1} - 1}, \dots, β_{q, 1}, \dots, β_{q, g_{q} - 1})}^{⊤}, \\ X_{- q} & = (X_{1, 1}, \dots, X_{1, g_{1} - 1}, \dots, X_{q, 1}, \dots, X_{q, g_{q} - 1}), \\ X_{G} & = (X_{1, 1} - X_{1, g_{1}}, \dots, X_{1, g_{1} - 1} - X_{1, g_{1}}, \dots, \\ X_{q, 1} - X_{q, g_{q}}, \dots, X_{q, g_{q} - 1} - X_{q, g_{q}}) . \end{matrix}

For any subscripts set

A \subset {(i, j) : i \in {1, \dots, q}, j \in {1, \dots, g_{i} - 1}}

, we use

X_{A}

to denote the matrix consisting of the columns of

X_{G}

with subscripts in

A

; likewise,

β_{A}

comprises all the elements of

β_{- q}

with subscripts in

A

.

Let

A^{*} = {(i, j) : β_{i, j}^{*} > 0, i \in {1, \dots, q}, j \in {1, \dots, g_{i} - 1}}

be the set of subscripts of all nonzero true coefficients in

β_{- q}^{*}

. We denote the cardinality of

A^{*}

as

p_{0} = | A^{*} |

, and the minimal value of nonzero true coefficients as

γ_{min} = min {β_{i, j}^{*} : β_{i, j}^{*} > 0, i \in {1, \dots, q}, j \in {1, \dots, g_{i}}}

.

By (5), the regression Equation (1) can be rewritten as

\begin{matrix} Y - \sum_{i = 1}^{q} w_{i} X_{i, g_{i}} & = X_{- q} β_{- q}^{*} - \sum_{i = 1}^{q} X_{i, g_{i}} \sum_{j = 1}^{g_{i} - 1} β_{i, j}^{*} + ϵ \\ = X_{G} β_{- q}^{*} + ϵ \\ = X_{A^{*}} β_{A^{*}}^{*} + ϵ . \end{matrix}

(6)

Define

{\hat{β}}_{A^{*}}^{o l s} = {(X_{A^{*}}^{⊤} X_{A^{*}})}^{- 1} X_{A^{*}}^{⊤} (Y - \sum_{i = 1}^{q} w_{i} X_{i, g_{i}}) .

We expand

{\hat{β}}_{A^{*}}^{o l s}

to be the oracle estimator

{\hat{β}}^{o l s}

by letting

{\hat{β}}_{i, j}^{o l s} = 0

if

β_{i, j}^{*} = 0

, and obtaining

{\hat{β}}_{i, g_{i}}^{o l s}

by Equation (5).

Let

{\hat{β}}^{t l p}

be the global minimizer of (4), and let

\hat{A} = {(i, j) : {\hat{β}}_{i, j}^{t l p} > 0, i \in {1, \dots, q}, j \in {1, \dots, g_{i} - 1}} .

be the estimated set of nonzero coefficients.

We now give two key assumptions when establishing theoretical results regarding the error bound and asymptotic distribution of the coefficient estimator, and model selection consistency.

Assumption 1.

For some constants

d_{0} > 0

and

α > 1

,

C_{min} \geq d_{0} σ^{2} \frac{log p}{n}

, where

C_{min} = inf_{\{β_{A} : | A | \leq α | A^{*} |, A \neq A^{*}\}} \frac{{∥X_{A^{*}} β_{A^{*}}^{*} - X_{A} β_{A}∥}_{2}^{2}}{max (| A^{*} ∖ A |, 1)} .

Assumption 2.

For some constants

d_{1} > 0

and

d_{2} > 0

,

d_{1} \leq η_{min} (\frac{1}{n} X^{⊤} X) \leq η_{max} (\frac{1}{n} X^{⊤} X) \leq d_{2},

where

η_{min} (\cdot)

and

η_{max} (\cdot)

denote the minimum and maximum eigenvalues of a matrix, respectively.

Theorem 1.

Assume that Assumption 1 holds, and

0 < τ \leq \sqrt{\frac{λ}{(n + 1) η_{max} (\frac{1}{n} X^{⊤} X)}}

. Then

Pr [{\hat{β}}^{t l p} \neq {\hat{β}}^{o l s}]

is upper bounded by

\begin{matrix} min & \{\frac{\sqrt{2} p_{0} n^{1 / 2} τ}{σ \sqrt{π} η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})} exp (- \frac{n {(γ_{min} - τ)}^{2}}{2 σ^{2} η_{min}^{- 1} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}), \\ p_{0} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})})\} \\ + 4 exp (- (\frac{n C_{min}}{20 σ^{2}} - (α + 1) log (p + 1) - \frac{n λ}{4 σ^{2}})) \\ + 4 exp (- (\frac{(α - 1) n λ}{6 α σ^{2}} - (1 + \frac{1}{α}) (log (p + 1) - \frac{5}{3}))) \\ + p_{0} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) + q Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{p_{0}^{1 / 2} σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}), \end{matrix}

where

Φ (\cdot)

is the distribution function of

N (0, 1)

.

Theorem 2.

Assume there are constants

a_{γ}

,

a_{C}

,

a_{λ}

,

a_{p_{0}}

and

a_{l p}

satisfying

\begin{matrix} a_{p_{0}} \geq 0, a_{l p} \geq 0, a_{p_{0}} < a_{γ}, a_{λ} < a_{γ}, a_{l p} - 1 < a_{λ}, a_{λ} < a_{C} and \\ γ_{min} ≍ n^{- \frac{1 - a_{γ}}{2}}, C_{min} ≍ n^{a_{C}}, λ ≍ n^{a_{λ}}, p_{0} ≍ n^{a_{p_{0}}}, log (p) ≍ n^{a_{l p}}, \end{matrix}

where for two sequences

{h_{n}}

and

{g_{n}}

we say

h_{n} ≍ g_{n}

if both sequences

\frac{h_{n}}{g_{n}}

and

\frac{g_{n}}{h_{n}}

are bounded. Also assume that the conditions of Theorem 1 and Assumption 2 hold. Then

(A): Selection consistency:

$Pr [\hat{A} \neq A^{*}] \leq Pr [{\hat{β}}^{t l p} \neq {\hat{β}}^{o l s}] \to 0, as n, p \to \infty .$
(B): Asymptotic distribution: Let $F_{n} (t)$ denote the distribution function of
$\sqrt{n} {(\frac{X_{A^{*}}^{⊤} X_{A^{*}}}{n})}^{\frac{1}{2}} ({\hat{β}}_{A^{*}}^{t l p} - β_{A^{*}}^{*})$ , and $Φ_{p_{0}} (t)$ the distribution function of $p_{0}$ -dimensional standard multivariate normal distribution. We have

$lim_{n \to \infty} sup_{t \in R^{p_{0}}} |F_{n} (t) - Φ_{p_{0}} (t)| = 0 .$

(7)

Remark 1.

Theorems 1 and 2 stabilize the index tracking problem. They encourage the non-negativity and sector neutrality in constructing sparse index tracking portfolios in high dimensional cases and allow practical and empirical work with only a moderate size of training data. The proofs of Theorems 1 and 2 are displayed in Appendix A.1.

5. Computation

The minimization of (4) with TLP can be treated as a sequence of weighted Lasso problems [17] and can be solved iteratively. However, the sector neutrality constraint and the nonnegativity constraint in (4) are not easy to handle directly. Fortunately, the alternating direction method of multipliers (ADMM) algorithm [21,22] can be applied to solve the constrained minimization problem (4).

We are now to put our optimization problem in the ADMM framework. First, using the Lagrangian multiplier method, we transform the original optimization into minimization of the objective function

\begin{matrix} f (β) = n^{- 1} {∥ Y - X β ∥}_{2}^{2} + λ \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} min {\frac{| β_{i, j} |}{τ}, 1} \end{matrix}

(8)

over the new parameter space

C

:

\begin{matrix} C = {β : β_{i, j} \geq 0, \sum_{j = 1}^{g_{i}} β_{i, j} = ω_{i}, i \in {1, \dots, q}, j \in {1, \dots, g_{i}}} . \end{matrix}

It is straightforward to verify that the new parameter space

C

is convex and the objective function f in (8) is a convex loss function plus a piecewise linear regularization function. Using the results in [21,29], our optimization problem is equivalent to

\begin{matrix} min_{β} & f (β) + I_{C} (α) \\ subject to : & β_{i, j} = α_{i, j}, i \in {1, \dots, q}, j \in {1, \dots, g_{i}}, \end{matrix}

(9)

where

I_{C} (α)

is the indicator function of space

C

(i.e.,

I_{C} (α)

= 0 for

α \in C

, and

I_{C} (α) = \infty

for

α \notin C

). This is a standard starting form of the ADMM algorithm. Before following three steps in the ADMM algorithm, we write the augmented Lagrangian for (9),

\begin{matrix} h (β) = n^{- 1} {∥ Y - X β ∥}_{2}^{2} + λ \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} min {\frac{| β_{i, j} |}{τ}, 1} + \frac{ρ}{2} {∥ β - α + μ ∥}_{2}^{2}, \end{matrix}

(10)

where

μ

is a scaled dual variable associated with the constraint

β = α

. Here

ρ > 0

is a penalty parameter.

In each iteration of ADMM, we perform alternating minimization of the augmented Lagrangian over

β

and

α

. At the k-th iteration we update variables

β

,

α

and

μ

by the following steps:

\begin{matrix} β^{k + 1} = & arg min_{β} n^{- 1} {∥ Y - X β ∥}_{2}^{2} + λ \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} min {\frac{| β_{i, j} |}{τ}, 1} + \frac{ρ}{2} {∥ β - α^{k} + μ^{k} ∥}_{2}^{2} \end{matrix}

(11)

α^{k + 1} = \prod_{C} (β^{k + 1} + u^{k})

(12)

μ^{k + 1} = μ^{k} + (β^{k + 1} - α^{k + 1}),

(13)

where

\prod_{C}

denotes the Euclidean projection onto space

C

. In the first step of ADMM, we fix

α

and

μ

and update the value of

β

by minimization of the augmented Lagrangian; in the second step, we fix

β

and

μ

and update the value of

α

by projection onto space

C

; and finally, we we fix

α

and

β

and update the dual variable

μ

. Algorithm 1 summarizes the framework of our algorithm and the details are referred to Appendix A.2.

Algorithm 1 ADMM algorithm for the minimization of (10).

(Initialization) Let $β^{0}$ , $α^{0}$ , $μ^{0}$ $\in R^{p}$ be the initial parameter vectors;
(Iteration) At each iteration k, Update $β$ using Algorithm A1 (see Appendix A.2), $α$ by the projection (12), and $μ$ by direct calculation (13);
(Termination) Stop when $β$ , $α$ , $μ$ converge.

6. Simulation Results

In this section, we show the performance of our estimation procedure via simulations and discuss the selection of appropriate tuning parameters

λ

,

τ

and

ρ

in the augmented Lagrangian (10).

The detailed information about the simulation study, including the model and its variations, the methods to conduct the experiments and the choice of tuning parameters are given in Appendix A.3.

The performance of various methods is evaluated by both variable selection performance and estimation accuracy. In terms of variable selection, we use four criteria: the mean true positive(TP), the mean false positive(FP), positive selection rate(PSR), and negative selection rate(NSR) [30], respectively. The true positive(TP) is defined as

# T P = \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} I (β_{i, j}^{*} \neq 0, {\hat{β}}_{i, j} \neq 0)

, which counts the variables with true non-zero coefficients and estimated non-zero coefficients. The false positive (FP) is defined as

# F P = \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} I (β_{i, j}^{*} = 0, {\hat{β}}_{i, j} \neq 0)

, which counts the variables with true zero coefficients but estimated as non-zero coefficients. Additionally, PSR is the ratio of TP and the total number of the true non-zero coefficients. Similar to PSR, NSR is the ratio of FP and the total number of the true zero coefficients. Regarding estimation accuracy, we adopt the mean and the standard deviation of model errors (ME) as criteria, where the model error is defined as

M E (\hat{β}) = {(\hat{β} - β^{*})}^{⊤} E (X X^{⊤}) (\hat{β} - β^{*})

, and the expectation is taken only with respect to new observation

(X, y)

. The running time of each method is evaluated using the machine with Intel Core i5 CPU, 2.4 GHz, 8 GB RAM. All methods were implemented in R.

6.1. Case 1

In this case

n > p

, five methods are compared. The first one is our method, named as constrained TLP method (CTLP). The second method uses the Lasso penalty for sparsity constraint instead of TLP, and we name it the constrained

L_{1}

method (C

L_{1}

). This method will fail to give a sparse portfolio because of the neutrality constraint, as is pointed out in Section 3; we present the results of C

L_{1}

here for confirmation. The third method uses TLP for the sparsity constraint, but ignores the sector neutrality, and we refer to this method as TLP. The fourth and fifth methods are the index tracking procedures in the following form,

\begin{matrix} min_{β} & TE (β) + {λ ∥ β ∥}_{0} \\ subject to & β^{⊤} 1 = 1 and β > 0, \end{matrix}

with

TE (β)

being empirical tracking error (ETE) and Huber empirical tracking error (HETE), which are denoted by ETE and HETE, respectively. In ETE,

TE (β)

is defined in (3), while in HETE,

TE (β) = \frac{1}{n} 1^{⊤} ϕ (y - X β)

where

ϕ (x) = {(ϕ (x_{1}), \dots, ϕ (x_{n}))}^{⊤}

, and

\begin{matrix} ϕ (x) = \{\begin{matrix} x^{2} & if | x | \leq M \\ M (2 | x | - M) & if | x | > M, \end{matrix} \end{matrix}

(14)

with M being the Huber parameter. Note that ETE and HETE methods ignore the sector-neutral constraints. These two methods can be carried out using R-package sparseIndexTracking [31].

We first investigate the case when the covariates

X

have an independent structure. Table 2 shows the simulation results for both

σ = 1

and

σ = 3

settings. For the

σ = 1

setting, methods CTLP and C

L_{1}

have smaller model errors than ETE and HETE, and the model errors of ETE and HETE are much smaller than that of TLP. When it comes to the variable selection ability, all methods correctly select non-zero variables. CTLP, ETE and HETE perform similarly, and better than C

L_{1}

in that C

L_{1}

tends to select too many trivial variables and thus fails to produce sparse portfolios, which confirms our earlier assertion. On the other hand, TLP makes the fewest mistakes in selecting both trivial and non-trivial variables. When the variance of error terms becomes larger, the simulated data are noisier. For the

σ = 3

setting, CTLP and C

L_{1}

still have the smallest model error while TLP performs quite poorly in terms of model errors. Every method selects almost all nontrivial variables except for TLP, which selects slightly fewer nontrivial and trivial variables.

Now we investigate the simulation settings when the covariates

X

have

A R (1)

correlation, without and with group structures. Similar conclusions can be reached from Table 3 and Table 4. In the large signal-to-noise ratio setting (

σ = 1

), the model errors of CTLP and C

L_{1}

are similar and smaller than that of their competitors. Besides, CTLP, ETE and HETE maintain a good balance between selections of trivial and nontrivial variables whereas C

L_{1}

selects too many trivial variables. TLP gives good variable selection results but has poor estimation accuracy. In the small signal-to-noise ratio setting (

σ = 3

), the model errors of all five methods become larger than those in the case of the large signal-to-noise ratio setting above. However, the relative performances of the five methods are similar in terms of model errors and variable selection ability as well.

From the last columns in Table 2, Table 3 and Table 4 we see that CTLP takes more time than other methods in training because of additional loops required to satisfy the sector neutrality constraint. In contrast, TLP and ETE consume the least time among all methods.

6.2. Case 2

When

p > n

, the variable selection becomes a high-dimensional problem and necessitates sparsity. Theoretically, the C

L_{1}

method with the Lasso penalty fails to yield sparse models due to sector neutrality and nonnegativity constraints. However, we can solve the convex optimization problem using some common convex optimizers like CVXR [32] to obtain an approximate solution C

L_{1}^{*}

. Table 5, Table 6 and Table 7 present the results of this case.

The results are similar to that of the low-dimensional case, but all four methods have larger model errors and select fewer nontrivial variables. In the large signal-to-noise ratio setting (

σ = 1

), CTLP has smaller model errors than other methods. CTLP, ETE and HETE have a similar ability in variable selection. TLP has larger model errors and it tends to select fewer nontrivial variables. The average model error of C

L_{1}^{*}

is between CTLP and other methods. It usually selects about 20% trivial variables as in tables if we consider the numbers less than

10^{- 4}

as 0. As the accuracy of numbers increases, more trivial variables are selected. No estimated coefficients of variables are exactly 0 with moderate accuracy, which proves that C

L_{1}^{*}

is just an approximate solution and the

L_{1}

term does not work. In the small signal-to-noise ratio setting (

σ = 3

), there is no significant difference between CTLP and ETE in terms of model errors, but CTLP selects fewer trivial variables. As to the other two methods, HETE gives less accurate estimation, and TLP performs less well in terms of both variable selection and estimation. Although C

L_{1}^{*}

yields the smallest model errors, it cannot provide the exact solution with the sparsity that we need. All five methods run faster than in the

p < n

case due to the smaller sample size. CTLP is still the slowest one while ETE is the fastest one.

In conclusion, although CTLP takes more time to run, it performs similarly or better than the competitors in terms of both model errors and variable selection ability.

7. Real Data Results

Now we are to apply the proposed methodology to sparse index tracking for the CSI 300 Index. In this application, we use the daily return series of the CSI 300 Index and all stocks in this index from 2014 to 2018. The CSI 300 is a capitalization-weighted stock market index designed to replicate the performance of the top 300 stocks traded in the Shanghai and Shenzhen stock exchanges. The index is compiled by the China Securities Index and is considered a blue chip index for Mainland China stock exchanges. The return series, the methodology of the CSI 300 Index, the names and weights of the index constituents, and the corresponding sectors are available from China Securities Index (http://www.csindex.com.cn/) (accessed on 31 September 2018).

According to the guidelines for the industry classification by the Global Industry Classification Standard (GICS) and the weights of constituents making up the CSI 300 Index, the stocks can be divided into 11 major sectors, including Materials, Communication services, Real estate, Industrials, Utilities, Financials, Consumer Discretionary, Energy, Consumer Staples, Information Technology and Health Care. In the composition of the CSI 300 index, the constituents, as well as their weights, are reviewed every six months. As a consequence, every six months, the stock numbers and the weights of sectors will change. Due to its dynamic properties, we must update our model periodically. We train and tune the model on the first day after constituent adjustment is implemented with the daily return series of the index and its renewed constituents before half a year. The total weights within sectors are calculated based on the renewed weights as well. Then the model will be tested with daily return series before the next adjustment day. These steps are in accordance with the practical procedure.

We treat the estimated coefficients as the weights of stocks in building an index-tracking portfolio. The daily tracking error or prediction error is adopted to evaluate the model performance. The daily tracking error measures the deviation of the index daily return from the portfolio daily return, defined as

error = {\hat{r}}_{t} - r_{t}

, where

r_{t}

is the daily return of the index,

{\hat{r}}_{t}

is the daily return of the constructed portfolio. We will compare our method CTLP with TLP, ETE and HETE as described in Section 6.1 in terms of tracking errors. Since the training sample size is smaller than the number of index constituents, there is no exact solution without sparsity, but an approximate solution by a common convex optimizer. Thus, we show the C

L_{1}^{*}

solution as in Section 6.2. For an additional comparison in the Lasso family, we give the performance of standard sparse group lasso [33] by SGL. To give a more intuitive view of the advantages of the proposed method, we also present the performance of portfolios created by traditional methods, including equal-weighted portfolio (EW) and inverse volatility weighted portfolio (IVW). Both portfolios consist of 300 stocks. The weights in the EW portfolio are equal while the weights in the IVW portfolio are inversely proportional to their historical volatility.

The CSI 300 index has been adjusted eight times from December 2014 to November 2018. Each time the constituents of the index were adjusted, we update the model and test it with the return series for the next six months. Table 8 presents the mean and standard deviation of the daily tracking errors in each test set. It also shows the number of stocks building the portfolio. We highlight the method with the smallest mean daily tracking error in each period. It is clear that in a majority of periods, CTLP has the smallest mean tracking error among its competitors. Even in the period, such as 2017H1 and 2017H2, when CTLP is not the best, it is still very close to the best one. In terms of the standard deviation of tracking errors, CTLP has a slightly larger standard deviation than ETE and HETE, but significantly smaller than TLP. However, TLP and HETE tend to yield sparser portfolios than CTLP and ETE. As to tracking errors, C

L_{1}^{*}

performs not bad. But it has difficulties to give a sparse portfolio in several periods which violates our motivation to propose this research. The standard sparse group lasso (SGL) may provide negative coefficients but satisfy sector-neutral constraints. Although it has sparse enough solutions, its tracking errors are the largest. The traditional EW and IVW portfolios are not sparse and yield unstable performance. In some periods they track the index tightly, while in other periods they produce large tracking errors. We also present the average runtime of the first four methods in Table 9, indicating that CTLP and HETE run more slowly than the other four methods. Even though the computation of CTLP requires longer computation time due to additional sector neutral constraints, it is still worthwhile for its competitive performance and the fact that this computation is not a frequent task.

As illustration, we display analysis results for the most recent two periods – the first and second half of 2018, in Figure 1 and Figure 2, Table 10 and Table 11. Figure 1 and Figure 2 draw the cumulative profit and loss of CTLP, TLP, ETE, HETE and the index. The closer of the cumulative profit and loss line to the red line, which represents that of the index, the better the method. It is clear that the CTLP replicates the index the best among all four methods. Table 10 and Table 11 present the number of stocks and the total weights of different sectors. Compared with the index, all portfolios built by five methods are sparse. However, the portfolio given by TLP consists of such a small number of stocks that it cannot track the index well. In the TLP portfolio, some sectors even have zero weights. CTLP and C

L_{1}^{*}

selected a portfolio with sector weights strictly equal to that of the index because of the sector neutrality constraint, whereas ETE and HETE produced sparse portfolios with sector weights only roughly equal to that of the index.

In summary, the proposed CTLP method demonstrates its advantages over the competitive methods in sparse index tracking. The sector neutrality constraint of CTLP guarantees the resulted sparse portfolio has the same sector risk exposure as the index. Most of the time, the CTLP method also gives smaller tracking errors. In addition, the non-negativity and sparsity of the CTLP portfolios are often desired properties in practical applications.

8. Summary

Motivated by a sparse index tracking problem, we propose a new method to do sparse variable selection under constraints. Our methodology extends the traditional variable selection with added constraints. Constraints either represent the lower dimensional structure of the data or special characteristics of practical applications.

In the sparse index tracking problem, sparsity, sector neutrality, and nonnegativity constraints are necessary to build an efficient, sector-risk neutral portfolio with lower transaction costs to track the performance of the index. We proved the consistency and asymptotic distribution for the constrained high-dimensional variable selection using our method. We also developed an efficient algorithm for the estimation of the stock weights of the sparse portfolio. Both simulations and the real application confirmed the validity and advantages of the new methodology.

In portfolio management applications, additional constraints may be incorporated into index tracking, for example, low volatility or size neutrality constraints. We leave these problems for future investigations.

Author Contributions

Conceptualization, Y.C.; Data curation, Y.C.; Investigation, Y.C.; Methodology, S.C.; Software, S.C.; Supervision, X.L.; Validation, S.C.; Writing—original draft, Y.C.; Writing—review & editing, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

Liu’s research is partially sponsored by the Shanghai Pujiang Program (21PJC056) and Shanghai Institute of International Finance and Economics.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: http://www.csindex.com.cn/.

Acknowledgments

The authors would like to thank the editor and anonymous referees for their helpful comments and suggestions. Our appreciation goes to Mathematics for consideration of publishing our work.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Appendix A.1. Proof of Theorems

Proof of Theorem 1.

For

(i, j) \in A^{*}

, by the definition of

{\hat{β}}_{A^{*}}^{o l s}

, the variance of

{\hat{β}}_{i, j}^{o l s}

is a diagonal entry of matrix

σ^{2} {(X_{A^{*}}^{⊤} X_{A^{*}})}^{- 1}

. Therefore,

\begin{matrix} Pr [{\hat{β}}_{i, j}^{o l s} \leq τ] & = Pr [n^{1 / 2} ({\hat{β}}_{i, j}^{o l s} - β_{i, j}^{*}) \leq - n^{1 / 2} (β_{i, j}^{*} - τ)] \\ \leq Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) . \end{matrix}

The variance of

{\hat{β}}_{i, g_{i}}^{o l s}

is

σ^{2} v_{i}^{⊤} {(X_{A^{*}}^{⊤} X_{A^{*}})}^{- 1} v_{i}

, where

v_{i}

is the coefficient vector for the linear combination of

β_{i, j}^{*}

on the right hand side of Equation (5) using only nonzero values of

β^{*}

. Note that

v_{i}

is a

p_{0}

-dimensional vector consisting of 0’s and

- 1

’s only, The variance of

{\hat{β}}_{i, g_{i}}^{o l s}

is bounded above by

p_{0} σ^{2} η_{min}^{- 1} (X_{A^{*}}^{T} X_{A^{*}})

. Thus,

\begin{matrix} Pr [{\hat{β}}_{i, g_{i}}^{o l s} \leq τ] & = Pr [n^{1 / 2} ({\hat{β}}_{i, g_{i}}^{o l s} - β_{i, g_{i}}^{*}) \leq - n^{1 / 2} (β_{i, g_{i}}^{*} - τ)] \\ \leq Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{p_{0}^{1 / 2} σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) . \end{matrix}

It follows that

\begin{matrix} Pr [{\hat{β}}_{i, j}^{o l s} & \leq τ for some (i, j) with β_{i, j}^{*} > 0] \\ \leq p_{0} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) + q Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{p_{0}^{1 / 2} σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) . \end{matrix}

Using theorem 5 of [34] for the global minimizer of (4) without constraints in (4), we obtain an upper bound for

Pr [{\hat{β}}^{o l s} \neq {\hat{β}}^{t l p}]

:

\begin{matrix} min & \{\frac{\sqrt{2} p_{0} n^{1 / 2} τ}{σ \sqrt{π} η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})} exp (- \frac{n {(γ_{min} - τ)}^{2}}{2 σ^{2} η_{min}^{- 1} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}), \\ p_{0} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})})\} \\ + 4 exp (- (\frac{n C_{min}}{20 σ^{2}} - (α + 1) log (p + 1) - \frac{n λ}{4 σ^{2}})) \\ + 4 exp (- (\frac{(α - 1) n λ}{6 α σ^{2}} - (1 + \frac{1}{α}) (log (p + 1) - \frac{5}{3}))) \\ + p_{0} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) + q Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{p_{0}^{1 / 2} σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}), \end{matrix}

which completes the proof. □

Proof of Theorem 2.

We firstly show that

P [{\hat{β}}^{o l s} \neq {\hat{β}}^{t l p}] \to 0

as

n \to \infty

. From Mill’s ratio [35] it follows that

Φ (- x) \leq \frac{1}{\sqrt{2 π}} \frac{1}{x} e^{- \frac{x^{2}}{2}}, x > 0 .

Hence,

\begin{matrix} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) \\ \leq \frac{1}{\sqrt{2 π}} \frac{σ}{n^{1 / 2} d_{1}^{1 / 2} (γ_{min} - τ)} exp (- \frac{n d_{1} {(γ_{min} - τ)}^{2}}{2 σ^{2}}), \end{matrix}

and

\begin{matrix} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{p_{0}^{1 / 2} σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) \\ \leq \frac{1}{\sqrt{2 π}} \frac{p_{0}^{1 / 2} σ}{n^{1 / 2} d_{1}^{1 / 2} (γ_{min} - τ)} exp (- \frac{n d_{1} {(γ_{min} - τ)}^{2}}{2 p_{0} σ^{2}}) . \end{matrix}

Since

γ_{min} ≍ n^{- \frac{1 - a_{γ}}{2}}

,

λ ≍ n^{a_{λ}}

,

p_{0} ≍ n^{p_{0}}

,

a_{γ} > a_{λ}

,

a_{γ} > a_{p_{0}} \geq 0

, it follows that

\begin{matrix} max \{\frac{\sqrt{2} p_{0} n^{1 / 2} τ}{σ \sqrt{π} η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})} exp (- \frac{n {(γ_{min} - τ)}^{2}}{2 σ^{2} η_{min}^{- 1} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}), \\ p_{0} Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})})\} \\ \leq & max \{\frac{\sqrt{2} p_{0} n^{1 / 2} d_{2}^{1 / 2} τ}{σ \sqrt{π}}, \frac{p_{0}^{3 / 2} σ}{\sqrt{2 π} n^{1 / 2} d_{1}^{1 / 2} (γ_{min} - τ)}\} exp (- \frac{n d_{1} {(γ_{min} - τ)}^{2}}{2 σ^{2}}) \\ \to 0 . \end{matrix}

(A1)

Similarly,

\begin{matrix} q Φ (- \frac{n^{1 / 2} (γ_{min} - τ)}{p_{0}^{1 / 2} σ η_{min}^{- 1 / 2} (\frac{1}{n} X_{A^{*}}^{T} X_{A^{*}})}) \\ \leq \frac{p_{0}}{\sqrt{2 π}} \frac{p_{0}^{1 / 2} σ}{n^{1 / 2} d_{1}^{1 / 2} (γ_{min} - τ)} exp (- \frac{n d_{1} {(γ_{min} - τ)}^{2}}{2 p_{0} σ^{2}}) \to 0 . \end{matrix}

(A2)

Because

a_{l p} - 1 < a_{λ} < a_{C}

and

a_{l p} \geq 0

, it follows that

(\frac{n C_{min}}{20 σ^{2}} - (α + 1) log (p + 1) - \frac{n λ}{4 σ^{2}}) \to \infty,

(A3)

and

(\frac{(α - 1) n λ}{6 α σ^{2}} - (1 + \frac{1}{α}) (log (p + 1) - \frac{5}{3})) \to \infty .

(A4)

Combining the limits (A1)–(A4), we obtain the selection consistency from Theorem 1.

Asymptotic properties: From Equation (6) we know that

{\hat{β}}_{A^{*}}^{o l s}

has a multivariate normal distribution with mean

β_{A^{*}}^{*}

and covariance matrix

\frac{σ^{2}}{n} {(\frac{X_{A^{*}}^{⊤} X_{A^{*}}}{n})}^{- 1}

. By the selection consistency in part (A) we obtain the asymptotic distribution (7). □

Appendix A.2. The Details of Algorithm

The objective function in the first minimization step (11) is a quadratic function plus a TLP function. This is a non-convex minimization which can be solved by the difference of the convex (DC) method [17,23]. The DC method is used to decompose a non-convex function into the difference of convex functions so that the algorithms and properties of convex optimization can be applied. By DC decomposition and some calculations, the minimization subproblem (11) is equivalent to minimize (A5) with regard to

β

,

\begin{matrix} h^{(m)} (β) = L (β) + \frac{λ}{τ} \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} | β_{i, j} | I ({\hat{β}}_{i, j}^{(m - 1)} \leq τ) . \end{matrix}

(A5)

Let us begin with the DC decomposition of the objective function

h (β) = h_{1} (β) - h_{2} (β)

, where

\begin{matrix} h_{1} (β) = L (β) + λ \sum_{j = 1}^{p} J_{1} (| β_{i, j} |), h_{2} (β) = λ \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} J_{2} (| β_{i, j} |) \\ L (β) = n^{- 1} {∥ Y - X β ∥}_{2}^{2} + \frac{ρ}{2} {∥ β - α^{k} + μ^{k} ∥}_{2}^{2} \\ J_{1} (| β_{i, j} |) = \frac{| β_{i, j} |}{τ}, J_{2} (| β_{i, j} |) = max {\frac{| β_{i, j} |}{τ} - 1, 0} . \end{matrix}

It is clear that

L (β)

is a convex function of

β

. Using the DC decomposition, a sequence of linear approximations

h^{(m)} (β)

of

h (β)

is constructed. Let

\nabla h_{2}

be the subgradient of

h_{2}

. At the m-th iteration, replacing

h_{2} (β)

by its majorization, we obtain

\begin{matrix} h^{(m)} (β) = h_{1} (β) - (h_{2} ({\hat{β}}^{(m - 1)}) + (| β | - | {\hat{β}}^{(m - 1)} {|)}^{⊤} \nabla h_{2} (| {\hat{β}}^{(m - 1)} |)), \end{matrix}

where

| β |

denotes the vector obtained by replacing each component of

β

with its absolute value. After ignoring the terms that are independent of

β

, the

h^{(m)} (β)

becomes

\begin{matrix} h^{(m)} (β) = L (β) + \frac{λ}{τ} \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} | β_{i, j} | I ({\hat{β}}_{i, j}^{(m - 1)} \leq τ) . \end{matrix}

(A6)

Minimizing (A6) gives the updated value

{\hat{β}}^{(m)}

. Repeat the process procedure until convergence.

Then, for the minimizing of (A6), we can use the adaptive weights

λ_{i, j} = \frac{λ}{τ} I (| {\hat{β}}_{i, j}^{(m - 1)} | \leq τ)

, (A6) can be rewritten as

\begin{matrix} h^{(m)} (β) = L (β) + \sum_{i = 1}^{q} \sum_{j = 1}^{g_{i}} λ_{i, j} | β_{i, j} | . \end{matrix}

(A7)

Note that the second term in (A7) is a separable function in variable

β

. This property enables us to use coordinate descent algorithm [24,25] for minimization. Similar to the results of regular Lasso [36], the updating formula for

β

is

\begin{matrix} β_{i, j} \leftarrow {(\frac{2}{n} X_{i, j}^{T} X_{i, j} + ρ)}^{- 1} S (Z_{i, j}, λ_{i, j}), \end{matrix}

(A8)

were

Z_{i, j} = \frac{2}{n} X_{i, j}^{T} r_{i, j} - ρ (- α_{i, j} + μ_{i, j})

,

r_{i, j} = Y - X_{- (i, j)} β_{- (i, j)}

,

X_{i, j}

is the j-th column of

X_{i}

,

X_{- (i, j)}

is the matrix

X

with the j-th column of

X_{i}

deleted, and

β_{- (i, j)}

is defined similarly.

S (z, λ) = Sign (z) (| z | - λ)

is the soft-thresholding operator. Updating Formula (A8) is cycled through all variables in turn. Repeated iteration of (A8) until convergence gives the estimate

{\hat{β}}^{(m)}

.

Finally, with the inner iteration (A8) and outer iteration Algorithm A1, we solve the minimization subproblem (11).

We summarize the DC algorithm for minimization (11) below.

Algorithm A1 DC algorithm for the minimization of (11).

(Initialization) Use $β^{k}$ in k-th ADMM iteration as the initial estimate ${\hat{β}}^{(0)}$ .
(Iteration) At m-th DC iteration, compute ${\hat{β}}^{(m)}$ by minimizing (A5).
(Termination) Stop when $| h ({\hat{β}}^{(m - 1)}) - h ({\hat{β}}^{(m)}) | < ϵ$ and no components of ${\hat{β}}^{(m)}$ is at $\pm τ$ .

Next, we note that computing the projection (12) directly is not easy. Let

C_{1} = {β : \sum_{j = 1, \dots, g_{i}} β_{j} = ω_{i}, i = 1, \dots, q}

. We divide the projection into two easier sequential projections: the first one is the projection from

R^{p}

onto space

C_{1}

, and the second one is the projection from

C_{1}

to

C

. The Theorem A1 below will guarantee the equivalence between the direct projection on

C

and the two sequential projections.

Theorem A1.

In Euclidean space, projection onto space

C

β \overset{\prod_{C}}{\to} β_{2}

is equivalent to the sequential projections onto space

C_{1}

and space

C

:

\begin{matrix} β \overset{\prod_{C_{1}}}{\to} β_{1} \overset{\prod_{C}}{\to} β_{2} . \end{matrix}

(A9)

There exists a closed-form solution for

β_{1} = \prod_{C_{1}} β

.

Let

β_{1} = {(β_{11}, \dots, β_{1 p})}^{⊤}

. Assume there are

m_{i}

elements in the i-th group

g_{i}

. We have

\begin{matrix} β_{1 j} = β_{j} + \frac{ω_{i} - \sum_{j = 1}^{g_{i}} β_{j}}{m_{i}}, i \in {1, \dots, q}, j \in {1, \dots, g_{i}} . \end{matrix}

(A10)

The computation of the second projection is given in the proof of Theorem A1. The composition of these two sequential projections updates the value of

α

. The third step (13) is a direct calculation.

Proof of Theorem A1.

Without loss of generality, we assume that all variables belong to the same group. The proof is similar to the case when the number of groups for variables is more than 1. The parameter space

C_{1}

becomes

C_{1} = {β : \sum_{j} β_{j} = 1}

.

Denote

β_{1} = \prod_{C_{1}} β

, and

β_{2} = \prod_{C} β_{1}

, where

β_{1} = {(β_{11}, \dots, β_{1 p})}^{⊤}

, and

β_{2} = {(β_{21}, \dots, β_{2 p})}^{⊤}

. The closed-form formula for

\prod_{C_{1}}

is given by (A10).

Now we are to calculate the projection

\prod_{C_{2}}

. We assume that

β_{1}

has u positive elements, v zero elements, and

p - u - v

negative elements:

\{\begin{matrix} β_{1 j} > 0 & 1 \leq j \leq u \\ β_{1 j} = 0 & u + 1 \leq j \leq u + v \\ β_{1 j} < 0 & u + v + 1 \leq j \leq p . \end{matrix}

(A11)

Let

c_{u} = \frac{1 - \sum_{j = 1}^{u} β_{1 j}}{u}

. The projection

β_{2} = \prod_{C} β_{1}

is given by

β_{2 j} = \{\begin{matrix} β_{1 j} + c_{u} & 1 \leq j \leq u \\ 0 & u + v + 1 \leq j \leq p . \end{matrix}

(A12)

It is possible that some of

β_{2 j}

,

1 \leq j \leq u

are negative. If this is the case, we will repeat the above procedure (A11) and (A12) until there are no negative elements in

β_{2}

. This completes the proof. □

Appendix A.3. The Details of Simulation

In the simulation study, we consider the model

y = X β + σ ϵ

. The covariates

X

are generated from a multivariate normal distribution with all marginal distributions being standard normal distribution

N (0, 1)

. The correlation matrix of

X

is denoted by

R

.

We assume three settings for the correlation matrix

R

:

(1): Independence structure: $R$ is the identity matrix;
(2): $A R (1)$ correlation without group structure: $R = {(r_{j, l})}_{j, l = 1}^{p}$ , where $r_{j, l} = 0 . 5^{| j - l |}$ for $j, l = 1, \dots, p$ ;
(3): $A R (1)$ correlation with group structure: $R = {(r_{j, l})}_{j, l = 1}^{p}$ , where $r_{j, l} = 0 . 5^{| j - l |}$ for $j, l = 1, \dots, g_{i}$ and $r_{j, l} = 0$ for $j = 1, \dots g_{i}$ and $l = 1, \dots, g_{i^{'}}$ , $i \neq i^{'}$ .

As to the dimension of the covariates

X

, two cases

n < p

and

n > p

will be considered. We use

p = 100

. The parameter vector

β

is divided into 5 groups, with 20 elements in each group. The numbers of non-zero elements in each group are 3, 3, 1, 2, and 5, respectively. The true values of

β

are

\begin{matrix} β^{0} = (\underset{︸}{0.12, 0.04, 0.05, \begin{matrix} \underset{︸}{0, \dots, 0} \\ 17 \end{matrix}}, \underset{︸}{0.09, 0.02, 0.03, \begin{matrix} \underset{︸}{0, \dots, 0} \\ 17 \end{matrix}}, \underset{︸}{0.17, \begin{matrix} \underset{︸}{0, \dots, 0} \\ 19 \end{matrix}}, \\ \underset{︸}{0.2, 0.12, \begin{matrix} \underset{︸}{0, \dots, 0} \\ 18 \end{matrix}}, \underset{︸}{0.02, 0.04, 0.03, 0.02, 0.05, \begin{matrix} \underset{︸}{0, \dots, 0} \\ 15 \end{matrix}}) . \end{matrix}

The sum of coefficients for each group are

γ_{1} = 0.21

,

γ_{2} = 0.14

,

γ_{3} = 0.17

,

γ_{4} = 0.32

, and

γ_{5} = 0.16

, respectively. Besides, the numbers of zero elements in each group are 17, 17, 19, 18, and 15, respectively. The sample size n is set to be 50 and 2000, for low- and high-dimensional cases. The distribution of error term

ϵ

is the normal distribution

N (0, 0 . 05^{2})

, and the scaling parameter

σ

takes two values:

σ = 1

and

σ = 3

.

In each experiment, we randomly divide a dataset into training, tuning, and testing sets of 60%, 20%, and 20% of original sample sizes, respectively. We repeat the experiment 100 times and report the means of ME, TP, FP, PSR, NSR and the standard error of ME for each of the simulation settings.

As to tuning parameter selection, we will use five-fold cross-validation. The optimal choice of tuning parameters

λ

and

τ

can be found by grid search [17]. Finding an optimal value for

ρ

is not a straightforward problem [22], however, ref. [21] provides many heuristics that work well in practice. Here, we find out that a good choice of

ρ

is around 1.

References

Barber, B.M.; Odean, T. Trading is hazardous to your wealth: The common stock investment performance of individual investors. J. Financ. 2000, 55, 773–806. [Google Scholar] [CrossRef]
Jansen, R.; Van Dijk, R. Optimal benchmark tracking with small portfolios. J. Portf. Manag. 2002, 28, 33–39. [Google Scholar] [CrossRef]
Ben-David, I.; Franzoni, F.; Moussawi, R. Exchange-traded funds. Annu. Rev. Financ. Econ. 2017, 9, 169–189. [Google Scholar] [CrossRef]
Fuller, S.L. The evolution of actively managed exchange-traded funds. Rev. Secur. Commod. Regul. 2008, 41, 89–96. [Google Scholar]
Santos, A.A. Beating the market with small portfolios: Evidence from Brazil. EconomiA 2015, 16, 22–31. [Google Scholar] [CrossRef] [Green Version]
Tabata, Y.; Takeda, E. Bicriteria Optimization Problem of Designing an Index Fund. J. Oper. Res. Soc. 1995, 46, 1023–1032. [Google Scholar] [CrossRef]
Ammann, M.; Zimmermann, H. Tracking Error and Tactical Asset Allocation. Financ. Anal. J. 2001, 57, 32–43. [Google Scholar] [CrossRef] [Green Version]
Jagannathan, R.; Ma, T. Risk reduction in large portfolios: Why imposing the wrong constraints helps. J. Financ. 2003, 58, 1651–1684. [Google Scholar] [CrossRef] [Green Version]
Strub, O.; Baumann, P. Optimal construction and rebalancing of index-tracking portfolios. Eur. J. Oper. Res. 2018, 264, 370–387. [Google Scholar] [CrossRef]
Vieira, E.B.F.; Filomena, T.P.; Sant’anna, L.R.; Lejeune, M.A. Liquidity-constrained index tracking optimization models. Ann. Oper. Res. 2021, 1–46. [Google Scholar] [CrossRef]
Li, X.P.; Shi, Z.L.; Leung, C.S.; So, H.C. Sparse Index Tracking with K-Sparsity or ε-Deviation Constraint via ℓ₀-Norm Minimization. IEEE Trans. Neural Netw. Learn. Syst. 2022. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, B.; Hospedales, T.M.; Yang, Y. Index tracking with cardinality constraints: A stochastic neural networks approach. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1242–1249. [Google Scholar]
Zhang, C.; Liang, S.; Lyu, F.; Fang, L. Stock-index tracking optimization using auto-encoders. Front. Phys. 2020, 8, 388. [Google Scholar] [CrossRef]
Demiguel, V.; Garlappi, L.; Nogales, F.; Uppal, R. A Generalized Approach to Portfolio Optimization: Improving Performance by Constraining Portfolio Norms. Manag. Sci. 2009, 55, 798–812. [Google Scholar] [CrossRef] [Green Version]
Brodie, J.; Daubechies, I.; De Mol, C.; Giannone, D.; Loris, I. Sparse and stable Markowitz portfolios. Proc. Natl. Acad. Sci. USA 2009, 106, 12267–12272. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Zhang, J.; Yu, K. Vast portfolio selection with gross-exposure constraints. J. Am. Stat. Assoc. 2012, 107, 592–606. [Google Scholar] [CrossRef] [Green Version]
Shen, X.; Pan, W.; Zhu, Y. Likelihood-based selection and sharp parameter estimation. J. Am. Stat. Assoc. 2012, 107, 223–232. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
James, G.M.; Paulson, C.; Rusmevichientong, P. Penalized and Constrained Optimization: An Application to High-Dimensional Website Advertising. J. Am. Stat. Assoc. 2019, 115, 1–31. [Google Scholar] [CrossRef]
Gaines, B.R.; Kim, J.; Zhou, H. Algorithms for Fitting the Constrained Lasso. J. Comput. Graph. Stat. 2018, 27, 861–871. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Wahlberg, B.; Boyd, S.; Annergren, M.; Wang, Y. An ADMM algorithm for a class of total variation regularized estimation problems. IFAC Proc. Vol. 2012, 45, 83–88. [Google Scholar] [CrossRef] [Green Version]
Tao, P.D.; An, L.T.H. Convex analysis approach to dc programming: Theory, algorithms and applications. Acta Math. Vietnam. 1997, 22, 289–355. [Google Scholar]
Wu, T.T.; Lange, K. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2008, 2, 224–244. [Google Scholar] [CrossRef]
Wright, S.J. Coordinate descent algorithms. Math. Program. 2015, 151, 3–34. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
Osborne, M.R.; Presnell, B.; Turlach, B.A. On the lasso and its dual. J. Comput. Graph. Stat. 2000, 9, 319–337. [Google Scholar]
Benidis, K.; Feng, Y.; Palomar, D.P. Sparse Portfolios for High-Dimensional Financial Index Tracking. IEEE Trans. Signal Process. 2018, 66, 155. [Google Scholar] [CrossRef]
Wang, Y.; Yin, W.; Zeng, J. Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 2015, 78, 29–63. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Chen, J. Tournament screening cum EBIC for feature selection with high-dimensional feature spaces. Sci. China Ser. A Math. 2009, 52, 1327–1341. [Google Scholar] [CrossRef]
Benidis, K.; Palomar, D.P. sparseIndexTracking: Design of Portfolio of Stocks to Tracks an Index; R Package Version 0.1.0. Available online: https://CRAN.R-project.org/package=sparseIndexTracking (accessed on 1 September 2019).
Fu, A.; Narasimhan, B.; Boyd, S. CVXR: An R Package for Disciplined Convex Optimization. J. Stat. Soft. 2020, 94, 1–34. [Google Scholar] [CrossRef]
Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R. A sparse-group lasso. J. Comput. Graph. Stat. 2013, 22, 231–245. [Google Scholar] [CrossRef]
Shen, X.; Pan, W.; Zhu, Y.; Zhou, H. On constrained and regularized high-dimensional regression. Ann. Inst. Stat. Math. 2013, 65, 807–832. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gasull, A.; Utzet, F. Approximating Mills ratio. J. Math. Anal. Appl. 2014, 420, 1832–1853. [Google Scholar] [CrossRef] [Green Version]
Friedman, J.; Hastie, T.; Höfling, H.; Tibshirani, R. Pathwise coordinate optimization. Ann. Appl. Stat. 2007, 1, 302–332. [Google Scholar]

Figure 1. The cumulative profit and loss of portfolios built by different methods and the index in 2018H1.

Figure 2. The cumulative profit and loss of portfolios built by different methods and the index in 2018H2.

Table 1. The periods with sector risk for CSI300 index from 2014 to 2018.

Period	CSI300	Gainer		Loser
October 2018–November 2018	−4.16%	Financials	2.50%	Consumer Staple	−13.18%
February 2018–March 2018	−5.23%	Information	6.44%	Financials	−9.17%
June 2017–August 2017	9.75%	Consumer Staple	17.51%	Utilities	−3.77%
October 2015–December 2015	8.19%	Information	26.40%	Energy	3.18%
November 2012–January 2013	17.01%	Financials	30.92%	Consumer Staple	−5.94%

Table 2. The comparison of five methods in the case of

n > p

, independent correlation matrix.

Table 2. The comparison of five methods in the case of

n > p

, independent correlation matrix.

	Method	ME	SE	TP	FP	PSR	NSR	Runtime (s)
$σ = 1$	CTLP	0.0742	0.0025	14	18.85	100.00%	21.92%	2.78
	C $L_{1}$	0.0741	0.0025	14	52.36	100.00%	60.88%	1.10
	TLP	0.1650	0.0042	14	6.77	100.00%	7.87%	0.12
	ETE	0.0858	0.0026	14	18.76	100.00%	21.81%	0.02
	HETE	0.0924	0.0026	14	18.8	100.00%	21.86%	1.45
$σ = 3$	CTLP	0.6415	0.0202	13.98	15.64	99.86%	18.19%	1.66
	C $L_{1}$	0.6712	0.0198	13.98	31.55	99.86%	36.69%	1.00
	TLP	1.2989	0.0338	13.55	7.52	96.79%	8.74%	0.11
	ETE	0.7673	0.0215	13.95	18.74	99.64%	21.79%	0.02
	HETE	0.9818	0.0278	13.9	19.33	99.29%	22.48%	1.47

Table 3. The comparison of five methods in the case of

n > p

,

A R (1)

correlation matrix without group structure.

Table 3. The comparison of five methods in the case of

n > p

,

A R (1)

correlation matrix without group structure.

	Method	ME	SE	TP	FP	PSR	NSR	Runtime (s)
$σ = 1$	CTLP	0.0697	0.0024	14	16.65	100.00%	19.36%	2.81
	C $L_{1}$	0.0696	0.0024	14	51.04	100.00%	59.35%	1.10
	TLP	0.1382	0.0043	14	6.74	100.00%	7.84%	0.14
	ETE	0.0796	0.0025	14	16.22	100.00%	18.86%	0.02
	HETE	0.0884	0.0026	14	16.21	100.00%	18.85%	1.49
$σ = 3$	CTLP	0.5956	0.0171	13.98	12.92	99.86%	15.02%	1.95
	C $L_{1}$	0.6161	0.0179	13.97	29.21	99.79%	33.97%	1.12
	TLP	1.0480	0.0268	13.81	7.26	98.64%	8.44%	0.13
	ETE	0.7136	0.0196	13.95	16.2	99.64%	18.84%	0.02
	HETE	0.9427	0.0285	13.91	17.73	99.36%	20.62%	1.43

Table 4. The comparison of five methods in the case of

n > p

,

A R (1)

correlation matrix with group structure.

Table 4. The comparison of five methods in the case of

n > p

,

A R (1)

correlation matrix with group structure.

	Method	ME	SE	TP	FP	PSR	NSR	Runtime (s)
$σ = 1$	CTLP	0.0651	0.0020	14	15.26	100.00%	17.74%	2.75
	C $L_{1}$	0.0651	0.0020	14	50.32	100.00%	58.51%	1.04
	TLP	0.1337	0.0039	14	6.28	100.00%	7.30%	0.14
	ETE	0.0765	0.0022	14	15.32	100.00%	17.81%	0.02
	HETE	0.0841	0.0025	14	15.4	100.00%	17.91%	1.42
$σ = 3$	CTLP	0.5740	0.0171	13.99	13.36	99.93%	15.53%	1.96
	C $L_{1}$	0.5836	0.0185	13.99	44.79	99.93%	52.08%	1.20
	TLP	0.9061	0.0276	13.89	10.65	99.21%	12.38%	0.13
	ETE	0.6742	0.0203	13.96	17.07	99.71%	19.85%	0.02
	HETE	0.9259	0.0253	13.9	17.05	99.29%	19.83%	1.49

Table 5. The comparison of four methods in the case of

n < p

, independent correlation matrix.

Table 5. The comparison of four methods in the case of

n < p

, independent correlation matrix.

	Method	ME	SE	TP	FP	PSR	NSR	Runtime (s)
$σ = 1$	CTLP	0.1352	0.0114	13.61	18.27	97.21%	21.24%	2.35
	C $L_{1}^{*}$	0.1758	0.0101	13.65	19.10	97.50%	22.21%	0.64
	TLP	0.3433	0.0299	11.09	12.77	79.21%	14.85%	0.10
	ETE	0.1724	0.0134	13.34	18.39	95.29%	21.38%	0.07
	HETE	0.1772	0.0134	13.32	18.2	95.14%	21.16%	0.33
$σ = 3$	CTLP	0.9915	0.0441	10.53	12.08	75.21%	14.05%	2.73
	C $L_{1}^{*}$	0.9599	0.0437	11.31	17.34	80.78%	20.16%	0.63
	TLP	1.7058	0.0816	6.08	4.25	43.43%	4.94%	0.04
	ETE	1.0885	0.0432	10.01	16.54	71.50%	19.23%	0.03
	HETE	1.2494	0.0507	9.8	15.73	70.00%	18.29%	0.40

Table 6. The comparison of four methods in the case of

n < p

,

A R (1)

correlation matrix without group structure.

Table 6. The comparison of four methods in the case of

n < p

,

A R (1)

correlation matrix without group structure.

	Method	ME	SE	TP	FP	PSR	NSR	Runtime (s)
$σ = 1$	CTLP	0.1268	0.0088	13.69	14.68	97.79%	17.07%	2.17
	C $L_{1}^{*}$	0.1328	0.0062	13.79	16.86	98.50%	19.60%	0.65
	TLP	0.3060	0.0261	11.87	11.02	84.79%	12.81%	0.10
	ETE	0.1643	0.0111	13.36	15.02	95.43%	17.47%	0.06
	HETE	0.1628	0.0110	13.33	14.71	95.21%	17.10%	0.31
$σ = 3$	CTLP	1.0728	0.0772	10.79	11.67	77.07%	13.57%	2.67
	C $L_{1}^{*}$	0.8185	0.0304	11.24	15.54	80.29%	18.07%	0.64
	TLP	1.8885	0.0644	7.03	4.53	50.21%	5.27%	0.05
	ETE	1.0795	0.0439	10.86	15.4	77.57%	17.91%	0.03
	HETE	1.2447	0.0507	10.39	15.27	74.21%	17.76%	0.37

Table 7. The comparison of four methods in the case of

n < p

,

A R (1)

correlation matrix with group structure.

Table 7. The comparison of four methods in the case of

n < p

,

A R (1)

correlation matrix with group structure.

	Method	ME	SE	TP	FP	PSR	NSR	Runtime (s)
$σ = 1$	CTLP	0.1189	0.0067	13.71	16.23	97.93%	18.87%	2.03
	C $L_{1}^{*}$	0.1328	0.0082	13.66	16.66	97.57%	19.37%	0.63
	TLP	0.2726	0.0229	12.94	15.04	92.43%	17.49%	0.10
	ETE	0.1600	0.0100	13.62	16.03	97.29%	18.64%	0.05
	HETE	0.1625	0.0104	13.57	16	96.93%	18.60%	0.32
$σ = 3$	CTLP	1.0246	0.0526	10.7	11.09	76.43%	12.90%	2.24
	C $L_{1}^{*}$	0.8340	0.0472	11.36	15.41	81.14%	17.92%	0.62
	TLP	1.9927	0.0993	6.72	4.12	48.00%	4.79%	0.05
	ETE	1.0223	0.0412	10.88	14.78	77.71%	17.19%	0.03
	HETE	1.0964	0.0404	10.53	13.94	75.21%	16.21%	0.39

Table 8. The performance of the selected portfolio. By different methods in eight periods. Mean, Std. Dev and Num stand for the mean, the standard deviation of tracking error and the number of stocks in the portfolio. In the columns of period, for example, 2018H1 stands for the first half year of 2018.

Period	Method	Mean	Std. Dev	Num	Period	Method	Mean	Std. Dev	Num
2018H2	CTLP	0.018%	0.0016	124	2016H2	CTLP	0.015%	0.0013	107
	C $L_{1}^{*}$	0.028%	0.0016	297		C $L_{1}^{*}$	0.027%	0.0007	300
	SGL	0.123%	0.0126	57		SGL	−0.073%	0.0056	65
	TLP	0.081%	0.0046	42		TLP	0.020%	0.0020	52
	ETE	0.031%	0.0013	188		ETE	0.023%	0.0009	219
	HETE	0.026%	0.0013	98		HETE	0.016%	0.0011	128
	EW	0.091%	0.0064	300		EW	−0.016%	0.0022	300
	IVW	0.087%	0.0066	300		IVW	0.055%	0.0060	300
2018H1	CTLP	0.008%	0.0031	116	2016H1	CTLP	0.002%	0.0022	75
	C $L_{1}^{*}$	0.014%	0.0017	126		C $L_{1}^{*}$	0.025%	0.0021	120
	SGL	0.043%	0.0093	57		SGL	0.062%	0.0109	90
	TLP	0.024%	0.0051	34		TLP	0.031%	0.0033	49
	ETE	0.022%	0.0013	129		ETE	0.031%	0.0018	140
	HETE	0.024%	0.0013	79		HETE	0.033%	0.0017	121
	EW	0.018%	0.0086	300		EW	0.052%	0.0040	300
	IVW	0.027%	0.0079	300		IVW	0.052%	0.0035	300
2017H2	CTLP	−0.008%	0.0019	123	2015H2	CTLP	0.055%	0.0034	117
	C $L_{1}^{*}$	−0.005%	0.0015	296		C $L_{1}^{*}$	0.064%	0.0021	130
	SGL	−0.095%	0.0060	35		SGL	0.195%	0.0208	84
	TLP	−0.040%	0.0035	25		TLP	0.135%	0.0087	51
	ETE	−0.012%	0.0013	220		ETE	0.064%	0.0017	179
	HETE	−0.017%	0.0016	84		HETE	0.062%	0.0019	121
	EW	−0.015%	0.0048	300		EW	0.076%	0.0049	300
	IVW	0.020%	0.0040	300		IVW	−0.935%	0.0479	300
2017H1	CTLP	−0.014%	0.0012	129	2015H1	CTLP	0.060%	0.0032	120
	C $L_{1}^{*}$	−0.016%	0.0009	299		C $L_{1}^{*}$	−0.011%	0.0020	147
	SGL	0.050%	0.0058	11		SGL	−0.358%	0.0153	58
	TLP	−0.024%	0.0032	33		TLP	−0.166%	0.0075	22
	ETE	−0.010%	0.0009	177		ETE	0.004%	0.0018	166
	HETE	−0.011%	0.0012	83		HETE	−0.009%	0.0024	90
	EW	0.017%	0.0023	300		EW	0.009%	0.0041	300
	IVW	0.015%	0.0017	300		IVW	0.045%	0.0176	300

Bold means the method has the smallest mean daily tracking error in this period.

Table 9. The average runtime of different methods.

	CTLP	C $L_{1}^{*}$	SGL	TLP	ETE	HETE
Runtime (s)	47.60	0.75	0.14	18.48	1.67	46.78

Table 10. The number of stocks and the sum of weights within sectors in selected portfolio by different methods and the index in 2018H1.

Sector	Number of Stocks						Sum of Weights
Sector	Index	CTLP	C $L_{1}^{*}$	TLP	ETE	HETE	Index	CTLP	C $L_{1}^{*}$	TLP	ETE	HETE
Materials	25	11	14	3	12	7	0.062	0.062	0.062	0.004	0.068	0.077
Communication	2	1	2	0	1	1	0.008	0.008	0.008	0.000	0.004	0.005
Real Estate	20	6	8	2	6	5	0.057	0.057	0.057	0.074	0.049	0.056
Industrials	60	14	22	5	22	10	0.137	0.137	0.137	0.117	0.133	0.131
Utilities	8	3	2	1	4	3	0.025	0.025	0.025	0.006	0.041	0.047
Financials	58	22	24	10	27	17	0.353	0.353	0.353	0.340	0.323	0.323
Consumer dis.	42	12	17	4	19	12	0.114	0.114	0.114	0.103	0.127	0.139
Energy	11	3	5	0	5	3	0.023	0.023	0.023	0.000	0.026	0.008
Consumer sta.	14	5	6	1	7	5	0.075	0.075	0.075	0.070	0.085	0.078
Information	42	16	18	5	17	8	0.100	0.100	0.100	0.088	0.065	0.050
Health Care	18	8	8	3	9	8	0.047	0.047	0.047	0.198	0.079	0.086

Table 11. The number of stocks and the sum of weights within sectors in selected portfolio by different methods and the index in 2018H2.

Sector	Number of Stocks						Sum of Weights
Sector	Index	CTLP	C $L_{1}^{*}$	TLP	ETE	HETE	Index	CTLP	C $L_{1}^{*}$	TLP	ETE	HETE
Materials	36	9	35	5	23	9	0.067	0.067	0.067	0.094	0.066	0.063
Communications	2	1	2	0	1	0	0.006	0.006	0.006	0.000	0.005	0.000
Real Estate	17	5	17	1	12	7	0.050	0.050	0.050	0.005	0.058	0.065
Industrials	57	14	57	10	25	12	0.122	0.122	0.122	0.132	0.118	0.128
Utilities	10	4	10	0	7	3	0.027	0.027	0.027	0.184	0.034	0.038
Financials	59	24	59	11	41	25	0.368	0.368	0.368	0.000	0.334	0.330
Consumer dis.	37	14	36	5	25	13	0.107	0.107	0.107	0.439	0.120	0.113
Energy	13	2	13	1	7	1	0.026	0.026	0.026	0.035	0.024	0.018
Consumer sta.	13	8	13	4	10	5	0.082	0.082	0.082	0.045	0.079	0.066
Information	34	10	33	4	23	14	0.080	0.080	0.080	0.046	0.098	0.108
Health Care	22	10	22	1	14	9	0.063	0.063	0.063	0.015	0.063	0.071

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Che, Y.; Chen, S.; Liu, X. Sparse Index Tracking Portfolio with Sector Neutrality. Mathematics 2022, 10, 2645. https://doi.org/10.3390/math10152645

AMA Style

Che Y, Chen S, Liu X. Sparse Index Tracking Portfolio with Sector Neutrality. Mathematics. 2022; 10(15):2645. https://doi.org/10.3390/math10152645

Chicago/Turabian Style

Che, Yuezhang, Shuyan Chen, and Xin Liu. 2022. "Sparse Index Tracking Portfolio with Sector Neutrality" Mathematics 10, no. 15: 2645. https://doi.org/10.3390/math10152645

APA Style

Che, Y., Chen, S., & Liu, X. (2022). Sparse Index Tracking Portfolio with Sector Neutrality. Mathematics, 10(15), 2645. https://doi.org/10.3390/math10152645

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sparse Index Tracking Portfolio with Sector Neutrality

Abstract

1. Introduction

2. Literature Review

3. Methodology

4. Theory

5. Computation

6. Simulation Results

6.1. Case 1

6.2. Case 2

7. Real Data Results

8. Summary

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorems

Appendix A.2. The Details of Algorithm

Appendix A.3. The Details of Simulation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI