1. Introduction
Many of the present day portfolio optimization techniques are based on the mean-variance optimization framework that was developed by
Markowitz (
1952). Due to the practical challenges associated with forecasting the mean returns, the prevalent popular portfolio risk optimization techniques require only the forecast of covariance of returns. Some of the notable risk-based portfolio allocation methods that rely only on covariance forecasts are the minimum variance
Clarke et al. (
2006), maximum diversification
Choueifaty and Coignard (
2008), equal risk budget
Leote et al. (
2012), and equal risk contribution
Maillard et al. (
2010).
The most well known and common estimator for the forecast of covariance of returns is the sample-based covariance. It is calculated from the time series of historical returns. For a covariance matrix of size
N there needs to be at least
independent and identically distributed (iid) returns observations to estimate the sample-based forecast. Therefore, in order to construct a covariance matrix of returns for 50 assets, one would ideally need at the least 5 years of daily returns time series, with the hope that they are iid data. There is ample evidence that asset returns exhibit heteroskedasticity with volatility clustering, and also that the correlation structures do not remain invariant for such long periods (
Zakamulin 2015;
Lopez de Prado 2016). There are broadly two major directions of work to address the above concern. The first approach is related to the development of better covariance forecast models. Some of the notable works in this direction are the shrinkage estimation of covariance matrix proposed by
Ledoit and Wolf (
2003) and the exponentially weighted covariance matrix that was popularized by
Riskmetrics (
1996). The sophisticated dynamic conditional correlation (DCC) model by
Engle (
2002), where the persistence in the variance and correlation dynamics is achieved by using a GARCH(1,1) type model is one of the most popular multivariate GARCH models for covariance forecasts. Another popular multivariate GARCH model is the constant conditional correlation (CCC) propsed by
Bollerslev et al. (
1990), where unlike the DCC-GARCH one uses a constant conditional correlation. The advantage of the CCC model is its easy estimation, although with an assumption that conditional correlations are time-invariant.
Hierarchical Risk Parity (HRP), as proposed in
Lopez de Prado (
2016), uses graph theory and machine learning algorithms to infer the hierarchical relationships between the assets which are then directly utilized for portfolio diversification. This approach, therefore, constitutes the second, more recent, direction of work to circumvent issues related to covariance matrix forecasts. Most of the traditional risk-based optimal allocations require the inversion of the covariance matrix, a step that is avoided in HRP. This provides an additional advantage to HRP, as the inversion of ill-conditioned matrices that is required in most risk-based portfolios can add significant estimation errors. The technique is extended in
Raffinot (
2017) where different methods for hierarchical clustering are employed and the robustness and performance of these algorithms with respect to traditional risk-based portfolios are studied.
Zakamulin (
2015) investigated the impact of the various covariance matrix forecasting methodologies on the performance of minimum variance and target volatility strategies. The study however does not pay attention to the performance of other popular risk-based allocation methodologies with these forecasting techniques. The impact of covariance matrix misspecification on the optimal weights that result from different risk-based optimization methods is reported in
Ardia et al. (
2017). The paper, however, does not study the impact of covariance matrix misspecification on portfolio performance, but rather just the portfolio weights. In
Trucíos et al. (
2019), the performance of one step ahead covariance estimates from various covariance forecasting methods was empirically studied using several performance metrics. In
Cesarone and Colucci (
2018), a CVaR-based ERC portfolio was introduced and its performance against other risk and capital diversification was empirically studied. While
Raffinot (
2017) showed better performance for HRP and its variants when compared to traditional risk-based allocation techniques, the study does not account for the impact of covariance misspecification on the outcomes due to the use of possible inferior covariance matrix forecasting methods.
We would also like to refer readers to the growing literature on the use of regularization techniques for improving out-of-sample performance of risk-based portfolios, with some notable works being
Brodie et al. (
2009);
Fastrich et al. (
2015) and
Carrasco and Noumon (
2011). As portfolios can be evaluated using multiple performance criteria,
Sawik (
2012) introduces a multi-objective portfolio model. A review of other robust optimization methods and their applications is provided in
Gabrel et al. (
2014).
The objectives of this paper are two-fold. The first objective is to empirically study whether there are covariance matrix forecasting methodologies that provide superior performance for both traditional risk-based, and machine learning-based portfolios. This is achieved by looking at the out-of-sample performance of the portfolios, constructed using covariance matrix obtained from different forecasting methodologies, at the daily, weekly, and monthly forecasting horizon. The second objective is to study if the more sophisticated machine learning algorithms provide a better portfolio performance when compared with the traditional risk-based portfolios that are constructed using appropriate covariance forecasting methodology. For both the objectives, we use the stationary bootstrap-based superior predictive ability (SPA) test proposed in
Hansen (
2005). The SPA test has been designed to evaluate whether an observed excess performance is significant or could have occurred by chance.
This paper is organized as follows.
Section 2 describes the various risk-based portfolio allocation methods considered in the paper, while
Section 3 describes the covariance forecast models.
Section 4 explains HRP, the machine learning based portfolio allocation approach, we consider in this work. In
Section 5 we describe the Data used and Methodology followed for the out-of-sample performance evaluations. We present our empirical results in
Section 6 and
Section 7 contains some concluding remarks.
3. Covariance Matrix Forecasting Methods
Given the time series of T past returns, we want to forecast the covariance of returns, of We discuss below the methods used in our study for the forecast of the covariance matrix.
3.1. Sample-Based Covariance (SMPL)
Following the notations of
Zakamulin (
2015) we assume that the vector of daily asset return is given by
where
is the vector of white noise on day
t such that
where
is
vector of zeros. To estimate the sample based covariance matrix, we use the rolling window of
T historical log returns. The covariance matrix on day t is given by
Because of the time aggregation property of log-returns the covariance matrix for weekly and monthly returns projection can be obtained as the sum of the iterated 1 day ahead covariance predictions.
3.2. Exponentially Weighted Moving Average (EWMA)
This estimator is designed to focus more on the recent past returns, a method that was popularized by the
Riskmetrics (
1996). The exponentially weighted covariance matrix is estimated by using the following recursion:
where based on the recommendation of the RiskMetrics group a decay constant
for daily returns is used. We calculate the forecast for weekly and monthly EWMA covariance matrix by multiplying the daily covariance matrix by the number of days in the subsequent week and month, respectively.
3.3. Dynamic Conditional Correlation GARCH (DCC-GARCH)
The DCC GARCH proposed by
Engle (
2002), models two latent processes, the conditional variance
and the conditional correlation
where both
and
is diagonal. The conditional covariance is then given by:
The elements of conditional variance
are modelled using the univariate GARCH(1,1) model so as to incorporate the conditional heteroskedasticity. This can be compactly written as
where
similarly
and ⊙ is the Hadamard product.
The conditional correlation is modelled as
where
is the unconditional correlation matrix of
The parameter estimation is done via a two stage optimization, where first the parameters, that maximize the log-likelihood of the conditional variance are determined. In the second stage, the values of , and that maximize the log likelihood of the conditional correlation are determined while taking into account the results from stage one. We use the log-returns for calibrating the model, as then using the time aggregation property of log-returns the covariance matrix for weekly and monthly returns projection can be obtained as the sum of the iterated 1 day ahead covariance predictions.
4. Hierarchical Risk Parity (HRP)
The traditional risk-based portfolios are sensitive to the accuracy of the forecasted covariance matrix (see
Ardia et al. 2017;
Zakamulin 2015). When the assets are highly correlated, there is a greater need for diversification. However, for highly correlated returns, the condition number of the covariance matrix, i.e., the ratio between its maximal and minimal eigenvalues is large. The weights calculated when the covariance matrix has a high condition number can have large estimation errors, as their calculations involve inversion of the covariance matrix. Therefore, the benefit of diversification in such a case cannot be materialized, due to the large estimation errors for the portfolio. This as
De Prado (
2018) refers, is known as the Markowitz’s curse.
The Hierarchical Risk Parity (
Lopez de Prado 2016) approach addresses the problems of traditional risk-based portfolio optimisation by using the covariance matrix without inverting it. In essence, HRP calculates the inverse volatility weights for groups of similar assets, that are iteratively scaled down as one moves to even smaller sub-groups until each asset forms a subgroup. The algorithm operates in three stages. The first step involves determining the hierarchical relationships between the assets using a recursive cluster formation scheme. The clusters are formed using correlations to identify similar groups of assets that are successively merged until one large cluster. The next stage involves the quasi-diagonalisation of the covariance matrix by rearranging rows and columns based on the information from the first stage. The aim of the second stage is to achieve a more diagonal representation of the covariance matrix with high correlations placed close to each other and therefore, the diagonal. Quasi-diagonalisation ensures that similar investments are grouped together and dissimilar ones are kept fairly apart. After this quasi-diagonalisation of the covariance matrix, weights are distributed using inverse variance allocation between sub-groups that are obtained by recursively bisecting the rearranged covariance matrix from the second stage. We here detail the three stages of the HRP algorithm.
4.1. Clustering
Clustering is a partitioning technique to group data points based on their characteristics. In the case of HRP, the correlation coefficient is used as the characteristic to measure the similarity between time series, and therefore to cluster assets that have similar time series. HRP uses an agglomerative nesting for clustering, where initially all the individual assets behave as a separate cluster. Then, on the basis of their correlation, they start forming bigger clusters until all the similar assets are clustered together. First, a suitable distance metric is defined as:
where
is the correlation-distance index between
and
asset and
is the corresponding Pearson’s correlation coefficient. Matrix
defined in such a way will be an appropriate metric space (see
De Prado 2018 for proof). Next, a matrix that defines Euclidean distance between any two columns of
D is defined as
whose elements are
Agglomerative clustering starts with every asset representing a single cluster. At each step, the closest two clusters are merged into one. The measure of dissimilarity between the clusters is known as the linkage criterion.
There are three different agglomerative clustering linkage criteria that are used in this study.
Single Linkage: The single linkage (SL) clustering method keeps the distance between two clusters as the minimum of the distance between any two points in the clusters such that:
This method is simpler to implement but sensitive to outliers and might result in long chained clusters
Raffinot (
2017).
Average Linkage: In the average linkage (AL) technique the distance is defined by the average of the distance between any two points in the clusters. For clusters
:
Ward’s Method: The most popularly used method is Ward’s method (
Ward 1963). It says that the distance between two clusters is how much the sum of squared errors will increase when they are merged:
where
are the cluster sizes and
are the center of the clusters
. It starts at zero, and then grows as clusters merge.
Figure 1 gives a schematic of the outcome of agglomerative clustering of the assets.
4.2. Quasi Diagonalisation
This step of the HRP algorithm, rearranges the covariance matrix using the information from the clustering algorithm. It places the assets with high correlations adjacently and close to the matrix diagonal, making sure that similar assets are placed together. It allows us to allocate weights optimally following an inverse-volatility allocation described below.
4.3. Recursive Bisection
The weights are allocated by inverse-volatility technique between two clusters and are scaled down as each cluster is recursively bisected until a single asset is left in each cluster. The algorithm for recursive bisection has the following steps (see
Lopez de Prado 2016 for details):
Initialize a list of assets in the portfolio with with .
Initialize a vector of weights as
Stop if
For each such that
Bisect into two subsets, where
Calculate the variance of
as
where
is the covariance matrix of the elements within cluster
and
which is the inverse volatility weight for the elements of the cluster.
Compute the split factor
Rescale allocations by a factor of
Rescale allocations by a factor
Loop to Step 2.
5. Data and Methodology
The optimal weights in a portfolio depend on the general level of correlation between the assets of the investment universe
Schumann (
2013), and the specific composition of the investment universe
Bertrand and Lapointe (
2018). We try and capture different correlation and composition structures by creating five different universes as summarized in
Table 1. We use the individual stocks that comprise the NIFTY 50 index of the National Stock Exchange (NSE) in India, to create these sub-universes. The first universe includes the top 10 stocks in the financial sector, the second includes the top 10 stocks by market capitalization (as of December 2016), the third and fifth universe contain randomly selected individual stocks from NIFTY 50, and the fourth universe contains individual stocks from the energy sector. The composition of the five universes is listed in
Appendix A.
For each dataset, we divide the observations into an estimation period and an evaluation period:
We use the daily adjusted closing prices of the individual stocks from November 2010 to December 2016 (a total of
) observations for the estimation period. The parameters of the three covariance forecast models, as described in
Section 3, are estimated using the first
T inter-day observations. We consider following three cases for portfolio rebalancing, (a) daily
, (b) weekly
and (c) monthly
For daily rebalancing, we need to forecast, given the returns up until time
and the model parameters calibrated using the rolling window of
T past observations
1, the covariance of returns on the
th day. To obtain the weekly
and monthly
covariance forecasts, we sum the iterated 1-step ahead covariance predictions using the parameters calibrated for
Iterated sum of daily covariance forecasts to obtain weekly and monthly forecasts is possible if we work with log returns, because of its time aggregation property. The
h-period forecast of covariance matrix of the log returns is then converted to
h-period covariance matrix of linear returns (See
Appendix B for detailed explanation) for calculating the optimal weights. This conversion is essential as only weighted sum of linear (as opposed to log) individual assets’ returns is equal to the portfolio return.
5.1. Intra-Day Realized Covariance Estimator
During the evaluation period, we calculate the out-of-sample realized portfolio performance based on different risk measures as described in
Section 5.2. In order to compute the realized performance, we use the minute by minute intra-day prices (400 data points per day) collected from NSE for the period starting from 2 January 2017 until 31 December 2017
2. For each of the
n evaluation dates, the intra-day returns data is constructed artificially by fitting a Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) on the available data in order to obtain returns on an equally spaced time grid of
intra-day points between 9:15 and 15:30 IST for all the assets. The intra-day returns is then defined as
where
is a vector of asset prices. It is reasonable to assume that
and that intra-day returns have no autocorrelation for moderately large values of
m (see
Hansen and Lunde 2005). The relationship between the log intra-day returns and the daily returns is given by,
The realized daily covariance
can then be estimated as
where the equality in Equation (
11) is due to the assumption of absence of autocorrelations in the returns time series, Equation (
12) is the outcome of the assumption that the expected value of intra-day returns is nearly equal to zero. We therefore use
as the estimator for realized intra-day covariance. As the NSE stock market is not open 24 h, the intra-day covariance misses out the covariance contribution from the time market closes until it opens on the next working day. We follow the approach of
Martens (
2002) and
Koopman and Hol Uspensky (
2002), where a scaling factor is used to convert intra-day volatility to obtain a measure of volatility for the whole day. The scaling factor for returns of the
i-th stock is computed as
where
is the variance in the close to open log returns for the
th stock, and
is the corresponding open to close variance of log returns measured in the evaluation period. Let
then the measure for daily covariance from the intra-day return is obtained as
where
The realized variance is next used to evaluate the performance of the portfolios based on the measures described below.
5.2. Portfolio Risk Measures
To assess the out-of-sample performance of the different portfolio strategies, and different covariance forecast methods, we use the following risk measures. We use these risk measures to define the loss functions for the superior predictive ability test that is described in
Section 5.3.
- (1)
Portfolio variance: We use the total daily variance of the portfolio as the first measure of performance. The realized variance of the portfolio is given by,
where
is the vector of weights obtained for a particular choice of portfolio allocation technique and covariance forecast
.
is the realised covariance matrix obtained from the intra-day returns data for
as described in
Section 5.1. A higher realized value of portfolio variance is an indicator of bad performance, and therefore we can directly use portfolio variance as a loss-function for the SPA test.
- (2)
Conditional Value-at-Risk (CVaR), also known as the expected shortfall, is a measure of risk, which is defined as (see
Acerbi and Tasche 2002). Let
X be the profit-loss of a portfolio on a specified time horizon and let
be some specified probability level. The expected
shortfall of the portfolio is defined as
where
is the appropriate probability measure and
Note that the corresponding value-at-risk (VaR) is given by
We first compute the out-of-sample realized intra-day returns for the constructed portfolio
and then sort it according to increasing profits
and approximate the number of
elements in the sample by
Then the set of worst case losses corresponding to parameter
would be represented by the least
s outcomes
VaR of the portfolio would be
and the expected shortfall can be estimated as
Again as higher CVaR values are indicators of bad performance, we use the CVaR values directly as loss function for the SPA test.
- (3)
Herfindahl Index (
) of percentage risk contribution: The normalized Herfindahl index is an indicator of concentration risk. It takes the value between 0 and 1, where 0 signifies a perfectly diversified portfolio. It is calculated as:
where
As greater value of the index reflects greater risk concentration and therefore we use the index directly as one of our loss functions for the SPA test.
- (4)
Diversification Ratio (DR): It is computed as defined in Equation (
4). In order to compute the realized DR. we use the portfolio weights computed using the forecasted covariance matrix, and the covariance matrix in the equation is substituted with the realized covariance matrix. It gives the measure of diversification in the portfolio and takes values
. As we know, a higher diversification ratio is a better performance indicator; we use
as our loss function.
- (5)
Sharpe Ratio: The Sharpe ratio, also called reward-to-variability ratio, is a measure of excess return per unit of deviation. It is defined as
where
are the portfolio returns and
is the risk free rate, and
is the standard deviation of excess returns of the portfolio. The portfolio variance is calculated using the intra-day returns as given by Equation (
15). For our SPA test we calculate the weekly realised Sharpe ratio. As increasing Sharpe ratio implies reduced losses, we use
as loss function for the SPA test.
5.3. Test for Superior Predictive Ability
In our study we want to evaluate whether a particular benchmark model is significantly outperformed by other models, while taking into account the large number of models that are being compared. Let
be the models being considered, with
being the chosen benchmark model and
are the models the benchmark is being compare against. Each model leads to a sequence of daily losses,
where the losses are chosen as the realized portfolio variance, CVaR,
, and the negative of Diversification Ratio, as described in
Section 5.2. The relative performance variables are defined as
Let
be a vector of relative performances and if
is defined, our null hypothesis is
that is, the benchmark model is not inferior to any of the alternative models when the objective is to minimize the expectation of the loss function considered.
The SPA test is based on the test statistic,
where
and
is the consistent estimator of
and thus
represents the largest test statistic of relative performance. We want to find if
is too large for it to be plausible that
This is achieved through the SPA test where the distribution of
is estimated under the null hypothesis and the critical value of
is obtained.
Under the assumptions that
is stationary and has well defined moments (see
Gonçalves and de Jong 2003 for the necessary assumptions and
Hansen and Lunde 2005 for the justification of the assumptions), it is known that the distribution of
converges to a multivariate normal distribution with mean
and covariance
This result can be used to determine the distribution of
however, as
n is practically not large enough relative to
l it is not possible to obtain the
covariance matrix
One has to then rely on the stationary bootstrap of
Politis and Romano (
1994) to estimate the distribution of
5.4. Stationary Bootstrap Based Implementation
We obtain B bootstrap re-samples,
,
, using the stationary bootstrap approach of
Politis and Romano (
1994). The bootstrapped re-samples are then used to estimate
and the distribution for
. First we calculate the sample averages,
and next estimate
from the bootstrapped re-samples, as the empirical distribution of
converges to the true asymptotic distribution of
(see
Gonçalves and de Jong 2003). As we seek the distribution of
under the null hypothesis, we must recentre the bootstrap variables about the true value of
. As we do not have a true value for
, we can use the three estimates proposed in
Hansen (
2005), i.e.,
where
is the correction factor
3. Now we redefine our performance variables for each bootstrapped re-sample as
, for
, where
,
and
. Hence we can approximate the distribution of
by the empirical distribution,
and calculate the
p-values as
for
. We reject the null hypothesis for small
p-values. Here, the three
p-values obtained for
corresponds to the consistent value of the true
p-value (
), and lower and upper bound for the true
p-values (
respectively).
6. Results
The aim of the paper is to study two major objectives, the first is for a given portfolio allocation method (for different rebalancing horizons), is there a benchmark covariance forecast method that is not inferior to the other methods considered. The second objective is to determine based on risk objectives, allocation methods that are not inferior to other allocation methods for different rebalancing horizons. For both objectives, we study whether the outcomes are consistent for different rebalancing frequencies of the portfolios.
6.1. Superior Method for Forecasting Covariance Matrix
The covariance forecast models considered for this study are the SMPL, EWMA and DCC-GARCH, details of which are described in
Section 3. The portfolio allocation methods considered and the corresponding loss functions that were used for the SPA test are reported in
Table 2.
In order to compute the out-of-sample loss we perform the following steps:
Forecast the covariance matrix using the three approaches for the appropriate forecast horizon.
Compute the portfolio weights, using the above covariance matrix for the portfolio allocation method being considered.
Compute using the intra day data, realized returns and realized covariance matrix
Use (2) and (3) to compute the time series of
realized losses
using the appropriate loss function for the allocation method (as provided in
Table 2), for each covariance forecast methodology.
For different choices of benchmark covariance forecast models, compute the p-values for the null hypothesis, which is that a chosen model is as good as any other model.
The results from the SPA test for the case of daily rebalancing in the form of
p-values (we only report
-values, as
and
are not significantly different) is reported in
Table 3. The
p-values correspond to the null hypothesis that a chosen model is as good as any other model. A low
p-value (we take a value ≤0.05) rejects the null hypothesis, which implies that the chosen model cannot be considered as a benchmark model and is inferior to the other models being considered.
The results show that for both, machine learning based HRP variants and the traditional risk-based portfolios, DCC GARCH can be considered as the benchmark model in majority of the universes. In few universes other covariance forecast methodologies can result in not inferior performance, especially when HRP (Ward), IVWP and MVP are used for portfolio allocation. However, for MDP, amongst the forecast methods considered, only DCC-GARCH results in superior performance.
We reach almost similar conclusions for weekly and monthly rebalancing, as the results (not reported here) are not significantly different from that for daily rebalancing.
6.2. Benchmark Allocation Methods for Different Portfolio Performance Objectives
From
Section 6.1 it is evident that DCC-GARCH can be used as benchmark model for the traditional risk-based allocation methods as well as machine learning based allocation methods, as it provides performance that is not inferior to any other covariance forecast model in most of the universes considered. We now try to determine if there are benchmark allocation methods whose performance are not inferior to the other models when different risk objectives are considered as loss functions. Unless specified otherwise, we use DCC-GARCH to forecast the covariance matrix for all the allocation methods. The market-cap weighted portfolio is also included in the study as it serves as proxy for passive investment strategies.
6.2.1. Out-of-Sample Portfolio Variance
We first study the out-of-sample daily realized portfolio variance for the different allocation models considered.
Table 4 reports the corresponding
p-values for the null hypothesis, that the performance of the portfolio constructed using the benchmark allocation method is not inferior to the performance from other allocation methods. The results reported in the table are for the case of daily rebalancing of the portfolio. Clearly, MVP, designed with the objective to minimize portfolio variance, does not perform well as a benchmark model. We find that at least one of the variants of HRP has a large
p-value in every universe. The performance of the HRP (SL) can not be considered inferior to other models in any universe, when DCC-GARCH is used for covariance estimation. For this case, when SMPL, an inferior covariance estimation method, is used we find that there is no single allocation method that is not discarded in one of the universes as a benchmark model. IVWP seems most robust amongst the traditional risk-based portfolios and HRP (AL) amongst the machine learning based portfolios, as both of them cannot be considered inferior to other models in three out of five universes.
We next study, whether different portfolio rebalancing frequencies can affect the choice of benchmark model for minimizing the portfolios out-of-sample daily realized variance.
Table A2 in
Appendix C reports the
p-values for the null hypothesis which is that a chosen benchmark allocation method is not inferior in minimizing the portfolio variance, when the portfolio is rebalanced weekly and monthly respectively. Again, DCC-GARCH is used to forecast the covariance matrix for the weekly and monthly time horizons, and the realized intra-day asset returns and covariances are used to measure the realized portfolio variance. With a weekly rebalancing frequency, amongst the risk-based portfolios, ERC and IVWP are not inferior to other allocation methods in three and two out of five universes respectively. With monthly rebalancing they are not inferior in one and two out of five universes respectively. With the machine learning based allocations, we see that with longer forecasting and rebalancing horizons, the fraction of universes in which a variant of HRP did not have an inferior performance goes down.
A summary of the realized annual volatilities of the portfolios constructed using the above allocation methods with daily rebalancing is reported in
Table 5. A few observations that can be made are, firstly the market-cap weighted portfolios have the highest volatilities. Secondly, the volatilities of risk-based and machine learning based portfolios are in similar range, although when DCC GARCH is used, HRP variants have the minimum portfolio variance in all the universes. Finally an inferior covariance estimation model, in this case SMPL, results in higher volatilities. Even for this case, the HRP variants have minimum out-of-sample volatility in each of the universe considered, except Universe 2 where ERC has the lowest volatility.
Finally, we look at the realized annual volatilities of the portfolios when different rebalancing horizons were considered. The results are reported in
Table A3 in
Appendix C. An observation that can be made here is that the portfolio volatility increases with increasing rebalancing horizon. The increment is not linear with a greater relative change in the volatility, moving from daily to weekly, than from weekly to monthly. The exception is the market-cap weighted portfolio whose volatility marginally reduced (in average) while moving from daily to weekly rebalancing. The greatest increment in volatility in expectation over the universes with increasing rebalancing horizon is that for MVP, followed by the variants of HRP.
6.2.2. Out-of-Sample Weekly CVaR
Expected shortfall is a widely used coherent risk measure, especially for computing capital reserves for unforeseen losses. We next look at the out-of-sample realized weekly CVaR, using the intra day returns of the portfolios constructed using different allocation methods.
Table 6 reports the
p-values for the null hypothesis, that a chosen benchmark model, in expectation, has lower realized expected shortfall value than others. From the reported values, it is clear that when MVP and MDP are considered as benchmark models, the null hypothesis is rejected due to low
p-values. IVWP and ERC are the only two risk-based portfolios that have significant
p-values in at least a couple of universes. However, the results show that at least one of the variants of HRP has a large
p-value in each of the universe. The performance of the HRP (SL) is not inferior to other models considered in four out of five universes. HRP (AL) is not inferior in three out of five universes, while HRP (Ward) only came out as not inferior in two out of five universes. With an inferior forecast model for covariance, the only risk-based portfolio that does not have inferior performance is IVWP, which cannot be considered inferior in three out of five universes. With larger covariance misspecification, HRP (SL) does not perform well, however, HRP (Ward) comes out as a benchmark model in four out of five universes.
How does portfolio rebalancing frequency affect the choice of benchmark model for minimizing the portfolios out-of-sample weekly CVaR values?
Table A4 in
Appendix C reports the
p-values for weekly and monthly rebalancing frequencies with DCC-GARCH used to forecast the covariance matrix. Amongst the risk-based portfolios, ERC’s performance is not inferior to other models in four out of five universes, while IVWP’s performance is not inferior in three universes, for weekly rebalancing with DCC GARCH used to forecast the covariance matrix. HRP variants are not inferior in four out of five universes, with HRP (SL) still performing (in terms of fraction of universes it is not inferior) better when compared to HRP (AL) and HRP (Ward). With monthly rebalancing both ERC and IVWP come out as not inferior choice in three out of five universes, while the null hypothesis with a variant of HRP considered as a benchmark model is not rejected in just two out of five universes.
6.2.3. Out-of-Sample Herfindahl Index and Diversification Ratio
While minimizing portfolio variance and expected shortfall are seen as outcomes of better diversification, we next study directly the extent of out-of-sample diversification using the Herfindahl index of the realized percentage risk contribution, and the realized diversification ratio of the portfolio.
Table 7 reports the
p-values corresponding to different choices of the portfolio allocation methods considered as a benchmark models with daily rebalancing. Clearly MDP, designed with the objective to maximize portfolio diversification, does not perform well as a benchmark model, with the null hypothesis being rejected in all the universes. Only ERC and IVWP have large
p-values in three and two out of five universes, respectively. With weekly rebalancing (see
Table A5 in
Appendix C) this becomes four and three out of five universes, respectively while with monthly, it is again three and two out of five universes, respectively. IVWP can be considered as the benchmark model when the objective is to minimize the Herfindahl index computed based on the realized percentage risk contribution of the underlying assets. In two out of five universes, ERC is not inferior to others when the loss function is taken as
The conclusions for the choice of benchmark model with objective to minimize the Herfindhal index remains the same when different rebalancing horizons are considered, as reported in
Table A5.
6.2.4. Out-of-Sample Sharpe Ratios
We have so far tried to identify if there are benchmark models that perform in a way that is
not inferior to other models with respect to purely risk driven objectives. We now bring in realized portfolio returns into our analysis by comparing the performance of different models when the objective is to maximize the Sharpe ratios. We take as loss function for our SPA test the negative value of the realized weekly Sharpe ratios. The weekly Sharpe ratios are computed using the intra-day returns.
Table 8 reports the
p-values for the null hypothesis, which is that a chosen benchmark model is not inferior to others when it comes to maximizing the Sharpe ratios. When DCC GARCH is used for forecasting the covariance matrix, there are candidates from both risk-based portfolios and machine learning-based portfolios whose performance can not be considered inferior to other models in most of the universes. The minimum variance portfolio can be considered as a benchmark model in four out of five universes. Market-cap weighted portfolios also are not inferior to other methods in four out of five universes. Amongst the ML based portfolios, HRP (SL) can be considered as a benchmark model in almost all the universes for this case.
When SMPL is used for forecasting the covariance matrix, the performance of both traditional risk-based portfolios and machine learning-based portfolios is significantly impacted, which is clear from the corresponding p-values. However, the market-capitalization weighted portfolio, which does not require a covariance forecast, can be considered not inferior to other models in all the universes.
The impact of rebalancing frequencies on the choice of the benchmark model, whose weekly out-of-sample Sharpe ratios are not lower than the other models, is reported in
Table 9. An observation that can be made is that with longer rebalancing horizons, the relative performance of machine learning based portfolios becomes inferior. While the HRP (SL) could be considered to be not inferior in almost all the universes when the portfolio was rebalanced daily; with weekly and monthly rebalancing this reduces to just two. The relative performance of the minimum variance portfolio, on the other hand, improves, with a
p-values close to one in most of the universes. Another interesting observation is that in this case, the relative performance of the market-cap weighted portfolio also deteriorates with longer rebalancing horizons. It should be noted that we comment only on the
relative performance of the models for different rebalancing horizons considered. It should
not be inferred, for instance, that HRP gives the highest Sharpe ratio when the portfolio is rebalanced daily, but rather if a choice of weekly rebalancing has been made, MVP would in expectation provide not inferior Sharpe ratio than HRP.
For the sake of completeness, we provide the summary of the realized annual Sharpe ratios for the different portfolio strategies for the year 2017.
Table 10 provides the realized Sharpe ratios when DCC-GARCH and SMPL are respectively used for covariance forecast, while the portfolio is rebalanced daily. We see that the realized Sharpe ratios of MVP and MDP amongst the traditional risk-based portfolios are significantly affected by the covariance misspecification. The results for IVWP and ERC appear more robust in presence of covariance misspecification, a result consistent with the findings of
Ardia et al. (
2017). The machine learning-based portfolios, as expected from previous experiments, perform better with DCC-GARCH. However, an inferior covariance estimator, does not as significantly affect the outcomes as it does for MVP. Market-cap weighted portfolio is outperformed in majority of the universes only by IVWP, when the portfolio is daily rebalanced.
The realized annual Sharpe ratio for different rebalancing horizons, when DCC-GARCH is used for covariance forecast, is reported in
Table 11. The Sharpe ratios improve for most of the allocation methods while moving from daily to monthly rebalancing, except for the market-capitalization weighted portfolio. The most significant improvement in the Sharpe ratios is for MVP, followed by the three variants of HRP. Overall, with longer horizons for rebalancing, MVP performs the best, while the performance of the variants of HRP, IVWP and ERC are similar for the dataset we consider. For our dataset, MDP and MWP perform comparatively poor (in that order), when the objective is to maximize the Sharpe ratio.
If an inferior covariance forecast method is used, the above inference can change significantly for longer rebalancing horizons. This is illustrated in
Table 12 which reports the annual Sharpe ratios of a portfolio rebalanced monthly using the covariance forecasts obtained from SMPL. For the dataset that we consider, IVWP which is more robust to covariance misspecification, performs the best in most of the universes. The other allocation methods that are not significantly affected are the variants of HRP and ERC. For the dataset considered, MVP and MDP appear to be most significantly affected by inferior covariance forecasts.
7. Conclusions
We have compared the out-of-sample performance of portfolios constructed using traditional risk-based allocation methods with those constructed using machine learning methods. We summarize the outcomes of the different experiments performed.
7.1. Choice of Covariance Estimator
As the forecasted covariance matrix plays an important role in risk-based allocation methods, we first determined whether there were covariance forecasting methods that led to a superior out-of-sample performance of the different portfolio allocation strategies. For each portfolio allocation method, we used an appropriate risk objective and measured using an SPA test which of the covariance forecast methodologies resulted in a superior performance on that objective. The risk objective chosen for a portfolio allocation method was the one closest to the objective it was trying to optimize on. For instance, for MVP the chosen performance measure was portfolio variance (lower the better), while for maximum diversification portfolio it was the diversification ratio (higher the better). As portfolio weights in HRP are not calculated by optimizing a particular risk objective, we used the variance of the portfolio as the objective for measuring superior performance for HRP. The following were the key observations from the analysis of the results.
The performance of all the portfolios, with respect to their corresponding objectives, is superior when DCC GARCH is used for forecasting the covariance in majority of the universes.
MDP appears most sensitive to the choice of better covariance forecast methodology.
7.2. Portfolio Variance
When the objective is to minimize portfolio variance, it turns out MVP whose weights are estimated by minimizing the in-sample portfolio variance does not have a superior out-of-sample performance. Other notable outcomes can be summarized as
HRP variants are superior in the majority of universes when the objective is to minimize out-of-sample variance.
With a poor covariance estimator, IVWP and HRP (AL) are superior methods in majority of the universes.
With longer rebalancing horizons, HRP variants have superior performance in majority of the universes with a few exceptions where IVWP and ERC have in expectation lower portfolio variance.
7.3. Expected Shortfall
From our experiments on out-of-sample 5 day CVaR we can make the following observations:
HRP variants are superior when it comes to minimizing out-of-sample CVaR. Amongst them, HRP (SL) performs relatively better in majority of the universes.
With inferior covariance estimates, IVWP and HRP (Ward) result in superior performance in majority of the universes.
With longer rebalancing horizons ERC consistently results in superior performance when it comes to minimizing out-of-sample expected shortfall.
7.4. Herfindhal Index and Diversification Ratio
While minimizing the Herfindhal index or maximizing Diversification ratio are not the end goal of a portfolio manager, they do serve as indicators for a diversified portfolio and lower portfolio concentration risk. While MDP is designed to maximize the in-sample diversification ratio, we find that its out-of-sample performance is inferior to other allocation methods in all the universes. For these two objectives, HRP also comes out as an inferior choice. The main observations from our analysis can be summarized as
ERC followed by IVWP are superior in majority of universes when the objective is to maximize out-of-sample diversification ratio.
IVWP followed by ERC are superior in majority of universes when the objective is to minimize the Herfindhal index of
These results are consistent for different rebalancing horizons.
7.5. Sharpe Ratio
The outcomes of the experiments for superior performance when it comes to maximizing out-of-sample weekly Sharpe ratios can be summarized as follows:
With daily rebalancing, and covariance estimated using DCC GARCH, many allocation methods including MVP, IVWP, MWP, and HRP results in not inferior performance when it comes to maximizing out-of-sample weekly Sharpe ratios.
With an inferior covariance estimate Market Weighted portfolio has superior performance in majority of the universes, followed by IVWP.
With increasing rebalancing horizons MVP clearly has superior performance in majority of the universes.
7.6. Strengths and Weaknesses
We summarize the strength and weakness of the different allocation methods.
MVP: We see that out-of-sample performance of MVP is poor when it comes to minimizing the portfolio variance or expected shortfall. Its performance is good when it comes to maximizing Sharpe ratio, especially when the portfolio is supposed to be rebalanced less frequently. However, its performance with respect to Sharpe ratio is highly sensitive to covariance misspecification.
IVWP: IVWP has superior performance when it comes to maximizing the out-of-sample Herfindhal index. It also has lower out-of-sample portfolio variance and expected short fall amongst the risk-based portfolios, especially when an inferior covariance estimator is used. However, with a superior covariance estimator, it often is not the best choice for most risk objectives.
ERC: ERC has superior performance when it comes to maximizing the out-of-sample diversification ratio. It also appears to be the best choice for minimizing expected shortfall when the portfolio is not rebalanced often. It seems to have inferior performance when it comes to maximizing Sharpe ratio with longer rebalancing horizons.
MDP: For our dataset, MDP had inferior performance for most objectives in the majority of the universes. It seems most sensitive to misspecification in a covariance matrix.
MWP: Market weighted portfolios did not perform well with objectives of portfolio variance and expected shortfall minimization. They, however, showed up as superior models for maximizing weekly Sharpe ratios, especially when an inferior covariance matrix estimator is available. With longer rebalancing horizons and good covariance forecasts models, they result in inferior Sharpe ratios when compared to the other methods considered.
HRP: HRP variants have superior performance when it comes to realized portfolio variance and expected shortfall when DCC GARCH is used to forecast the covariance matrix. They also are not inferior when the objective is to maximize weekly Sharpe ratio, when the portfolio is rebalanced daily and DCC GARCH is used to forecast the covariance matrix. They do seem to be sensitive to the choice of the covariance forecast model. The performance of different variants of HRP seems similar, although it might appear that HRP (SL) has a slight edge for our dataset. It is important to note that the key strength of HRP lies when the portfolio is constructed with many underlying assets, where inversion of the covariance matrix and good estimation of correlations becomes challenging. We have considered portfolios with only ten constituents; however, it would be interesting to see the outcomes of further studies carried out for portfolios with larger numbers of underlying assets.