1. Introduction
Empirical investigation of the observed patterns in asset returns has been an active area of research in finance, with momentum, or persistence, in asset returns being one of the more popular examples of this line of research. Of these, perhaps the most prominent is cross-sectional momentum (CSM), which refers to the observation that the set of assets that outperform relative to another set over a prior period tend to continue to outperform over a subsequent period. The existence of CSM is usually tested empirically by sorting the assets according to their returns over a prior “ranking” period, and constructing a portfolio over a subsequent “holding” period by taking a long position in the “winners” and a short position in the “losers”. Statistically significant excess returns from following such a strategy would then support the existence of cross-sectional momentum.
Cross sectional momentum strategies are popular with practitioners since they tend to generate positive returns, while they are popular with academics due to the fact their existence would run contrary to an implication of the efficient market hypothesis that there does not exist any discernible patterns in asset returns. There is an extensive academic literature investigating the properties of CSM returns covering various asset classes, markets, and jurisdictions. The most notable findings are that CSM returns are generally slightly positive, but become highly negative during times of market uncertainty, and that losses during such periods tend to cancel out, or at least significantly reduce, the prior gains.
Various authors, including
Fama and French (
1992),
Jegadeesh and Titman (
1993,
2001),
Asness (
1994), and
Israel and Moskowitz (
2013), found that momentum strategies are profitable in US equities markets over different time periods dating back to 1927. Analogous results were found for country equity indices by
Richards (
1997),
Asness et al. (
1997),
Chan et al. (
2000), and
Hameed and Yuanto (
2002), for emerging markets by
Rouwenhorst (
1998), for exchange rate markets in
Okunev and White (
2003) and
Menkhoff et al. (
2012), for commodities by
Erb and Harvey (
2006), for futures contracts in
Moskowitz et al. (
2012), and in industries by
Sefton and Scowcroft (
2004). Similar results were also found by
Asness et al. (
2013) and
Daniel and Moskowitz (
2016) for markets in the European Union, Japan, the United Kingdom, and the United States, and across asset classes including fixed income, commodities, foreign exchange, and equity from 1972 through 2013.
Despite the extensive literature on the empirical properties of momentum based returns, there are relatively few that consider the distributional properties of these returns from a theoretical viewpoint, with
Kwon and Satchell (
2018) being a notable exception that addresses the CSM returns as defined in this paper. Most of the known theoretical results, obtained for example by
Lo and MacKinlay (
1990),
Jegadeesh and Titman (
1993),
Lewellen (
2002), and
Moskowitz et al. (
2012), are concerned only with the expected values and first order autocorrelations of returns from the so-called weighted relative strength strategy in which the portfolio over the holding period is constructed from all underlying assets weighted, essentially, in proportion to their absolute or relative returns over the ranking period. The reason why we wish to calculate the distribution of CSM returns is that we can then calculate percentiles, quantiles, and related quantities. We can deduce the degree to which moments of returns exist and their precise form. Such information can be used, for example, to assess the fatness of the tails of the distribution and this is valuable for risk management calculations as well as understanding the benefits and limitations of portfolio construction.
By assuming that underlying asset returns are Gaussian, the distribution and the moments of the CSM returns were derived in
Kwon and Satchell (
2018). In this paper, we extend their results to the case where the underlying asset returns are Student’s
t to derive the probability density function and the moments of the CSM returns. The
t distribution arises naturally, for example, in a framework where asset volatility is stochastic, and conventional mean-variance analysis will create returns which are very similar to
t-distributed returns. The important distinction between Student’s
t returns and normal returns is that the distribution of Student’s
t has an additional parameter which governs the fatness of the tails of the distribution and can be used to assess tail risk. There is a trade-off between realism and complexity; we would like to use a more complex distribution such as the skewed Student’s
t considered in
Theodossiou (
1998) and
Hansen et al. (
2010), but the analytical complexity that results becomes prohibitive.
Although the individual asset returns do not exhibit skewness under the generalization to Student’s
t, they can be leptokurtic which is a well-established feature in the empirical literature. Moreover, the CSM returns can, and do, exhibit skewness that depends on the statistical properties of the underlying assets. A detailed analysis of the special case of two underlying assets reveals that many of the salient features of the CSM returns reported in the empirical literature are consistent with the theoretical implications from this framework. This analysis is of interest because
Kwon and Satchell (
2018) were able to show that non-normality was a consequence of the momentum structure, even when the underlying returns were normal. We therefore wish to assess what the impact of assuming non-normality in the underlying returns will have on CSM returns. For example, will it exacerbate non-normality or make very little difference? Answers to this question will shed light on applying CSM to universes of assets which are fundamentally non-normal, such as emerging markets.
It should be pointed out that since we work under the assumption that asset returns over the ranking and holding periods are jointly t-distributed, there are limitations in the properties of momentum returns that can be addressed in the theoretical framework of this paper. For example, it is not possible to adequately address properties that depend on certain firm specific, economic, or financial factors such as liquidity, credit spread, market sentiment, business cycle, and information asymmetry since these factors cannot easily be captured in the distributional assumption on asset returns. Theoretical investigation of such properties would require an extension with the ability to incorporate such factors.
Finally, it may be asked what the connection is between our analysis and the extensive linear factor modelling that dominates the asset pricing literature. This literature essentially says that the time t mean of an asset, say the first, is a linear function of factor returns. In the framework of this paper, we can accommodate such modelling by interpreting the asset mean to be conditional on factor returns.
The remainder of this paper is organized as follows:
Section 2 introduces the notation and the key results on multivariate normal distributions, and
Section 3 provides a mathematically precise definition of CSM returns. Although the expressions for the CSM return density and the associated moments are quite complex in general, they simplify considerably in the case of two assets with one winner and one loser, and this special case is examined in detail in
Section 4, along with implications to the empirically observed features reported in the literature, and the paper concludes with
Section 5.
2. Notation and Preliminaries
For the convenience of the reader, we introduce in this section the notation that will be used throughout the paper, and present some known results that will be relied upon in subsequent sections.
2.1. Notation
For any , we will write for the i-th coordinate of , and given write if and only if for all . Similarly, given a matrix , we will write for the -th entry of M, and the transpose of a vector or a matrix will be denoted by the superscript . The vector in with all entries equal to 1 will be denoted , and given a subset , we will denote by the indicator function on A.
Given a random vector, , with values in a region , we will write and for the probability density and the cumulative density functions of , respectively. Moreover, given another random vector , with values in , we will denote by and the conditional probability density and conditional cumulative density functions of given , respectively.
For any
, let
and let
be the set of permutations of
. We will denote the permutation that maps
by a sequence
, and given any
write
for the image of
i under
so that if
, for example, then
,
, and
. Given a permutation
, we will denote by
the permutation matrix corresponding to
and denote by
the matrix
The elements of the permutation group
act naturally on the set of polynomials,
by the rule
for any polynomial
and
. For any
, let
be the polynomial
Denote by
the stabilizer of
under the action of
so that
and let
be the quotient group,
1 with elements of
identified with their coset representatives
. Finally, define
as the
m-fold Cartesian product of
.
2.2. Multivariate Normal Distributions
The density of an n-dimensional normal distribution with mean and covariance at will be denoted , and the corresponding cumulative density function will be denoted . In general, given random variables , their joint probability density function will be denoted , and we will write for the cumulative density function.
Theorem 1. Let and suppose , wherewith and for , and Σ
positive definite. Then, the conditional distribution of given is normal with mean and covariancerespectively, and decomposes as Given an
n-dimensional random vector
and
, we will denote by
the
-th moment of
so that
Note that the subscripts,
, in (
9) may be repeated so that the above definition is equivalent to the more familiar definition of moments in which the powers of the components of
appear inside the expectation on the right-hand side, viz.
with
for
. For example, we have
, and so in the notation of (
9) we have
,
, and
.
Theorem 2. Let , where , and is positive definite. Then, for any , we have In particular, if and m is even, then An alternative expression for
, where
is multivarite normal, is given in
Kan (
2008) Proposition 2.
Corollary 1. Let . Then, for any , the m-th moment of X is given bywhere is the double factorial of . In the special case where and m is even, we have . Proof. Follows from Theorem 2, since the inner sum in (
10), for which
, consists of
identical terms that are all equal to
. □
2.3. Multivariate Student’s t Distribution
Asset returns are often assumed to be normally distributed in the academic literature for theoretical convenience, in which case they are completely determined by the location and the scale parameters. However, it is widely reported in the empirical literature that the observed asset returns exhibit excess kurtosis. Student’s t-distribution is a distribution from the elliptical family with an additional parameter , viz. number of degrees of freedom that controls the kurtosis. The Student’s t distribution reduces to the normal in the limit as , and hence provides a convenient framework under which to investigate the impact of excess kurtosis in the underlying asset returns on the distributional properties of the CSM return. This subsection provides a brief summary of the key properties of the Student’s t distribution that will be required in the remainder of this paper.
The
t distribution has a long history in mathematical statistics. The univariate probability density function (pdf), t (
,
,
), of a
t-variate with mean
, scale parameter
, and degrees of freedom
is given by
From the origins of the t-test in mathematical finance, it is clear that we can write the corresponding random variable as , where z and Y are independent, , , and .
In extending the definition of the t distribution to the multivariate case, we are faced with a choice. Although the choice is clear, Y can be defined in various ways. For example:
each component, , of is normalized by independent with same or differing ,
could be jointly dependent,
single common Y.
All three of these choices have a stochastic volatility interpretation corresponding to
Although we have chosen the final characterization, cross sectional momentum could also be analyzed under other characterizations.
Throughout this paper, the probability density function of n-dimensional Student’s t distribution, , with degrees of freedom, location , and shape matrix at will be denoted , and we will write for the corresponding cumulative density function. The next theorem shows that multivariate Student’s t distribution is closed under conditioning, in the sense that the conditional density of a subset given its complement is again Student’s t. This property will be crucial in the investigation of the CSM returns in later sections.
Theorem 3. Let and suppose , where ,with and for , and Σ positive definite. Then, the conditional distribution of given is Student’s t with degrees of freedom , and location and shape matrix respectively, and decomposes as Although a Student’s t distribution does not have finite moments of all orders, the next theorem provides an explicit expression for those that do exist.
Theorem 4. Let , where , , and is positive definite. Moreover, for any , denote by the set of subsets of the set of size k such that is even, where . Then, for any such that , we havewhere in the final sum. Setting gives the moments of the one-dimensional Student’s t distribution.
Corollary 2. Let . Then, for any such that , the m-th moment of X is given bywhere is the double factorial defined in Corollary 1. Proof. Follows from similar arguments to Theorem 4 noting that the indices are all equal in this case. □
For an alternative derivation of the moments of Student’s
t distribution, refer to
Kirkby et al. (
2019).
2.4. Unified Skew t Family of Distributions
Multivariate skew-normal (SN) distributions were introduced in
Azzalini and Valle (
1985) to generalize normal distributions to those that allow non-zero skewness, and the seemingly disparate distributions related to the multivariate SN distributions were brought together under the umbrella of the so-called unified skew-normal (SUN) family of distributions in
Arellano-Valle and Azzalini (
2006), where it was shown that the SUN family contains many of these skew-normal variants as special cases. The extension of the normal family to those with non-zero skewness was then extended to the elliptical family of distributions in
Arellano-Valle and Genton (
2010). In what follows, we only summarize the results on the extension for the multivariate Student’s
t distributions that will be required in this paper, and refer the reader to
Arellano-Valle and Genton (
2010) and
Jamalizadeh and Balakrishnan (
2012) for the details.
Given
,
,
,
for
, where
and
are positive definite, let
Then, the probability density function,
, of an
-dimensional unified skew
t (SUT) distributed random variable,
, associated with
given by
where
The key characteristic of is that it is a product of an -dimensional Student’s t density and an -dimensional cumulative Student’s t density with the variable appearing as the main variable in the former and in the mean and variance parameters of the latter. As will be seen, the densities of cross sectional momentum returns will be a weighted sum of these SUT distributions.
3. Cross-Sectional Momentum Returns with Student’s Distributed Asset Returns
In this section, we derive the distributional properties of the cross sectional momentum (CSM) returns under the assumption that the underlying asset returns are multivariate Student’s
t. We begin by recalling the mathematically precise definition of the CSM return from
Kwon and Satchell (
2018).
Let
such that
, and for each
denote by
the return on asset
i at time
t. Moreover, let
and for any
, define
Note that any given defines an ordering, , of the components of . Thus, represents the return on a portfolio where the top ranked assets are equally weighted and held long while the bottom assets are equally weighted and held short. The assumption of equal weighting is for notational simplicity only, and not crucial for the general theoretical results. Note also that is defined to allow the ranking of the components of corresponding to to be written succinctly as .
Definition 1. The -cross sectional momentum return
, , is defined bywhere , for any subset , denotes the indicator function on the set A. For intuition behind the definition of
, note that the components of
, representing asset returns over the ranking period, can be arranged in any of the
orderings corresponding to the permutations
. For each such ranking
, the
winner returns over the holding period are
while the
loser returns are
. Equally weighting the returns in the winner and the loser portfolios gives
, and since the ranking of components of
determined by
is equivalent to the condition
, summing over all possible
and prefixing by the matching indicator function gives the expression for
in (
28).
For the remainder of this paper, we make the following assumption on the distribution of .
Assumption 1. The vector of returns, , is multivariate Student’s t distributed so thatwith , , and , where . Since the
t-distribution is symmetric, there are limitations on the properties of asset returns that can be captured adequately by the above assumption as already discussed in
Section 1. Nevertheless, the assumption is sufficiently general to accommodate linear factor models and econometric models such as vector autoregressive moving average models where the factors and noise terms, respectively, are
t-distributed. Moreover, the framework also allows consideration of more general cases where
and
are conditional means linear in factors without requiring the factors themselves to be multivariate
t. However, since the analysis in later sections will show that momentum returns are nonlinear in the underlying asset returns, the common practise of regressing momentum returns on the various factors must be interpreted as a best linear prediction rather than a conditional expectation in such cases. We now derive the probability density function,
, of the CSM return
.
Theorem 5. Suppose that satisfies Assumption 1. Then, the probability density function, , of the cross sectional momentum return, , is given bywhere ,and we define . Alternatively,where Note that the summands that appear in the pdf of the CSM return in (
30) have the characteristic form of the SUT densities given in (
21) other than for the omission of the normalization factor
2 that appears in the denominator of (
21). It follows that pdf of the CSM return is a weighted sum of the SUT densities. The next result gives the pdf in the special case where
and
are independent, which can be considered as the case where the market is efficient.
Corollary 3. If satisfies Assumption 1, and and are independent, thenwhere Proof. Follows from Theorem 5 since in this case. □
We next derive the expressions for the non-central moments of the CSM returns. Since t distributions do not have moments of all orders as noted in Theorem 4, the moments of CSM returns will also only exist up to a certain order.
Theorem 6. Suppose satisfies Assumption 1, and let such that . Then, the m-th non-central moment of is given bywhere and are as defined in (34) and (35), respectively. 4. Special Case of Two Assets
In this section, we examine in detail the special case of two assets, and begin by computing the partial moments of one-dimensional Student’s
t distributions that will be required. To reduce notational burden, we define for
,
, and
so that from (
13) we have
Lemma 1. Let , and . Then, for , we have The next theorem will play a key role in the derivation of the non-central moments of the CSM returns.
Theorem 7. For any , , , and such that , let Then, , , andfor . More explicitly, if m is evenand if m is odd Proof. Firstly, the expression for
follows directly from the definition of
, and, for
, we obtain on setting
in Lemma 1 that
Next, for (
43), using Lemma 1 and applying integration by parts gives
which is (
43). The explicit expressions for the odd and even cases follow by induction. □
As it will be seen, the quantities that play a key role in the two asset case are the spreads,
and
, and so we define
where
and
. Note that
is the variance of the spread
, and
is the correlation between
and
. Next, we compute the terms
and
that appear in the expression (
33) for the pdf of the CSM return. For
and
, we have
and so
If we define the sign of permutations in
by
and
, then the expressions for
and
can be written succinctly as
The next lemma will provide the building blocks for the non-central moments of CSM returns.
Lemma 2. Let and be as defined in (46) and (47) respectively, where . Then, for such that , we havewhere is as defined in (42). Proof. Follows from using the binomial formula to expand the powers of and , and applying the definition of . □
For notational convenience, given any
,
and
such that
, we define
and note that
can be computed explicitly using (
48). We now present the non-central moments of the CSM return
.
Theorem 8. Let , and . Then, for such that , the m-th non-central moment, , of is given as follows:where is as defined in (49). In particular, the first four non-central moments are Proof. Follows from the general expression (
38) for the moments of
and the definition of
. □
We remark that the non-central moments of
given in Theorem 8 are sums indexed by
that consists of two elements, and that each term that appears in these sums can be computed recursively using (
43), (
48), and (
49). Since the right-hand side of (
48) consists of a finite number of terms and (
43) is equivalent to the explicit expressions (
44) or (
45) depending on the index
m, these moments of
can be computed without having to make any simplifying approximations. For example, the first moment is given explicitly by
which is reassuring since it has the same functional form as the following expression
3
obtained for the normally distributed asset return case in
Kwon and Satchell (
2018) except for the distribution functions being Student’s
t rather than normal. The numerical calculations in this paper were performed using code written in C++ that relied on the boost library
4 to compute the functions
and
.
Returning briefly to the linear factor structure discussed in
Section 1, we could consider
and
to be a linear combinations of factors, which in a
Carhart (
1997) model context would consist of size, market, value, and momentum. Thus, if we were to go long asset 1 and short asset 2 in our CSM momentum portfolio, we might expect a larger exposure to the momentum factor for asset 1 and a smaller exposure for asset 2. We could carry out further detailed analysis to accommodate these features but leave this for further research.
If we denote by the mean, the variance, the skewness, and the excess kurtosis of the CSM return, then these quantities are easily computed from the non-central moments given in Theorem 8. It should be noted that the quantities corresponding to the odd moments, viz. and are approximately odd as functions of , and those associated with the even moments, viz. and , are approximately even as functions of . This is because the return from a portfolio formed by taking a long position in the loser and a short position in the winner when would have the same distributional properties as the return from taking the opposite positions when .
In the analysis that follows, we assume that the asset returns are stationary in order to reduce the number of parameters. Moreover, we have set , , and , where the asset variances have been scaled by a factor dependent on to ensure that they are independent of the number of degrees of freedom. It should be noted that the cross-sectional correlation, , then determines the variance, , of the spread, .
The mean of the CSM return is shown in
Figure 1 as a function of the degrees of freedom
, and the spread autocorrelation,
, for
,
, and
. As expected,
is an increasing function of
. For
, the mean decreases slightly with
, while the opposite is the case when
. In the region
, the mean is slightly positive. Since this is the region corresponding to small autocorrelations in the underlying asset returns, and the situation most commonly observed in practice, it is reassuring that the small positive CSM returns implied by the model is consistent with the findings reported in the empirical literature. Moreover, we see that the degrees of freedom parameter,
, that controls the kurtosis of asset returns has very little impact on
. Interpreting small
as representative of assets from emerging markets, this is consistent with findings from
Rouwenhorst (
1999) and
Bekaert et al. (
1997) that, although there is evidence of momentum in emerging markets, it is not significantly different to those observed in developed markets, despite the assets from the respective markets having different distributional properties. The surface flattens out as the cross-sectional correlation increases from
to
. Since an increase in
corresponds to a decrease in the variance,
, of the spread
, this behavior is consistent with the positive relationship usually associated with risk and return in finance.
The standard deviation of the CSM return as a function of
and
is shown in
Figure 2. Although not clearly evident from the figure,
is a decreasing function of
, and for a fixed value of
the standard deviation is convex in
and takes the maximum value at
. Finally, since the variance of the spread decreases as
increases,
likewise decreases with increasing
.
The skewness of the CSM return in
Figure 3 shows that
is negative when
and positive otherwise. In fact, although it is not clearly evident from the figure,
is negative even for small positive values of
. As discussed above, the autocorrelations in the asset returns tend to be small in practice, and hence
will also be small. The corresponding model implied skewness in the CSM return will then be slightly negative, which is consistent with the observations in the empirical literature.
In contrast to the surfaces for other quantities that flatten to a large extent as
increases from
to
, the surface for
in
Figure 4 remains relatively unchanged. The excess kurtosis of CSM returns is generally positive, in line with the findings reported in the literature, and increases significantly when
is small and
is high. It should be noted that
is largest when
is small. Since the deviation of the Student’s
t distribution from the normal is greatest when
is small, it follows that the extension considered in this paper will be useful in situations where the observed kurtosis in the CSM returns is higher than the value implied under the assumption of normal asset returns. This would be the case, for example, when considering emerging markets.