1. Introduction
Financial crises, such as the Great Financial Crisis of 2007–2009 and the COVID-19 Crisis of 2020–2021, lead to high volatility in financial markets and highlight the importance of the debate on the Efficient Markets Hypothesis, a corollary of which is that in an efficient market it should not be possible to systematically make excess returns. Because of this, during the past few decades, it has become ever more popular to consider financial markets from other than a neoclassical rational expectations point of view. The latter considers financial markets to be in continuous equilibrium with informationally efficient prices. However, empiricists have questioned the validity of this model, pointing to evidence of inefficiencies and anomalies (for recent reviews of the literature on efficiency and anomalies see
Titan 2015;
Woo et al. 2020;
Jacobs and Müller 2020).
Pesaran (
1989) notes that the idea of a rational expectations equilibrium (REE) involves much more than the familiar concept of the equilibrium of demand and supply. A REE can be characterized by three main features: (1) all markets clear at equilibrium prices, (2) every agent knows the relationship between equilibrium prices and private information of all other agents, and (3) the information contained in equilibrium prices is fully exploited by all agents in making inferences about the private information of others. Thus, in a REE prices perform a dual role—apart from clearing the markets they also reveal to every agent the private information of all other agents. In effect, the concept of the REE requires that everybody knows (in a probabilistic sense) everything about the way the market economy functions. However, as
Von Hayek (
1937) puts it: “The statement that, if people know everything, they are in equilibrium is true simply because that is how we define equilibrium. The assumption of a perfect market in that sense is just another way of saying that equilibrium exists, but does not get us any nearer an explanation of when and how such a state will come about. It is clear that if we want to make the assertion that under certain conditions people will approach that state we must explain by what process they will acquire the necessary knowledge.”
1 The preceding implies that, for the REE to have any operational meaning, it is necessary that the processes by means of which people learn from experience and acquire the common knowledge necessary for the achievement of the REE, are specified fully and explicitly.
2 In this sense, one might expect that as stock-market trading is increasingly dominated by sophisticated professionals, as opposed to individual investors, that markets are becoming more efficient. As pointed out by
Stein (
2009), however, crowding and leverage are two factors which limit this tendency to efficiency and therefore to an associated REE. Furthermore,
Rösch et al. (
2017) observe that the mainstream finance literature does not allow for market efficiency to vary through time. Market efficiency is governed by arbitrage activity and market-making capacity, both of which facilitate price convergence on efficient market benchmarks. In turn, the efficacy of arbitrage and market making is influenced by financial frictions (such as limited capital, transaction costs, short-sale constraints, and idiosyncratic volatility) whose severity varies considerably over time.
The Efficient Markets Hypothesis (EMH) has been traditionally linked to the idea that security prices fully reflect the available information. As
Shleifer (
2000) points out, this implies that when news about the value of a security hits the market, its price should react and incorporate this news both quickly and correctly. The “quickly” part means that those who receive the news late should not be able to profit from this information, while “correctly” means that the price adjustment in response to the news should be accurate on average: i.e., that prices should neither under react nor over react to particular news announcements.
Shiller (
2003) notes that we have to distance ourselves from the presumption that financial markets always work well and that price changes always reflect genuine information. According to Shiller, evidence from behavioral finance helps us to understand, for example, that the worldwide stock market boom in the late 1990s, and then crash after 2000, had its origins in human foibles and arbitrary feedback relations and must have generated a real and substantial misallocation of resources.
Even more fundamentally, the confirmation or negation of the EMH has been highly controversial, due to the existence of the joint-hypothesis problem, formulated by Fama in their seminal 1970 and 1991
Fama (
1970,
1991) overview papers on efficient capital markets, wherein efficiency is determined only within the context of a particular asset-pricing model.
A common corollary of the EMH, that by some has been taken to be its definition, is that in an efficient market it is impossible to make excess, above average returns in a systematic fashion without accepting above average risk (for an overview see, for example
Malkiel 2003), or that excess returns are unpredictable. However,
Pesaran and Timmermann (
1995) note that predictability of excess returns does not imply stock market inefficiency, and can be interpreted only in conjunction with, and in relation to, an intertemporal equilibrium model of the economy. Inevitably, all theoretical attempts at interpretation of excess return predictability will be model-dependent, and hence inconclusive.
Fama (
1991) states that it is only possible to test whether information is properly reflected in prices in the context of a pricing model that defines the meaning of “properly”. As a result, when anomalous evidence is found on the behavior of returns, the way it should be split between market inefficiency or a bad model of market equilibrium is ambiguous. Furthermore, as
Balvers et al. (
1990) have pointed out, it is possible to formulate an equilibrium model that leads to predictable returns.
As
Fama (
1991) argues, it is a disappointing fact that, because of the joint-hypothesis problem, precise inferences about the degree of market efficiency are likely to remain impossible. However, a way to avoid the pitfalls of the joint-hypothesis problem is to take a completely empirical approach; defining inefficiency with respect to some measure that is not dependent on the existence of some underlying model, such as an asset-pricing model. Thus, one evaluates the economic significance of stock market predictability by seeing if the associated information could have been exploited successfully in investment strategies, thus leading to systematic excess returns. Of course, this begs the question of how do we define systematic and excess? Excess relative to what? In the literature, it is common to measure excess relative to some `fixed’ benchmark, such as the risk-free interest rate, or to an index portfolio (the logical extreme of that being the market portfolio). One of the chief drawbacks of such measures is that they permit the possibility that uninformed traders acquire excess profits, even when the market is efficient. For instance, noise traders who, on average, do not change their portfolio holdings, may have excess profits relative to a fixed benchmark simply because, by chance, the price rose. Thus, as emphasized by
Bagehot (
1971), it is important to distinguish between trading gains and market gains.
The contribution of this paper is threefold. First, in order to eliminate the defects of a fixed benchmark, we propose as benchmark in
Section 2 a “moving target”, where excess profit during timestep
t is related to the increase in the market value of an active trading portfolio in timestep
t, relative to the increase in the market value of a buy and hold portfolio in the same timestep. In this way an excess profit or loss for a given trader over the timestep
t can only arise when there has been a net change in the trader’s portfolio holdings in the asset and a net change in the asset’s price. This choice of benchmark always refers the market dynamics to a zero sum game. Second, in
Section 3 we develop a new measure for the relative inefficiency between two trading strategies, or trader groups, evolving from time
t to time
. The relative inefficiency will be defined as the excess returns of trading strategy
i relative to trading strategy
j, divided by the relevant standard error. Third, in
Section 4 we propose an Inefficiency Matrix, which aims at giving a complete description of the relative inefficiencies that exist in an entire market, thus enabling us to determine the degree of market (in)efficiency. In
Section 5, we illustrate the use of the Inefficiency Matrix as a diagnostic by applying it to a pair of model markets—an agent-based market and an experimental market.
Section 6 contains a discussion of the application of the Inefficiency Matrix to commercial exchanges, such as stock markets, while
Section 7 concludes the paper.
2. When Are Returns “Excessive” or “Abnormal”?
Canonically, “excess” returns for securities have been defined with respect to exogenous benchmarks such as a risk free interest rate, while “abnormal” returns have been defined with respect to a benchmark portfolio—the logical extreme being the market portfolio itself. Both have defects. The main defect of defining excess via the risk free interest rate is that it does not distinguish trading returns from market returns, i.e., what was earned by a general increase in the market versus a particular active trading strategy that tries to beat the market. On the other hand, judging a security’s return as being “abnormal” requires that it be benchmarked against an asset pricing model
Schwert (
2003), such as the Capital Asset Pricing Model
Sharpe (
1964), or some other empirical model, such as Fama–French three or four factor models
Fama and French (
1996) in order to determine what is “normal”. The latter type of empirical approach usually takes a firm-specific model for specifying what is considered to be a normal return.
Although in accounting for short horizon abnormal returns, where returns are close to zero anyway, one might expect the “bad model” problem, wherein the security is mispriced, to be relatively unimportant. This is certainly not the case for longer horizon returns
Fama (
1998). This is related to another problem: how often the benchmark should be updated. This is usually tied to the period that one is using for calculating excess or abnormal returns—monthly, yearly etc., especially in the case of abnormal returns defined with respect to an asset pricing model, where the standard periods tend to be short—a month or less.
Further, in the case of a risk free interest rate benchmark, it is possible for traders to make excess profits simply by chance. To understand this one must remember that a market is unique, in that although it is a realization of an underlying stochastic process, one observes only one particular realization of that process. It may be that a realization over an interval
was such that the asset price
, whereupon one would find that many agents were apparently making “excess” profits relative to the risk free benchmark. Just how much profit depends on the “luck of the draw”, i.e., how atypical the particular observed realization was relative to its expected behavior. This problem is exacerbated if one thinks of noise traders with heterogeneous trading horizons, wherein an agent with a long horizon, who randomly makes one single trading decision to buy in a market that goes up by random chance, would be seen to have a potentially large profit relative to the risk free benchmark. Finally, in commercial markets, there is a general increase in asset prices, such as stocks, that is traditionally viewed as being compensation for the extra risk taken on. However, traditional pricing models have not been able to properly account for this increase, leading to the equity premium puzzle (see, for example,
Kocherlakota 1996;
Mehra and Prescott 1985.) The implication of this is that one can earn excess returns relative to the risk free interest rate just by doing “nothing".
So, the question is: How does one arrive at a measure that can completely distinguish between market gains and trading gains and that avoids the pitfalls of the joint hypothesis problem? Essentially, we are led to ask: What are the “excess” returns associated with a particular trade, or sequence of trades in a market, relative to what one would have earned just by doing nothing, i.e., by not trading? We will denote this as the Excess Trading Return (ETR)
Benink et al. (
2004).
3 The magnitude of the returns one would have from the market by maintaining a particular set of portfolio holdings obviously depends on the precise values of those holdings. Hence, in this case, any relevant benchmark will be both specific to a particular agent and potentially dynamic. With these thoughts in mind one may define the ETR associated with a particular trade by evaluating the return on the active trading portfolio that included that trade against the return on the passive Buy and Hold portfolio wherein the trade was not executed.
For an agent
i, using a trading strategy
, we denote the agent’s portfolio value at time
t by
. Hence, the ETR associated with a single trade,
n, executed at time
and made in the time period
, where
, by an active trading portfolio using a trading strategy,
, relative to the portfolio that uses a Buy and Hold strategy over that time interval, is
where
is the increase in portfolio value between
and
t for agent
i using trading strategy
, while
is the same quantity for the Buy and Hold portfolio. To go further one needs to specify the portfolio in terms of the agent’s assets. Thus, we define
where
is the portfolio value associated with asset
k and we are considering a universe of
K assets.
is the holding that an agent
i using a strategy
would have in the asset
k at time
t, and
is the value of one unit of the asset. For example, for stocks, ignoring dividends,
, the price of the
kth asset. Note that, in principle, any asset type may be included in this definition, not only stocks. One obtains then for the ETR
and hence
where
is the change in holdings of asset
k at
. If one denotes the total number of agents in the market by
then one has the trivial constraint
. Now, it is important to remember that (
5) is also a constrained sum as one may only transfer resources between assets. Thus, one must impose the constraint
, i.e., that a trade
4 is just a transfer of resources between assets, so that total portfolio value for a given agent must be the same immediately before and after the trade in the absence of trading costs. Taking the trade under consideration to be a transfer between assets
and
then
. Thus,
where
is the change in holdings of asset
due to the trade, and
is the change in the relative value of the asset from the trade
to
t. Note that this relative asset value change is the difference between the value of asset
at time
t and its value at
multiplied by the relative increase in the value of
. Hence, for
, for example, if the value of asset
does not increase proportionately more than that of asset
, then the ETR is negative, i.e., it is only worth trading if the net increase in portfolio value due to the trade is more than that associated with doing nothing. If the holding of asset
is exchanged for an asset,
, whose value does not change, such as cash, then
and, hence,
, i.e., the pure change in the value of the asset
. If the cash is invested at a continuously compounded interest rate
r, then
.
Thus, we see that in order to generate an ETR for a given trade one requires both a change in relative asset value and a change in asset holding. This is the chief distinction with any measure of excess returns defined via static, global benchmarks, where returns can be made in the absence of changes in portfolio holdings. The key difference here is that the ETR tests directly the profitability of a particular trade, i.e., what was the return on that particular trade versus maintaining one’s portfolio constant. This is a crucial point, as it means that, by construction, as desired, the ETR distinguishes completely between market gains and trading gains by explicitly removing the market return that would have accrued if the trade had not taken place.
It is clear that the net ETR across the market associated with this trade is zero, as for the agent i, changing holdings by , there is a counterparty, j, to the trade, using a strategy , whose holdings change by and , such that and . Thus, the net ETR for the entire market due to the trade is zero. Trading in this sense, unlike the case of the standard benchmarks, is a zero-sum game.
To proceed further, it is necessary to decide how to determine the total ETR for an agent associated with more than a single trade. One could, of course, merely sum
over all the agent
i’s trades,
n, that take place at times
such that
, the total number of trades being taken to be
, which depends on the trading frequency of the agent. Thus, with this definition, for an agent
i, using a trading strategy
, the total ETR is
Although an adequate definition for a single trader, it has the defect that when summed across the market over multiple traders the net ETR is non-zero. This is because it is being applied to agents with potentially very different trading horizons.
Requiring a measure that corresponds to a zero-sum game when taken across multiple traders, we may consider two different measures of ETR. For the first, we consider the ETR associated with looking ahead to the next trade in the market
Benink et al. (
2004,
2010), i.e., the next “tick”.
5 In this case, the ETR is
where
N is the total number of trades in the interval and, now
where we have assumed that the trade is between assets
and
. In distinction to (
7), in (
8) the sum is over all trades in the interval irrespective of whether or not trader
i was a party in the trade or not. This definition is natural for markets where trading takes place at fixed, regular intervals, such as a call market, where the above refers to returns from the
th auction to the
th one. Behind this definition of ETR is the notion of “perfect foresight”, that an agent is judged over what was done at every single trade that takes place. To illustrate this, if we consider a market with one risky asset and cash, with fixed trade size
V per trade, say, then the maximum ETR possible over a time interval,
to
t is given by
An advantage of having excess profits measured at such high frequency is that statistical inference is enhanced due to the greater sampling. For example, one would expect to be able to better judge the utility of an agent’s proprietary strategy if it has been used to make 1000 trades as opposed to 10. Of course, this presumes that relevant agents trade frequently. In principle, this definition of ETR could be implemented using real, high-frequency trading data, where it could be used to evaluate the ability of traders to exploit short-term profit making opportunities. In the case where trading is typically over longer multi-period time horizons, it would not be sensible to use this criteria—for instance, to judge between different fund managers. For example, a manager might increase their portfolio holding in a given stock to 2.6% from 2.1% in January as a result of expecting an appreciation in the price of the stock over a six month interval. It would be unfair to judge this investment over the first tick after the purchase!
Of course, one can readily generalize (
8) to the case where
and
refer not to trades, i.e., ticks but, rather that
refers to a fixed time interval,
. In the case that trading is carried out only at regular intervals this reduces to (
8) above. When trading occurs continuously, then a fixed time interval would be appropriate for evaluating the performance of traders with trading horizons not too dissimilar to
. So, just as it is inappropriate to consider a fund manager’s performance over a short time interval, so it would be inappropriate to consider the performance of a day trader using a time horizon of a year, i.e., to consider the net change in the portfolio weighting over the year weighted by the net change in price over the year.
In the above, changes in portfolio holdings associated with a trade were always referred to a dynamic reference point—the actual holdings of the agent before the trade. Keeping to the idea that ETR should measure the returns for a particular agent of executing a sequence of trades relative to not having executed those trades, we may also define the ETR associated with a trade at
between assets
and
as
Superficially, this expression looks the same as (
6). However, the crucial difference here is that
refers to the change in portfolio holdings relative to the initial time
. The ETR associated with the sequence of trades is then, in analogy with (
8),
To illustrate the difference between the two, consider an agent trading a specific risky asset and cash. The specific sequence of four trades is taken to be the purchase of one share at each of the first two ticks, a held position on the third tick and the sale of one share at the fourth tick. In this case, for
, one has
whereas
is given by
Thus, for one considers the change in portfolio holdings from one tick to the next, whereas for one considers the cumulative change relative to some initial holding. A consequence of this is that tends to be a more volatile measure as a large net position leads to potential large changes in ETR of opposite sign every time there is a price change.
It should be noted that all three measures of ETR have been tested in the context of both agent-based markets
Benink et al. (
2004,
2010) and experimental markets
Stephens et al. (
2006). Although it is possible to imagine situations where the different measures can give quite different results, these tend to be rather pathological. Up to now, in applications, all three measures have led to remarkably coherent but complementary results, giving different perspectives as to how market inefficiencies evolve.
Having defined the ETR associated both with a single trade, a fixed time interval, or a sequence of trades, one may proceed to define an average ETR per trade,
, and an average ETR per unit time,
defined by
Furthermore, one may define a total, or average, ETR associated with any relevant subset of traders, , consisting of agents, simply by summing, , or averaging, , the ETR over those agents to obtain the average ETR per trader, or the average ETR per agent per trade, in the group . In the former case we will refer to it as the ETR of a “representative” agent. As an illustration, one could consider trying to determine if financial institutions are making excess profits at the expense of the rest of the market, or whether the particular investment decisions of a fund of funds style hedge fund have resulted in excess returns, where in this case the sum will be over the funds in the group.
One could also choose a trading strategy, , itself as , whereupon one would sum over all agents using that strategy. At this juncture it is worth discussing what we mean by “strategy”. This can best be done by appealing to a biological analogy—that between phenotype and genotype, the former being the physical manifestation of the organism whose genetic makeup is encoded in the genotype. In this context the phenotype of a trading strategy used during an interval of time , is the actual pattern of trades that lead to changes in portfolio holdings between t and . Both the agent’s portfolio holdings and wealth depend uniquely on this sequence. However, there may be many underlying trading strategies that lead to this sequence. This underlying strategy, which will depend on the agent’s information set, utility function etc. can be thought of as the genotype. In this sense, by , one could imply either the phenotypic or genotypic strategy.
ETR as a Stochastic Variable
As excess profit is a stochastic variable, there is always a non-zero probability that, over a given time interval, an agent makes a profit just by chance. Hence, it is natural to refer the magnitude of any ETR (“signal”) to the degree of variance (“noise”) in the ETR, measured, for example, in units of the standard deviation. One may consider different statistical ensembles associated with different ways of grouping trades together—across time for a particular trader, or across a set of traders at a particular time, or a combination of the two. Which one is the most natural will depend on the question under investigation. For a given agent,
i, one may consider the volatility associated with a particular sequence of trades, or the volatility per trade, or per unit time. For instance, the variance per trade is
while the variance per trader is
Now, one may construct analogs of the Sharpe ratio
Sharpe (
1964) that measure the ETR associated with a particular trader, set of traders or trading strategy. These may be constructed by taking any measure of ETR and dividing by the corresponding volatility measure. For an agent
i, using a strategy
, we may introduce a reward to variability ratio
given by
4. A New Measure of Market Inefficiency
With a definition of the relative inefficiency between agents or agent groups in hand, one may consequently attack the question of inefficiency in the market as a whole. To this end, to consider the entire market, the division into agent groups should be a partition, i.e., it should cover all agents, and any agent should appear in one and only one group—the finest partition being one where every agent is considered separately. Thus, we characterize the inefficiency of a market by making the following definition:
Definition 2. The Inefficiency Matrix, , for a market m evolving from time t to time is the matrix with matrix elements from Equation (22). We take the Inefficiency Matrix to give a complete description of the relative inefficiencies that exist in a market. Note that this definition of inefficiency, with the measures of ETR we have proposed, is totally endogenous, making no reference whatsoever to any external benchmark. Naturally, the Inefficiency Matrix may also be defined by defining its matrix elements with respect to some exogenous benchmark, such as the risk free interest rate. Such a fixed, universal benchmark would clearly cancel from , being the same for both i and j.
Note that is antisymmetric, i.e., as, if agent group i is making profits relative to trader group j, then agent group j is making losses of exactly the same magnitude relative to trader group i. Furthermore, of course, a trader cannot make profits from themselves. A market will be defined as inefficient over the interval if any over the interval, where n is chosen according to the degree of confidence one requires.
At the most fine grained level, for a given definition of ETR, is unique when calculated in terms of individual agents. However, in this case the dimensionality of the matrix is that of the number of agents. Any reduction in the dimension of the matrix via an aggregation or coarse graining to agent groups will not be unique. In this case a market may be observed to be inefficient with respect to one market partition but not with respect to another. This is not a defect of the definition but rather a caution about how to aggregate. For instance, imagine a market where there are two similar informed traders making profits relative to a group of two noise traders. The market may be divided up into two groups, 1 and 2 in several different ways. If group 1 corresponds to the two informed traders, and group 2 to noise traders, then and could be made depending on the relative informational advantage of the informed relative to the uninformed. On the other hand, if both groups consist of one informed and one noise trader then .
An associated single inefficiency measure for the whole market is
where the trace is over all strategies or agent groups and the normalization factor
, where
N is the number of strategies or agent groups in the market. With this single market measure we could in principle also consider the relative inefficiency of one market versus another.
It is important to emphasize that in a real-world commercial market the question of whether a market is efficient or not is really an empirical one, as we do not have a valid underlying theory that can demonstrably prove a market to be efficient or not. Moreover, it is one that can only be answered statistically, given that the evolution of a market is stochastic. In that sense the empirical question boils down to one of: Can one infer that a financial market is efficient from a set of data?
6. Applications to Real-World Commercial Stock Markets
Using two types of financial market—artificial and experimental—we have shown how our inefficiency measures can be used to not only measure the degree of inefficiency in a market, and how it evolves over time, but also to understand how and why it arises. An obvious question then arises: why not just apply it to data from commercial stock markets such as the LSE or Euronext? Our inefficiency measures can clearly be applied to such stock markets. The question is more: what data is required in order to apply them?
The data challenges are the following: firstly, we require trade data that is labelled, i.e., that we can unambiguously label the counterparties in any trade. However, generic market data is anonymous. Some datasets have broker identifiers, but very few have the associated buy-side counterparties identified. The labelling is important in that, at the most fine-grained level, we wish to determine if a particular market participant, or group, is making excess profits or, indeed, excess losses. We thus need an identifier. Moreover, we require that all market participants have such an identifier.
If we have a labelled set of trades then the inefficiency matrix can be constructed on an agent by agent basis. However, the statistical significance of each element depends on how many trades were done by the corresponding agent. As a statistical significance measure, although an agent might have made a large excess profit, if this was a result of only a few trades then this might not be distinguishable from luck
Stephens et al. (
2009).
By grouping agents we may obtain better statistics. However, as emphasized, the degree of inefficiency is dependent on how agents are grouped. As we have shown, a natural grouping is that of choosing those agents that are the most/least successful. However, in order to eliminate pure selection bias as the cause of the inefficiency, we must consider analysing an out-of-sample set of data.
There are many interesting labels by which trades could be grouped—trades by mutual funds, funds of a given type, institutional trades in general etc. In all cases however, we require that every trade is labeled by a buyer and a seller. For instance, one might imagine using publicly available mutual fund data to try to identify inefficiencies. However, although we may note the change in holdings of a fund from one period of time to another, we do not know how the portfolios of the counterparties changed. Additionally, mutual funds do not form a partition, i.e., there are trades done where the counterparty is not another mutual fund.
In summary: our empirical characterization of inefficiency does not suffer from any theoretical barrier, as is manifest with information based approaches, which are limited by the joint-hypothesis problem. As our artificial and experimental market examples show, any difficulty is purely practical, in that the application of our formalism requires a data set with particular characteristics. Moreover, it is not that such data does not currently exist but, rather, that such data is highly confidential and therefore very difficult to obtain. Finally, as our examples show, our inefficiency measures are not just “black boxes” that simply determine if there is a market inefficiency or not. By studying the characteristics of the groups that exhibit inefficiencies we may begin to infer why such inefficiencies exist in the first place and therefore potentially allow for the design of more efficient markets.