1. Introduction
Futures contracts have long been used in finance to harness the
wisdom of the crowd and make predictions about the future value of an asset by exploiting people’s aggregate expectations. Prediction markets are recent forms of futures markets, and although they were originally only meant to forecast the outcomes of important political events, nowadays they are used in several different contexts. For instance, alongside public markets that allow betting on political or sporting events, private prediction markets, which are used by companies such as Google, Intel, and General Electric, gather people’s beliefs about business activities, such as sales forecasts or the likelihood of a team meeting certain performance goals [
1,
2]. Although prediction markets are often heralded as effective mechanisms to make highly accurate predictions [
3,
4], it has been shown that they are prone to bias [
5] and manipulation [
6], and that these phenomena can spread to financial markets with dreadful consequences [
6,
7].
To better understand prediction markets and, consequently, account for these adverse events, thereby limiting their impact outside the prediction market itself, there is the need for models that realistically reproduce the underlying processes that drive price and opinion formation. This can be achieved by building adequately complex frameworks that can be validated against real-world data. That is, models that are simple enough to be understood and controlled, but complex enough to allow the emergence of realistic and complex dynamics generated by simple interactions between agents. Despite the vast amount of existing models of prediction markets [
5,
8,
9,
10], there is limited work focussed on modelling prediction market exchange platforms, arguably the most crucial type of prediction market.
To address this gap, in this paper, we propose a model that matches the empirical properties (often referred to as
stylised facts) of historical price and volume time series of political prediction markets. To achieve this, we consider a social network where agents possess an opinion about the probability of a given event occurring, and either buy or sell contracts on a prediction market exchange based on their opinions. To model the opinion dynamics process, we use the Deffuant model [
11]. Opinion dynamics have long been a subject to which numerous physicists have applied statistical physics tools to gain a better understanding of human interactions and the complex phenomena that emerge from them (see, for example [
11,
12,
13,
14,
15] for seminal opinion dynamics models and [
16] for a thorough review of the topic). Research has also been conducted using opinion dynamics models with binary opinions to model agents in the stock market [
17,
18,
19,
20]. Among the many models proposed to describe opinion diffusion in social networks, we follow the Deffuant model, which has three features that make it especially suitable to represent the underlying opinion propagation process that determines prices in prediction markets. First, the Deffuant model considers continuous opinions bounded by arbitrary values. This makes it perfect to describe the diffusion of opinions about the probability of an event to occur, as the opinion can be bounded between 0 and 1 and take any value in this range. Second, since it only has two free parameters, the Deffuant model has the merit of being extremely simple, which allows us to gain deeper insights into what drives price properties in prediction markets. This also guarantees a good degree of realism without having to make assumptions on other parameters in the model, which have to otherwise be fine-tuned, a common (and often necessary) practice for models of financial markets [
21,
22]. Third, similar to other bounded confidence models of opinion diffusion, in the Deffuant model, only people with similar opinions update their beliefs after interacting, which allows for the possibility of not reaching a consensus at equilibrium. This property enables us to analyse the similarity of our results to historical data, depending on the number of opinion clusters that coexist at equilibrium.
This paper makes three important contributions. First, we introduce a model of prediction markets that uses opinion dynamics in social networks as its underlying mechanism. This model has the merit of being particularly simple since it possesses only two free parameters, which are highly robust at the same time, and it is capable of generating price time series that match the stylised facts of historical data for any combination of the two free parameters. Second, we show that our model reproduces historical data best when opinions are heterogeneous but revolve around a single opinion cluster. This suggests that participants in prediction markets tend to have a similar, yet not identical view on the outcome, especially in markets that have a limited duration. Third, our results support the presence of Deffuant-like opinion dynamics. This contribution helps tackle an important challenge that Sobkowitz posed by arguing that opinion dynamics models are often disconnected from the real world and lack empirical validation [
23]. Since his paper, the popularity explosion that social media has experienced has certainly offered a means of compelling validation [
24,
25]. However, the availability and popularity of such vast datasets resulted in most opinion dynamics models being validated only on social media data, potentially introducing a bias. In this paper, we use market data to provide evidence of continuous opinion diffusion models; specifically, the Deffuant model is compatible with the exchange of opinions in prediction markets. To achieve this, we use data from PredictIt, a political prediction market exchange platform, and show that the Deffuant model provides an excellent representation of the underlying opinion diffusion process in social networks when opinions are scalar.
The remainder of the paper is organised as follows. In
Section 2, we define the model of opinion formation and market exchange. In
Section 3, we explain the experimental settings that we used to run agent-based simulations of our model and discuss our findings, showing that our model provides a qualitatively good description of historical price and volume time series even in the worst-case scenario. Finally, we conclude with a short discussion and outline future work in
Section 4.
2. Model
In this section, we describe the price formation model and argue why it is appropriate to represent prediction markets. We start by describing how prediction markets work; we define normalised prices and true probabilities, and then describe the model dynamics in detail.
Prediction markets are time-limited markets in which contracts are traded on the outcomes of a given event, , where i denotes the i-th event, and can take only two values: if i occurs, and 0 otherwise. Markets for which multiple options are available, (e.g., Who will win the presidential elections?) can be seen as markets in which there are N events, where N is the number of candidates running for president. The payoff of a contract on the i-th event is 1 if the corresponding event occurs, and 0 otherwise. Let us denote with the price of the contract on the event . In prediction markets, , where is the turnaround (e.g., spread, bookmaker fees, etc.); thus, for our analysis, we consider normalised prices , which provide a better representation of the corresponding realisation probabilities. Also, if a prediction market is completely efficient, the normalised price of a security reflects exactly the probability of the corresponding outcome to occur, i.e., , where is referred to as true probability.
To model the opinion diffusion process, we follow the Deffuant model [
11]. We start by considering a population of N agents, who belong to an undirected, unweighted social network
. Without loss of generality, we assume that there is only one event with two possible outcomes,
and
. Then, agent j possesses an opinion
, which can take any real value, and corresponds to the subjective probability that the agent attaches to the outcome
. The opinion update process is iterative. At each time step, agents may discuss the event with their neighbors, and update their opinions. To model this process, in every round, agent
i is randomly chosen from the network to discuss with agent
j, which is, in turn, randomly chosen among agent
i’s neighbors. If their opinions are too different, they refuse to update their beliefs. More precisely, the agent pair
interacts only if
, where
is the threshold of this process and can take any real value between 0 and 1. If the agent pair interacts, they update their opinions as follows:
where
is the
convergence parameter, and
.
In this model,
represents the
open-mindedness of agents who will discuss with—or listen to—other agents, only if their opinions are sufficiently close. In this paper, we follow the basic Deffuant model and consider
constant among all agents, but there exist other versions of this model in which agents have heterogeneous open-mindedness [
26,
27]. In general,
affects the number of clusters at equilibrium (i.e., the number of opinions that coexist), while
drives the convergence time [
11].
To reflect the temporal volume dynamics observed by Restocchi et al. [
28], each agent has a probability
to participate in the market, where T is the duration of the market (in days),
represents the number of days elapsed from the beginning of the market (with
), and
is a scaling parameter estimated on the same dataset [
28]. If agent
i is chosen to participate in the market, they can either buy or short-sell a contract, whose payoff is 1 if
and 0 otherwise. For the sake of simplicity, we assume that there exists only one event with two possible outcomes. Since agent
i believes that
, they will buy a contract only if the current price
, and sell (or short-sell) it if
. They will neither buy nor sell if
. Following influential agent-based models of financial markets, we assume that price is purely driven by excess demand. Agents can only buy/sell at the current trading price and contracts are created whenever necessary. The demand of agent
i,
, is proportional to the distance between their opinion and the price, and is described by
That is, the more mispriced the agent believes the contract is, the more they will trade. The excess demand (
) is simply the sum of each agent’s demand, multiplied by a noise term
, as follows:
where, following [
29],
. The reason we model the process with a multiplicative (cf. additive) noise term is that, to ensure that the price does not go beyond the boundaries too often, we need a quenching term. To ensure that this decision does not affect the model’s robustness, we run extensive simulations where the equation for the excess demand is
. We find that, for
, the two models return qualitatively similar results. However, adding such a quenching term would require us to run additional calibration and optimisation, eventually risking overfitting our model. By adding a multiplicative noise, we avoid this issue. Importantly, below, we will show that the empirical properties of the time series obtained by running our simulations heavily depend on
and
, suggesting that the noise term has little impact on the emerging properties of the model. At each trading round, the price is updated depending only on
. Since prices are bounded between 0 and 1, they are set to 0 or 1 if they become less than 0 or greater than 1, respectively, following an update. Therefore,
takes the following values:
where, without loss of generality, we include the equation for the price update for
in the first case (i.e.,
).
3. Results
In this section, we describe the experimental setting that we used to run agent-based simulations of the model, and the results we obtained from such experiments. The simulations were run with all possible combinations of
and
, which were the only two free parameters in our model, within the ranges
and
, which represent the whole possible space for the Deffuant model, with a precision of 0.02 units. Our results show that this model provides a particularly accurate and robust description of prediction markets, because even under the worst-performing conditions, the synthetic time series produced by our simulations capture (at least to some degree) the emerging properties of prediction markets, such as volatility clustering and the absence of the autocorrelation of returns [
30].
We tested the model on three different network topologies: a random network, a scale-free network, and a complete graph. We show the results obtained on a complete graph (i.e., any agent is free to exchange opinions with any other agent), since we find that these replicate historical data more accurately. However, we do not find any significant difference in the qualitative behaviours of the results, suggesting that (i) price formation does not heavily depend on the network structure and that (ii) our model is robust to topological changes. Results on the other two network topologies are shown in
Appendix A.
Although it has been demonstrated that, at least under some circumstances, diffusion processes display finite-size effects [
14,
31], we ran experiments with 1000 agents to reflect the scarce liquidity that prediction markets usually exhibit [
28], with a median of approximately 300 trades per day per market. In fact, if any finite-size effect exists, this must be displayed by real prediction markets; therefore, by keeping the number of participants low, we can capture such an effect in our simulations. We will leave a detailed analysis of the relation between the network size and price formation in prediction markets in future work. The choice of using Barabasi–Albert networks for our model is motivated by the fact that social networks exhibit hubs and that network topology does not have an influence on the equilibria of the Deffuant model [
32].
To specify the other parameters of the simulations, we follow references [
28,
30], where the authors provided a comprehensive quantitative analysis of the empirical properties of prediction markets using the same dataset from PredictIt. Specifically, for each combination of
,
, we ran
simulations, which was the number of markets used by Restocchi et al. [
28,
30] in their analysis, and the duration T of each market was randomly drawn from the empirical distribution of durations observed among these markets. In this way, our simulations will produce time series, which are directly comparable with historical data on prediction markets. We initialised our agents with a uniform opinion distribution with an average of
. This is for three reasons: First, these are the initial conditions that were first used in the paper that introduced the Deffuant model, and are the most widely used in practice. Second, we believe that a uniform distribution with an average of
constitutes the most “uninformative prior”, and as such, it should not introduce a bias in our model. That is, other assumptions, especially those with different average opinions, could change the results. Given the impossibility of calibrating the initial opinion distribution with the existing dataset, we decided to remove what would have been another free parameter of the model. Third, for the sake of simplicity, we assume that the true probability is equal to
and is constant throughout the market. This assumption reflects the fact that, on PredictIt and other prediction market exchange platforms, it is possible to bet on an event and its opposite (i.e., there is the possibility to buy and sell contracts on the event
Will happen? but also on the event
Will not happen?).
To find the optimal values for the pair
,
, we follow [
33] and define the following objective function:
where
and
represent the kurtosis value of the distribution of the returns of all 3385 markets, for simulated and historical data, respectively. For the second term, rather than using the value of ARCH(1), i.e., the first autoregressive term of the time series, as suggested by Gilli and Winker [
33], we use the value of the scaling parameter
that describes the power-law decay of the autocorrelation function of absolute returns. Since the two terms in (
5) can significantly differ in magnitude, to ensure that no component in the objective function outweighs the other, Gilli and Winker suggest that the second term, in our case
, is multiplied by a constant
that rescales its magnitude. For our data set, Restocchi et al. [
30] found that
and
, from which we derive
. We choose to use
, instead of the first-order autocorrelation term
a, because we believe this gives the calibration a better accuracy. Specifically, we also tried to calibrate the model by using two different functions, namely
, where
, and
, and found that
exhibits a qualitatively similar behavior to
f, without adding information. Also, we found that by using
, as suggested by Gilli and Winker, reduces the sensitivity of the objective function to
and
, and generates two regions of local minima in which values are not significantly different. These results are shown by
Figure 1.
The objective function values computed from our simulations for each pair
,
are displayed by the heatmap in
Figure 2. It is clear to see from this figure that there exists one region, approximately within the interval
, where
f is significantly lower than throughout the rest of the space. This region also includes the global minimum, which is found in
and
.
It is also interesting to note that
f displays regular behaviour when varying
and
. To better visualise this, in
Figure 3, we cut the objective function space in slices and show the behaviour of
f separately, depending on
(
), for a few
(
) around its optimal value. By observing these two figures, it is easy to see two regularities. First, from the right-hand side plot, one can see that, in the range
, there is a sharp transition from high values of
f to low ones, and that this behaviour does not depend on
(apart from
). Second, the left-hand side plot shows that
f always has a minimum approximately in the range of
, but, depending on
, this minimum is more or less pronounced, and it almost disappears for
.
These results suggest that both
and
have a significant impact on the objective function, i.e., both parameters contribute to shaping the statistical properties of the time series generated by the model. This is expected, since, within short time horizons, they both contribute toward the emergence of consensus, and the speed at which this happens. For instance, for high values of
and
, the consensus is reached too soon, and the generated time series become less accurate, as suggested by the high value of
f in the region
. However, our results suggest that
has a greater impact on
f than
. Specifically, we observe that there is one region, delimited by
, for which the objective function value becomes particularly high. Interestingly, this is the same value of
below, where consensus on a single opinion is not reached in the Deffuant model, and two or more opinions coexist at equilibrium [
11], suggesting a relation between the accuracy of our model and whether one or more opinion clusters coexist in the underlying opinion dynamics model. These results suggest that the quantitative accuracy of reproducing
and
heavily depends on whether there is a single opinion cluster existing at equilibrium, and on the time it takes for opinions to converge. Further evidence is represented by the results shown in
Figure 4, in which we show the dependence of
k and
(
Figure 4b) on
and
.
The results shown in
Figure 4a suggest that the kurtosis of the distribution of raw returns is affected both by
and
, but loses its dependence on
for low values of
, approximately for
, i.e., in the region where multiple opinions coexist at equilibrium. However, the dependence on
when
suggests that when only one opinion exists at equilibrium, the kurtosis highly depends on the time it takes to reach a consensus. Not surprisingly, the shorter the time to reach consensus, the higher the kurtosis, as once consensus is reached, only a few agents trade, and those who trade have low absolute values of demand, since their opinion is closer to the mean. Similarly,
Figure 4b shows that
seems to depend mostly on
, but exhibits a far sharper phase transition around
than
k, significantly decreasing its value for
. Also,
Figure 4b shows that, similar to
k, in the
region,
depends mainly on
, and its value becomes larger the faster opinions converge.
These results are further evidence that the accuracy of our model depends on whether there is a single opinion cluster or more, except for the region approximately delimited by
, where consensus is reached early in the market and little or no trading exists after a certain point. Finally,
Figure 4 shows that, for any pair
, the time series generated by our simulations exhibit excess kurtosis and volatility clustering, which are the two main features of the prediction market time series we want to replicate. This is important because the optimal values of
and
can significantly change, depending on the dataset used for calibration, but our results suggest that our model is capable to reproduce, at least qualitatively, the empirical properties of prediction markets regardless of the value of its parameters. However, it is important to note that combinations of parameters far from the optimum still show the emergence of the discussed stylised facts, but these are quantitatively far off from those observed in historical data.
Figure 5 shows two comparisons between historical data and simulation results, obtained with both the best and worst configurations, found for the pairs with
and
, respectively. The comparisons are based on the autocorrelation of absolute returns and the probability density function of absolute returns, which are commonly used metrics to describe time series in financial markets [
30,
34]. From this figure, we observe that both the best and worst configurations generate distributions of returns, which are similar to the historical ones, only with slightly heavier tails. Results displayed in
Figure 5 suggest that, whereas time series generated by the best configuration perfectly match the decay slope of the absolute return autocorrelation function, those generated by the worst configuration do not match historical data accurately. Indeed, although in this case the decay of the autocorrelation function exhibits a long tail, its values quickly dropped when the lag considered was greater than 10 days. This is because, for the pair of values
, the underlying opinion dynamics process converges to one opinion cluster far earlier than the end of the market for most market durations. This causes price changes to be very small or zero, in contrast to the beginning of the market, when price changes are larger due to the higher heterogeneity of opinions.