1. Introduction
Decision timing is a key ingredient of decision making in many settings. Whenever the effect of a choice depends on a future state of the world—e.g., betting, financial markets, firm’s strategy—agents face the additional choice of whether taking their decision close to or far from the future event. On the one hand, waiting for a last-minute decision may allow them to improve their information set. On the other hand, if they cannot efficiently process all inputs accruing in proximity to the event, information overload may be detrimental.
We study this tradeoff in the context of sports betting for two reasons. First, in terms of internal validity, as we exploit large data on online bets, we can estimate the effect of the distance from the event on the probability of success without losing statistical accuracy, even if we control for unobservable heterogeneity and for a number of time-varying confounding factors. Second, in terms of external validity, as we focus on a population of non-professional bettors, we isolate behavioral regularities that may extend beyond our context.
To test our hypothesis that decision timing matters, we analyze the winning probability of bets placed in two different seasons of the Italian Major Soccer League (Serie A). The dataset contains more than one million online bets. The 7093 individuals in our dataset are non-experts, who bet small amounts of money on multiple events to increase their potential profits and only win if all of the events happen. Betting on soccer relies on the availability of objective information, such as team rankings and win-loss records, which represent reasonably good predictors of game outcomes. For these reasons, we believe that the distance from the game day is a significant factor among those determining how these non-professional bettors process and make use of the available information. The tradeoff highlighted above is clearly at work. Betting too early might force individuals to dismiss relevant information, such as players’ injuries that happen close to the game. On the other hand, betting late faces individuals with a large amount of information, which increases with the public relevance of the event, comes from multiple sources, and may not be easy to handle.
In our empirical strategy, we control for individual fixed effects, therefore accommodating for (time-invariant) unobservable ability. Indeed, when we refer to “early” vs “late” bettor, we mean the same individual placing different bets at a different time distance from the relevant event. As in some specifications we control for individual-times-team fixed effects, we also accommodate for the fact that individuals might systematically bet earlier or later on specific teams (e.g., their favorite one). To control for learning as individuals place more and more bets, we use flexible control functions. Finally, to control for potential time-varying sources of omitted variable bias, we include the betting odds—which capture the strategic interaction between bettors and the other side of the market—and other attributes of the bet (e.g., financial amount, event’s characteristics).
According to our empirical evidence, for the same bettor, the probability of making a correct forecast is higher when the bet is made on the days before the event: As opposed to bets on game day, the chance of winning increases by 1.3 percentage points (that is, by about 3% with respect to the average). The effect is larger when big teams or multiple bets are involved (about 5% in both cases). The relationship between betting early and winning is monotonic, as the probability of a correct forecast is larger the higher the number of days from the event, up to the maximum effect of 6.7 percentage points (about 15% with respect to the average) 5 days before the event. This evidence supports the hypothesis that information overload may occur; as the event becomes closer, individuals receive more information than they are able to properly digest, therefore increasing the probability of mistakes.
The estimated individual fixed effects show that successful (non-professional) bettors also tend to place their bets in advance. Furthermore, they are more selective, as they place a smaller number of bets in the same week, and tend to focus on events associated with lower betting odds, which are arguably easier to forecast.
The paper is organized as follows.
Section 2 reviews the related literature.
Section 3 describes our empirical strategy. The results are discussed in
Section 4.
Section 5 concludes.
2. Related Literature
Since the 1970s, sports forecasting has been the object of extensive research motivated by two main reasons: (i) To ascertain if betting markets are informationally efficient and enable learning processes, and (ii) to check if experts make more accurate predictions than non-experts. Both strands of the literature aimed at analyzing the conditions under which the availability of comprehensive information and professional advice is fully discounted by market prices (that is, betting odds) and rules out observable biases that could allow speculators to make higher-than-average returns. A large body of empirical evidence supports the view that bettors’ behavior does not conform to the rational decision model and is affected by some cognitive biases (
Diecidue et al. 2004;
Osborne 2001). First, bettors show a clear tendency to under-bet favorites and over-bet long shots (
Golec and Tamarkin 1995;
Paul and Weinbach 2005;
Newall and Cortis 2021). Second, they exhibit decision biases such as confirmation, gambler’s fallacy, and overconfidence related to inaccurate information processing (
Blavatskyy 2009;
Palomino et al. 2009). Third, they adopt a series of heuristics whose suitability is context-dependent (
Conlisk 1993;
Kochman et al. 2015). Finally, they are not effective enough in discounting the effect of noisy and redundant information and in reducing the impact of information overload (
Bleichrodt and Schmidt 2002;
Kaufmann and Weber 2013). If information directly enters the agent’s utility function, it can create an incentive to avoid information, even when it is useful, free, and independent of strategic considerations. For a survey on the theoretical and empirical literature on avoiding information, see
Golman et al. (
2017).
A major strand of research concerns horse-race betting, which is a naturally occurring asset market in which the transmission of information from informed to uninformed traders is not typically smooth. This betting market is efficient if it aggregates less-than-perfect information owned by all the participants and disseminates it to all bettors, through the publicly available information given by track and bookmakers’ odds and handicappers’ picks (e.g., see
Snyder 1978;
Figlewski 1979;
Hausch et al. 1981;
Asch et al. 1984). Baseball, basketball, football, and soccer are sports in which the sources of insider information are less relevant than in racetrack.
Pope and Peel (
1989) analyze the fixed odds offered by bookmakers and the forecasts made by professional tipsters on UK soccer league games; they argue that betting markets are efficient in preventing bettors to gain abnormal returns based on public information, but odds do not fully reflect all the available information. This finding is confirmed by
Forrest and Simmons (
2000), who consider newspaper tipsters offering professional advice on English and Scottish soccer games.
The fact that the condition of being experts is not necessarily associated with a high degree of forecasting accuracy is extensively discussed by
Camerer and Johnson (
1991) for various domains (medical, financial, academic). They conclude that experts’ superiority in processing information is not strictly related to performance superiority, which is crucially affected by the matching of experts’ cognitive abilities with “environmental demands” (
Camerer and Johnson 1991, p. 213). An interpretation of this finding can be traced back to the paper by
Oskamp (
1965), who argues that the extent of collected information cannot be directly related to predictive accuracy. While predictive ability reaches a ceiling once a limited amount of information has been collected, confidence in the ability to make accurate decisions continues to grow proportionally (
Davis et al. 1994;
Kaufmann and Weber 2013). In the context of geopolitical questions,
Atanasov et al. (
2020) find that high-skill forecasters that make frequent, small updates outperform low-skill forecasters, who tend to confirm their initial judgments or make infrequent, large revisions. Therefore, small-increment updating is seen as a signal of early accuracy.
Gigerenzer et al. (
1999),
Benartzi and Thaler (
2001),
Martignon and Hoffrage (
2002),
Rieskamp and Otto (
2006), and
Gigerenzer and Goldstein (
2011) argue that decision making can be better explained by models of heuristics rather than by the standard rational decision model.
Anderson et al. (
2005) use the recognition heuristics to account for non-experts’ performance in soccer betting. According to
Newell and Shanks (
2004), recognition heuristics is assumed to demand little time, information, and cognitive effort, and exploits the relationship between a criterion value (e.g., success in home win) and its predictors (e.g., team rank position).
Heuristics perform quite well in environments affected by noisy and redundant information such as sports forecasting. Noisy information is defined as an information structure in which not only can one signal indicates several states, but also several signals can occur in the same state (
Bichler and Butler 2007;
Crawford and Sobel 1982). In
Dieckmann and Rieskamp (
2007), redundant information is defined as information composed by pieces highly correlated with each other and supporting the same prediction (positive redundancy), or that contradict each other and suggest incompatible predictions (negative redundancy).
By again quoting
Oskamp (
1965), if bettors are provided with a very rich source of information without activating a costly search process, confidence increases in relation to the beliefs that they had before. For example,
Bettman et al. (
1993) provide support for the notion that people also select strategies adaptively in response to information redundancy; they show that participants choosing between gambles search only for a subset of the available information when they encounter a redundant environment with positively correlated attributes. Negatively correlated attributes, in contrast, give rise to search patterns consistent with compensatory strategies that integrate more information. This cognitive bias is known as the illusion of knowledge, according to which beyond a threshold more information on the event increases self-confidence more than accuracy (
Barber and Odean 2002). In recent years, computational intelligence applications to sports have boomed; see
Fister et al. (
2015) for a survey. Computational intelligence involves algorithms for solving real-world problems somewhat intelligently as similar problems are solved by natural systems. Its ability to fastly and efficiently adapt to a changing environment is promising in the field of betting, in contrast with the biases shown by humans.
The condition of “information overload” characterizes media information on Italian soccer, which provides the ground for our empirical analysis. The amount of information to be processed is greatly increased by the variety of communication systems on TV, the internet, and newspapers. Furthermore, much of the information is not original and watchers continuously process information received from other sources but differently presented. The introduction of online betting causes a further increase in the availability of information, which is also diffused by online betting sites. Our dataset, which is described in the next section, includes small bets, generally evenly distributed across individuals. Therefore, it can be safely assumed that the individuals contained in our dataset are “non-expert bettors.”
3. Empirical Strategy and Data
Based on the literature surveyed in the previous section and on the available data, we test the following hypothesis.
Hypothesis 1 (H1). (information overload) As soon as the event approaches and the amount of noisy information available to bettors increases, their winning ability decreases.
At the same time, we control for the following confounding hypothesis.
Hypothesis 2 (H2). (learning) Bettors improve their performance over time, as they get more acquainted with the environment and the relative strength of the soccer teams.
We use a unique (large) dataset of online bets from a provider specialized in the field. The company is located in Southern Italy, but bets are made from all over the country. Users have to register and then bet online through credit card payments. We were provided with bets on all games of 20 game weeks of the Italian Soccer Major League (
Serie A), namely, the last 10 weeks of the 2004–2005 season and the first 10 weeks of the 2005–2006 season. Our dataset includes 1,205,597 single bets made by 7093 registered users. A large study by
Buhagiar et al. (
2018) analyses a total sample of 163,992 soccer odds from 41,003 matches for ten leagues over twelve seasons. Single bets may also be part of multiple bets including more than one event and may concern several events (e.g., which team wins, draw, goals scored, goals scored in the first half, and so forth). Multiple bets increase potential profits and are won only if all of the events happen at the same time. In our analysis, we focus on the simplest events: 1, X, 2, 12, 1X, and X2 (where 1 stands for home win, X for draw, and 2 for away win). These types of events account for 85% of all bets. Using all bets does not affect our results (available upon request).
The fact that bettor
j correctly forecasts event
i at game week
t (
Wijt) is modeled as:
where γ
j are individual fixed effects (capturing all time-invariant characteristics of bettor
j, including her intrinsic level of sophistication and ability);
Xijt is a vector of time-varying attributes linked to bettor
j (such as the amount of money bet at game week
t, or the number of other events linked to event
i in a multiple bet);
Zit is a vector of time-varying attributes of event
i (such as whether the home team or the favorite team won the game, and the day-by-day odds decided by the provider);
g(.) is a flexible function of the distance from the day individual
j places the bet to the day event
i occurs (
Dijt);
f(.) is a flexible function of game week
t; and
εijt is an idiosyncratic error clustered at the event level.
To test H1 (information overload), we consider three specifications of g(.): Linear function of Dijt (“betting distance”); dummy equal to one if the bet is placed before the game day and zero otherwise (“betting early”); non-parametric specification including a set of dummies for each value of Dijt (which varies from zero for bets on game day to a maximum of 5 days). To control for H2 (learning), we use three specifications of f(.): Linear trend; quadratic trend; game week dummies (with t varying from 1 to 20 across the two seasons in our dataset). The inclusion of individual fixed effects accommodates for all time-invariant bettors’ characteristics correlated with both the outcome and the treatment. The inclusion of betting odds in Zit controls for the decision of the other side of the market, that is, the betting company, which might strategically adjust the timing of the odds as the event approaches. The inclusion of the event’s characteristics identified as relevant by the previous literature, such as the victory of the home or favorite team, controls for the fact that bettors might bet earlier on events easier to forecast. We also estimated specifications including an interaction term between the betting distance and the amount of money bet by the user, so as to partly account for overconfidence, but the coefficient was never statistically different from zero and therefore we excluded the interaction term from the baseline estimations. Finally, we also estimated specifications including a set of interactions between the individual fixed effects and the home or away team, so as to account for the fact that bettors might adopt different timing strategies with respect to different teams, such as their favorite one; results are again unchanged and available upon request.
Specifically, among the covariates related to event i, we consider the dummy “main teams,” equal to one if the bet concerns at least one of the four leading teams during our sample period (F.C. Internazionale, Juventus F.C., A.C. Milan, and A.S. Roma); the dummy “strong team wins,” equal to one if the stronger team (measured by the relative ranking position in the league) wins; the dummy “home team wins,” equal to one if the home team wins. Among the time-varying attributes of each bettor j’s decision, we consider the amount spent by the user in each game week (“amount by user”); the number of the other single bets associated with i within a multiple bet (“other events”); and the official evaluation that the betting company gives to each event when the bet i is placed by individual j (betting “odds”). To capture any systematic difference between the two seasons in our dataset, we also include a dummy for the 2005-06 season.
Table 1 reports the descriptive statistics of our variables. In our data, 45% of single bets are successful. This does not mean that bettors have such a high winning rate, because single bets may be part of multiple bets and some of them may be wrong. Indeed, the winning rate in multiple bets is just 5% on average. Most bettors place their play on the game day, while early bettors are about 32%. The average amount spent per bettor in a game week is 211 Euros, again with a large standard deviation. Almost 40% of bets are made on the main four teams.
Table 2 provides information on the above variables and bettors’ socio-economic characteristics by betting distance. We also test whether means are different between bets placed on game day and bets placed before. Thanks to the large sample size, many differences are statistically significant, although most of them are economically small. Early bets tend to be placed on stronger teams, and to be associated with a larger number of multiple bets.
4. Empirical Results and Discussion
Table 3,
Table 4 and
Table 5 report our baseline specifications as in Equation (1). In the first three columns, we do not control for individual fixed effects, whereas this is done in the last three columns. The latter represents our preferred specifications, but it is instructive to compare results with and without fixed effects. As discussed above, to control for possible learning we use three specifications: linear trend in game week (columns 1 and 4); quadratic trend (columns 2 and 5); and a full set of game week dummies (columns 3 and 6). The difference between the three tables concerns how we model betting distance: linearly in
Table 3; with the dummy “betting early” in
Table 4; and with a full set of dummies for each value of the betting distance, which is measured in days, in
Table 5.
Table 3 shows very similar results across all specifications. The coefficient of betting distance is significantly positive and very stable: The farther away from the event date the bet is, the higher the probability of winning. On average and for the same bettor, betting one day earlier increases the chance of winning by about 0.8 percentage points, that is, by about 1.8% with respect to the average probability of a correct forecast. This provides evidence of possible information overload. Moreover, as long as the season goes on, bettors worsen their performance, as highlighted by the significantly negative coefficients for the game week trend in both the linear and quadratic specifications.
Consistently with the previous literature, we find very strong effects for both “home team wins” and “strong team wins” (equal to 40.8% and 60.9%, respectively, with respect to the average outcome). The ability of winning is positively and significantly affected by the monetary amount that each player bets, with a large effect with respect to the average outcome (37.4% for an increase of the amount bet equal to its standard deviation). Both betting for the main teams and on more than one event increase the probability of winning. Columns 2 and 5 include the variable game week squared. We do not report its value since it is extremely small (in the order of four decimals); therefore the linear specification shows a fairly good fit. As we would also expect, higher odds are associated with a lower probability of winning (on average by −46.0% for an increase of the odds equal to the standard deviation).
In
Table 4 the regressor of interest is the dummy “betting early,” equal to one if the bet was placed on one of the 5 days preceding game day. This variable is significantly positive, meaning that the probability of making the correct forecast is higher when the bet is made in advance. On average and for the same bettor, the chance of winning increases by 1.3 percentage points (that is, by 2.9% with respect to the average). All of the other variables confirm their behavior from both a qualitative and a quantitative point of view.
Table 5 includes a full set of dummies for each value of betting distance. The effect of the distance from the event on the probability of winning is monotonic, as it increases to its maximum when individuals bet 5 days in advance (the largest distance from the event day that is allowed in this betting context). At this maximum distance, as opposed to betting on the event day, the probability of a correct forecast is larger by 6.7 percentage points (i.e., by about 15% with respect to the average). Wald tests on the equality of coefficients confirm the statistical significance of this increasing effect as we move away from game day. Again, all of the other variables confirm their behavior.
As further robustness checks aimed at assessing the validity of the mechanism on information overload, in
Appendix A we address heterogeneity issues, that is, we assess whether the effect of betting distance is stronger in specific subsamples. Specifically, in
Table A1, we distinguish between bets on one of the main teams and on all the other teams. In
Table A2, we discriminate between bets done on many events (that is, above the median of events associated with multiple bets) or lesser events.
Table A3 distinguishes between “hard bets” (that is, bets whose amount is above the median value, where we consider the amount of the multiple bets made by the individual) and all the others. We find evidence of heterogeneity in the first two exercises. In particular, the effect of betting early is quantitatively larger for bets linked to main teams and for bets linked to other bets in a multiple play. This confirms our information-overload interpretation of the positive effect of betting early.
Finally, the estimated individual fixed effects allow us to shed light on additional behavioral patterns in our data.
Figure A1 in
Appendix A shows that more successful bettors (that is, those with a larger fixed effect) also tend to bet in advance. This regularity, of course, does not affect the estimates discussed above, as they accommodate for unobservable heterogeneity, but it is an interesting finding per se. More skilled bettors seem to anticipate information overload and place their bets in advance. They are also more selective, as they place a smaller number of bets per game week, focus on bets associated with smaller betting odds, and tend to spend lower amounts.
5. Conclusions
We find that betting timing matters. From the analysis of more than 1,250,000 online bets, we detect a statistically significant and stable difference in the winning probability of early versus late bettors. The estimated effect controls for time-invariant unobservable heterogeneity, learning, betting odds, and observable characteristics of the event. Therefore, when we refer to “late” versus “early” bettors we are comparing the same individual making bets at different distances from each event. The poorer forecasting performance of late bettors is attributed to inefficient processing of information, also consistent with the heterogeneity results that we are able to disclose thanks to the richness of our data. The late bettors’ decision process is affected by various cues that, unknown to the earlier bettors, have scarce relevance for predicting the outcomes. The excess of noisy information (especially harsh if the same individual decides to bet on the main teams or on multiple events) reduces the possibility of using very simple prediction methods, such as team rankings or home team winning. The use of these criteria and cues greatly improves the possibility of placing a winning bet. Some skilled bettors partly anticipate the issue, as individuals with larger fixed effects tend to bet from 3 to 5 days in advance.
We acknowledge two limitations of our results. First, they are based on small stakes and we cannot rule out that when stakes are higher information processing could become more efficient. Second, we cannot rule out the fulfillment of other emotional objectives rather than standard profit maximization. We leave to future research the generalizazion of our results to betting contexts characterized by a larger degree of sophistication. We also leave to future research a more direct test of the information-overload mechanism that we indirectly disclose while estimating the causal effect of betting early on forecasting accuracy.