1. Introduction
The pandemic has caused a disruption in the evolution of macroeconomic aggregates. Consequently, the estimation of upcoming events becomes one of the fundamental objectives of economic analysis, especially in periods of high uncertainty, such as the current period. Recent trade disputes and growing investor concerns about the global economic outlook have led the International Monetary Fund (IMF) to downgrade global growth projections for 2020, which have their lowest levels since the 2008 financial crisis [
1]. In this context, agents’ expectations about future economic conditions are a key feature in macroeconomic forecasting.
Expectations are not directly observable. Consequently, agents’ expectations tend to be elicited via surveys. Survey expectations present several advantages over experimental expectations, such as the following: (a) they are based on the knowledge of respondents operating in the market, (b) they provide detailed information about a wide range of economic variables, and (c) they are available ahead of the publication of official quantitative data. These features make them very useful for prediction.
One of the main sources of expectation information are economic tendency surveys (ETS). In ETS, respondents are asked whether they expect variables to rise, fall, or remain unchanged. Some of the most well-known ETS are collected by the University of Michigan, the Federal Reserve Bank of Philadelphia, the Organisation for Economic Co-operation and Development (OECD), and the European Commission (EC). In 1961, the EC launched the Joint Harmonised Programme of Business and Consumer Surveys with the aim of unifying the survey methodologies in the member states of the European Economic Community, now the European Union (EU), allowing comparability between countries.
Survey responses from ETS are commonly used to design composite confidence and sentiment indicators, such as the ifo World Economic Climate Index, the University of Michigan Consumer Confidence Index or the Purchasing Managers’ Index calculated by the Markit Group. The EC constructs business and consumer confidence indicators as the arithmetic mean of a subset of predetermined survey expectations.
The selection of variables for construction of confidence indicators is fundamentally determined by their fit to a reference series. Economic relationships between variables change over time and require periodic overhaul [
2]. Therefore, in this study, we propose a machine-learning method for the generation of economic sentiment indicators that allows both an automated variable selection procedure and an update of the relationships between the selected variables. One can refer to [
3,
4] for a comprehensive assessment of the automated selection procedures.
The proposed approach allows the determination of an optimal combination of expectations that minimizes a set loss function. The obtained expressions differ from the confidence indicators constructed by the EC in the following ways: (a) they are based on information from all the available variables of each survey, (b) they select expectations with the highest forecasting power and their optimal lag structure, (c) they capture the existing non-linear relationships between survey expectations, and (d) they generate direct estimates of economic growth. In [
5], the authors found evidence of the non-linear relationship between expectations and economic growth.
The objective of the paper is threefold. First, we aim to provide practitioners with easy-to-implement business and consumer confidence indicators. To this end, we have used all the variables contained in the industrial and consumer surveys conducted by the EC for 19 EU states and for the euro area (EA). With this information, we generated country-specific confidence indicators that estimate the GDP growth rate expected by firms and consumers. Secondly, because the algorithm selects the expectational variables with the highest predictive capacity, including the number of lags, we evaluate the relative importance of the variables in each survey, as well as their lag structure.
Finally, we assess the forecasting performance of the generated indicators. On the one hand, we compare them to the confidence indicators constructed by the EC in a nowcasting exercise. On the other hand, we design a recursive out-of-sample forecasting experiment in which we iteratively re-compute the indicators to predict economic growth. The obtained forecasts are then compared to univariate time-series models that are used as a benchmark.
The proposed methodology is based on genetic programming (GP), which is a soft computing search technique based on the application of evolutionary algorithms. GP simultaneously evolves the structure and the parameters of expressions, allowing formalization of the interactions between the variables that best fit a reference series. This approach is especially useful in situations where the exact functional form of the solution is not known in advance, such as the present one, where there is no a priori combination of survey expectations that best tracks economic growth.
GP has been successfully used as a tool for automatic problem-solving in areas such as image processing [
6], but very seldom for macroeconomic modelling and forecasting [
7,
8,
9,
10,
11]. In this study, we fill this gap by applying GP to the estimation of free-form regressions that link economic growth with survey expectations in order to generate country-specific machine-learning sentiment indicators. We design an independent experiment for each country and for each type of survey, obtaining data-driven confidence indicators that allow us to independently monitor economic growth dynamics from both the demand and the supply sides of the economy.
The rest of the paper proceeds as follows: the next section describes the methodological approach and the experimental setup. In
Section 3, we present the obtained indicators. In
Section 4, we assess the performance of the evolved confidence indicators in a nowcasting exercise. In
Section 5, we perform an iterative forecasting experiment. Finally,
Section 6 concludes the paper.
2. Methods
GP is a heuristic search technique based on the evolution of programs. This optimization approach represents programs in tree structures that learn and adapt by changing their size, shape, and composition of the models. As opposed to conventional regression analysis, which is based on a certain ex-ante model specification, GP-based symbolic regression (SR) searches for relationships between a given set of variables and evolves the functions until it reaches a solution, which in our case corresponds to the algebraic expression that best fits the data. One can refer to [
12] for a comparative analysis of the metaheuristic optimization algorithms.
GP simultaneously evolves the structure and the parameters of the expressions. This feature provides a quick overview of the most relevant interactions between the variables of the system and can help to identify new unknown links. As a result, due to its suitability for finding patterns in large datasets and handling complex modelling tasks, this empirical modelling approach is beginning to be used in more and more applications in different scientific fields, from lung cancer prediction [
13] and automatic skin cancer image classification [
14], to a wide range of engineering applications [
15,
16,
17,
18,
19,
20]. A large part of the applications is related to complex optimization problems [
21,
22,
23] and predictive tasks in different fields [
24,
25].
Although GP-based SR was first used as a means to assess the non-linear interactions between price level, gross national product, money supply, and the velocity of money [
26], applications of GP in macroeconomics have been very limited since then. Using a similar procedure, in [
27], the authors identified interactions between economic indicators in order to estimate the evolution of prices in the US. In [
28], the authors assessed the performance of different model selection approaches, including GP, in an out-of-sample forecasting exercise to predict the growth rates of quarterly GDP and monthly inflation. Similarly, in [
29], the authors used GP to track GDP growth by combining ten science and technology factors. In [
30], the authors applied GP in a vector error correction model for macroeconomic forecasting. One can refer to [
31] for a review of the application of GP to economic modelling.
Most of the applications of evolutionary computation in economics have been made in finance [
32]. In [
33], the authors used a genetic algorithm to predict the financial failure of firms. In [
34], the authors applied genetic algorithms to optimize the signals generated by technical trading tools. However, most of the works focus on the prediction of exchange rates or stock price trends (e.g., [
35,
36,
37,
38,
39,
40]). One can refer to [
41] for a recent review of the applications of genetic algorithms to forecasting prices of commodities. For a review of the applications of evolutionary algorithms for financial forecasting, one can refer to [
42].
Evolutionary computation is based on the application of the principles of the theory of natural selection to an iterative optimization problem. The implementation of GP starts by the creation of an initial random population of M individuals (functions or programs), from which the algorithm selects the fittest ones (parents). In order to guarantee diversity in the population, we used the size three tournament method as the strategy for the selection of parents for replacement, meaning that the best two out of three individuals randomly selected are finally mated.
Genetic operators (reproduction, crossover and mutation) are applied to the selected parents (N). Reproduction results in the copying of the function; crossover consists of exchanging random parts of selected pairs; and mutation involves substitution of some random part of a function with some other part.
In each successive simulation (generation), a new and fitter offspring is generated. The fitness of each member of the population is evaluated by a loss function. Operations are recursively applied to the new generations until a stopping criterion is reached. The recursion stops when some individual program reaches a predefined fitness level or when the process reaches a given number of generations (Ng). The output of this process consists of the best individual function from all generations.
In our case, we generated a first random population of 70,000 functions, and selected the best 10,000 individuals according to the obtained mean square error. We set a maximum number of 100 generations as the termination criterion.
In this study, we implemented GP to generate composite indicators that capture the optimal combinations of survey variables that best track the actual evolution of economic growth. Formally, the objective of the algorithm is to infer a functional relationship from a set of observations, such that the inferred function
is as near as possible to the reference series in the Euclidean distance sense, where index
denotes the sample size. The search process is characterized by a trade-off between accuracy and simplicity. To limit the complexity of the resulting expressions, the set of functions is restricted to the four elementary mathematical operations (addition, subtraction, product, and division). One can refer to [
43] for a detailed study on the effect of the choice of function sets on the generalization performance of SR models.
With the aim of further restricting the complexity of the resulting functional forms, we additionally introduced regularization terms in the slope and curvature of the inferred functions. For a justification of the need to regularize, one can refer to [
44]. We used the Distributed Evolutionary Algorithms in Python (DEAP), developed by [
45].
Finally, by way of synthesis, we want to highlight some of the advantages and disadvantages of the proposed methodology. As for the advantages, we want to note that, unlike the confidence indicators constructed by the EC, the obtained evolved expressions proposed in the present study generate direct estimates of economic growth, which are based on information from all the available variables of the industry and consumer surveys. Additionally, the proposed approach automatically selects the expectational variables with the highest forecasting power and their optimal lag structure, which is predefined in the design of the experiment, ranging in our case from one to a maximum of four quarters. Finally, the proposed approach detects existing non-linear relationships between survey expectations and allows them to be modelled.
Regarding the limitations of the proposed methodology, we want to note that due to the empirical nature of the proposed approach, the evolved expressions lack any theoretical background. Another limitation of the proposed approach is that, as opposed to standard regression, the significance of the parameters obtained in SR cannot be assessed. Finally, given that GP-based SR is a stochastic, high-variance algorithm, its sensitivity to changes in training data can be a drawback for certain practical applications. In this sense, the implementation of preliminary sensitivity analyses through Monte Carlo simulations and the incorporation of prior knowledge are ways to mitigate this effect.
3. Evolved Indicators
This study matches two sources of information, official quantitative GDP data and firms’ and consumers’ qualitative expectations about a wide array of variables. Regarding the quantitative information, we used seasonally adjusted year-on-year growth rates of GDP provided by Eurostat. With respect to agents’ expectations, we used all monthly and quarterly data from the Joint Harmonised EU Industry and Consumer surveys conducted by the EC (see
Table A1 in the
Appendix A). Monthly survey indicators were aggregated on a quarterly basis and can be freely downloaded at the website of the EC.
The sample period is from 2003.Q1 to 2020.Q1. The last seventeen quarters were used as the out-of-sample period to evaluate forecast accuracy. We focused on 19 European countries, including Austria (AT), Belgium (BE), Bulgaria (BG), the Czech Republic (CZ), Denmark (DK), Finland (FI), France (FR), Germany (DE), Greece (EL), Hungary (HU), Italy (IT), the Netherlands (NL), Poland (PL), Portugal (PT), Romania (RO), Slovenia (SI), Spain (ES), Sweden (SE) and the United Kingdom (UK) and the EA.
In both the industry survey and the consumer survey, respondents are asked about their expectations regarding future developments and their perceptions about past and present changes. In either case, the results are presented as balance series, which are obtained from the percentage of positive replies minus the percentage of negative replies. The EC publishes one composite indicator for each survey, including the industry confidence indicator (ICI) for the industry survey and the consumer confidence indicator (CCI) for the consumer survey. Both indicators are obtained from the arithmetic mean of the balance series of a subset of questions.
In this section, we present the industry and consumer confidence indicators obtained for each country and for the EA after the evolutionary process. We ran two independent experiments for each country. In the first one, we linked GDP growth to the industry survey indicators. In the second one, we linked GDP growth to consumer survey indicators. The outputs of the first set of experiments are country-specific evolved industrial confidence indicators that generate estimations of firms’ expectations of economic growth (Exp.IND), while the outputs of the second set of experiments are evolved consumer confidence indicators for each country that yield estimations of households’ expectations of the evolution of economic activity (Exp.CONS). The obtained industrial and consumer confidence indicators are respectively presented in
Table 1 and
Table 2.
When comparing the resulting indicators of industrial and consumer confidence, we observed that genetic algorithms generated more linear expressions for firms’ expectations. In most countries, the derived expression is a linear combination of several industry survey expectational variables, as opposed to evolved consumer confidence indicators, which are mostly non-linear and, include ratios and more complex interactions between survey indicators.
Regarding the lag structure, most variables tend to appear indistinctly lagged and contemporaneously, sometimes for the same country. In the case of the evolved consumer indicators, the financial and general economic situation over the last 12 months, as well as future unemployment expectations, mostly appear contemporaneously, in the same way as the observed production trend and new orders in recent months appear for the evolved industry indicators. The results of
Table 1 and
Table 2 are summarized in
Figure 1.
In the bar chart, which shows the relative frequency with which each survey variable appears in the evolved expressions, we can observe that variable B5 from the industry survey (‘production expectations for the months ahead’) is the most frequent of the evolved industry confidence indicators. Regarding consumer expectations, variables C13 (‘intention to buy a car’) and C14 (‘intention to purchase a home in the next 12 months’) are the variables most frequently selected by the algorithm, both contemporaneously and lagged. We observe that the distribution of the industry survey variables shows less variance than that of the consumer survey variables, which is more flat-topped. It can be observed that each survey variable of the consumer survey appears at least 3 or 4 times in the evolved consumer confidence indicators; however, in the industry survey, production expectations appear 16 times, while other variables such as the ‘competitive position inside the EU’ do not appear for any country.
These obtained results suggest the predictive potential of production expectations in the industry. In the case of consumers, the intention to buy a car or a house are the variables with the highest informational content to capture economic growth dynamics. In [
46], the authors found evidence of the predictive potential of variable
B5 when evaluating the usefulness of expectations from the industry survey to improve the forecasting performance of time-series models in 26 European countries. It is also noteworthy that in spite of the leading properties of the variables contained in the consumer quarterly surveys, which are the most frequently selected variables by the algorithm, they have always been omitted by the EC in the construction of the official consumer confidence indicators.
4. Results
In this section, we examine the predictive performance of the proposed confidence indicators in tracking economic activity in a nowcasting exercise. We used the last 17 quarters (2016.Q1 to 2020.Q1) as the out-of-sample period, and the root mean square forecasting error (RMSFE) as a measure of forecast accuracy. First, we compared the forecasts obtained with the evolved confidence indicators (Exp.IND and Exp.CONS) to those obtained with the corresponding confidence indicators constructed by the EC, previously re-scaled (Cof.IND and Cof.CONS). Because the output of the evolved indicators is directly expressed as expected annual GDP growth rates, we re-scaled the indicators presented in expressions (1) and (2), by regressing the GDP growth of each country on the components of the indicators during the in-sample period (2003.Q1 to 2015.Q4).
In
Figure 2, we graphically compare the evolution of the two GP-generated indicators to that of the GDP of each country. The last seventeen quarters of the sample are used as the out-of-sample period, in which we use the results of the surveys to estimate the period-to-period economic growth prior to the publication of official data.
The EC publishes one composite indicator for the industry (
ICI) and another one for households (
CCI). Both indicators are obtained from the arithmetic mean of the balance series of a subset of questions.
The in-sample OLS estimates of the weights of each of the components of the confidence indicators published by the EC allow us to compute the scaled confidence indicators that are directly comparable with the evolved confidence indicators. This experiment can be regarded as a nowcasting exercise, given that for each period, the indicators provide an estimation of the current state of the economy before the official figures are released, making exclusive use of the latest survey data published by the EC. For further discussion of nowcasting, one can refer to [
47,
48], and the references cited therein.
To test whether the reduction in accuracy is statistically significant, we computed the Harvey–Leybourne–Newbold (HLN) statistic [
49], which is a modification for small samples of the Diebold–Mariano (DM) statistic [
50]. Under the null hypothesis that there is no significant difference in precision, the statistic follows a Student’s-
t distribution. A negative sign indicates that the second model has larger forecast errors. The results are presented in
Table 3.
In
Table 3, we can observe that in most countries, the lowest forecast errors are obtained using the evolved indicators contained in
Table 1 and
Table 2, although the difference in accuracy is not always statistically significant. For the industry, we obtained significantly lower forecast errors for Bulgaria, Germany, the Netherlands, Poland and Sweden; however, for consumers, we obtained these errors in Belgium, Germany, Hungary, Italy, the Netherlands, Poland, Sweden, and the UK. We also observed notable differences in accuracy between firms and households in countries such as the Czech Republic and Greece.
The EC weights the confidence indicators obtained for each of the surveys to calculate an aggregate index, the economic sentiment indicator (
ESI). In [
51,
52], the authors showed that letting the aggregation weights of each component of the ESI be data-driven improves its forecasting performance. Hence, we next combined the estimations obtained from the evolved industry and consumer confidence indicators by means of constrained optimization. We used a generalized reduced gradient non-linear algorithm to minimize the summation of squared forecast errors and imposed the following two restrictions: (a) the sum of both weights must equal one, and (b) the weights must be equal to or larger than zero. The resulting weights are annexed in
Appendix B (
Table A2).
We applied the computed relative weights to combine firms’ and consumers’ expectations obtained from the evolved confidence indicators (Exp.Agg) and the scaled confidence indicators (Cof.Agg). We additionally computed Av.Cof.Agg as the average between the expectations obtained from the scaled confidence indicators. Results of the forecasting comparison are presented in the last five columns of
Table 3.
Again, we can observe that in most cases, the lowest forecast errors are obtained with the aggregated expectations from the proposed confidence indicators (Exp.Agg), although the difference in accuracy is only statistically significant in seven countries (Belgium, Germany, Hungary, the Netherlands, Poland, Sweden, and the UK). We also found that data-driven weights improved the forecasting performance of the scaled confidence indicators.
This forecasting exercise addresses the question about the information content of business and consumer survey expectations, and whether more sophisticated aggregation schemes based on machine learning could provide composite indicators that can better track economic activity. Our findings are in line with recent research by [
53], who found that the use of optimized news-based sentiment values yielded accuracy gains for forecasting US industrial production. For Switzerland and Germany, [
54] obtained improvements in accuracy of one-step-ahead GDP forecasts by augmenting benchmark autoregressive models with variations in the recession-word index. Similarly, [
55] found that accounting for consumer and business sentiments led to the improved forecast accuracy of consumption in Indonesia.
There is ample evidence that survey expectations are useful for predicting economic variables [
56,
57,
58,
59,
60,
61]. In this sense, the obtained results are consistent with recent research regarding the predictive content of survey expectations. In [
62], the authors showed the usefulness of diffusion indexes from the Markit survey in nowcasting and forecasting GDP in emerging markets by means of machine-learning and dimensionality-reduction techniques. Using qualitative survey responses from the ifo’s World Economic Survey (WES), in [
63], it was found that the respondents provided statistically significant directional forecasts. In [
64], the authors used survey data from South Africa to investigate the accuracy of directional and point forecasts of investment, and found that for shorter horizons, survey forecasts enhanced by time-series data significantly improved point forecasting accuracy.
5. Iterative Forecasting Experiment
To further explore the potential of the proposed approach for short-term economic forecasting, we designed an iterative out-of-sample forecasting experiment in which we re-ran the evolutionary process for each period of the out-of-sample subset using a rolling estimation window. We compared the obtained results with autoregressive moving average (ARIMA) forecasts used as a benchmark. The selected models are displayed in
Table A3 in the
Appendix C.
In order to determine the number of lags that should be included in the model, we have selected the model with the lowest value of the Akaike information criterion (AIC), considering models with a minimum number of 1 lag up to a maximum of 4, including all the intermediate lags. In
Table 4, we present the results of comparing the out-of-sample forecasting performance of the proposed approach to rolling ARIMA forecasts used as a benchmark for two different forecast horizons (
h).
We find that in all countries except Bulgaria, iterative sentiment indicators (Evo.Exp) produce lower RMSFE values than ARIMA models, regardless of the forecast horizon. This gain in forecast accuracy is significant in ten of the countries for one-quarter-ahead predictions (
h = 1), and in nine economies for four-quarter-ahead forecasts (
h = 4). Consequently, the iterative approach allows for the refining of the accuracy of the estimations obtained in the nowcasting exercise (
Table 3). Compared to ARIMA predictions, the relative improvement of the proposed methodology increases along with the predictive horizon. Proof of this is that the RMSFE obtained for one- and four-quarter-ahead predictions is practically identical in most countries. The explanation lies fundamentally in the fact that the generated indicators tend to show stable behaviour over long periods.
These results show the predictive potential of the proposed procedure, and provide evidence regarding the ability of GP to solve optimization problems related to economic modelling and forecasting. In this sense, our study connects with previous research by [
30], who incorporated GP in a vector error correction framework and obtained better forecasts of US imports than with ARIMA models. Using information from the ifo’s WES, in [
65], the authors implemented GP to construct a leading indicator and a coincident indicator, obtaining more accurate forecasts with the latter. Similarly, in [
66], the authors applied GP to develop a set of empirical models to forecast GDP, investment and loan rates in Poland, and found that the proposed approach outperformed artificial neural network models. Focusing on the EA, in [
28], the usefulness of genetic algorithms to forecast quarterly GDP growth and monthly inflation was empirically demonstrated. Previous applications of evolutionary computing in finance have also shown the potential of GP for the prediction of exchange rates [
7,
8,
39], and for stock price forecasting [
35,
36,
37,
38].
6. Conclusions
Economic sentiment indicators are key for monitoring the current state of the economy and providing forward-looking information regarding imminent economic developments. In this paper, we propose a machine-learning method for sentiment indicator construction. The proposed approach allows us to find optimal combinations of a wide range of qualitative survey expectations that minimize the loss function and generate quantitative estimates of economic growth. By means of genetic algorithms, we obtained country-specific industry and consumer confidence indicators that allow for the monitoring of the dynamics of economic activity in nineteen European countries and the EA.
First, when examining the obtained mathematical expressions, we observed that firms’ production expectations for the months ahead and consumers’ assessments about the general economic situation over the previous months are, respectively, the survey variables that most frequently appear in the evolved indicators, both lagged and contemporaneous. We also found that all questions of the consumer survey appeared in the indicators, while in the case of the industry survey, the distribution between variables is less uniform, with the two questions related to production being the most frequent. These results can be very useful when using data from business and consumer surveys for economic analysis.
Second, we assessed the forecasting performance of the proposed indicators. On the one hand, we compared them to the confidence indicators constructed by the EC in a nowcasting exercise and found that the evolved expressions outperformed the scaled confidence indicators in most cases. On the other hand, we designed a recursive out-of-sample forecasting experiment in which we iteratively re-computed the indicators to track economic growth. We found that the proposed approach significantly outperformed univariate time-series models in terms of accuracy.
The obtained results provide evidence regarding the ability of genetic programming to solve optimization problems related to economic modelling, and show the potential of the methodology as a predictive tool. Furthermore, the proposed indicators are easy to implement and help to monitor the evolution of the economy, from both the demand and the supply sides. This set of country-specific indicators can also be used to transform the qualitative expectations of firms and consumers into advanced estimates of national GDP growth, without making any assumptions regarding economic agents’ behaviours.
We want to note that due to the empirical nature of the proposed approach, the evolved expressions lack any theoretical background. In this sense, an issue left for further research is the introduction of restrictions in the design of the experiments with the objective of generating expressions that admit an economic interpretation. Another limitation of the proposed approach is that, as opposed to standard regression, the significance of the parameters obtained in symbolic regression cannot be assessed. Additionally, the evaluation of the stability of the evolved indicators through Monte Carlo simulations also remains to be explored. Other aspects left for further research are the implementation of the analysis using mixed-data sampling, as well as the extension of the analysis to other economic tendency surveys, such as the construction and retail trade surveys of the Joint Harmonised Programme of Business and Consumer Surveys conducted by the EC or the Consumer Survey of the University of Michigan.