1. Introduction
This investigation explores the predictive effectiveness of the mixed-frequency data sampling (MIDAS) model with error correction and a principal component. As gross domestic product (GDP) is an important indicator reflecting the macroeconomic operations of countries, the prosperity or recession of the national economy is closely related to this indicator. The time lag of macroeconomic policy means its formulation must be forward-looking, to which end the analysis and prediction of GDP are particularly essential. GDP is released by the National Bureau of Statistics as quarterly data, while the variables used to predict GDP are mainly monthly data, such as consumption, investment, trade, inflation rate and money supply. Due to the difference in data frequency, mixed-frequency data must be processed before establishing the traditional model. The summing method, average approach and alternative technique must be precisely applied to convert the monthly data into quarterly data [
1,
2] and then build the same-frequency model to predict quarterly GDP. However, these conversion methods may cause a loss of information or a simulated increase, lowering the predictive effectiveness of GDP. Against the background that the current predictive method and accuracy need to be enhanced, the MIDAS model and its expanded forms may be employed to predict quarterly GDP in a more timely and accurate manner, thus supporting real-time and effective macroeconomic policy formulation.
In order to overcome the limitations of the same-frequency models, scholars have proposed mixed-frequency models that can be modelled based on mixed-frequency data without processing the original sequences. Among them, the MIDAS model is widely used, and existing studies have expanded its modelling theory and application field. Ghysels et al. [
3,
4] developed the MIDAS model based on the modelling theory of the autoregressive distributed lag (ARDL) model, which constructs a mixed-frequency model by giving several weights to high-frequency explanatory variables and puts forward the corresponding weight functions (e.g., almon, beta and step weight functions). Although the existing studies have explored the MIDAS model extensively, the research on the MIDAS model with error correction is insufficient, and few scholars have considered it from the perspective of the principal component. Hence, the contributions of this investigation are primarily threefold: To begin with, we consider the strong correlation among consumption, investment and trade sequences selected in this paper, which was ignored in previous studies. In order to eliminate the estimated bias caused by the collinearity problem, we utilise the principal component analysis (PCA) method to extract the principal component of these three variables, and build the MIDAS model to probe whether the mixed-frequency model based on the principal component is effective. Secondly, the extant research hardly considers the adjustment of the long-term relationship by using the short-term one, in which process some data information is lost. Thus, relying on the error correction is another novelty of this study, in which the predictive effectiveness of the principal-component-based mixed-frequency error correction model is verified by constructing the ECM-MIDAS and CoMIDAS models and taking the extracted principal component as the predictor of GDP. Thirdly, through quantitative analysis, we further improve the modelling theory of the mixed-frequency model and provide new ideas for more accurate GDP prediction, which constitute relevant insights into how related authorities might better predict GDP and formulate macroeconomic policies.
The investigation is organised as follows:
Section 2 reviews the extant literature. The materials and methods are introduced in
Section 3.
Section 4 and
Section 5 present the results and discussion. Finally, the conclusions are drawn in
Section 6.
2. Literature Review
As GDP prediction is essential to the development of a country, existing studies have predicted the GDPs of various countries and regions using different methods, falling into two categories. One is to forecast GDP based on its lag periods, such as with the autoregressive moving average (ARIMA) model [
5,
6], grey prediction model and its extended form [
7,
8], BP neural network model [
9], etc. But these methods have certain downsides; therefore, some scholars have included GDP-related variables in the prediction, mainly employing same-frequency data [
10,
11]. In order to avoid information loss, Ghysels et al. [
3,
4] proposed the MIDAS model, which was initially designed to analyse and predict the stock market’s volatility based on mixed-frequency data. Since then, many researchers have used this model to analyse the stock market [
12,
13]. Although the MIDAS model is effective in analysing the volatility of the stock market, since Engle et al. [
14] proposed the generalised autoregressive conditional heteroscedasticity MIDAS (GARCH-MIDAS) model and Colacito et al. [
15] developed the dynamic-condition-associated MIDAS (DCC-MIDAS) model, researchers have been more inclined to use these two methods to analyse the stock market [
16,
17,
18,
19,
20,
21,
22].
Then, Clements and Galvao [
23] produced the MIDAS model with autoregression terms (MIDAS-AR), such as GDP, thereby solving the sequences using auto-correlation. They provided evidence that the addition of autoregressive terms to the MIDAS model makes it more effective at predicting quarterly GDP growth in the U.S. based on monthly indicators. Since its launch, scholars have made more use of the MIDAS model to predict the GDPs of various countries. Through in-sample and out-of-sample empirical analyses, Hogrefe [
24] proved that a model built based on mixed-frequency data could improve the revision of GDP prediction. According to the modelling theory of the MIDAS model, Andreou et al. [
25] offered evidence that the predictive effectiveness for quarterly GDP is improved after the addition of daily financial indicators. Aprigliano et al. [
26] predicted the quarterly GDP growth rate of the eurozone based on monthly and daily indicators and suggested that combining the unconstrained MIDAS (U-MIDAS) model with a smaller mean square error and larger weight produces higher predictive accuracy. Fu et al. [
27] suggested that the MIDAS model has a smaller root mean squared error (RMSE) than the VAR system in short-term forecasting, which provides more stable real-time predictions and short-term forecasts of quarterly GDP growth rates in China. Mishra et al. [
28] found that the values of RMSE were low in their sample and when predicting the out-of-sample one- and four-quarter horizons, while RMSE increased if predicting the ten-quarter horizon. In addition, Chikamatsu et al. [
29], Pan et al. [
30], Chernis et al. [
31], Xu et al. [
32], Jiang et al. [
33], Pettenuzzo et al. [
34], Barsoum and Stankiewicz [
35] and Degiannakis [
36] also ascertained the effectiveness of MIDAS model in forecasting from different perspectives.
However, in the construction process of the above mixed-frequency models, a growth rate sequence or first-order difference series is adopted to avoid the “spurious regression” problem in the smooth modelling process, which means part of the important information is lost. In order to overcome this problem, Miller [
37] contributed the idea of co-integration into the MIDAS model and, accordingly, constructed the co-integration MIDAS (CoMIDAS) model for predicting the variables of real global economic activities. Gotz et al. [
38], meanwhile, added an error correction item to the MIDAS model and constructed the error correction MIDAS (ECM-MIDAS) model for forecasting the monthly inflation rate of the U.S. Their results showed that the ECM-MIDAS and CoMIDAS models had improved predictive effectiveness.
5. Discussion
Previous studies confirmed the MIDAS model’s effectiveness in predicting GDP [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28], and this analysis produced further evidence of that conclusion by considering consumption, investment and trade. In
Table 3, it can seen that no matter whether consumption, investment or trade was an explanatory variable, the RMSE of the mixed-frequency model was smaller than that of the same-frequency one. The main reason is that the MIDAS model could make full use of the information of high-frequency data and avoid the problem of losing data when averaging monthly data into quarterly sequences, which is conducive to improving prediction accuracy. In addition, the same- and mixed-frequency models based on investment had the highest predictive accuracy, followed by trade and consumption. Furthermore, among the five weight functions, the beta class and almon weight functions had better predictive abilities, while the step weight function had poorer predictive effectiveness. Thus, the beta class and almon weight functions can be given priority when constructing a mixed-frequency model. However, the research base on the MIDAS model with error correction and a principal component is insufficient, and this paper only begins to fill the gap.
On the one hand, we probed the predictive effectiveness of the principal-component-based MIDAS model. From
Table 6, we can confirm that the accuracy of mixed-frequency prediction based on the principal component was better than that of the same-frequency prediction, highlighting that the MIDAS model is more effective in forecasting than the the ARDL model. More importantly, the same- and mixed-frequency models based on the principal component had better predictive effectiveness, mainly because the extracted principal component contained information on consumption, investment and trade. Thus, combining the PCA and MIDAS models can enhance predictive accuracy. Additionally, we could further ascertain that the beta weight function had the highest predictive effectiveness.
On the other hand, we explored the predictive effectiveness of the principal-component-based mixed-frequency error correction model. From
Table 8, we can determine that the mixed-frequency model not only had better predictive effectiveness without an error correction term but also held better predictive accuracy than the same-frequency error correction model after adding an error correction term, and the prediction based on the beta weight function still had the best effectiveness. Furthermore, the predictive error was reduced when the error correction term was added to the mixed- and same-frequency models. The primary cause for this is that some vital information of the original data may be lost if the first-order difference is used to avoid “spurious regression” and the error correction term is missing, resulting in unsatisfactory predictive effectiveness. But adding the error correction term can allow for adjustment according to the long-term relationship, making the prediction more effective. Moreover, the predictive accuracy of the ECM-MIDAS model was slightly better than that of the CoMIDAS model in this case, mainly because the fitting effect when constructing error correction terms based on data of the same period was better than that of different periods. However, the actual selection of these two mixed-frequency error correction models should be analysed in detail in the future.
6. Conclusions
In this analysis, we selected consumption, investment and trade in order to construct MIDAS models to predict quarterly GDP in China. Furthermore, we utilised PCA to extract the principal component, and built a principal-component-based mixed-frequency error correction model, following which we probed the predictive effectiveness. Based on the RMSE, which measures the predictive accuracy, the effectiveness of different models in forecasting could be compared, and the following conclusions are drawn:
Firstly, the predictive accuracy of the mixed-frequency model is better than that of the same-frequency model. This conclusion can be observed in the MIDAS models based on consumption, investment, trade and principal component, and the ECM-MIDAS and CoMIDAS models. Thus, making predictions based on the mixed-frequency model is effective. In doing so, relevant policymakers could combine mixed-frequency data to predict quarterly GDP in China, supporting the real-time and accurate formulation of macroeconomic policies.
Secondly, consumption, investment and trade have different forecasting effects on GDP. The same- and mixed-frequency models show that investment has the best predictive effectiveness on GDP, followed by trade and consumption. In addition, the predicted value based on consumption is higher than the real value, while investment and trade are moderate, meaning that consumption plays a greater role in boosting GDP than investment and trade in China. Hence, China should not only implement relevant policies to stimulate investment and foreign trade but also give full play to the potential of promoting consumption to boost GDP growth, which is beneficial to promote economic development.
Thirdly, PCA is effective in mixed-frequency prediction. When applying the PCA technique here, the principal component that could reflect 96.93% of the information on the three variables was extracted to build the MIDAS model. When making a comparison, it can be observed that the predictive accuracy of the principal-component-based MIDAS model was significantly better than the MIDAS model based on consumption, investment and trade. This is because the principal-component-based MIDAS model not only makes full use of multiple variables but also overcomes problems such as inaccurate predictions and excessive parameters caused by multivariable collinearity.
Fourthly, combining the ECM and MIDAS models is effective in forecasting GDP. By constructing same- and mixed- frequency error correction models, it was found that adding the error correction term improved the predictive accuracy. In this case, the predictive effectiveness of the ECM-MIDAS model was better than that of the CoMIDAS system. Although the choice between these two mixed-frequency error correction models still needs to be analysed in detail, we can conclude that constructing the principal-component-based mixed-frequency error correction model is appropriate, and its prediction is effective.
Fifthly, the beta weight function has better predictive effectiveness. The beta weight function generally has the smallest predictive error and significant parameters; thus, in this study, we could directly select this weight function to simplify the analysis process when performing the mixed-frequency prediction.
In the future, research should focus on the following aspects: First, the construction of mixed-frequency error correction models should not be limited to same-frequency co-integration; instead, the theory and method of mixed-frequency co-integration need to be further explored. Second, consumption, investment and trade are not the only data used to predict GDP. Other predictors (e.g., money supply and inflation) should also be taken into account, and we would advise studying which predictor or combination possesses the most powerful predictive effect. Third, according to Ang et al. [
49] and Evgenidis et al. [
50], researchers should also consider the yield curve in order to predict GDP.