1. Introduction
The stock market, or equity market, consists of numerous stock exchanges across the globe. The general public and investors sell and purchase shares, whose prices fluctuate constantly by dint of the law of demand and supply. A stock or share represents partial possession of a company or corporation. Buyers attempt to purchase a share at the lowest feasible price, while sellers attempt to sell it at the highest price [
1]. One of the most significant venues for raising capital is the stock market, alongside debt markets, which are more intimidating but not publicly traded. Due to the high liquidity of the stock market, investors can quickly and easily buy and sell securities. A rising stock market and widespread participation in this are the two main indicators of an improving economy.
Stock market fluctuations can have a considerable influence on individuals as well as the whole economy. A dramatic drop in stock prices can be extremely destabilizing for economic activities. For example, the 1929 stock market collapse was the primary cause of the Great Depression in the 1930s [
2]. When stock prices are high, a large number of companies are likely to launch an initial public offering (IPO) in order to enhance their capital by transferring ownership of their businesses. During a bull market, mergers and acquisitions are also influential. Due to the greater investment, economic development is accelerated [
1].
What if investors could predict when the price of a stock would increase or decrease? They would invest all their funds in that company in order to maximize their profits. However, it is feasible to estimate the unknown parameters and achieve a forecast for the future based on historical and current data regarding specific shares. This type of analysis refers to technical analysis or machine learning (ML). ML models have shown effectiveness in a variety of financial processes, including portfolio management [
3] and bankruptcy forecasting [
4].
ML is an AI subfield concerned with developing and testing algorithms with the aid of data. Automation is taking over a lot of industries; using mathematical models, computers make quick decisions about online trade [
5]. This generates markets in which the long-term outlook is replaced by short-term fluctuations and sell-offs. The algorithms that are most often used for predicting and analyzing the stock market and future movements are SVM and ANN. Using tick data, these systems achieve up to 99.9% accuracy. Financial forecasting is characterized by data-intensive, non-stationary, noisy, unstructured, and hidden relationships [
6].
Ref. [
7] utilized neural networks to predict US stock prices and demonstrated that neural networks outperform conventional models such as generalized linear models, main component regressions, and regression trees. Long short-term memory (LSTM) networks were utilized by [
8] in order to accurately predict stock trends that attract investor sentiment and report big profits. Ref. [
9] utilize neural networks to predict bond excess returns and report large economic gains. The neural network model has also been applied to cryptocurrencies in some of the literature; these studies demonstrate that the approach is more accurate at predicting future price changes [
10,
11]. Fathali et al. [
12] used various neural network techniques, including recurrent neural networks (RNNs), LSTM, and convolutional neural networks (CNNs), for anticipating stock market price movements. They discovered that LSTM is the best model after running numerous experiments with different inputs and epochs. Ref. [
13] used random forests to examine how investor confidence affects US monthly aggregate realized stock-market volatility, in addition to a large number of financial and macroeconomic variables. They found that investor confidence, specifically investor confidence uncertainty, predicts overall realized volatility and its “good” and “bad” variants out-of-sample. Ref. [
14] introduced an investor attention index that relies on proxies found in the existing literature. Their findings indicate that this index effectively forecasts the stock market risk premium, demonstrating its predictive accuracy in both the sample and post-sample periods. Notably, the individual proxies exhibit a limited predictive ability when considered independently. Ref. [
15] carried out the study and showed that the Markov-switching multifractal (MSM) is superior to the dynamic conditional correlation-generalized autoregressive conditional heteroscedasticity (DCC-GARCH) model in terms of predictive accuracy. Ref. [
16] predicted three stock market indexes of SAARC countries using the ARIMA model and novel machine-learning techniques including multilayer perceptron and recurrent neural networks. They showed that hybrid models are a viable choice for forecasting financial time-series data. The study carried out by [
17] demonstrated that the integration of ARIMA and ANN models yields a superior predictive performance compared to the individual use of either ARIMA or ANN models. To predict stock market movement, Ref. [
18] evaluated a variety of ML algorithms for the standard time series model, and it was determined that LSTM accurately predicts stock market data. To address the challenge of predicting stock closing prices, Ref. [
19] proposed the Deep Convolutional Generative Adversarial Network (DCGAN) architecture and demonstrated that it outperforms current tools in both single-step and multi-step forecasting, demonstrating that deep learning (and GANs in particular) is a promising tool for financial time series forecasting.
Ref. [
20] compared the forecast performance of volatilities using two different hybrid ANN models and GARCH-type models. The results demonstrate notable leverage effects in the Chinese energy market and that the EGARCH-ANN model outperforms other models in predicting the volatilities of log-returns series.
According to [
21], the goal of this study is to develop a novel parallel hybrid model in order to provide a comprehensive hybrid framework that can accurately simulate all pure and mixed linear and/or nonlinear patterns found in real-world time series. The suggested hybrid model performs better than the individual models of ARIMA, MLPNN, RBFNN, and LSTM, as well as the hybrid models of the ARIMA-MLPNN and MLPNN-ARIMA series, and the hybridization of ARIMA and MLP models in parallel.
Numerous time series forecasting techniques that employ linear and nonlinear models, alone or in combination, have been studied by [
22]. The research indicates that integrating linear and nonlinear models can enhance forecasting accuracy. Nevertheless, in some circumstances, the performance of those current methods may be limited by specific assumptions that they make. We offer a novel hybrid technique that operates within a broader framework: ARIMA-ANN. We demonstrate that combining our hybrid approach with EMD with any of the other approaches that we employed independently can be a useful strategy to increase the forecasting accuracy attained by conventional hybrid methods.
In the fields of economics and finance, there is a pressing need to enhance the precision of forecasts to the utmost degree. In order to effectively implement strong macroeconomic policies, it is important to engage in empirical analyses and strategic planning that relies on projections pertaining to significant macroeconomic indicators. Consequently, a range of univariate and multivariate methodologies have been devised to effectively manage data noise and enhance the precision of forecasting. However, it is important to acknowledge that real-world phenomena do not strictly adhere to either linear or nonlinear patterns. Consequently, both linear and nonlinear models frequently fall short of accurately representing the underlying trend within the data. This study integrates linear and nonlinear models to develop a hybrid model, specifically ARIMA-ANN, which effectively incorporates both linear and nonlinear components of a series. Consequently, this hybrid model enhances predictive accuracy in comparison to the use of individual linear (ARIMA) or nonlinear (ANN) models alone.
Our research aims to bridge a significant gap in the existing literature by investigating the use of stock market indices within the context of G7 countries. These nations, including the United States, Canada, Japan, Germany, France, the United Kingdom, and Italy, collectively represent some of the world’s largest and most influential economies. Despite their critical role in the global financial landscape, there has been a notable scarcity of studies that explore the application of stock market indices in hybrid models within this specific group of countries.
The central objective of our research is to enhance prediction accuracy by integrating both linear and non-linear modeling approaches, specifically by combining the linear (ARIMA) model with a nonlinear (ANN). Thus, our study focuses on analyzing the historical closing prices of key stock indices, namely the Nasdaq stock exchange in the United States, the Nikkei stock exchange in Japan, and the CAC 40 index in France. These indices represent a sample from the G7 countries, and our aim is to evaluate and compare the predictive capabilities of standalone linear and non-linear models against a hybrid model, known as ARIMA-ANN.
In the specific context of G7 countries, numerous prior research endeavours have employed various forecasting techniques, such as AR, ARIMA, ANN, and VAR, among others. However, a notable gap exists in the utilisation of hybrid models for this purpose. As previously discussed, hybrid models are deemed more appropriate for forecasting due to their ability to capture both linear and nonlinear trends in the data. This characteristic ultimately leads to more precise and accurate forecasts. The primary objective of our research is to investigate the efficacy of the hybrid ARIMA-ANN model in comparison to the individual ARIMA and ANN models. This analysis is conducted using a dataset comprising stock market indices.
The remaining sections of the paper are organized as follows.
Section 2 discusses the data and the procedures.
Section 3 presents the research’s empirical findings. The paper arrives at a conclusion in
Section 4.
3. Empirical Results
This section provides a thorough analysis and graphical representation of the three stock markets.
3.1. Nasdaq USA Stock Market
In
Figure 2a, the original series is shown to increase over time, which shows that the underlying series is non-stationary. More specifically, the statistical characteristics exhibit temporal variability. To achieve smoothness and eliminate fluctuations from the data, we initially transform the series by taking the natural logarithm and then perform the first difference to achieve stationarity.
Figure 2b portrays the graph of the transformed time series, which manifests that the series is difference stationary. In
Figure 3a, the ACF plot is steadily declining. This is another indication of a unit root. As
Figure 3c shows, as we performed the transformation, the ACF plot is very quickly declines, which suggests a differenced stationary series. Thus, we can proceed with the stationary series. Certain patterns in the ACF and PACF plots correspond to specific orders of q and p.
There are a few ways in which we can observe the residuals’ randomness in the estimated model. We adopt a graphical approach, as well as a statistical approach, in
Figure 4. The residuals’ ACF reveals no serious autocorrelations. The last plot on the bottom provides
p-values for the Ljung–Box statistic for each lag up to 10. These tests consider the accumulated residual autocorrelation from lag 1. The dashed blue line indicates a 5 percent level of significance, and it can be observed that all
p-values (denoted by circles) are above this. Thus, we can conclude that residuals are purely random. Hence, this model is suitable for prediction.
Post ARIMA modeling, we utilize another approach for forecasting, known as ANN. ANN is considered the most well-known machine learning technique for forecasting. Therefore, this study adopts this technique to capture the complex behavior of the Nasdaq US stock market and resultantly achieve a better forecast. The process of configuring the ANN is comprehensively elucidated in
Section 2.2. In the ANN model fitting, we employ an iterative approach, utilizing a trial-and-error method to determine the optimal number of hidden layers. To elucidate this, we commence with a single hidden layer and individually increment the layer count until we achieve the most precise outcome. During this progression, it was observed that the minimum test error was attained when employing three hidden layers and five input layers.
The same methodological approach was replicated in the construction of the hybrid model. Here, the task was to identify the ideal configuration of the hybrid model. The iterative process led to the selection of two hidden layers and four input layers as the configuration that yielded the most favorable results.
Figure 5 shows our comparison of different time series and machine learning models. This shows how well the predictions worked visually, with the height of each bar showing the extent to which the predicted values differed from the actual values. A lower bar height is indicative of a smaller margin of error, reflecting a higher level of accuracy in the prediction.
Upon a detailed examination of
Figure 5, several key observations and insights come to the fore. First and foremost, it is evident that the ANN model exhibits a commendable ability to capture the directional movements of the Nasdaq US stock market. This implies that, when using the ANN model in isolation, it can offer a relatively accurate forecast. This is a testament to the power of neural networks to uncover complex patterns and relationships within financial time series data.
However, the most intriguing findings emerge when we turn our attention to the hybrid model, specifically the ARIMA-ANN combination. When compared to both the standalone ARIMA and ANN models in this situation, it is clear that the forecast errors produced by the ARIMA-ANN hybrid model are significantly lower. This reduction in forecast errors signifies a higher level of predictive accuracy when utilizing the hybrid approach.
The observed improvement in forecast accuracy achieved with the ARIMA-ANN hybrid model can be attributed to its unique ability to combine the strengths of two distinct forecasting methodologies. The ARIMA component excels in modeling linear trends and capturing seasonality, while the ANN component is adept at handling complex, nonlinear relationships in the data. By integrating these two approaches, the hybrid model leverages their complementary strengths, resulting in a more precise forecast.
3.2. Nikkei Japan Stock Market
Figure 6a demonstrates a clear upward trend in the series at a certain level, indicating that the underlying series is non-stationary. In order to achieve flatness and remove fluctuations from an underlying series, researchers commonly employ a logarithm transformation, followed by the application of the first difference to establish stationarity.
Figure 6b displays a plot of the converted series, which exhibits a difference stationarity.
Figure 7a demonstrates a consistent decrease in the autocorrelation function (ACF) plot, which serves as additional evidence of the presence of a unit root.
Figure 7c exhibits a distinct decline in the autocorrelation function (ACF) plot after undergoing transformation, indicating the achievement of stationarity. The arrangement of q and p in a certain sequence correlates with a distinct pattern observed in the plots of the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), respectively.
In
Figure 8, the ACF or autocorrelation coefficient of the residuals of fitted ARIMA for lag 1–30 is within the limits. Moreover, the Ljung–Box test also supports this result. Thus, we can conclude that residuals are purely random. Hence, this approach can be applied to forecasting. Post ARIMA prediction, we utilized the ANN algorithm and then a hybrid of both. We used an iterative process to fit the ANN model, determining the ideal number of hidden layers through trial and error. We started with one hidden layer and progressively added layers individually until we reached the most accurate result. It was discovered that using two hidden layers and three input layers resulted in the lowest test error.
The insights drawn from
Figure 9 are particularly illuminating, shedding light on the performance of various forecasting models in the context of the Nikkei Japan stock market. This visual representation allows for us to discern and interpret the relative accuracy of these models by observing the heights of the bars, where lower heights signify smaller forecast errors and, consequently, a higher degree of predictive precision.
Upon a closer examination of
Figure 9, it becomes evident that the ANN algorithm displays a commendable capacity to capture the overarching trend of the Nikkei Japan stock market. This indicates that, when utilized as a standalone model, the ANN is adept at providing forecasts that align well with the actual market movements. This observation underscores the ability of neural networks to uncover and incorporate intricate patterns and nuances within the time series data of the Nikkei index, contributing to its strong forecasting performance.
However, the most striking findings emerge when we shift our focus to the hybrid model, specifically the ARIMA-ANN combination. In this context, it becomes readily apparent that the forecast errors generated by the ARIMA-ANN hybrid model are notably reduced when compared to the separate ARIMA and ANN models. This reduction in forecast errors is a clear manifestation of the heightened predictive accuracy that the hybrid approach offers.
The unique ability of the ARIMA-ANN hybrid model to combine the best features of two different modelling approaches is what makes it better at making predictions. The ARIMA component excels in capturing linear trends, and it effectively addresses issues related to seasonality. Meanwhile, the ANN component demonstrates its prowess in dealing with the complexity of non-linear relationships within the data. By integrating these two approaches, the hybrid model capitalizes on their complementary strengths, culminating in a more precise and reliable forecast.
3.3. France Stock Market (CAC 40 Index)
We can see in
Figure 10a that the original stock market time series is increasing over time, which shows that the series is suffering from a unit root problem. To achieve flatness and remove fluctuations in the data, the logarithm transformation is implemented, and difference transformation is performed to obtain a stationary series.
Figure 11 represents the transformed series, which ensures stationarity. In
Figure 11a, a gradual decrease in the ACF plot is further evidence of a unit root. Following transformation, in
Figure 11c, we can notice a sharp fall in the ACF plot. This confirms that the series is a first difference stationary series. Certain orders of q and p are connected to a specific pattern in the ACF and PACF plots, respectively.
Looking at the residuals correlogram and the Ljung–Box test shown in
Figure 12, it is clear that there is no noticeable spike, and the
p-values from the Box–Ljung test are higher than the 5% significance level. The results of this study offer support for the null hypothesis, indicating that the residuals have a random pattern. Therefore, it can be inferred that residuals exhibit characteristics of white noise. Therefore, this model has the potential to be utilised for t making predictions. After ARIMA prediction, the subsequent step employs the ANN technique. Subsequently, a combination of both ARIMA and ANN strategies is utilised. We fit the ANN model iteratively, exploring until we found the optimal number of hidden layers. We began with a single hidden layer and worked our way up to the most accurate outcome, layer by layer. Along the way, it was found that the lowest test error was achieved with three input levels and three hidden layers.
The insights derived from
Figure 13 offer a compelling perspective of the performance of various forecasting models within the intricate landscape of the French stock market. This visual representation provides a clear means of gauging the relative accuracy of these models, with lower bar heights indicating smaller forecast errors and, by extension, a higher level of predictive accuracy.
Upon a detailed examination of
Figure 13, a notable observation comes to the forefront: the ANN algorithm demonstrates a strong ability to capture the underlying trends of the French stock market. This implies that, when employed as a standalone model, the ANN excels at providing forecasts that closely align with actual market behavior. This finding underscores the capacity of neural networks to uncover and incorporate the subtleties and intricacies within the time series data of the French stock market, contributing to its robust forecasting performance.
However, the most remarkable findings are unveiled as we shift our focus towards the hybrid model, specifically the fusion of ARIMA and ANN. When compared to the individual ARIMA and ANN models, it is clear that the ARIMA-ANN hybrid model significantly reduces the forecast errors. This substantial reduction in forecast errors reflects a higher degree of predictive accuracy, affirming the superior forecasting capability of the hybrid approach.
The improvement in forecasting precision obtained with the ARIMA-ANN hybrid model is a direct consequence of its unique ability to harness the strengths of two distinct modeling methodologies. The ARIMA component effectively captures linear trends and addresses seasonality in the data, while the ANN component excels at managing the complexities of non-linear relationships. By seamlessly integrating these two approaches, the hybrid model optimally leverages their complementary strengths, culminating in a forecast that is both accurate and robust.
3.4. Difference among the Three Datasets Results
This study presents a novel approach that combines the ARIMA and ANN models and is then applied to three financial markets within the G7. The findings of this study demonstrate that the hybridization of these models yields highly beneficial results in terms of predicting. It is worth noting that, in the realm of financial markets, the hybrid approach exhibits a notably low level of forecast inaccuracy when applied to the Nasdaq USA stock market as compared to other financial markets under consideration. Specifically, in the case of the Nikkei Japan stock market, there is a particularly significant degree of forecasting error.
4. Conclusions
Almost all financial decision-makers, such as investors, money managers, hedge funds, and investment banks, needed to forecast financial asset prices such as exchange rates, options, bonds, interest rates, and stocks, among other things, with the aim of making productive decisions. Therefore, to date, the modification and development of new models have not stopped in research on the management of financial markets. According to previous research, prediction plays a key role in financial markets; however, this is a difficult task. Thus, financial stakeholders face many difficulties in achieving accurate forecasts. In the forecasting literature, merging multiple models is one of the most popular ways to gain additional accuracy in comparison with individual models. The literature has put forth a number of methods for dealing with the limitations of the separate approaches and generating more trustworthy results. A combining approach that decomposes a time series into two parts, linear and non-linear, is the most popular approach, and has been theoretically as well as empirically accepted to be more successful than an individual model. These models have advantages in terms of linearity and nonlinearity in the time series nexus.
The current study compares the predictive power of a hybrid of linear/nonlinear (i.e., ARIMA/ANN), such as ARIMA-ANN, with their components using the data of three stock market indices from G7 countries. Empirical research based on three popular real datasets of stock prices from the three stock market indexes, namely the Nasdaq stock exchange, United States, Nikkei Stock exchange, Japan, and France stock exchange, demonstrates that using a hybrid model yields a more accurate forecast than using separate components. It is generally believed that a hybrid model can deliver results that are, to some extent, better than those obtained by individual models. Based on an analysis of real data, the findings revealed that the hybrid ARIMA-ANN is overall superior to individual ANN and ARIMA models. For all the considered stock exchange indexes, the RMSE and MAE values observed in the hybrid model exhibited a significant reduction in comparison to the individual models.
The scope of this study primarily centres on univariate analysis, wherein forecasting models are built solely on historical data related to the stock market indices under consideration. Numerous external variables, including economic indicators, political events, and global trends, can have a profound impact on market movements. Incorporating external economic and financial indicators, such as geopolitical events or macroeconomic data, into the forecasting models can enhance their predictive power. Future studies could explore the impact of exogenous variables on model accuracy. A combination of LSTM and ANN can be utilised for the prediction of complex stock market data.