1. Introduction
The exploration of stock price prediction models has always been a significant topic. From traditional forecasting methods such as Autoregressive Integrated Moving Average (ARIMA), Generalized Autoregressive Conditional Heteroskedasticity (GARCH), and Vector Autoregressive Moving Average (VARMA) (Atsalakis & Valavanis [
1]) to Machine Learning (ML) and nowadays Artificial Neural Networks (ANNs) (Kao et al. [
2]), numerous scholars have continually attempted to improve models for predicting stock prices. In 2019, Shahzad [
3] constructed a model using Support Vector Regression (SVR) for short-term stock index prediction, a nonlinear time series forecasting method. In 2018, Ao and Zhu [
4] utilized Support Vector Machine (SVM) technology to establish a model to predict the movements of the CSI 300 Index, demonstrating the effectiveness of SVM in stock index prediction.
Compared to traditional machine learning algorithms, neural network models can effectively address the non-stationary and nonlinear characteristics of stock price data (Zhang and Lou [
5]). Liu et al. (2019) [
6] used a BP neural network model to predict stock price fluctuations. Kong et al. (2021) [
7] employed a GRU neural network model to predict Amazon spot prices, verifying the applicability of neural network models in the financial domain [
8].
Especially, LSTM networks are highly favored due to their ability to address the vanishing gradient problem common in RNNs. This is attributed to their internal hidden memory cells [
9], which can store past information and use it along with the latest information to predict sequences, and automatically decide when to delete information from their memory (Gers, Schraudolph, & Schmidhuber, 2003 [
10]; Hochreiter & Schmidhuber, 1997 [
11]), thus maintaining long-term dependencies.
BiLSTM, an updated variant of LSTM, has been utilized in both classification and regression problems (Peng et al., 2021 [
12]). BiLSTM combines forward and backward LSTM, achieving additional training by traversing the input data from left to right and from right to left. Some studies (Cheng, Ding, Zhou, & Ding, 2019 [
13]; Kulshrestha, Krishnaswamy, & Sharma, 2020 [
14]) have demonstrated its effectiveness in handling long-term dependencies. It primarily achieves this through back-to-back encoding, allowing for the extraction of more hidden information from the data.
However, due to certain issues with the gating mechanism of LSTM units in processing data information, some gating mechanisms are not designed to be sufficiently streamlined, which may limit the efficiency of BiLSTM in processing data. Experimental results indicate that the input gate and output gate mechanisms in LSTM are crucial, and coupling the forget gate and input gate can achieve performance comparable to LSTM. The coupled structure can enhance the data processing capability of LSTM units. The He-BiLSTM takes advantage of the BiLSTM architecture and a modified neural network unit, replacing the traditional LSTM units in the BiLSTM framework with the modified neural network units, thereby influencing the overall model through these modified units and subsequently optimizing it.
Based on this situation, this paper optimizes the LSTM design and proposes a He-BiLSTM, aiming to improve the data processing efficiency of BiLSTM and enhance the overall model performance.
Based on the aforementioned situation, the research findings of this paper are summarized as follows:
To further enhance the robustness, accuracy, and generalization capability of the BiLSTM model, a novel LSTM unit structure is adopted to reconstruct the BiLSTM. Compared to the existing BiLSTM, this paper reconstructs a novel bidirectional neural network structure (He-BiLSTM) by replacing the traditional neural units in BiLSTM. This makes the data transmission efficiency of the He-BiLSTM higher, enabling the model to better handle the non-stationary and nonlinear characteristics of stock financial data. Consequently, the model’s ability to process noise is improved, leading to an overall performance enhancement. Additionally, this paper also presents the backpropagation algorithm for the He-BiLSTM.
Applying the He-BiLSTM to stock price prediction results in higher accuracy, providing significant reference value for predicting stock prices in financial markets.
2. Related Work
Since the construction of the He-BiLSTM is based on the improved design of LSTM and BiLSTM, this section will first introduce these two aspects.
2.1. LSTM (Long Short-Term Memory) Neural Networks
LSTM, proposed by Hochreiter and Schmidhuber [
11] in 1997, is a variant of RNN neural networks. It was introduced because training standard Recurrent Neural Networks (RNNs) to handle problems with long-term temporal dependencies presents numerous challenges. Recent studies (e.g., Abe and Nakayama, 2018 [
15]; Fischer & Krauss, 2018 [
16]; Kraus & Feuerriegel, 2017 [
17]; Minami, 2018 [
18]; Choi, 2018 [
19]) have utilized deep learning methods to predict financial time series data, involving techniques such as LSTM and BiLSTM, with these models achieving notable predictive performance.
The significance of LSTM neural networks lies in the gate mechanisms present in each of their units: the forget gate, input gate, and output gate. These three gates play different roles in processing data information. The forget gate determines which information from the input should be discarded, the input gate decides which data information will be modified, and the output gate determines which data information will be passed to the next neural unit. The specific structure of an LSTM neural unit is shown in
Figure 1:
The relevant gating mechanism equations are as follows:
Among these, Equation (
1) is the calculation formula for the forget gate
, Equation (
2) represents the calculation formula for the input gate
, Equations (
3) and (
4) represent the calculation formulas for the cell state
, indicating the long-term memory transmission state of data information, Equation (
5) represents the calculation formula for the output gate
, and Equation (
6)
represents the hidden state output information flowing to the next unit. In Equations (
1)–(
6),
represents the sigmoid activation function, ⊙ is the Hadamard product, and
represents the input vector of the current unit.
and
represent the weight matrices for the input vector
and the previous hidden state
, respectively.
represents the bias vectors.
2.2. BiLSTM (Bidirectional LSTM) Network
The core of the BiLSTM architecture lies in utilizing two LSTM layers to process the input sequence in both forward and backward directions, enabling bidirectional information flow. This design allows the model to simultaneously capture historical and future information within the sequence. In non-financial data domains, scholars (Fan, Qian, Xie, & Soong, 2014 [
20]; Graves, Jaitly, & Mohamed, 2013 [
21]; Graves & Schmidhuber, 2005 [
22]; Schuster & Paliwal, 1997 [
23]; Siami-Namini, Tavakoli, & Namin, 2019 [
24]) have found that its performance surpasses that of LSTM. Later, Ren et al. discovered that the characteristics of the BiLSTM model are more suitable for nonlinear data, such as stock prices, compared to LSTM models.
Figure 2 illustrates the structure of the BiLSTM framework, and since forward propagation is divided into front and back directions, we write the front direction as a positive direction and the back direction as a negative direction. In this way, we can distinguish between the reverse direction of forward propagation and the concept of backpropagation. Compared to the LSTM, the structure of the BiLSTM is more complex. In an LSTM unit, there are 12 internal parameters involved in backpropagation updates:
,
and
. However, for the BiLSTM, the parameters include not only
,
and
in the positive direction but also
,
and
in the negative direction, making a total of 24 parameters. For clarity, the parameters of the positive unit are denoted as
,
and
, and the parameters of the negative unit are denoted as
,
and
.
The forward propagation calculations of the BiLSTM are also divided into two parts: the positive-direction LSTM and the negative-direction LSTM. The calculation formulas for the forward LSTM are (
7)–(
12).
The calculation formulas for the negative direction LSTM are (
13)–(
18):
In BiLSTM, the input vector at time t is
(where
t = 1, 2...
), with both
and
being zero vectors. The final transmitted
and
from both directions are multiplied together to get
. The calculation formula is given by Equation (
19), as follows:
3. Methodology Proposal
Although BiLSTM networks alleviate some of the issues related to insufficient dependency modeling between information compared to traditional LSTM, they still inherit many of the inherent problems of LSTM units. One of these issues lies in the imperfect design of the gating mechanisms within LSTM units. In 2000, Felix Gers [
25] observed that LSTM units needed to update their internal states while processing time series data, leading to the introduction of the forget gate mechanism. In the study referenced in [
18], it was also found that within LSTM units, the forget gate is the critical gate, while improving the input and output gates is crucial for enhancing LSTM unit performance. Therefore, it is advisable to retain the LSTM forget gate while improving the other gating mechanisms.
In the LSTM unit structure, due to the limited gating mechanisms involved in controlling the long-term memory , handling long-term memory information may be inadequate. Additionally, LSTM units require learning 12 parameters, and when applied to BiLSTM, this number increases to 24. This increase in model parameters significantly impacts the model’s computational speed.
Given these issues, the concept of He-BiLSTM is proposed to address the inefficiencies in information processing. This approach aims to improve accuracy, enhance robustness, strengthen generalization capabilities, and mitigate the slow computational speed caused by inefficient information handling in traditional models.
3.1. Structure and Forward Propagation Algorithm of He-BiLSTM
The structure diagram of He-BiLSTM is shown in
Figure 3:
The He-BiLSTM consists of two parts, the variant LSTM and the BiLSTM, which is constructed on the basis of the V-LSTM (Variant LSTM) model. The V-LSTM unit adds a new forget gate internally, which is aimed at the long-term memory
. Since in the traditional LSTM unit, the information in
is not required much in the gating mechanism, it may lead to excessive noise information affecting the unit’s ability to process information during the transmission process. Therefore, selectively forgetting information in the long-term memory
and retaining useful information can improve the performance of the LSTM unit. On the other hand, for the traditional LSTM unit, many researchers have proven that reasonably simplifying the gating mechanism of the LSTM does not affect the performance of the LSTM. Instead, the simplified structure will reduce the parameters that need to be learned inside the unit, thereby improving the unit’s information processing efficiency. For the overall model structure, it can improve the model’s running speed while ensuring accuracy. Based on the above two factors, the V-LSTM was designed. In [
26], experiments also show that the V-LSTM unit effectively improved some of the problems of the classical LSTM unit, and to a certain extent, improved the model’s robustness, accuracy, and generalization performance.
The BiLSTM integrates the structures of LSTM in two directions, establishing relationships between units. Therefore, in the BiLSTM, the advantages and disadvantages of the LSTM are both magnified. The disadvantage of the traditional LSTM, which is the inefficiency in processing information that leads to slower model operation, is even more pronounced in the BiLSTM. Based on this situation, considering the improvement of the traditional BiLSTM, the V-LSTM is seamlessly embedded into the BiLSTM to design a He-BiLSTM, thereby enhancing the overall performance of the BiLSTM.
Here, the forward propagation formula of He-BiLSTM is as follows:
After obtaining
and
at time
, compute their dot product using Formula (
34) to obtain
as the final result.
After computing
, pass it through a fully connected layer to compute the predicted value
, using Formula (
35), as follows:
The loss function here is computed using mean squared error (MSE), denoted as
E, with the following calculation formula:
The formula for calculating the total error (total loss) is
3.2. Backpropagation Algorithm of He-BiLSTM
BPTT (Backpropagation Through Time) is a backpropagation algorithm used to handle time series data, also known as the gradient descent method. It iteratively adjusts parameters to reduce errors and ultimately finds the optimal parameter values.
In traditional BiLSTM models, there are 24 parameters that need to be learned through backpropagation. Specifically, there are 12 parameters for the positive direction (, and ), and there is another set of 12 parameters for the negative direction (, and ). For He-BiLSTM, due to improvements within the units, in the positive direction, there are , i.e., 10 parameters to be learned; in the backward direction, there are also 10 parameters, reducing the total number of parameters to be learned by 4 compared to traditional BiLSTM.
Let the partial derivative of the loss function
E with respect to
at the final time
be denoted as
. The formula for calculating
is as follows:
After obtaining
, the partial derivative of the loss function
E with respect to
is
The partial derivative of the loss function
E with respect to
is
The partial derivatives of the loss function
E with respect to the weight matrices
,
,
and
are, respectively, as follows:
The partial derivatives of the loss function
E with respect to the weight matrices
,
and
are, respectively, as follows:
The calculation formulas for
and
are as follows:
The partial derivatives of the loss function
E with respect to the bias vectors
and
are, respectively, as follows
The above are the formulas for calculating the partial derivatives of all parameters in the forward direction. For the backward direction, there are differences in the computation of
. If we denote the backward activation
during the forward propagation as
, then the formula for calculating
is as follows:
When referring to in the backward direction in future discussions, it can be replaced with .
At this point, all the backpropagation formulas for He-BiLSTM have been derived. For the parameter update method, this paper uses the Adam optimizer to update each parameter.
3.3. The Algorithmic Process of He-BiLSTM
Accordingly, the entire algorithmic procedure of He-BiLSTM is presented in Algorithm 1:
Algorithm 1: The algorithmic process of He-BiLSTM |
Divide the dataset into training and testing sets ; set the size of the batch processing ; the dimensions of the hidden layer ; the dimensions of ; the time step t; the learning rate of the Adam optimizer ; the number of iterations ; Random seed number. Input: the input vector (where t =1, 2...) Output: if then while do 1. During forward propagation, calculate the values of various variables. 2. Calculate the partial derivatives with respect to all parameters based on the loss function E for backpropagation. 3. The Adam optimizer updates the iteration of various parameters. 4. Calculate the error on the training set. 5. Calculate the error on the test set. 6. Calculate the model’s running time. end end return 1. Calculate the error on the training set. 2. Calculate the error on the test set. 3. Calculate the model’s running time. |
4. Experiment
Since the model in this paper is an improvement based on BiLSTM, the main comparison model for He-BiLSTM is BiLSTM. The comparison involves aspects such as accuracy, robustness, generalization ability, and running speed between the two. In terms of robustness, the experiment compares the models at moments of stock price volatility, observing the models’ adaptability to sudden changes through a graphical analysis. Additionally, the experiment also considers the ability to predict stock prices for multiple future days as a factor in judging the model’s robustness. In terms of generalization ability, the same model is used in different financial datasets to evaluate the model’s generalization capability.
In the experiment, the data used were stock index data (S&P 500) and futures data (gold) to verify the performance of the model. For the S&P 500 data, 5000 data points from 26 January 2000 to 9 December 2019 were selected. For the gold data, 4960 data points from 25 January 2005 to 23 April 2024 were selected. The input vector dimensions are all four-dimensional, representing the opening price, highest price, lowest price, and closing price. The time step is 10 days, using the data from the previous 10 days to predict the prices for the 11th, 12th, and 13th days.
In the data processing section, the dataset is first divided into a training set and a test set, with the training set consisting of 3000 entries and the test set consisting of 500 entries. After that, the data are checked for missing values, specifically looking for cases where the opening price is zero. Once the check is complete, the data are normalized. The formula for normalization is as follows:
Here, represents the value of each element in vector, indicates the minimum value in each vector, and indicates the maximum value in each vector.
The evaluation metrics for the model are mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination (
). The smaller the values of MSE and MAE, the higher the model’s predictive accuracy and performance; the closer
is to 1, the closer the model’s predicted values are to the actual values. The calculation formulas for the three evaluation metrics are as follows:
4.1. Predict the Price for the Next Day
In this section, we initially use the prices from 10 days to predict the price on the 11th day. With a random seed number set to 10, a batch size of 500 samples, and an Adam learning rate of 0.59, the BiLSTM model achieves a prediction accuracy of 93.80% on the training set and 92.79% on the test set. This represents a certain degree of improvement in accuracy compared to the classical LSTM. The classical LSTM is less stable in prediction, easily influenced by hyperparameters, and exhibits greater volatility. The BiLSTM, leveraging the advantages of bidirectional transmission, enhances the dependency of information over time, which significantly improves this issue. In the He-BiLSTM model, the prediction accuracy is even higher than that of Bi-LSTM, with the training set reaching an accuracy of 95.41% and the test set reaching 94.23%. The prediction effect diagrams for both BiLSTM and He-BiLSTM are shown in
Figure 4.
From
Figure 4, it is evident that during periods of price volatility, the predictions of He-BiLSTM are closer to the actual values. To compare the predictive capabilities of the two models during periods of stable prices, a section of
Figure 4 is enlarged for comparison. The comparative chart is shown in
Figure 5.
Upon comparing the prediction effect diagrams during periods of stable prices, it is observed that the predictions of He-BiLSTM are more accurate than those of BiLSTM. This indicates that He-BiLSTM demonstrates superior robustness compared to BiLSTM, regardless of whether the prices are stable or experiencing sudden changes.
For the comparison of evaluation metrics for the two models, see
Table 1:
From
Table 1, it can be seen that compared to BiLSTM, He-BiLSTM has superior evaluation metrics across the board. On the test set, the MSE of He-BiLSTM is reduced by 38.36% compared to BiLSTM, and the MAE is reduced by 33.42% compared to BiLSTM. In terms of accuracy, He-BiLSTM has improved by an additional 3.87% over BiLSTM.
4.2. Predicting the Price for Multiple Future Days
To validate the generalization capability of He-BiLSTM, it is employed to predict the prices of the 12th and 13th days based on the prices of the first 10 days, and the results are compared with those of BiLSTM. The specific evaluation metrics are shown in
Table 2.
When predicting the prices for the next two days, it can be observed that the test accuracy of He-BiLSTM can still reach 92.79%. Compared to predicting the price for the next day, the accuracy only drops by less than 2% from 94.22%. However, for BiLSTM, although the accuracy on the test set can still exceed 85%, it drops by nearly 4% compared to 90.36% when predicting the price for the next day. This indicates that in terms of robustness and generalization ability, He-BiLSTM performs better.
As can be seen from
Table 3, When predicting the prices for the next three days, He-BiLSTM’s test set accuracy is 90.03%, still maintaining a benchmark above 90%. In contrast, BiLSTM’s test set accuracy has dropped to 74.28%. Comparatively, He-BiLSTM’s evaluation metrics are now significantly ahead of BiLSTM’s. This demonstrates that He-BiLSTM has excellent capability for predicting prices over multiple future days, which, in the practical field of financial investment, can provide investors and decision makers with a broader range of strategies.
Since the He-BiLSTM model achieves an accuracy of 94.2% in predicting the stock index price for the next day, it is considered a high-probability event. Therefore, in real market conditions (excluding the impact of sudden news events), if the predicted lowest price for the next day is within a 6% fluctuation range from the current price, buying at that point and selling when the predicted highest price is within a 6% fluctuation range from the current price can maximize profits. If the He-BiLSTM predicts a higher price within the next two days and one aims for greater returns, the accuracy decreases to 92.8%; thus, the buying and selling price fluctuation range should be around 8%. Similarly, if a higher price is predicted within the next three days, the accuracy is 90.3%, and the fluctuation range for buying and selling should be around 10%.
In individual stock price predictions, Microsoft and Kweichow Moutai were also selected for further stock price forecasting. The prediction results are shown in
Figure 6 and
Figure 7, respectively. The experiments showed that the overall performance of the He-BiLSTM model surpassed that of the BiLSTM.
Additionally, to further demonstrate the model’s generalization capability, He-BiLSTM is applied to futures data and compared with BiLSTM in experiments, with results also showing that He-BiLSTM is superior to BiLSTM.The prediction results are shown in
Figure 8.
4.3. Comparison of Model Convergence Speed
In terms of model running speed, the time for one iteration of both He-BiLSTM and BiLSTM is recorded, with the first 10 iterations being counted. The statistical results of the running time are shown in
Table 4.
From
Table 4, it can be observed that as the number of iterations increases, the advantage of He-BiLSTM in terms of reduced time consumption becomes increasingly evident. After 10 iterations, He-BiLSTM takes a total of 159.378 s, while BiLSTM takes 172.683 s. Compared to BiLSTM, He-BiLSTM’s time consumption is reduced by 7.70%.
5. Conclusions
BiLSTM is currently a more advanced model. To further improve the performance of BiLSTM and better cope with the aperiodicity and volatility of financial market prices, this paper starts with the internal units of BiLSTM to improve the overall performance of the model. The V-LSTM unit is seamlessly integrated into BiLSTM, reconstructing it into He-BiLSTM and providing the backpropagation algorithm. The V-LSTM unit adds a new forget gate to the long-term memory , and the information forgotten by and the information coupled with the forget gate are used together as the input for the next unit. This allows each unit to utilize when transmitting information backward, while at the same time, it simplifies the number of parameters that each LSTM unit needs to learn, improving the efficiency of internal information processing. These advantages of the V-LSTM unit, when applied to BiLSTM, will further enhance the performance of He-BiLSTM.
To validate the robustness, accuracy, generalization ability, and model running speed of He-BiLSTM as superior, this paper has designed three experiments.
Firstly, He-BiLSTM and BiLSTM are applied to predict the stock price of S&P 500 for the next day. The experimental results show that compared to BiLSTM, He-BiLSTM has a smaller mean squared error, being reduced by 38.36% in comparison to BiLSTM’s mean squared error of 0.000318619, and with the accuracy rate being increased by 3.87% from 90.36%. Moreover, the improvement in He-BiLSTM’s predictive capability is not only reflected in parts where the price changes abruptly but also during periods of stable prices, where the predictive accuracy is further enhanced. This also proves the strong robustness of the model.
Secondly, to verify the generalization capability of He-BiLSTM, it was used to predict prices for multiple days in the future and compared with BiLSTM. The experiments showed that even when predicting up to 3 days into the future, He-BiLSTM can still maintain an accuracy rate of over 90% on the test set. Furthermore, He-BiLSTM not only performs better in stock index data prediction but also yields good results in the futures field. This proves that He-BiLSTM has stronger generalization capabilities.
Lastly, in terms of model running speed, He-BiLSTM operates faster. When iterating 10 times under the same conditions, He-BiLSTM can reduce the time by approximately 7.70%.
The performance characteristics of He-BiLSTM, when applied to the field of financial investment, can provide investors with more accurate price forecasts, more strategic choices for the future, and faster trading operations, thereby helping to address some of the risks inherent in the financial markets.
6. Limitations
Since the prediction of stock prices can also be influenced by factors such as financial reports, interest rates, economic indicators, and geopolitical events, these non-technical factors may also impact the trend of stock prices. This is an area that needs to be considered and improved in future research work. In terms of code implementation, the model runs on custom-written NumPy code, and we will optimize the overall code in the future to facilitate the overall invocation of the model. Additionally, the model will be applied to a broader range of fields for experimentation to further verify the model’s generalization capabilities.