Implicit-Causality-Exploration-Enabled Graph Neural Network for Stock Prediction

Li, Ying; Xue, Xiaosha; Liu, Zhipeng; Duan, Peibo; Zhang, Bin

doi:10.3390/info15120743

Open AccessArticle

Implicit-Causality-Exploration-Enabled Graph Neural Network for Stock Prediction

by

Ying Li

^*,

Xiaosha Xue

,

Zhipeng Liu

,

Peibo Duan

and

Bin Zhang

College of Software, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Information 2024, 15(12), 743; https://doi.org/10.3390/info15120743

Submission received: 22 October 2024 / Revised: 8 November 2024 / Accepted: 19 November 2024 / Published: 21 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Accurate stock prediction plays an important role in financial markets and can aid investors in making well-informed decisions and optimizing their investment strategies. Relationships exist among stocks in the market, leading to high correlation in their prices. Recently, several methods have been proposed to mine such relationships in order to enhance forecasting results. However, previous works have focused on exploring the correlations among stocks while neglecting the causal characteristics, thereby restricting the predictive performance. Furthermore, due to the diversity of relationships, existing methods are unable to handle both dynamic and static relationships simultaneously. To address the limitations of prior research, we introduce a novel stock trend forecasting framework capable of mining the causal relationships that affect changes in companies’ stock prices and simultaneously extracts both dynamic and static features to enhance the forecasting performance. Extensive experimental results in the Chinese stock market demonstrate that the proposed framework achieves obvious improvement against multiple state-of-the-art approaches.

Keywords:

stock prediction; graph neural networks; Granger causality

Graphical Abstract

1. Introduction

The uncertainty of the stock market increases the difficulties in stock prediction, while the stock price is a critical indicator for finance practitioners to reasonably allocate their assets to gain profit [1,2]. In recent decades, forecasting stocks using deep learning methods has been of interest to researchers. In particular, the performance of graph neural network (GNN)-based methods surpasses that of most competitors due to their superior ability to capture a wider range of relational features leveraging the associations inherent in stock market data [3,4].

The tremendous success of GNN-based methods is mainly attributed to the research efforts dominated by the development of advanced GNN models. For instance, refs. [5,6,7] directly construct company relations, e.g., industry relation and shareholding relation, based on explicit knowledge in the market, and [8,9,10,11] explore the stock correlations, e.g., Pearson coefficient and cosine similarity, among stocks based on historical price information in the market. Subsequently, GNN-based models have been employed to aggregate the relational features associated with target stocks for prediction.

However, the performance of a GNN-based stock predictor is also closely related to the input graph structure, which is interpreted as the result of the relations between stocks [7]. In practice, the relation between stocks can be either explicit or implicit (more details can be referred to Section 4). Compared with explicit relations which have been widely employed in most existing GNN-based methods, most recently, growing interest in discovering the intrinsic causality of stock variation is motivating a shift toward implicit relation exploration across a number of GNN-based methods for stock prediction.

Powered by the pioneering techniques of GNN, the implicit relation exploration is usually manipulated by deep learning modules embedded in GNN models [8,10]. On the one hand, it suffers flaws in the absence of interpretability. On the other hand, as per the aforementioned analysis, the most pressing question is compounded by the fact that an implicit relation is still explored based on the input graph structure without the ability to discover the potential relations between non-connected nodes.

Although there exist studies conducting correlation-based measurements (e.g., Pearson correlation) in graph building for the GNN model [9,11], the correlation, by nature, indicates the presence of an association between stocks without establishing a deep causation between them. Worse still, little a priori work has accounted for the dynamic characteristics of implicit relations that can potentially affect the prediction performance. It is worth noting that Granger causality has been proven to be an effective mechanism for modeling stock market graphs [12,13]. Nevertheless, the directional attribute of a Granger causal graph, together with its dynamic characteristic, makes it difficult to incorporate it into the GNN model for stock prediction.

In light of the aforementioned challenges, we start with a discussion about the necessity of the Granger causal graph and use this insight in our proposed GNN model with the aim of capturing the causation rather than statistical association. Furthermore, we design a unified GNN-based framework, which is not limited to capturing explicit relations, but also implicit relations, no matter whether they are static or vary with time. The contributions of this article are drawn as follows:

We propose a unified GNN-based stock predictor with the incorporation of explicit and implicit relations with the aim of making a tradeoff between the interpretability of structural relations and the power of black-box deep learning models.
We design an equivalent undirected graph of Granger causality as a feed of the proposed dynamic and static fusion GNN model, named DSF-GNN, which captures both static and dynamic characteristics of implicit and explicit relations by leveraging two proposed modules, namely the static-relation-based feature extractor and dynamic-relation-based feature extractor.
Through experiments on the Chinese stock market spanning over more than three years of data, our proposed model outperforms other counterparts, demonstrating a performance improvement of 2.63% to 6.76% in terms of accuracy.

2. Related Work

2.1. Euclidean-Input Based Methods

Tasks related to stock forecasting involve predicting future price trends or prices of stocks. Accurate predictions can significantly enhance investment decisions and ultimately lead to better returns. Due to the temporal nature of stock prices, traditional RNN networks and their variants, such as LSTM and GRU, can extract the timing feature of stocks, and they are widely used by researchers. Some studies have taken into account the long-term time dependence in the stock price sequence [14,15]. Subsequently, some methods further improved upon this approach by considering the varying contribution of features at different moments for stock prediction. Attention-based RNN methods are employed in stock forecasting tasks. For example, Qin et al. [16] adopted an attention-based LSTM to extract driving input signals and hidden states. Zhang, Aggarwal, and Qi [17] introduced a deep learning model utilizing LSTM to address the dynamic market factors in stock prediction. Feng et al. [18] utilized adversarial training to simulate stochasticity during model training. Md et al. [19] introduced a multi-layer sequential long short-term memory (MLS-LSTM) model designed for predicting stock index prices. The proposed approach involves normalizing time series data and segmenting it into time steps to assess the correlation between past and future values. Smith et al. [20] proposed an optimized LSTM-RNN model that enhanced investors’ ability to understand and operate in dynamic stock market environments, enabling better prediction of stock trends.

The stock prices are also influenced by other factors, such as macro factors, investor sentiment, company news, etc. Natural language processing methods and text mining techniques are the commonly employed approaches to process such data. Wang et al. [21] considered both stock sectors and relevant macroeconomic variables to facilitate predictions. Mittal and Goel and Kraus and Feuerriegel [22,23] utilized twitter data to analyze public investor sentiment. Li et al. [24] incorporated both quantitative indicators and sentiment extracted from sentiment dictionaries into their LSTM-based model for predicting stock prices. Liapis et al. [25] proposed a multivariate prediction method that combines time convolutional networks and BERT-based multi-label sentiment classification, using sentiment-related features to predict financial time series. Yang, Zhen, et al. [26] utilized multiple investor sentiment features to investigate the effectiveness of stock index prediction, and combined the boosting method to forecast stock trends. Ding et al. [27] employed the open information extraction method to extract company events from news texts and represent them as dense vectors for prediction. Furthermore, various studies have utilized sentiment lexicons to extract features from financial news for stock price prediction. Deng et al. [28] utilized a sentiment lexicon, specifically SentiWordNet 3.0, to assess the sentiment of news articles, which employed a multiple kernel learning (MKL) regression model, integrating sentiments and stock prices, to predict stock price movements. Chen et al. [29] integrated quantitative indicators with news features extracted using a sentiment dictionary into their RNN-boost model, resulting in improved stock prediction performance. However, while the Euclidean-input based methods exhibit impressive accuracy in forecasting, they have their limitations. The majority of the aforementioned studies primarily concentrated on integrating an individual stock’s historical data with additional textual information, while neglecting the interrelations among different stocks, particularly in capturing the relationship attributes between different stocks.

2.2. Graph-Based Methods

In recent years, a novel research direction has investigated graph-structured data to capture the interconnections between stocks, which can be divided into two types, explicit-based and implicit-based. Chen, Wei, and Huang [7] constructed a corporate graph using the shareholding relationships, transforming stock prediction into a node classification problem using GCNs. Sawhney et al. [30] utilized attentional graph neural networks to analyze the connections formed by social media texts and correlations among companies, facilitating a comprehensive understanding of market dynamics. Feng et al. [5] utilized two types of explicit relationships, namely industry or Wikipedia relationship, to construct a graph and enhance the GCNs with LSTM cells to model stock interconnections for prediction. Inspired by the remarkable success of attention mechanisms, refs. [31,32] employed two layers of GATs [31] to extract meaningful relational features and filter out irrelevant ones for movement trend prediction. Ye et al. [6] built multiple graphs to find the representations of cross effects. Zhang et al. [33] proposed a novel graph attention network (GAN) driven by dynamic attributes that combined sentiment (DGATS) information, trading data, and textual data. They captured real-time market dependencies and key attribute information through graph networks, dynamically updating relationships and relationship strengths within predefined graphs for stock prediction. Shi et al. [34] introduced prior knowledge on relationships among four types of stocks, constructed a graph based on internal information of trading features, and utilized an integrated GCN-LSTM approach for predicting stock price movements. Qian et al. [35] proposed a hierarchical multi-relational dynamic graph framework that utilized discrete dynamic graphs to comprehensively capture the multifaceted relationships between stocks and their temporal changes, aiming to simulate stock investment prediction.

Although superficial connections can be observed in explicit relationships, the graphs constructed by these methods were constrained by fixed, predefined corporate relationships, which heavily relied on pre-established external knowledge. As a result, captu deeper interconnections remained elusive. Therefore, various methods were employed to uncover implicit relationships among stocks. Hamilton, Ying, and Leskovec [36] proposed the GAT model in GraphSAGE, which was good at implicitly assigning different weights to different nodes in a neighborhood, avoiding the need for labor-intensive matrix operations or relying on prior knowledge of graph structure. Li et al. [11] constructed matrices of positive and negative correlations based on historical market prices to predict overnight stock movements. Cheng and Li [10] introduced an attribute-mattered aggregator to model the momentum spillovers among firms for movement prediction. Xu et al. [8] utilized cosine similarity to construct hidden concepts and extract concept-oriented shared information from both predefined and hidden concepts. Tian et al. [9] constructed the time-varying Pearson coefficient and Manhattan matrix to predict stocks’ return rate and movement trends, respectively. Xing et al. [37] gained understanding of implicit connections through attribute-guided vague graph comprehension, followed by the derivation of relational characteristics using GNN. Despite the progress achieved in implicit-relation-based graph neural network stock prediction models, their focus was mainly on identifying stock relations grounded in correlation rather than causality. To provide a detailed overview of the related methodologies discussed above, Table 1 presents a comparison between these works, shows their respective pros and cons.

3. Preliminaries

3.1. Granger Causality

With the intention of determining whether a time series variable is advantageous to predict another, Granger causality is developed on the basis of the statistical hypothesis and has been widely adopted as an operational measure of causality in the field of econometrics. Mathematically, given a sequence of d-dimensional time series

X = {X_{1}, X_{2}, \dots, X_{T}}

,

X_{t \in T} \in R^{d}

.

X_{t}

can be expressed as a linear combination of

T

previous lags of the whole time series:

X_{t} = \sum_{n = 1}^{T} A_{n} X_{t - n} + ε_{t}

(1)

where

A_{n} \in R^{N \times N}

is a coefficient matrix with each element

A_{n} (i, j)

denoting the degree of influence of time series j on i, while

″_{t}

is Gaussian noise. The null hypothesis, that a time series is said to be not a Granger cause for another, is not rejected if the following condition is satisfied:

H_{0} : A_{1} (i, j) = A_{2} (i, j) = \dots = A_{T} (i, j) = 0

(2)

3.2. Message Passing Based GNN

Message passing is a main technique used in most existing GNN models [5,40]. We use

G = (X, A)

to represent a stock market graph, where

X \in R^{N \times D}

is an attribute matrix of N nodes, assuming each node has D attributes, while

A \in R^{N \times N}

denotes the adjacency matrix. The message passing is facilitated between nodes as follows:

H^{(l + 1)} = f (A, H^{(l)}; θ^{(l)})

(3)

where

H^{(l + 1)} \in R^{N \times F}

are the features of the node (for example, “messages”) computed after

l + 1

steps,

f (\cdot)

is the message propagation function, and

θ^{(l)}

denotes training parameters. There are multiple variants of the function

f (\cdot)

. We follow the work of [38]:

f (A, H^{(l)}; θ^{(l)}) = ReLU ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(4)

where

\tilde{A} = A + I

,

\tilde{D} = \sum_{j} {\tilde{A}}_{i j}

, and

W^{(l)}

is a weight matrix. GNN enables a graph node (a node usually denotes a stock) to receive features sent from its

l_{t h}

-order (

l \geq 1

) neighbors. Thus, the correlation between two distinct nodes could be explored and captured.

4. Methodology

We model the Granger causal graph of stocks using a multi-layer feed-forward neural network, called neural Granger causality (NGC) [41], for capturing Granger causality, followed by the proposed GNN-based predictor on the basis of moralization. Before illustrating more details, we first define the explicit and implicit relations as follows:

Definition 1.

Explicit relation: an explicit relation is directly constructed through observable financial domain knowledge without analysis, such as industrial relation and shareholding relation [5,40].

Definition 2.

Implicit relation: an implicit relation is extracted and analyzed from observable financial domain knowledge using techniques that can be further classified into linear/nonlinear measurement techniques, e.g., Pearson’s correlation [9].

In this paper, we consider industry, shareholding, and media information as explicit relations. Correspondingly, we take Granger causality as the implicit relation. In the following, we present the details about the Granger causal graph modeling, which is then fed to the proposed GNN model.

4.1. Granger Causal Graph Modeling

The power of Granger causality in time series analysis has been discussed in extensive studies [42,43]. In this paper, we implement Granger causality detection as a pre-task of stock prediction motivated by the fact that the Granger causality verifies whether one stock is useful to forecast another. Compared to correlation-based measures, existing investigation indicates that the Granger causality measure provides a much more stringent criterion for causation than simply observing a high correlation with some lag-lead relationship [13,44]. In addition, further research shows that standard correlation can mislead taxon interactions since correlation is a sign of causality, but it is not sufficient to infer causality [45].

For clarity of exposition, we implement Pearson correlation analysis and the Granger causality detection between two stocks. As shown in Figure 1, on the one hand, there is a weak Pearson correlation between the two stocks. On the other hand, no explicit relation is found between them, e.g., shareholding relations. In contrast, Figure 1 reveals that there exist contemporaneous and lagged relations between the two stocks at different time periods, allowing us to identify the underlying causality between these two stocks.

Unfortunately, Granger causality is unable to provide insight into the true causal relationship between two variables. The follow-on task of discovering true causal relations is assigned to our proposed GNN-based model. Unlike the directed graph generated based on the Pearson correlation, the difficulties of discovering Granger causality and deploying the causal graph as the input of a GNN model are as follows:

Most classical approaches to Granger causality detection have the limitation of being implemented on a time series with nonlinear features, resulting in an inconsistent estimation of Granger causal interactions.
Granger causality leads to the relation between two variables being directed rather than undirected, whereas the work so far on GNN models has focused primarily on undirected graphs [46]. Moreover, this is followed by another challenge, namely that a hybrid graph structure (directed and undirected graph structure)-enabled GNN model is necessary for our stock predictor.

To overcome the first challenge, we employ a nonlinear approach to investigate Granger causality among stocks and establish a dynamic Granger causality graph,

G C

, accordingly. Taking stock i as an example, our objective is to identify stocks that contribute to its modeling, thereby revealing those that Granger-cause stock i, which consists of two phases. First, we design an L-layer feed-forward neural network to model the closing price of stock i using the historical closing prices of N stocks from the stock pool, which is formulated as:

c_{i}^{t} = N_{i} (c_{1}^{t - T : t - 1}, c_{2}^{t - T : t - 1}, \dots, c_{N}^{t - T : t - 1}) + e_{t}

(5)

where

c_{i}^{t}

indicates the closing price of stock i on day t. The reason for choosing the closing price is that, compared to other price indicators, it intuitively reflects the trend of stocks. T is the time lag and

e_{t}

is the error term. Specifically, the parameters of

N_{i}

consist of the weights,

W = (W^{1}, W^{2}, \dots, W^{L})

, and biases,

b = (b^{1}, b^{2}, \dots, b^{L})

, of L layers. To enable interpretability, we partition the network

N_{i}

into two parts: the first layer and the remaining

L - 1

layers. At the first layer, the weight

W^{1} = (W_{1}^{1, 1 : T}, W_{2}^{1, 1 : T}, \dots, W_{N}^{1, 1 : T}) \in R^{N \times T \times H}

of

N_{i}

represents the influence of each stock on stock i over historical T days. The embedding of the first layer is calculated as:

h_{t}^{1} = ReLU (\sum_{k = 1}^{T} (W^{1, k} c_{1 : N}^{t - k} + b^{1}))

(6)

It is noted that for

\forall j \in N

, if

W_{j}^{1, 1 : T}

is a zero matrix, it indicates that stock j does not Granger-cause stock i, i.e.,

G C_{j, i}^{t} = 0

; otherwise, it indicates stock j Granger-causes stock i, i.e.,

G C_{j, i}^{t} = 1

. Subsequently, the closing price of stock i at time t is simulated:

c_{i}^{t} = FC (h_{t}^{1}; Θ) + e_{i}^{t}

(7)

where FC

(\cdot)

is the fully connected network with

L - 1

layers,

Θ

is the training parameters, and

e_{i}^{t}

is the error term.

We utilize the weights

W^{1}

of

N_{i}

in the first layer to simulate

A^{(k)}

in Equation (1) of VAR. Thus, if the j-th matrix

W_{j}^{1}

contains zeros for all time steps, that means stock j does not Granger-cause i. The group lasso penalty is applied to divide the weights of the first layer into N groups, penalizing the weights that are ineffective to zeros for each network. To identify the stocks that influence the ith stock, the loss function is as follows:

L = \min_{W} {(c_{i}^{t} - N_{i} (c_{1 : N}^{t - T : t - 1}))}^{2} + λ \sum_{j = 1}^{N} {‖ W_{j}^{1} ‖}_{F}

(8)

where the hyperparameter

λ

controls the degree of group sparsity, and

{‖ \cdot ‖}_{F}

is the Froebenius matrix norm. The group lasso penalty uniformly penalizes the weights associated with all lags for stock j. Since Equation (8) is not differentiable, the conventional gradient descent method is unable to update the network parameters. Thus, proximal gradient descent is employed to address the optimization problems of objective functions that involve non-differentiable convex function terms. At each timestamp, a Granger causality graph,

G C^{t}

, is constructed to model Granger causality.

G C_{j, i}^{t} = \{\begin{matrix} 1, & if stock j Granger causes stock i \\ 0, & otherwise \end{matrix}

(9)

4.2. Moralization

To tackle the challenge posed by the limited ability of GNNs to directly process directed graphs, we have adopted a method that converts directed causal graphs into equivalent undirected moral graphs. This transformation ensures the compatibility of directed graphs with the utilization of GNNs. The transformation process consists of two steps: First,

\forall x_{i} \in G C

, connect every pair of

P a r e n t (x_{i})

, where

P a r e n t (x_{i})

is the set of parent nodes of

x_{i}

. After that, replace the directed edges in

G C

with undirected edges.

4.3. Model Architecture

The overall architecture is illustrated in Figure 2 and consists of the following components: time feature representation module (TFR), relationship feature encoding module (RFE), relationship feature fusion module (RFF), and the final prediction layer. The input stock time series data are processed through these modules to generate predicted movement trends for each stock. The primary functions of each component are summarized as follows:

Time feature representation module (TFR): This module is devised to capture the temporal dependencies of historical stock prices on future fluctuations. We employ a classic time series analysis model, i.e., GRU, to extract the historical dynamic features of stocks, serving as input for subsequent modules.
Relationship feature encoding module (RFE): This module is designed to capture relational dependencies among related stocks and contains two sub-modules: a dynamic-relation-based feature extractor (DRE) and a static-relation-based feature extractor (SRE). DRE aims to capture dynamic relational dependencies, whereas SRE focuses on capturing static relational dependencies among stocks.
Relationship feature fusion module (RFF): To capture additional potential relational dependencies, RFF is designed to integrate both dynamic and static embeddings, generating a novel high-level stock representation.
Prediction layer: Finally, after the feature fusion process, the fused embeddings are input into the prediction layer, a fully connected network that generates predicted movement trends for each stock.

4.3.1. Temporal Feature Representation

Stock prices are highly influenced by both their own historical indicators and those of other stocks. The TFR module aims to capture the influence of the stocks’ historical information on their own stock prices. For a given stock i, the historical price attributes,

X_{i} = (x_{i}^{t}, x_{i}^{t - 1}, \dots, x_{i}^{t - T + 1}) \in R^{T \times D}

, are utilized as the input data for the TFR module, where

x_{i}^{t} \in R^{D}

denotes the attributes of stock i at time t, including opening price, closing price, high price, low price, and trading volume. The TFR module maps

X_{i}

to higher-dimensional features, which consist of abundant historical information,

H_{i}, o_{i} = RNN (X_{i})

(10)

where

H_{i} = (h_{i}^{t}, h_{i}^{t - 1}, \dots, h_{i}^{t - T + 1}) \in R^{T \times F_{1}}

is the output sequence,

o_{i} = h_{i}^{t} \in R^{F_{1}}

is the final state, and

F_{1}

is the number of hidden units of RNN. In this work, we use a GRU network as the RNN, which has the remarkable ability to retain information in long-term sequences, in contrast to vanilla RNN.

4.3.2. Relational Feature Encoding

In this paper, we consider four types of stock relationships: industry relationship, shareholding relationship, media relationship, and Granger causality relationship. Among these, the industry relationship is static, while the other three relationships are dynamic. For static relationships, we can directly capture dependencies of stocks by aggregating features according to static stock graphs. However, in contrast to static relationships, dynamic relationships change over time. Thus, it is necessary to capture the historical daily relational dependencies and consider how the accumulation of these dependencies impacts future stock fluctuations. Thus, we devise DRE and SRE for extracting dynamic and static relational features, respectively.

DRE aims to capture the influence of features from various related stocks at different time steps and incorporate them to generate embedding with dynamic relational feature, according to various dynamic relations. We achieve a comprehensive representation of stock i by taking a weighted sum of the output sequence generated by TFR for all neighboring stocks at different times, considering the various dynamic relations.

{\tilde{h}}_{i}^{k, n} = σ (\sum_{j \in G_{k} (i)} \frac{1}{d_{i j}} W_{a}^{k} h_{j}^{n} + b_{a}^{k})

(11)

where

n \in {t, t - 1, \dots, t - T + 1}

,

k \in {1, 2, . ., P}

, and P indicates the number of dynamic relations.

{\tilde{h}}_{i}^{k, n} \in R^{D_{1}}

,

G_{k} (i)

represents the set of stocks related to stock i for the k-th dynamic relation.

d_{i j}

balances the influence of different neighbor stocks on stock i.

W_{a}^{k} \in R^{F_{1} \times D_{1}}

is the weight matrix, and

b_{a}^{k} \in R^{D_{1}}

is the bias vector.

Subsequently, considering the temporal characteristics of dynamic features, we construct a temporal-attention-based GRU to integrate the relational embeddings of stocks at various time steps. First of all, the GRU generates a latent vector representation of historical features,

{\tilde{D}}_{i}^{k} = GRU ({\tilde{h}}_{i}^{k, t}, {\tilde{h}}_{i}^{k, t - 1}, \dots, {\tilde{h}}_{i}^{k, t - T + 1})

(12)

where

{\tilde{D}}_{i}^{k} = ({\tilde{d}}_{i}^{k, t}, {\tilde{d}}_{i}^{k, t - 1}, \dots, {\tilde{d}}_{i}^{k, t - T + 1}) \in R^{T \times D_{2}}

. In order to capture the impact of historical features across different time steps, we incorporate a temporal attention layer. The output,

a_{i}^{k, t}

, of this layer dynamically assigns weights to the vector

{\tilde{d}}_{i}^{k, t}

.

a_{i}^{k, t} = \frac{exp (\tanh ({\tilde{d}}_{i}^{k, t} w_{a^{'}} + b_{a^{'}}))}{\sum_{n = t - T + 1}^{t} exp (tanh ({\tilde{d}}_{i}^{k, n} w_{a^{'}} + b_{a^{'}}))}

(13)

where

w_{a^{'}} \in R^{D_{2}}

and

b_{a^{'}} \in R

are learnable parameters. After that,

{\tilde{d}}_{i}^{k, t}

is aggregated into a high-level vector representation.

a_{i}^{k, t}

assigns weights to

{\tilde{d}}_{i}^{k, t}

at each time step from

t - T + 1

to t. As a result, we obtain the dynamic embedding

d_{i}^{t} \in R_{d}^{D}

for each stock based on temporal attention:

d_{i}^{t} = \sum_{k = 1}^{P} α_{k} (ReLU (\sum_{n = t - T + 1}^{t} a_{i}^{k, n} {\tilde{d}}_{i}^{k, n}))

(14)

where P is the number of dynamic relations, and

α_{k}

controls the weights of each different dynamic feature.

Based on the characteristics of multiple static relations, SRF aims to capture the static relational embedding. We employ an attention-based GCN network to capture the association strength between stocks according to the predefined static relations. Subsequently, we aggregate the features of the associated stocks, according to the correlation strength between stocks.

For

\forall i, j \in N

, to mine the association strength between a stock pair, we first employ a linear transformation method to extract the latent representations, respectively,

h_{i}^{' k} = W_{b}^{k} o_{i} + b_{b}^{k}, h_{j}^{' k} = W_{b}^{k} o_{j} + b_{b}^{k}

(15)

where

k \in {1, 2, \dots, Q}

, and Q indicates the number of static relations.

o_{i}

and

o_{j}

are the the final state generated by TFR, and

W_{b}^{k} \in R^{F_{1} \times S_{1}}

,

b_{b}^{k} \in R^{S_{1}}

are learnable parameters. Subsequently, the vectors

h_{i}^{' k}

and

h_{j}^{' k}

are concatenated to form a new vector. The concatenated vector is linearly transformed, parameterized by a learnable weight vector

a_{s}^{k}

. After the nonlinear transformation, the softmax function is employed to transform the association strength values to the range [0, 1].

e_{i j}^{k} = \frac{exp (LeakyReLU (a_{s}^{k} [h_{i}^{' k} ‖ h_{j}^{' k}])}{\sum_{l \in G_{k} (i)} exp (LeakyReLU (a_{s}^{k} [h_{i}^{' k} ‖ h_{l}^{' k}])}

(16)

According to correlation strength scores for each neighbor of stock i, we can obtain a more comprehensive representation by performing a weighted sum of the latent representations of all neighboring stocks.

s_{i}^{t} = \sum_{k}^{Q} (β_{k} ReLU (\sum_{j \in G_{k} (i)} e_{ij}^{k} h_{j}^{' k}))

(17)

where

s_{i}^{t} \in R_{s}^{D}

indicates static embedding, and

β_{k}

controls the weights of each static dynamic feature.

4.4. Relational Feature Fusion

Subsequently, we aim to integrate the dynamic and static embeddings to form a new embedding, which preserves both the dynamic and static characteristics. The challenge is to effectively capture their interactions while preserving the independent features. To solve the above problems, we adopt a feature fusion method to capture potential interactions between

d_{i}^{t}

and

s_{i}^{t}

, while extracting high-level features.

To analyze the inherent associations between

d_{i}^{t}

and

s_{i}^{t}

, we perform multiple vector-matrix-vector (VMV) product,

d_{i}^{t} F_{i}^{[1 : F^{'}]} s_{i}^{t}

, where

F_{i}^{[1 : F^{'}]} \in R^{F^{'} \times D_{d} \times D_{s}}

indicates the

F^{'}

learnable weight matrix. More specifically, for the kth matrix, we multiply it with

d_{i}^{t}

and

s_{i}^{t}

by VMV, resulting in an temporary result. By repeating this process, we obtain

F^{'}

results, where each result represents a feature of the fused embedding.

To maintain their individual features,

d_{i}^{t}

and

s_{i}^{t}

are combined through concatenation, and linear transformation is performed using a weight matrix. Thus, the formula of RFF is as follows:

f_{i}^{t} = \tanh (d_{i}^{t} F_{i}^{[1 : F^{'}]} s_{i}^{t} + W_{f} [d_{i}^{t} ‖ s_{i}^{t}] + b_{f})

(18)

where ‖ indicates the concatenation, and

W_{f} \in R^{(D_{d} + D_{s}) \times F^{'}}

and

b_{f} \in R^{F^{'}}

are learnable parameters.

4.5. Prediction Layer

The relational embedding of stock i, generated by the RFF module, serves as input for the prediction layer. The prediction layer is constructed with a fully connected (FC) neural network. For stock movement prediction, an FC layer with softmax function is more suitable for classification,

{\hat{y}}_{i}^{t + 1} = Softmax (W_{p} f_{i}^{t} + b_{p})

(19)

where

{\hat{y}}_{i}^{t + 1}

is the output for stock i from the prediction layer parameterized by

W \in R^{F^{'} \times C}

,

b \in R^{C}

, and C represents the number of classes. The standard cross-entropy loss function can be employed for backpropagation to train the proposed model.

5. Experiments

5.1. Data

We validate the effectiveness of our proposed model using a dataset obtained from Tushare, which provides extensive information about the Chinese stock market (https://www.tushare.pro/ accessed on 21 October 2024).

More than 4000 stocks are filtered. The screening criteria are as follows: Firstly, to ensure data completeness, stocks that were not suspended from 1 January 2019 to 4 January 2022 were selected. After that, we ensured that during this three-year period, the stock prices were all above 10 yuan, as stocks below 10 yuan generally entail higher investment risk and are considered as microcap stocks with limited investment value. Among those identified, precisely 600 companies met both of the above conditions. We aim to predict the stock movement based on historical indicators, including opening price, closing price, high price, low price, volume, and other useful information in the market. Thus, stock movement prediction can be considered as a binary classification problem:

y = \{\begin{matrix} 1, & c l o s e_{t}^{c} \geq c l o s e_{t - 1}^{c} \\ 0, & c l o s e_{t}^{c} < c l o s e_{t - 1}^{c} \end{matrix}

(20)

5.2. Baselines

To assess the performance of the proposed model, we compare it against two types of baselines: Euclidean-based and non-Euclidean-based methods.

5.2.1. Euclidean-Input-Based Methods

Long short-term memory (LSTM) [47] is one of the most widely recognized recurrent neural network (RNN) models used for processing time series data.
Gated recurrent unit (GRU) [48] is another model commonly used for processing time series data. In comparison to LSTM, GRU has a simpler structure, with only two gated units.
The dual-stage attention-based recurrent neural network (ALSTM) [16] employs a recurrent neural network model with a two-stage attention mechanism, consisting of an input attention layer and a temporal attention layer.

5.2.2. Non-Euclidean-Input Based Methods

The graph convolutional network (GCN) [38] updates features by aggregating the information of neighboring nodes.
Temporal graph convolution (TGC) [5] makes the prediction by considering the sequential embedding and relational embedding.
The multi-graph convolutional gated recurrent unit (Multi-GCGRU) [6] emphasizes the diversity of relations by constructing three types of graphs to enhance the representations of cross effects.
Shared information for stock trend forecasting (HIST) [8] makes the prediction by combining predefined concepts, hidden concepts, and individual information.
Multi-relational graph attention ranking (MGAR) [39] employs a graph aggregation network that concurrently incorporates multiple stock relation graphs as input to examine the interactions between stocks, particularly focusing on similarity relationships.

5.3. Metrics

Following previous stock movement prediction works [21,49], we take two metrics as reference: accuracy (ACC) and Matthews correlation coefficient (MCC). The detailed definitions of the metrics are as follows.

ACC measures the proportion of correct predictions among the total number of predictions made. In the context of stock prediction, it is commonly used to evaluate the model’s accuracy in predicting the direction of stock price movements. It is calculated as:

A C C = \frac{T P + T N}{T P + F P + F N + T N}

(21)

where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.

MCC is a measure of the quality of binary classifications, considering both true and false positives and negatives. In the realm of stock prediction, it serves as a valuable metric for assessing the overall performance of a model, taking into account its ability to correctly predict positive and negative movements in stock prices. It is calculated as:

M C C = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

(22)

Here, MCC ranges between −1 and +1, where +1 indicates perfect prediction, 0 indicates random prediction, and −1 indicates perfect inverse prediction.

5.4. Experimental Results

To demonstrate the effectiveness of the proposed model, we compare it with the above baselines in Table 2, and the following observations are made from the table: Our method achieves the highest ACC and MCC, outperforming some of the latest stock prediction methods, such as Multi-GCGRU, TGC, MGAR, and HIST. ALSTM, LSTM, and GRU, which solely consider the temporal dependencies among historical price information, achieve similar performance. The standalone use of GCN yields the lowest accuracy among non-Euclidean-input-based methods due to its inability to handle sequential data. Multi-GCGRU, TGC, and MGAR, which take into account both temporal dependency and stock relations, achieve superior performance compared to ALTSM, LSTM, GRU, and GCN. HIST achieve the second best performance, considering implicit relation by establishing a relation by aggregating stocks with similar hidden concept characteristics. However, this approach essentially discovers connections between stocks, overlooking the causal influence of other companies on the target company in prediction. However, our model captures not only the causal effects but also the dynamic and static characteristics of other predefined relations, while considering temporal dependency. Therefore, DSF-GNN outperforms existing stock trend forecasting methods. It is important to note that all models fail to perform well in terms of MCC, which can be attributed to the fact that Chinese stock data are characterized by significant volatility and high randomness. This high randomness is due to the influence of various factors such as macroeconomic factors, policy changes, and market sentiment, and existing research also supports this view [50].

5.5. Ablation Experiments

We conduct ablation experiments to verify the effectiveness of different modules in the DSF-GNN, including the TFR, DRF, SRF, and RFF modules. To study the effects of these modules, we conduct experiments where specific components are removed, and subsequently observe the results. Table 3 shows the comparitive models and the results. Some conclusions could be drawn, as follows:

To assess the effectiveness of RFF, we remove it from DSF-GNN. Instead, we concatenate the outputs of DRF and SRF for prediction. The results demonstrate that incorporating the interaction between dynamic and static relational features, while preserving their independent characteristics, can enhance the prediction outcomes.
To assess the effectiveness of DRF and SRF, we conduct experiments by removing the DRF and SRF modules, respectively. As one of the modules is removed, the RFF module is not utilized. The results demonstrate that DRF and SRF contribute to the improvement in prediction performance.
Removing the TFR module would decrease the predictive performance. Therefore, the utilization of historical price indicators is crucial for stock trend forecasting.

5.6. Relations Comparison

In this part, we further investigate the effectiveness of different relations in stock prediction. We evaluate the performance of a relation by removing it from DSF-GNN. As shown in Table 4, the performances are similar when removing shareholding relation, causal relation, and industry relation, respectively. The accuracy is highest when media relations are removed. This indicates media relations have limited usefulness in stock prediction. This can be explained by the lag in news dissemination and the authenticity of news content. The major shareholders, who hold a significant portion of the stocks, have access to firsthand news and can influence public opinion through the Internet. On the other hand, minority shareholders commonly lack the resources to obtain timely news, resulting in a lag in information acquisition. In some cases, when major shareholders intend to short-sell their own stocks, they can manipulate the market by generating positive news to attract small shareholders’ attention, selling their stocks to other shareholders, resulting in a subsequent decrease in stock price.

However, for shareholding relation, the information released by the company every quarter is genuine, as other companies can only become shareholders after purchasing the stocks of the target company. Due to the sector rotation effect in the stock market, investors often select stocks based on industries. As a result, stocks within the same industry typically exhibit similar trends over a certain period of time. Therefore, industry relations are also of great importance.

5.7. Conclusions

In our research, we introduced a framework aimed at uncovering the causal relationships influencing changes in a company’s stock price, demonstrating its potential advantages in stock forecasting. First, we proposed the neural Granger causality model to identify Granger causal relationships between stocks, followed by a GNN-based model. We differentiated between explicit relations, including industry, shareholding, and media information, and implicit relations, such as Granger causality. Second, we introduced moralization to convert directed causal graphs into equivalent undirected moral graphs, making them compatible with GNNs. This model includes a time feature representation module and a relationship feature model, used respectively to capture the influence of historical information on a stock’s own price and to explore dynamic and static influences influenced by other stocks. Third, we presented a relationship feature fusion module that integrates dynamic embeddings and static embeddings to form a new feature that retains both dynamic and static characteristics. This new feature is used to effectively capture interactions between them while preserving independent features. Overall, our proposed approach addresses the challenges in discovering true causal relations between stocks, leveraging Granger causality modeling, and enhancing the predictive power of GNNs in stock movement prediction. Through extensive experimentation and analysis, our model demonstrates promising results and opens avenues for further research in this domain.

Author Contributions

Conceptualization, Y.L. and Z.L.; methodology, Y.L.; software, Y.L.; validation, Y.L., X.X. and Z.L.; resources, P.D. and B.Z. Data curation, Y.L. and X.X. Writing—original draft preparation, Y.L. and X.X. Writing—review and editing, Z.L., P.D. and B.Z. Visualization, Y.L. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Department of Science and Technology of Liaoning province (02110076523001) and Northeastern University, Shenyang, China (02110022124005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Menkveld, A.J. The economics of high-frequency trading: Taking stock. Annu. Rev. Financ. Econ. 2016, 8, 1–24. [Google Scholar] [CrossRef]
Dash, R.; Dash, P.K. A hybrid stock trading framework integrating technical analysis with machine learning techniques. J. Financ. Data Sci. 2016, 2, 42–57. [Google Scholar] [CrossRef]
Arya, A.N.; Xu, Y.L.; Stankovic, L.; Mandic, D.P. Hierarchical Graph Learning for Stock Market Prediction Via a Domain-Aware Graph Pooling Operator. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Sawhney, R.; Agarwal, S.; Wadhwa, A.; Derr, T.; Shah, R.R. Stock selection via spatiotemporal hypergraph attention network: A learning to rank approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021. [Google Scholar] [CrossRef]
Feng, F.; He, X.; Wang, X.L.; Cheng, L.; Yiqun, C. Temporal relational ranking for stock prediction. Acm Trans. Inf. Syst. (TOIS) 2018, 37, 1–30. [Google Scholar] [CrossRef]
Ye, J.; Zhao, J.; Ye, K.; Xu, C. Multi-graph convolutional network for relationship-driven stock movement prediction. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milano, Italy, 10–15 January 2021; pp. 6702–6709. [Google Scholar] [CrossRef]
Chen, Y.; Wei, Z.; Huang, X. Incorporating corporation relationship via graph convolutional neural networks for stock price prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 1655–1658. [Google Scholar] [CrossRef]
Xu, W.; Liu, W.; Wang, L.; Xia, Y.; Bian, J.; Yin, J.; Liu, T.Y. HIST: A graph-based framework for stock trend forecasting via mining concept-oriented shared information. arXiv 2021, arXiv:2110.13716. [Google Scholar]
Tian, H.; Zheng, X.; Zhao, K.; Liu, M.W.; Zeng, D.D. Inductive Representation Learning on Dynamic Stock Co-Movement Graphs for Stock Predictions. INFORMS J. Comput. 2022, 34, 1940–1957. [Google Scholar] [CrossRef]
Cheng, R.; Li, Q. Modeling the momentum spillover effect for stock prediction via attribute-driven graph attention networks. In Proceedings of the AAAI Conference on Artificial Intelligence, virtually, 2–9 February 2021; Volume 1, pp. 55–62. [Google Scholar] [CrossRef]
Li, W.; Bao, R.; Harimoto, K.; Chen, D.; Xu, J.; Su, Q. Modeling the stock relation with graph network for overnight stock movement prediction. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021; pp. 4541–4547. [Google Scholar] [CrossRef]
Papana, A.; Kyrtsou, C.; Kugiumtzis, D.; Diks, C. Financial networks based on Granger causality: A case study. Phys. Stat. Mech. Appl. 2017, 482, 65–73. [Google Scholar] [CrossRef]
Saha, S.; Gao, J.; Gerlach, R. A survey of the application of graph-based approaches in stock market analysis and prediction. Int. J. Data Sci. Anal. 2022, 14, 1–15. [Google Scholar] [CrossRef]
Botunac, I.; Bosna, J.; Matetić, M. Optimization of Traditional Stock Market Strategies Using the LSTM Hybrid Approach. Information 2024, 15, 136. [Google Scholar] [CrossRef]
Chen, K.; Zhou, Y.; Dai, F. A LSTM-based method for stock returns prediction: A case study of China stock market. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015. [Google Scholar] [CrossRef]
Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A dual-stage attention-based recurrent neural network for time series prediction. arXiv 2017, arXiv:1704.02971. [Google Scholar]
Zhang, L.; Aggarwal, C.; Qi, G.J. Stock price prediction via discovering multi-frequency trading patterns. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 2141–2149. [Google Scholar] [CrossRef]
Feng, F.; Chen, H.; He, X.; Ding, J.; Sun, M.; Chua, T.S. Enhancing stock movement prediction with adversarial training. arXiv 2019, arXiv:1810.09936. [Google Scholar] [CrossRef]
Md, A.Q.; Kapoor, S.; Junni, A.V.C.; Sivaraman, A.K.; Tee, K.F.; Sabireen, H.; Janakiraman, N. Novel optimization approach for stock price forecasting using multi-layered sequential LSTM. Appl. Soft Comput. 2023, 134, 109830. [Google Scholar] [CrossRef]
Smith, N.; Varadharajan, V.; Kalla, D.; Kumar, G.R.; Samaah, F. Stock Closing Price and Trend Prediction with LSTM-RNN. J. Artif. Intell. Big Data 2024, 4, 877. [Google Scholar]
Wang, G.; Cao, L.; Zhao, H.; Liu, Q.; Chen, E. Coupling macro-sector-micro financial indicators for learning stock representations with less uncertainty. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 4418–4426. [Google Scholar] [CrossRef]
Mittal, A.; Goel, A. Stock Prediction Using Twitter Sentiment Analysis; CS229; Standford University: Stanford, CA, USA, 2021. [Google Scholar]
Kraus, M.; Feuerriegel, S. Decision support from financial disclosures with deep neural networks and transfer learning. Adv. Neural Inf. Process. Syst. 2017, 104, 38–48. [Google Scholar] [CrossRef]
Li, X.; Wu, P.; Wang, W. Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong. Inf. Process. Manag. 2020, 57, 102212. [Google Scholar] [CrossRef]
Liapis, C.M.; Kotsiantis, S. Temporal Convolutional Networks and BERT-Based Multi-Label Emotion Analysis for Financial Forecasting. Information 2023, 14, 596. [Google Scholar] [CrossRef]
Deng, S.; Zhu, Y.; Yu, Y.; Huang, X. An integrated approach of ensemble learning methods for stock index prediction using investor sentiments. Expert Syst. Appl. 2024, 238, 121710. [Google Scholar] [CrossRef]
Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep learning for event-driven stock prediction. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
Deng, S.; Mitsubuchi, T.; Shioda, K.; Shimada, T.; Sakurai, A. Combining technical analysis with sentiment analysis for stock price prediction. In Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, Sydney, Australia, 12–14 December 2011; pp. 800–807. [Google Scholar] [CrossRef]
Chen, W.; Yeo, C.K.; Lau, C.T.; Lee, B.S. Leveraging social media news to predict stock index movement using RNN-boost. Data Knowl. Eng. 2018, 118, 14–24. [Google Scholar] [CrossRef]
Sawhney, R.; Agarwal, S.; Wadhwa, A.; Shah, R. Deep attentive learning for stock movement prediction from social media text and company correlations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 8415–8426. [Google Scholar]
Kim, R.; So, C.H.; Jeong, M.; Lee, S.; Kim, J.; Kang, J. Hats: A hierarchical graph attention network for stock movement prediction. arXiv 2019, arXiv:1908.07999. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Zhang, Q.; Zhang, Y.; Yao, X.; Li, S.; Zhang, C.; Liu, P. A dynamic attributes-driven graph attention network modeling on behavioral finance for stock prediction. Acm Trans. Knowl. Discov. Data 2023, 18, 1–29. [Google Scholar] [CrossRef]
Shi, Y.; Wang, Y.; Qu, Y.; Chen, Z. Integrated gcn-lstm stock prices movement prediction based on knowledge-incorporated graphs construction. Int. J. Mach. Learn. Cybern. 2024, 15, 161–176. [Google Scholar] [CrossRef]
Qian, H.; Zhou, H.; Zhao, Q.; Chen, H.; Yao, H.; Wang, J.; Liu, Z.; Yu, F.; Zhang, Z.; Zhou, J. MDGNN: Multi-Relational Dynamic Graph Neural Network for Comprehensive and Dynamic Stock Investment Prediction. arXiv 2024, arXiv:2402.06633. [Google Scholar] [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. arXiv 2017, arXiv:1706.02216. [Google Scholar] [CrossRef]
Xing, R.; Cheng, R.; Huang, J.; Li, Q.; Zhao, J. Learning to Understand the Vague Graph for Stock Prediction with Momentum Spillovers. IEEE Trans. Knowl. Data Eng. 2024, 36, 1698–1712. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
Song, G.; Zhao, T.; Wang, S.; Wang, H.; Li, X. Stock ranking prediction using a graph aggregation network based on stock price and stock relationship information. Inf. Sci. 2023, 643, 119236. [Google Scholar] [CrossRef]
Wang, H.; Li, S.; Wang, T.; Zheng, J. Hierarchical Adaptive Temporal-Relational Modeling for Stock Trend Prediction. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19–27 August 2021; pp. 3691–3698. [Google Scholar]
Tank, A.; Covert, I.; Foti, N.; Shojaie, A.; Fox, E.B. Neural granger causality. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4267–4279. [Google Scholar] [CrossRef]
Tank, A.; Fox, E.B.; Shojaie, A. Granger causality networks for categorical time series. arXiv 2017, arXiv:1706.02781. [Google Scholar] [CrossRef]
Arnold, A.; Liu, Y.; Abe, N. Temporal causal modeling with graphical granger methods. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 66–75. [Google Scholar] [CrossRef]
Damos, P. Using multivariate cross correlations, Granger causality and graphical models to quantify spatiotemporal synchronization and causality between pest populations. BMC Ecol. 2016, 16, 33. [Google Scholar] [CrossRef]
Mainali, K.; Bewick, S.; Vecchio-Pagan, B.; Karig, D.; Fagan, W.F. Detecting interaction networks in the human microbiome with conditional Granger causality. Plos Comput. Biol. 2019, 15, e1007037. [Google Scholar] [CrossRef]
Zhang, X.; He, Y.; Brugnone, N.; Perlmutter, M.; Hirn, M. Magnet: A neural network for directed graphs. Adv. Neural Inf. Process. Syst. 2021, 34, 27003–27015. [Google Scholar] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Kyunghyun, C.; van Bart, M.; Caglar, G.; Dzmitry, B.; Fethi, B.; Holger, S.; Yoshua, B. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Li, S.; Liao, W.; Chen, Y.; Yan, R. PEN: Prediction-explanation network to forecast stock price movement with better explainability. In Proceedings of the AAAI Conference on Artificial Intelligence, Wahington, DC, USA, 7–14 February 2023. [Google Scholar] [CrossRef]
Liao, S.; Xie, L.; Du, Y.; Chen, S.; Wan, H.; Xu, H. Stock trend prediction based on dynamic hypergraph spatio-temporal network. Appl. Soft Comput. 2024, 154, 111329. [Google Scholar] [CrossRef]

Figure 1. Two stocks with Granger causality have low correlation.

Figure 2. The framework of DSF-GNN. DSF-GNN contains four steps: temporal feature representation (TFR), relational feature encoding (RFE), relational feature fusion (RFF), and prediction.

Table 1. The comparison between related works.

	Reference	Relation Type	Pros	Cons
Euclidean-input-based	LSTM [14,15]	Dynamic	The time-based patterns in historical price indicators are effectively utilized.	Failing to recognize that the impacts of various historical trading days differ.
	MLS-LSTM [19]	Dynamic, implicit	Taking into account the varying impacts of different historical trading days.	Overlooking the aggregate effect of associated stocks.
	ALSTM [16]	Dynamic, implicit
	LSTM-RNN [20]	Dynamic, implicit
Graph-based	GCN [38]	Static, implicit	A stock graph efficiently reveals impacts from linked stocks.	A static graph cannot model the complex, dynamic relationships between stocks.
	TGC [5]	Static, explicit	Dynamically learn and capture interactions between stocks.	Using one graph for prediction limits integrating multiple stock graphs.
	ADGAT [10]	Dynamic, implicit	Dynamically learn and capture interactions between stocks.
Graph-based	DGATS [33]	Static, explicit	Explores multifaceted stock relationships, capturing interactions single graphs may miss.	The main focus is on the correlation between related stocks, without fully exploring causal relationships.
	MGAR [39]	Dynamic, explicit
	MDGNN [35]	Dynamic, implicit

Table 2. Performance comparison with baseline models.

	Methods	ACC	MCC
Euclidean-input-based methods	ALSTM	52.55	0.0613
	LSTM	52.56	0.0596
	GRU	52.72	0.0620
Non-Euclidean-input-based methods	GCN	51.49	0.0295
	Multi-GCGRU	52.95	0.0582
	TGC	53.15	0.0596
	HITS	53.56	0.0618
	MGAR	53.34	0.0602
	Proposed Model	54.97	0.0821

Table 3. Ablation experiments for verifying the effectiveness of different modules. TFR indicates temporal feature representation module; DRF and SRF represents dynamic relational feature extractor, respectively; and RFF is the relational feature fusion module.

TFR	DRF	SRF	RFF	ACC	MCC
✓	✓	✓	×	53.86	0.0659
✓	✓	×	×	53.58	0.0632
✓	×	✓	×	53.21	0.0564
×	✓	✓	✓	54.23	0.0712
✓	✓	✓	✓	54.97	0.0821

Table 4. Relational comparison. Removing the shareholding, causal, media, and industry relation are indicated by w/o -S, w/o -C, w/o -M, and w/o -I, respectively.

	ACC	MCC
w/o -S	53.41	0.0592
w/o -C	53.87	0.0656
w/o -M	54.85	0.0801
w/o -I	53.58	0.0632
DSF-GNN	54.97	0.0821

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Xue, X.; Liu, Z.; Duan, P.; Zhang, B. Implicit-Causality-Exploration-Enabled Graph Neural Network for Stock Prediction. Information 2024, 15, 743. https://doi.org/10.3390/info15120743

AMA Style

Li Y, Xue X, Liu Z, Duan P, Zhang B. Implicit-Causality-Exploration-Enabled Graph Neural Network for Stock Prediction. Information. 2024; 15(12):743. https://doi.org/10.3390/info15120743

Chicago/Turabian Style

Li, Ying, Xiaosha Xue, Zhipeng Liu, Peibo Duan, and Bin Zhang. 2024. "Implicit-Causality-Exploration-Enabled Graph Neural Network for Stock Prediction" Information 15, no. 12: 743. https://doi.org/10.3390/info15120743

APA Style

Li, Y., Xue, X., Liu, Z., Duan, P., & Zhang, B. (2024). Implicit-Causality-Exploration-Enabled Graph Neural Network for Stock Prediction. Information, 15(12), 743. https://doi.org/10.3390/info15120743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Implicit-Causality-Exploration-Enabled Graph Neural Network for Stock Prediction

Abstract

1. Introduction

2. Related Work

2.1. Euclidean-Input Based Methods

2.2. Graph-Based Methods

3. Preliminaries

3.1. Granger Causality

3.2. Message Passing Based GNN

4. Methodology

4.1. Granger Causal Graph Modeling

4.2. Moralization

4.3. Model Architecture

4.3.1. Temporal Feature Representation

4.3.2. Relational Feature Encoding

4.4. Relational Feature Fusion

4.5. Prediction Layer

5. Experiments

5.1. Data

5.2. Baselines

5.2.1. Euclidean-Input-Based Methods

5.2.2. Non-Euclidean-Input Based Methods

5.3. Metrics

5.4. Experimental Results

5.5. Ablation Experiments

5.6. Relations Comparison

5.7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI