Double Decomposition and Fuzzy Cognitive Graph-Based Prediction of Non-Stationary Time Series

Chen, Junfeng; Guan, Azhu; Cheng, Shi

doi:10.3390/s24227272

Open AccessArticle

Double Decomposition and Fuzzy Cognitive Graph-Based Prediction of Non-Stationary Time Series

by

Junfeng Chen

^1,2,*

,

Azhu Guan

³ and

Shi Cheng

⁴

¹

College of Artificial Intelligence and Automation, Hohai University, Changzhou 213200, China

²

Jiangsu Key Laboratory of Power Transmission & Distribution Equipment Technology, Hohai University, Changzhou 213200, China

³

College of Information Science and Engineering, Hohai University, Changzhou 213200, China

⁴

School of Computer Science, Shaanxi Normal University, Xi’an 710119, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(22), 7272; https://doi.org/10.3390/s24227272

Submission received: 8 October 2024 / Revised: 2 November 2024 / Accepted: 11 November 2024 / Published: 14 November 2024

(This article belongs to the Special Issue Emerging Machine Learning Techniques in Industrial Internet of Things)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning models, such as recurrent neural network (RNN) models, are suitable for modeling and forecasting non-stationary time series but are not interpretable. A prediction model with interpretability and high accuracy can improve decision makers’ trust in the model and provide a basis for decision making. This paper proposes a double decomposition strategy based on wavelet decomposition (WD) and empirical mode decomposition (EMD). We construct a prediction model of high-order fuzzy cognitive maps (HFCM), called the WE-HFCM model, which considers interpretability and strong reasoning ability. Specifically, we use the WD and EDM algorithms to decompose the time sequence signal and realize the depth extraction of the signal’s high-frequency, low-frequency, time-domain, and frequency domain features. Then, the ridge regression algorithm is used to learn the HFCM weight vector to achieve modeling prediction. Finally, we apply the proposed WE-HFCM model to stationary and non-stationary datasets in simulation experiments. We compare the predicted results with the autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) models.For stationary time series, the prediction accuracy of the WE-HFCM model is about 45% higher than that of the ARIMA, about 35% higher than that of the SARIMA model, and about 16% higher than that of the LSTM model. For non-stationary time series, the prediction accuracy of the WE-HFCM model is 69% higher than that of the ARIMA and SARIMA models.

Keywords:

non-stationary time series prediction; high-order cognitive fuzzy map; wavelet decomposition; empirical mode decomposition; ridge regression

1. Introduction

Time series forecasting can help enterprises and governments to make reasonable economic decisions and is widely used in various fields. For example, in the economic field, it can be used to predict future sales, stock prices, economic growth rates, etc. [1]. A time series forecasting model in meteorology can predict future temperature, rainfall, and other meteorological conditions to help people make reasonable travel and production plans [2]. In transportation, it can be used to predict future traffic flow and help traffic management departments carry out traffic scheduling and planning [3]. The time series prediction model in the energy field can predict elements of future power generation, such as power consumption, and provide a reasonable basis for decisions in power-dispatching departments-making [4,5,6]. In addition, time series prediction models have been widely used in medicine [7,8], tourism [9], industry [10], and other fields. Therefore, the study of time series prediction is of great significance.

Time series can be divided into stationary time series and non-stationary time series. The observed values of stationary series fluctuate at a fixed level. Although the fluctuation degree differs in different periods, there is no specific rule, and the fluctuation can be regarded as random. A non-stationary sequence is a sequence of trends, seasonality, or periodicity, which may contain only one or a combination of several components [11]. Ghaderpour et al. [12] proposed the mathematical derivation of the underlying probability distribution function for the normalized least squares wavelet spectrogram. It can simultaneously estimate trend and seasonal components in the time series, considering their correlation, which significantly improves the component estimation. Baidya et al. [13] proposed a novel TSF model designed to address the challenges posed by real-life data, delivering accurate forecasts in both multivariate and univariate settings. In real life, the non-stationary sequence is the majority, and the smoothness of the sequence is an essential premise for time series prediction modeling. However, dealing with large-scale, non-stationary time series with changing trends and rapid changes is still challenging. Explaining the potential features of non-stationary time series and their correlation is still a complex problem.

In recent years, because of the rapid development of deep learning, deep learning has also been widely used in various fields of time series prediction. Using the black box theory, researchers have developed several time series prediction models with excellent performance; examples include a time series prediction model based on long short-term memory (LSTM) [8,14], a time series prediction model based on convolutional neural network (CNN) [15], and a time series prediction model based on Transformer [16]. Li et al. [17] converted time series values into images, which were then predicted as inputs to the CNN. Nie et al. [18] used a CNN to predict the residual life of rolling bearings and proved that the proposed model is superior to the traditional model in experiments. This is because CNN results in information loss when there are too many layers. Zan et al. [19] used the TCN-TPA prediction model to predict fusion meteorological data, avoiding the problems of gradient disappearance and gradient explosion. Some researchers have applied artificial neural networks to traffic flow prediction [20] and room temperature prediction [21]. Compared with the traditional time series prediction model, the time series prediction model based on deep learning can mine more feature information. However, it has the characteristics of a black box and a long training time, which makes it difficult for people to understand. Therefore, designing a model with interpretability and good prediction accuracy is of great importance.

Some scholars have proposed that fuzzy cognitive maps (FCMs) [22] can be applied to time series prediction. FCMs have a strong ability to carry out fuzzy reasoning and semantic understanding; they are a powerful tool for constructing interpretable time series prediction models. The nodes of FCMs represent events, features, or targets; the connections between nodes represent relationships; and the values of nodes and the connections between nodes can be represented as fuzzy values. Because of their interpretability and causal reasoning ability, FCMs have been widely used in time series prediction in recent years. Stach et al. [23] used real-coded genetic algorithms to learn fuzzy cognitive maps and make numerical and linguistic predictions of time series. In order to improve the prediction accuracy of the FCM model, Lu et al. [24] designed a high-order fuzzy cognitive map to model and predict time series. In order to solve the problems of low precision, sensitivity to hyperparameters, and unrobust prediction results of time series based on FCMs, Jin et al. [25] proposed an SO₂ concentration prediction method based on EMD and LSTM. The results show that EMD can effectively reduce the non-stationarity of the original time series and improve prediction accuracy. Some scholars have also proposed dealing with non-stationary time series using wavelet transform. Aussem et al. [26] proposed a wavelet-based feature decomposition strategy and combined it with recurrent neural network financial forecasting.

In order to construct an interpretable model and to solve the problem of ignoring the frequency-domain features in time series data, in this paper, we apply wavelet decomposition (WD), EMD, and HFCM to time series prediction. The high-frequency component of the wavelet decomposition is still non-stationary. Since empirical mode decomposition (EMD) can reduce the non-stationarity of the sequence, EMD is used to decompose the high-frequency components after wavelet decomposition. Finally, the low-frequency component of wavelet decomposition and the high component of empirical mode decomposition are taken as the input of HFCM, and the ridge regression algorithm optimizes the parameters. We call this model WE-HFCM. Using the ridge regression algorithm to learn HFCM weights, this algorithm can effectively optimize model parameters when dealing with large-scale time series. Therefore, WE-HFCM can be effectively applied to large-scale time series with hundreds or thousands of data points. Finally, to show the prediction model’s performance, this paper compares the performance of this model with the classical time series prediction model, autoregressive integrated moving average (ARIMA), and the deep learning model, LSTM, on two non-stationary time series datasets and two stationary time series datasets.

The main contributions are as follows:

(1): We design a double decomposition stage, which extracts the low-frequency and high-frequency features of the time series by wavelet decomposition and EMD decomposition and smoothes the non-stationary time series.
(2): We construct a WE-HFCM model to increase the interpretability of the model. By aggregating the eigenvalues of different frequencies and making better use of the critical information of the potential features of time series, the representation learning of node relations is realized, and the high-order fuzzy cognitive map (HFCM) is constructed for prediction.
(3): Based on the comparison and ablation experiments, the proposed method can better predict the non-stationary univariate time series.

2. Materials and Methods

2.1. Datasets

We select benchmark time series with different statistical characteristics from different fields to test the validity of the proposed model. Non-stationary time series include stock opening price (open-price) and wind speed (wind-speed) datasets. The open-price dataset recorded the opening price of stocks every day from 25 November 2015 to 17 November 2017, with a total of 500 data pieces. The wind-speed dataset is used by this research group to participate in a competition, recording the wind speed of a place from 2 January 2021 to 11 January 2021 every 15 min, with a total of 1000 data pieces. The stationary time series includes sunspots and daily minimum temperature datasets (min-temp). Sunspot records a time series of annual sunspots from 1700 to 1987, with 289 observations. Min-temp contains 800 data items. The three datasets, open-price, sunspot, and min-temp, are available on Baidu’s website.

2.2. Wavelet Decomposition

WD decomposes the original signal into high-frequency (HF) and low-frequency (LF) components through wavelet basis functions. HF represents changes in the details of the original signal data, and LF represents the overall trend of the original signal data. The decomposition and refactoring process is shown in Figure 1. The corresponding fast algorithm in wavelet decomposition reconstruction is called the Mallat algorithm [27], expressed as follows:

\{\begin{matrix} {LF}_{i + 1} (t) = d \times C_{i} (t) \\ {HF}_{i + 1} (t) = g \times C_{i} (t) \end{matrix}

(1)

where

i = 0, 1, 2, \dots, N

. d is a low-pass filter for low-frequency signals. g is a high-pass filter for high-frequency signals. N is the number of decomposition layers. t is the time point.

C_{0} (t)

is the original time series of the input.

The wavelet coefficients and the length of the component after passing WD are inconsistent and do not have the characteristics of the actual sequence, so the component needs to be reconstructed and solved, as shown in Figure 1a.

C_{N} = d^{*} \times A_{d + 2} + g^{*} \times D_{d + 1}

(2)

where

d = N - 1, N - 2, \dots, 0

.

d^{*}

and

g^{*}

are the dual operators of d and g, respectively.

The reconstructed time series

C_{0}^{'}

is expressed as follows:

C_{0}^{'} = C_{N} + D_{1} + D_{2} + \dots + D_{N}

(3)

2.3. EMD

EMD [28] is a powerful tool that decomposes a time series into intrinsic mode functions (IMFs) and residuals. This method, which arranges the details of the time series in sequence from high frequency to low frequency, has clear advantages in handling unstable and aperiodic signals. These advantages include that the maximum difference between the number of extreme points of a local signal and the number of zeros is one and that there is symmetry between the upper and lower envelope of every part of the curve.

Wavelet decomposition (WD) plays a crucial role in generating low-frequency A and high-frequency D. Since the high-frequency part is still a non-stationary sequence, it is summated and denoted

C_{d} (t)

. We perform EMD decomposition for

C_{d} (t)

[20,29] as follows.

(1): We interpolate the time series $C_{d} (t)$ with cubic splines and connect the extreme points to form the upper and lower envelope $e_{m i n} (t)$ and $e_{m a x} (t)$ . The average envelope $m_{t}$ is calculated as in (4).
(3): The intrinsic mode function (IMF) $d_{1}$ is defined as the difference between the time series $C_{d} (t)$ and the mean envelope $m_{t}$ , as shown in (5).
(3): The component of the maximum frequency of time series $C_{d} (t)$ is determined as $c_{i}$ , ( $i = 1, 2, \dots, n$ ), and separated from $C_{d} (t)$ , as shown in Equation (6). We continue the decomposition with $r_{1}$ as input. The complete decomposition formula is shown in (7).

m (t) = (e_{m i n} (t) + e_{m a x} (t)) / 2

(4)

d_{1} = C_{d} (t) - m (t)

(5)

r_{1} = C_{d} (t) - c_{1}

(6)

C_{d} (t) = \sum_{i = 1}^{n} d_{i} (t) + r_{n}

(7)

where

C_{d} (t)

is the sum of the high-frequency part of the wavelet decomposition sequence,

m (t)

is the average of the upper and lower envelope of the extreme point,

d_{i} (t)

is the decomposed

I M F_{i}

,

d_{1}

is

I M F_{1}

,

c_{1}

is the maximum frequency of the input sequence,

r_{1}

is the

C_{d} (t)

sequence with the

c_{1}

part removed, and

r_{n}

is the residual term.

2.4. FCMs

FCMs [22] are developed by Kosko based on Axelord cognitive maps, extending the ternary relation between concepts (1, 0, 1) to the fuzzy relation on the interval [−1, 1]. This makes the FCMs more informative. FCMs are a graph structure that connect causal events, participation values, goals and trends in a fuzzy feedback dynamic system through arcs between concepts. The nodes are concepts, entities, etc., and the arcs represent causal relationships between concepts or entities. The degree of causal influence can be expressed by fuzzy values [0, 1] or described by natural language, such as weak, very weak, medium, and strong.

The semantics contained in standard FCMs are represented by a 4-tuple

(C, W, A, f)

. This representation is a crucial aspect of FCM, which consists of n nodes,

X = \{X_{1}, X_{2}, \dots, X_{n}\}

is a set of n nodes, and W is a weight matrix of

n \times n

dimensions:

W = (\begin{matrix} W_{11} & \dots & W_{1 n} \\ ⋮ & ⋱ & ⋮ \\ W_{n 1} & \dots & W_{n n} \end{matrix})

(8)

The status value of the

X_{i}

node at time

t + 1

is affected by the status value and weight value of all nodes connected to it at time t. In this paper, X is the

d_{i} (t)

set obtained by decomposing the low-frequency component A in WD and EMD. The state value of the

X_{i}

node at time

t + 1

can be expressed by (9).

A_{i} (t + 1) = f (\sum_{j = 1}^{n} w_{j i} A_{i} (t))

(9)

where

A_{i} (t)

and

A_{i} (t + 1)

represent the status value of node

X_{i}

at time t and time

t + 1

, respectively,

t = \{1, 2, 3, \dots, T\}

. f is the activation function.

\{\begin{matrix} w_{j i} > 0 \\ w_{j i} = 0 & , w_{j i} \in [- 1, 1] \\ w_{j i} < 0 \end{matrix}

(10)

The weight

w_{j i}

in (10) can reflect the degree and direction of causal influence between node

X_{i}

and node

X_{j}

. If

w_{j i} > 0

, the two nodes are positively correlated. If

w_{j i} < 0

, the two nodes are negatively correlated. If

w_{j i} = 0

, there is no relationship between the two nodes.

The current state of the FCM is not only determined by the state of the previous moment but also affected by the state of the past. Stach et al. [30] introduced higher-order state values into the FCM model to enhance its approximation ability. The calculation of the HFCM of order k is shown in (11).

A_{i} (t + 1) = f (\sum_{j = 1}^{n} (w_{j i}^{1} A_{j} (t) + w_{j i}^{2} A_{j} (t - 1) + \dots + w_{j i}^{k} A_{j} (t - k + 1)) + w_{i 0})

(11)

where

w_{j i}^{k}

represents the relation of the j-th node to the i-th node at the time step and is the constant deviation of the i-th node relative to the 0-th node.

Figure 2 shows a fuzzy cognitive graph with five nodes and a weight matrix. A one-way arrow pointing from

X_{2}

to

X_{1}

indicates that node

X_{2}

is correlated with node

X_{1}

, and its weight is

X_{21}

.

2.5. WE-HFCM Prediction Model

The HFCM’s node and relationship weights are two key factors. Obtaining meaningful nodes and a node relation matrix is a challenging research problem. The univariate time series, being a one-dimensional numerical series, cannot directly form the multi-node structure of HFCM. This paper proposes a double decomposition strategy based on WD and EMD to solve the time series prediction problem through the HFCM framework. The resulting WE-HFCM framework has the potential to impact the field of time series prediction significantly.

Figure 3 illustrates the meticulous process of time series prediction using WE-HFCM. The original time series is first normalized, and the numerical data are transformed into HFCM nodes through WD and EMD decomposition. WD decomposes the time series into two parts, the low-frequency component and high-frequency component, and EMD further decomposes the high-frequency component into multiple IMFs through accumulation. The input of HFCM is the set of low-frequency components and multiple IMFs. The weight matrix of HFCM is then optimized using the rigorous ridge regression algorithm. Finally, in each time step, the values of all nodes are summed, and the predicted value is output. This comprehensive process ensures the reliability of the WE-HFCM model.

2.5.1. Double Decomposition of Time Series

Wavelet decomposition provides signal analysis in time and frequency, allowing decision makers to observe time series at different resolution levels. However, there are still non-stationary subsequences in the high-frequency components obtained by wavelet decomposition, and EMD decomposition has apparent advantages in dealing with unstable and non-periodic signals. Therefore, this paper introduces EMD to further decompose the high-frequency part of WD decomposition. The double decomposition of the time series is shown in Figure 4. We first normalize the input time series and map it to the range [−1, 1]. Then, through discrete wavelet transformation and reconstruction, the normalized time series is decomposed into low-frequency component A and high-frequency component D by (1) and (2). The low-frequency component represents the overall trend information of the input time series, and the high-frequency part represents the detailed information of the input time series. As far as we know, previous studies only focus on the low-frequency component, ignoring the high-frequency information in the time series. Therefore, this paper sums up the high-frequency components and decomposes them through EMD to obtain multiple IMFs, reflecting the time series’ intrinsic characteristics.

2.5.2. Ridge Regression for Learning HFCM

Ridge regression is a regularization method of linear regression, which has a good advantage in dealing with datasets with correlation between predictors [31]. Ridge regression works to add a penalty term based on the least squares method and impose a penalty by adjusting the size of the coefficient to solve the shortcomings of linear regression.

The multivariate time series obtained from wavelet decomposition and EMD decomposition are used to learn the weight matrix of HFCM. In the following, we use second-order HFCM as an example to illustrate how to learn HFCM weights by ridge regression. The proposed method can be easily extended to HFCM learning in any order. Wu et al. [32] pointed out that the problem of learning the weight matrix for FCM can be reduced to the problem of learning the local connections of the nodes separately. As shown in Figure 5, different spheres represent different nodes, we also use the same decomposition strategy in the HFCM learning method. Firstly, a subnetwork is constructed between node i and its neighboring nodes, and the HFCM learning problem with four nodes is decomposed into four subproblems, one for each subnetwork. The modeling of each subproblem is essentially a signal reconstruction problem, including the difference between the available sequence and the generated sequence and the sparse structure from all nodes to a specific node. Each subproblem is optimized by ridge regression. Taking node

X_{2}

as an example, we use a lasso to learn the structure of node

X_{2}

from nodes

X_{1}

,

X_{2}

,

X_{3}

, and

X_{4}

and return the relationship from node

X_{2}

to node

X_{1}

with

W_{21} = 0.68

. Finally, after learning all nodes’ neighboring nodes, we combine the local connections into the whole HFCM.

The nonlinear dynamic equation of HFCM is linearized by inverse transformation, expressed as

φ^{- 1} (A_{i} (t + 1)) = \sum_{j = 1}^{n} (w_{i j}^{1} A_{j} (t) + w_{i j}^{2} A_{j} (t - 1) + w_{i 0})

(12)

where

φ^{- 1}

is the inverse of the transfer function

φ

. Once a time series of length L at different time steps t is available, the transformed dynamic Equation (12) can be rewritten in vector form.

Y_{i} = X W_{i}

(13)

where

Y_{i}

is the vector containing

φ^{- 1} (A_{i} (t + 1))

, X is the state matrix of all states

A_{j} (t)

of different t nodes, and

w_{i}

is the weight vector between all nodes and the i-th node. Equations (14)–(16) show the three variables’ expression.

Y_{i} = [\begin{matrix} φ^{- 1} (A_{i} (3)) \\ φ^{- 1} (A_{i} (4)) \\ ⋮ \\ φ^{- 1} (A_{i} (5)) \end{matrix}]

(14)

W_{i}^{T} = [\begin{matrix} w_{i 1}^{1} & w_{i 1}^{2} & w_{i 2}^{1} & w_{i 2}^{2} & \dots & w_{i N_{c}}^{1} & w_{i N_{c}}^{2} & w_{i 0} \end{matrix}]

(15)

X = [\begin{matrix} A_{1} (2) & \dots & A_{N_{c}} (1) & 1 \\ A_{1} (3) & \dots & A_{N_{c}} (2) & 1 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ A_{1} (L - 1) & \dots & A_{N_{c}} (L - 1) & 1 \end{matrix}]

(16)

Ridge regression is used to solve the following optimization problems, improve the generalization ability of the WE-HFCM model, and determine the local connection of the i-th node.

min_{W_{i}} \{(1 / 2 L) {∥Y_{i} - X W_{i}∥}_{2}^{2} + \partial {∥W_{i}∥}_{2}\}

(17)

where

{∥W_{i}∥}_{2} = \sqrt{\sum_{k} W_{i k}^{2}}

. ∂ is a regularized parameter that is generally non-negative. The greater the value of ∂, the greater the shrinkage and the stronger the model’s robustness to collinearity. And k denotes a higher-order fuzzy cognitive map of order k. Equation (17) determines the weight vector

W_{i}

between all nodes and the i-th node. We used ridge regression [33] from the Python library Scikit-learn to learn the weight vector.

2.6. Data Preprocessing and Evaluation Indicators

2.6.1. Data Preprocessing

The amplitude signal of the time series is used as input to the proposed model, and we need to normalize it. This paper uses the Min-Max normalization method to normalize the input time series and map it uniformly to the range [−1, 1]. The maximum and minimum values of the original time series are set to

X_{m a x}

and

X_{m i n}

, respectively, and the maximum and minimum values of the normalized time series are

X_{h i g h}

and

X_{l o w}

, respectively. We normalize the time series from range [

X_{m i n}

,

X_{m a x}

] to range [

X_{l o w}

,

X_{h i g h}

] by (18).

\bar{X} = (X - X_{m i n}) (X_{h i g h} - X_{l o w}) / (X_{m a x} - X_{m i n}) + X_{l o w}

(18)

Here, X represents the original time series, and

\bar{X}

represents the normalized time series.

This paper divides the normalized time series data into three subsets: training set, validation set, and test set, as shown in Table 1. The training dataset is used to learn the weight matrix of the HFCM prediction model, the validation dataset is used to select the best model, and the test dataset is employed to evaluate the prediction accuracy.

2.6.2. Evaluation Indicators

In this paper, three evaluation criteria are used to evaluate the performance of the method: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). MAE and RMSE measure the absolute value of the predicted deviation from the actual value. MAE does not consider the positive or negative of the predicted value and pays more attention to the size of the absolute error. RMSE is easy to understand, convenient to calculate, and sensitive to outliers. The MAPE measures the relative magnitude (i.e., percentage) by which the predicted values deviate from the actual values. MAE and MAPE are relatively less susceptible to extreme values. However, RMSE uses the square of the error, amplifying the prediction error, making it more sensitive to outlier data, and highlighting the error values with significant influence. The smaller the values of these criteria, the closer the prediction results are to the original time series values. The evaluation formula is defined as follows.

RMSE = (\sqrt{\sum_{i = 1}^{L} (X_{i} - \bar{X_{i}})}) / L^{2}

(19)

MAE = (\sum_{i = 1}^{L} |X_{i} - \bar{X_{i}}|) / L

(20)

MAPE = (\sum_{i = 0}^{L} |(X_{i} - \bar{X_{i}}) / X_{i}|) / L

(21)

where L is the time series length, and

X_{i}

and

\bar{X_{i}}

represent the normalized original and predicted time series.

3. Result and Analysis

This section explores the influence of different parameter values on the prediction performance of the WE-HFCM model and comprehensively evaluates its prediction performance.

3.1. Model Parameters

To verify the prediction effect of the model proposed in this paper, the WE-HFCM model was compared with the ARIMA and LSTM models. The model parameters are shown in Table 2.

For different datasets, the optimal parameters of the model are different. The regularized parameter ∂ is the best value selected by cross-validation, here

\partial = 1 \times 10^{- 12}

. Figure 6 discusses the error comparison of the wavelet decomposition of 2 to 7 times, respectively, in the four datasets’ second to seventh-order HFCM model. From Figure 6, we can see that for the sunspot dataset, the optimal model parameters of HFCM are

k = 4

and

n = 2

. For the min-temp dataset, the optimal model parameters of HFCM are

k = 5

and

n = 2

. For the wind-speed and open-price datasets, the optimal model parameters of HFCM are

k = 2

and

n = 2

. Generally, the order of the optimal parameters of the HFCM model on non-stationary datasets is lower than that on stationary data sets.

3.2. Analysis of Experimental Results

We meticulously tested the proposed model on four datasets, comparing the original and predicted data. Following the model parameter setting in Section 3.1 we determined the optimal WE-HFCM model parameters for each dataset. The prediction results of the WE-HFCM model on the four datasets are presented in Figure 7. Figure 7a,b show the prediction results of the proposed WE-HFCM model in non-stationary time series, and Figure 7c,d show the prediction results of the WE-HFCM model in stationary time series. As seen in Figure 7, the model’s prediction results follow the data trend and are accurate for non-stationary and stationary time series.

In order to better reflect the effectiveness of the WE-HFCM model proposed in this paper, it is compared with the classical time series prediction model ARIMA, SARIMA and the deep learning model LSTM, and the results are shown in Table 3. The best-performing result is indicated by bold skew, and the second best-performing result is indicated by skew. This means that the model with the lowest error is considered the best-performing model, and the second-lowest error is considered the second-best-performing model. The experimental results show that on non-stationary datasets, the error of the WE-HFCM model is much smaller than that of the ARIMA model. However, the prediction results are slightly inferior to that of the LSTM prediction model. The prediction errors of the WE-HFCM model on the stationary time series are higher than those of the non-stationary time series. However, compared with the ARIMA and LSTM models, the prediction errors of the WE-HFCM model are lower.

RSME values, i.e., the root mean square error values, measure the differences between values predicted by a model and observed values. The error value of the ARIMA model is about two times that of the WE-HFCM, and the error value of the LSTM model is equal to that of the WE-HFCM. In the final ablation experiment, the proposed model combines wavelet decomposition and empirical mode decomposition to extract features. In order to verify the effectiveness of dual decomposition, the RMSE values of the proposed model WE-HFCM and the model Wave-HFCM that only uses wavelet decomposition for feature decomposition are compared, as shown in Table 4. It can be seen that the error values of the proposed model WE-HFCM are all smaller than those of the single decomposition model Wave-HFCM, which verifies that the use of double decomposition can better improve the prediction accuracy of the model. Table 5 records the RMSE of WE-HFCM on all datasets, i.e., the training dataset, validation dataset, and test dataset, for each time series.

In order to reflect the interpretability of the WE-HFCM model proposed in this paper, the min-temp dataset is taken as an example for analyzing the interpretability of the model. The minimum, median, and maximum values of the min-temp time series are selected as three fuzzy variables, and their corresponding semantic interpretations are defined as low-amplitude, medium-amplitude, and high-amplitude, respectively, as shown in Figure 8. The top orange area represents high amplitude, the middle white area represents medium amplitude, the bottom blue area represents low amplitude, and the purple line represents the interval predicted value of the model. The experimental results show that the value corresponding to the semantic “low amplitude” is 0, the value corresponding to “medium amplitude” is 11.4, and the value corresponding to “high amplitude” is 26.3. From Figure 8, we can obtain not only the predicted values of the time series data but also the prediction intervals; in addition, we can obtain the semantic interpretation of the predicted values, which is helpful for people to apply in practice.

4. Conclusions

To construct interpretable models and address the issue of previous research focusing only on the time-domain features of time series data and neglecting frequency-domain features, this paper mainly constructs a time series prediction model, WE-HFCM, for non-stationary time series. The WE-HFCM model is designed to offer both interpretability and high prediction accuracy. It stands out for its unique features, such as using a double decomposition strategy to extract time series features. This strategy combines and exploits the advantages of wavelet and EMD decomposition to extract multiple adequate time series features. We then construct high-order fuzzy cognitive maps based on these features and use the ridge regression algorithm to continuously learn and determine the optimal model. Finally, we apply the WE-HFCM model to predict stationary and non-stationary time series, comparing the results with those of the ARIMA and LSTM models. The experimental results, obtained through rigorous validation, demonstrate the superiority of the WE-HFCM model; it has a 45% higher accuracy than the ARIMA model, about a 35% higher accuracy than the SARIMA model, and a 16% higher accuracy than the LSTM model in predicting stationary series. In the prediction of non-stationary series, the WE-HFCM model’s accuracy is 69% higher than that of the ARIMA and SARIMA models, providing a robust solution for time series prediction.

Author Contributions

J.C. is responsible for the overall structure and composition of the paper. S.C. provided critical revision of the manuscript for intellectual content. A.G. is responsible for the sequence data prediction experiment and data analysis. The personal contributions of the authors are as follows. Conceptualization, J.C. and A.G.; methodology, J.C. and A.G.; validation, J.C. and A.G.; formal analysis, A.G.; investigation, A.G.; data curation, A.G.; writing—original draft preparation, J.C. and A.G.; writing—review and editing, J.C. and A.G.; project administration, J.C.; funding acquisition, J.C.; supervision, J.C. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Plan of Jiangsu Province (grant number BE20219042), the National Natural Science Foundation of China (grant number 61806119), the Natural Science Basic Research Plan in Shaanxi Province of China (grant number 2024JC-YBMS-516), and the Jiangsu Key Laboratory of Power Transmission & Distribution Equipment Technology (grant number 2022JSSPD05, 2023JSSPD07).

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

The study did not involve humans.

Data Availability Statement

The raw data provided in the study can be obtained from public databases. We have saved the data used, and they can be accessed at https://gitee.com/gaz123816/second-hand-trading-platform/tree/master/ (23 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

WD	Wavelet Decomposition
FCM	Fuzzy Cognitive Maps
HFCM	High-Order Fuzzy Cognitive Maps
LSTM	Long Short-Term Memory
CNN	Convolutional Neural Network
TCN	Temporal Convolutional Network
EMD	Empirical Mode Decomposition
HF	High-Frequency
LF	Low-Frequency
IMF	Intrinsic Mode Functions
RNN	Recurrent Neural Network
ARIMA	Autoregressive Integrated Moving Average

References

Ali, Y.; Nakti, S. Sales forecasting: A comparison of traditional and modern times-series forecasting models on sales data with seasonality. In Proceedings of the 10th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 15–17 March 2023; pp. 159–163. [Google Scholar]
Liu, G.; Xiao, F.; Lin, C.T.; Cao, Z. A fuzzy in terval time-series energy and financial forecasting model using network based multiple time-frequency spaces and the induced-ordered weighted averaging aggregation operation. IEEE Trans. Fuzzy Syst. 2021, 28, 2677–2690. [Google Scholar] [CrossRef]
Wang, Y.; Yin, H.; Chen, H.; Wo, T.; Xu, J.; Zheng, K. Origin-destination matrix prediction via graph convolution: A new perspective of passenger demand modeling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), Anchorage, AK, USA, 4–8 August 2019; pp. 1227–1235. [Google Scholar]
Wang, L.; Chen, J.; Wang, W.; Song, R.; Zhang, Z.; Yang, G. Review of time series traffic forecasting methods. In Proceedings of the 4th International Conference on Control and Robotics (ICCR), Guangzhou, China, 2–4 December 2022; pp. 1–5. [Google Scholar]
Li, X.; Wang, J.; Tian, R. LGA-based short-term power load time series forecasting. In Proceedings of the IEEE 2nd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), Changchun, China, 24–26 February 2023; pp. 1695–1698. [Google Scholar]
Saha, E.; Saha, R.; Mridha, K. Short-term electricity consumption forecasting: Time-series approaches. In Proceedings of the 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 13–14 October 2022; pp. 1–5. [Google Scholar]
Do, T.H.; Lakew, D.S.; Cho, S. Building a time-series forecast model with automated machine learning for heart rate forecasting problem. In Proceedings of the 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; pp. 1097–1100. [Google Scholar]
Chimmula, V.K.R.; Zhang, L. Time series forecasting of COVID-19 transmission in canada using lstm networks. Chaos Solitons Fractals 2020, 135, 109864. [Google Scholar] [CrossRef] [PubMed]
Liu, F.; Wang, W. Forecasting of short-term tourism demand based on multivariate time series clustering and lssvm. In Proceedings of the IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Beijing, China, 3–5 October 2022; pp. 174–178. [Google Scholar]
Sun, X.; Liu, N.; Wang, Y.; Chen, F.; Li, Y. Application of time series ARIMA model in coal mine ground sound monitoring system. Coal Eng. 2023, 55, 111–117. [Google Scholar]
Yang, H.; Pan, Z.; Bai, W. Review of time series predicton methods. Comput. Sci. 2019, 46, 21–28. [Google Scholar]
Ghaderpour, E.; Pagiatakis, S.D.; Mugnozza, G.S.; Mazzanti, P. On the stochastic significance of peaks in the least-squares wavelet spectrogram and an application in GNSS time series analysis. Signal Process 2024, 223, 109581. [Google Scholar] [CrossRef]
Baidya, R.; Lee, S.W. Addressing the Non-Stationarity and Complexity of Time Series Data for Long-Term Forecasts. Appl. Sci. 2024, 14, 4436. [Google Scholar] [CrossRef]
Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing 2019, 323, 203–219. [Google Scholar] [CrossRef]
Kirisci, M.; Cagcag Yolcu, O. A new CNN-based model for financial time series: TAIEX and FTSE stocks forecasting. Neural Process. Lett. 2022, 54, 3357–3374. [Google Scholar] [CrossRef]
Kim, J.; Kang, H.; Kang, P. Time-series anomaly detection with stacked Transformer representations and 1D convolutional network. Eng. Appl. Artif. Intell. 2023, 120, 105964. [Google Scholar] [CrossRef]
Li, L.; Ota, K.; Dong, M. Everything is image: CNN-based short-term electrical load forecasting for smart grid. In Proceedings of the 14th International Symposium on Pervasive Systems, Algorithms and Networks & 11th International Conference on Frontier of Computer Science and Technology & Third International Symposium of Creative Computing (ISPAN-FCST-ISCC), Exeter, UK, 21–23 June 2017; pp. 244–351. [Google Scholar]
Nie, L.; Zhang, L.; Xu, S.; Cai, W.; Yang, H. Remaining life prediction of rolling bearings based on similarity feature fusion and convolutional neural network. Noise Vib. Control 2023, 43, 115–121. [Google Scholar] [CrossRef]
Zan, S.F.; Zhang, Q. Short-Term Power Load Forecasting Based on an EPT-VMD-TCN-TPA Model. Appl. Sci. 2023, 13, 4462. [Google Scholar] [CrossRef]
Yan, W.; Xu, Z.; Xue, G.; Song, J.; Du, X. Solar irradiance forecasting model based on EMD-TCN multi-energy heating systems. Acta Energiae Solaris Sin. 2023, 43, 182–188. [Google Scholar]
Romeu, P.; Zamora-Martínez, F.; Botella-Rocamora, P.; Pardo, J. Time-series forecasting of indoor temperature using pre-trained deep neural networks. In Proceedings of the 23rd International Conference on Artificial Neural Networks (ICANN), Sofia, Bulgaria, 10–13 September 2013; pp. 451–458. [Google Scholar]
Kosko, B. Fuzzy cognitive maps. Int. J. Man-Mach. Stud. 1986, 24, 65–75. [Google Scholar] [CrossRef]
Stach, W.; Kurgan, L.A.; Pedrycz, W. Numerical and linguistic prediction of time series with the use of fuzzy cognitive maps. IEEE Trans. Fuzzy Syst. 2008, 16, 61–72. [Google Scholar] [CrossRef]
Lu, W.; Yang, J.; Liu, X.; Pedrycz, W. The modeling and prediction of time series based on synergy of high-order fuzzy cognitive map and fuzzy c-means clustering. Knowl.-Based Syst. 2014, 70, 242–255. [Google Scholar] [CrossRef]
Jin, X.; Liu, Y.; Yu, J.; Wang, J.; Qie, Y. Prediction of outlet SO₂ concentration based on variable selection and EMD-LSTM network. Proc. Chin. Soc. Elect. Eng. 2021, 41, 8475–8484. [Google Scholar]
Aussem, A.; Campbell, J.; Murtagh, F. Wavelet-based feature extraction and decomposition strategies for financial forecasting. J. Comput. Intell. Financ. 1998, 6, 5–12. [Google Scholar]
Han, Z.; Liu, Z.; Lu, X.; Zhou, D. High-speed parallel wavelet algorithm based on CUDA and its application in power system harmonic analysis. Electr. Power Autom. Equip. 2010, 30, 98–101. [Google Scholar]
Zhao, B.; Li, H. Noise reduction method of vibration signal combining EMD and LSF. J. Vib. Meas. Diagn. 2022, 42, 606–610. [Google Scholar]
Ye, X.; Zhang, H.; Wang, J.; Li, D.; Zhang, M.; Zhang, Y.; Du, X.; Li, J.; Wang, W. EMD-LSTM-based group prediction algorithm of container resource load in preprocessing molecular spectral line data. J. Jilin Univ. (Eng. Technol. Ed.) 2024, 54, 1–10. [Google Scholar]
Stach, W.; Kurgan, L.; Pedrycz, W. Higher-order fuzzy cognitive maps. In Proceedings of the 2006 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS), Montreal, QC, Canada, 3–6 June 2006; pp. 166–171. [Google Scholar]
Zhang, S.; Zhang, J. Wind power load combination forecasting based on improved SOA and ridge regression weighting. J. North China Electr. Power Univ. (Nat. Sci. Ed.) 2024, 51, 1–10. [Google Scholar]
Wu, K.; Liu, J. Robust learning of large-scale fuzzy cognitive maps via the lasso from noisy time series. Knowl.-Based Syst. 2016, 113, 23–38. [Google Scholar] [CrossRef]
Wang, X.Q.; Lu, H.G.; Li, J.H. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]

Figure 1. Wavelet decomposition and reconstruction: (a) wavelet decomposition (b) wavelet reconstruction.

Figure 2. (a) Fuzzy cognitive map with five nodes. (b) The weight matrix of FCM.

Figure 3. Basic framework of the WE-HFCM prediction model.

Figure 4. Double decomposition frame.

Figure 5. The learning process of HFCM with four node.

Figure 6. The RMSE values of HFCM of 2–7 nodes in the four datasets correspond to the order 1–6 HFCM, respectively. (a) k = 1, (b) k = 2, (c) k = 3, (d) k = 4, (e) k = 5, (f) k = 6.

Figure 7. WE-HFCM predictions on four datasets: (a) wind-speed, (b) open-price, (c) sunspot, (d) min-temp.

Figure 8. WE-HFCM interpretation analysis on the min-temp test dataset.

Table 1. Data set partitioning.

Dataset	Total Length	Training Set Length	Validation Set Length	Test Set Length
open-price	500	319	79	102
sunspot	289	177	44	67
min-temp	800	448	112	240
windspeed	1000	640	160	200

Table 2. Model parameter.

Model	Main Parameters	Explanation
Wave	N	Number of WD levels
HFCM	n	Number of HFCM nodes
HFCM	$α$	Ridge regression parameters

Table 3. Comparison of the experimental results of four models.

Dataset	Model	RMSE	MAE	MAPE
wind-speed	WE-HFCM	0.509655	0.430687	12.18910
	ARIMA	2.887541	2.618241	80.83926
	LSTM	0.245350	0.201483	5.843880
	SARIMA	2.035321	0.463132	16.63178
open-price	WE-HFCM	1.287810	1.056692	0.921886
	ARIMA	128.4047	128.3749	99.96622
	LSTM	0.614700	0.465886	0.388289
	SARIMA	8.183435	7.234018	7.234018
min-temp	WE-HFCM	3.086318	2.414340	22.02933
	ARIMA	8.039095	6.946333	65.90470
	LSTM	4.204377	3.418290	31.70666
	SARIMA	5.173325	4.050270	4.050270
sunspot	WE-HFCM	16.82442	11.45599	28.79121
	ARIMA	55.01083	39.21089	154.4593
	LSTM	62.51576	48.02626	218.0822
	SARIMA	21.99944	16.3178	0.463132

The second best performance result is indicated in italics. The best-performing result is represented in bold and italics.

Table 4. Results of RMSE comparison in the ablation experiment.

	WE-HFCM	Wave-HFCM
Dataset	WE-HFCM	Wave-HFCM
open-price	1.287810	1.984186
sunspot	16.82442	20.99618
min-temp	3.086318	3.389726
wind-speed	0.509655	0.520299

The second best performance result is indicated in italics. The best-performing result is represented in bold and italics.

Table 5. RMSE of WE-HFCM on different subsets.

Dataset	Stage	RMSE
wind-speed	all	0.509655
	training	0.509846
	validation	0.473800
	test	0.536034
open-price	all	1.287810
	training	1.238265
	validation	1.391657
	test	1.343341
min-temp	all	3.086318
	training	3.024576
	validation	3.123155
	test	3.181652
sunspot	all	16.82442
	training	15.22556
	validation	13.92508
	test	21.92508

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Guan, A.; Cheng, S. Double Decomposition and Fuzzy Cognitive Graph-Based Prediction of Non-Stationary Time Series. Sensors 2024, 24, 7272. https://doi.org/10.3390/s24227272

AMA Style

Chen J, Guan A, Cheng S. Double Decomposition and Fuzzy Cognitive Graph-Based Prediction of Non-Stationary Time Series. Sensors. 2024; 24(22):7272. https://doi.org/10.3390/s24227272

Chicago/Turabian Style

Chen, Junfeng, Azhu Guan, and Shi Cheng. 2024. "Double Decomposition and Fuzzy Cognitive Graph-Based Prediction of Non-Stationary Time Series" Sensors 24, no. 22: 7272. https://doi.org/10.3390/s24227272

APA Style

Chen, J., Guan, A., & Cheng, S. (2024). Double Decomposition and Fuzzy Cognitive Graph-Based Prediction of Non-Stationary Time Series. Sensors, 24(22), 7272. https://doi.org/10.3390/s24227272

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Double Decomposition and Fuzzy Cognitive Graph-Based Prediction of Non-Stationary Time Series

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Wavelet Decomposition

2.3. EMD

2.4. FCMs

2.5. WE-HFCM Prediction Model

2.5.1. Double Decomposition of Time Series

2.5.2. Ridge Regression for Learning HFCM

2.6. Data Preprocessing and Evaluation Indicators

2.6.1. Data Preprocessing

2.6.2. Evaluation Indicators

3. Result and Analysis

3.1. Model Parameters

3.2. Analysis of Experimental Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI