1. Introduction
In complicated equipment manufacturing enterprises such as shield tunneling, rail transportation, and wind energy, the cost of maintaining an inventory of spare parts generally accounts for more than 60% of inventory costs [
1]. Qualified inventory optimization [
2] and flexible scheduling of parts [
3] are critical to improve the efficiency of aftermarket service in lifecycle product management [
4]. Due to various uncertain factors, such as element failure, project schedule, safe inventory level, etc., there will be an inventory shortage of spare parts in warehouses which, in turn, triggers the demand for spare parts. The demand for spare parts can then serve as a kind of sensing data to monitor the maintenance efficiency and evaluate the aftermarket service quality. Accurate prediction of the demand for spare parts plays a supporting role in intelligent inventory optimization. However, in practical operations, parts planning is often linked to new online projects or associated with the unavailability of spare parts, resulting in sporadic demand for spare parts. The data distribution of the demand for spare parts therefore has intermittent characteristics. Precisely predicting the demand of intermittent time series is challenging in spare parts management for manufacturing enterprises.
Due to the intermittent characteristics of these demand data, predicting demand relies heavily on time series prediction. Currently, time series prediction methods can be divided into three categories [
5]: (1) statistical methods (e.g., exponential smoothing [
6] and moving average [
7]); (2) machine learning methods (e.g., support vector regression (SVR) [
8], random forests (RF) [
9], and LightGBM (light gradient boosting machine) [
10,
11]); and (3) deep learning methods (e.g., recurrent neural networks (RNN) [
12] and long short-term memory (LSTM) [
13]). These methods are often applicable to time series with strong periodicity and apparent trends. However, there are some challenges involved in effectively extracting evolutionary patterns from time series with strong randomness, poor continuity, and, especially, small sample sizes, leading to low prediction accuracy for intermittent series. To solve this problem, Croston [
14] improved the exponential smoothing algorithm by decomposing intermittent time series into zero-interval and demand sequences. The exponential smoothing algorithm was then applied to each sequence separately, and the results were weighted to improve prediction performance. Syntetos et al. [
15] further improved the Croston algorithm and designed the Syntetos–Boylan approximation (SBA) method. The SBA method introduced a bias term
to mitigate the uncertainty of intermittent distributions. In addition, some studies proposed different metrics, such as the average demand interval (ADI) and the square of the coefficient of variation (CV2) [
16], to explore intermittent characteristics to better extract intrinsic information from demand sequences [
17]. Another typical approach is to perform hierarchical clustering on demand sequences [
18], which divides the original sequences with weak overall patterns into multiple clusters with more significant patterns. Then, different regression algorithms can be applied to the clusters for prediction. Shi et al. [
19] proposed the block Hankel tensor-autoregressive integrated moving average (BHT-ARIMA) model, which uses tensor decomposition [
20] to extract the intrinsic correlations among multidimensional small-sample sequences. Although the aforementioned methods have achieved certain results, they still have some limitations. First, they are mostly based on the assumption that all sequence demands have predictive value and disregard the interference of abnormal values. In fact, in actual business, due to special events such as natural disasters, emergencies, market fluctuations, and other factors, some spare parts have abnormal demand patterns, which require manual analysis for recognition. Second, demand prediction results are random and unreliable. Operations still want to obtain trustworthy results regarding demand prediction, but the existing methods fail to make valid decisions when the prediction results are distorted.
There is also a similar concept, i.e., abnormal demand forecasting, which refers to the process of forecasting and analyzing abnormal demands that may occur in the future. Guo et al. [
21] utilized the passenger flow characteristics extracted by SVR into LSTM to predict abnormal passenger flow. Liu et al. [
22] combined statistical learning and linear regression to build a model of the relationship between price discounts and demand for medical consumables. Li et al. [
23] proposed a multi-scale radial basis function (MSRBF) network to predict subway passenger flow under special events. Nguyen et al. [
24] combined LSTM with SVM to detect outliers in a demand sequence, and then used an LSTM neural network to predict the occurrence of outliers. Li et al. [
25] proposed a combination model of time series and regression analysis, plus dummy variables, to predict special factors related to passenger flow. These methods aim to predict the occurrence of abnormal but meaningful events in many fields. However, in the situation described by this paper, abnormal demands are commonly found due to reporting mistakes made by personnel or environmental changes. Such abnormal demands are harmful to the prediction of the intermittent time series, since the evolutionary pattern of the demand for spare parts is disturbed. There is no value in predicting abnormal demands if such demands have no predictability. This paper is, therefore, solely devoted to predicting normal demands against the disturbances induced by abnormal demands, instead of predicting the abnormal demands.
To solve the problems mentioned above, this paper employs a tensor decomposition technique to smooth the abnormal demands in demand sequences. Tensor Tucker decomposition decomposes a tensor into a set of factor matrices and one core tensor that can describe the intrinsic information from raw data. With the decomposed forms, tensor Tucker decomposition brings the potential to extract the evolutionary trend from intermittent time series. The concept of a prediction interval is further introduced to tackle the problem of high uncertainty and less reliability in the prediction results. The methodology is as follows: First, a tensor autoencoder network is constructed, which performs tensor Tucker decomposition on the output features extracted from the hidden layers of the autoencoder. Second, the core tensor is decoded and reconstructed with an alternating optimization strategy to obtain the optimal feature representation of intermittent time series. Third, an adaptive prediction interval (API) algorithm is developed with a dynamical update mechanism to obtain point prediction values and prediction intervals, thus improving the reliability of the prediction results. Finally, the performance of the proposed method is validated using a set of real-life aftersales data from a large engineering manufacturing enterprise from China.
The contributions of this paper can be summarized as follows:
(1) An intermittent series smoothing algorithm is proposed. By integrating tensor Tucker decomposition into a stacked autoencoder network with an alternately optimizing scheme, the proposed algorithm extracts the evolutionary trend of the intermittent time series under irregular noise interference. Compared with existing time series anomaly detection methods, the proposed algorithm does not require any pre-fixed detection thresholds, and can adaptively identify outliers in the series. It is highly suitable for smoothing the outliers in the intermittent time series. Moreover, the proposed algorithm is universally applicable, e.g., the stacked autoencoder can be easily replaced by other deep models;
(2) An adaptive prediction interval algorithm is designed. Different from the existing point prediction methods using a deterministic prediction model, this algorithm incorporates the prediction intervals with the point prediction, which can match the intermittent characteristics of demand sequence for spare parts. This algorithm provides an effective solution to address the uncertainty in the demand for spare parts. According to the literature survey, there is no related research about interval prediction, specifically for demand prediction.
3. Proposed Method
In this section, a tensor optimization-based robust interval prediction method of intermittent time series is presented. This method consists of two parts: (1) a sequence smoothing network based on tensor decomposition and a stacked autoencoder, which aims to smooth anomalous demand values in the original sequence and to extract the evolutionary trend of demand as well; and (2) an adaptive prediction interval algorithm, which aims to construct a reliable prediction interval to avoid the oversupply risk or inventory shortage caused by inaccurate predictions. In this method, the role of tensor Tucker decomposition is: (1) extracting core tensors from the demand sequence for representing the evolutionary trend; and (2) smoothing the outliers in the sequence. The negative interference of anomalous demands can then be effectively reduced under an unsupervised mode. Moreover, training a LightGBM model with core tensors can better mine the trend information from the demand sequence. Tensor Tucker decomposition is believed suitable for intermittent time series forecasting with small samples.
The flowchart of the proposed method is shown in
Figure 2. First, the hidden features are extracted from the original data using a stacked autoencoder, then tensor Tucker decomposition is performed on the hidden features to obtain the core tensors. An alternating optimization scheme is designed to obtain the optimal core tensors by alternately updating the autoencoder parameters and tensor factor matrices. Second, an adaptive interval prediction algorithm is constructed. The interval is calculated using the predicted values and prediction residuals from LightGBM estimators. Finally, a dynamic update mechanism is used to adjust the width of prediction interval. The detailed implementation will be presented as follows.
3.1. Sequence Smoothing Network
Denote the demand sequence by
, where
m indicates the sample number and
n represents the time dimension. The sequence can be encoded and mapped into a hidden layer, as shown in Equation (
10).
where
h represents the hidden layer features,
is the activation function of the encoding layer, and
and
are the weight matrix and bias vector of the encoding layer, respectively. The obtained feature set is denoted by
, where
l represents the dimension of the deep features.
In order to adequately represent the temporal information between samples in the feature set
H, an MDT with the operations of multi-linear duplication and multi-way folding is employed to transform the original sequence to a three-order tensor. This process can also reduce noise disturbance. Denote by
the tensor of
, then the MDT for
can be defined as:
where
and m represent the time window size and sample length, respectively, and
S is a duplication matrix. In this paper,
is set to 6.
With the tensors constructed by MDT, the Tucker decomposition technique is introduced to obtain the three-order core tensor
that represents the essential information of the sequence:
where
is the projection matrix and can maximally preserve the temporal continuity between core tensors, where the superscript
v represents the tensor dimension, and
represents the sample length after reconstruction. Equation (
12) means that the decomposition result consists of a core tensor and a series of factor matrices. Obviously, the core tensor
contains the intrinsic information of an evolutionary trend. By optimizing
, the inherent correlation between feature sequences can be sufficiently captured, while noise interference can also be reduced. The loss of tensor reconstruction can be calculated as:
where
is the original tensor, and
is the tensor reconstructed from the factor matrix
.
To implement the decoding operation, it is necessary to transform the core tensor
back to the original input space. Here, the inverse MDT transform [
6] is applied to
by reversing the transformation along the time dimension to obtain a second-order core tensor
, as shown in Equation (
14). The core tensor
is then used as the input of decoding to update the network.
where † is the Moore–Penrose pseudo-inverse.
Through decoding
, the reconstructed data
can be obtained as follows:
where
is the activation function of the decoding layer,
is the weight matrix of the decoding layer, and
is the bias vector of the decoding layer. Consequently, the reconstruction loss of the autoencoder can be calculated as:
where
is the weight decay parameter and
is the Frobenius norm.
Based on the aforementioned analysis, the whole loss function is:
Minimizing Equation (
17) can smooth anomalous demands in the demand sequence and extract the evolutionary trend of the demand for spare parts. The key idea of this process is to reduce the significant deviations of anomalous demands, making them suitable for intermittent sequence anomaly detection. The crucial aspect of this process lies in utilizing the core tensor to represent the evolutionary trend. As Equation (
12) indicates,
is randomly initialized. Therefore, the optimization of tensor decomposition in minimizing Equation (
17) is required to obtain the optimal representation of the core tensors.
3.2. Alternating Optimization Scheme
Minimization of Equation (
17) includes the optimization of
and
. The
can be solved using a stochastic gradient descent (SGD) strategy [
29]. However, SGD cannot be directly applied to tensor decomposition. Therefore, this paper adopts an alternating optimization strategy: fix
, and update
; then fix the updated
, and update
. These two steps are performed alternately until convergence. It should be noted that, since the number of tensor optimization iterations is typically smaller than the number of
updates, updating
is set to stop when the difference between two consecutive tensor optimizations of
is less than a specific threshold.
is updated until the model converges. With the initialized
W and
, the specific optimization process is as follows:
(1) Fix , and update ;
1. Encoding stage: the partial derivatives of
with respect to the parameters are:
The partial derivatives corresponding to the error term on each sample can be obtained by:
where “·” represents the dot product operator,
is the diagonal matrix,
is the unit column vector of size
r, and
is the derivative of the activation function;
2. Decoding stage: the partial derivatives of
with respect to the parameters are:
For easy analysis, the error propagation term (i.e., the derivative of the error term with respect to the hidden layer output) is briefly denoted by:
Furthermore, following the chain rule, we have:
where “·” and
have the same meaning as in the encoding stage;
3. Substitute Equations (19) and (22) into Equations (18) and (20), respectively, and the model parameters can be updated as:
where
is the learning rate.
(2) Fix and update .
Updating
can be achieved by minimizing the tensor reconstruction error as shown in Equation (
13). Since
is an orthogonal matrix, Equation (
13) can be rewritten as follows:
where
. To minimize Equation (
24), we should maximize the following equation:
where
is the unfolding matrix of the tensor
along the
i-th dimension. The alternating least squares method [
30] can be used to solve it, where each factor matrix in different directions is fixed sequentially to find the least squares solution. This process is shown in Equation (
26):
3.3. Adaptive Prediction Intervals (API)
After obtaining the optimal feature representation, this paper designs the API algorithm to generate the point prediction values and reliable prediction intervals. The details of the API algorithm are presented as follows. The API algorithm consists of two stages: training and prediction. At the training stage, a fixed number of LightGBM estimators are fitted from a subset of training data. Then, the predicted value from all the LightGBM estimators is aggregated using the leave-one-out (LOO) strategy to generate LOO prediction factors and residuals. At the prediction stage, the API algorithm calculates the LOO predicted value by averaging the prediction factors from each test sample. With the predicted values, the center of the prediction interval is determined. The prediction intervals are then established using the LOO residuals. The width of prediction interval is updated using a dynamic updating strategy. The specific implementation steps are as follows:
First, a LightGBM model
f is trained using the training samples
, and the prediction interval at the
t-th time step is calculated as:
where
is the significance factor,
represents the
t-th estimator of
f,
is the
-quantile on
, and
is the
-quantile on
. The LOO prediction residual
and
are defined as follows:
Second, the interval center is , and the interval width is the difference between the and -quantiles on the past T residuals.
However, the obtained interval values cannot adequately address the problem of large fluctuations of demands for different spare parts. This section proposes an adaptive update mechanism to improve the reliability of the prediction interval. The specific step is as follows: First, each demand sequence is divided into a zero-interval sequence (i.e., the gap between two subsequent demands occur) and a non-zero sequence (i.e., the real demand values). The squared coefficient of variation is then calculated for the two sequences, which are denoted by
and
, respectively. A larger coefficient indicates greater sequence fluctuation, thus requiring a larger interval width for prediction, and vice versa. Second, initialize the interval width parameter
and update
by:
The interval update mechanism, as shown in Equation (
30), can improve the rationality and reliability of the prediction interval for demand sequences with different volatility and intermittency characteristics. Obviously, different values of interval widths will directly affect the decision of inventory management, e.g., spare part coverage rate. The spare part coverage rate, reflected by the interval coverage rate in this paper, is defined as the ratio of the number of demands covered by the interval to the total number of demands in the sequence. The mechanism begins by setting an initial interval with a larger width, and then reduces the width to meet the fluctuations in the demand sequence. The update process is stopped when the interval width reaches a certain threshold. In this study, the threshold of the interval width is determined just by the interval coverage rate. The setting of the coverage rate relies heavily on business logic. Different kinds of maintenance tasks or enterprises have different requirement for the setting. In this experiment, we received help from the maintenance engineers from our cooperative enterprise and set the lower limit of coverage rate to be 60%. We also observe that, with this threshold, the prediction results become stable, which indicates that the threshold runs well.
5. Conclusions
In this paper, a new tensor optimization-based robust interval prediction method of intermittent time series is proposed to forecast the demand for spare parts. With the introduction of tensor decomposition, the proposed method can smooth the anomalous demand while preserving the intrinsic information of evolutionary trend from the demand sequences. Moreover, to tackle the distorted or biased prediction results of intermittent time series, the API algorithm is designed to transform traditional point prediction to adaptive interval prediction. The proposed method is able to provide a trustworthy interval for the prediction results, and can accurately reflect the uncertainty of the prediction results.
This study is fundamental to develop the effect of inventory management. As a typical extended application, the proposed method is critical to update the current safety stock model to a dynamic version by integrating the evolutionary trend of demand for spare parts. For instance, the prediction results from this paper can adjust the upper bound of safety stock to match the fluctuations in the actual demands. We believe this operation can decrease the cost level while keeping maintenance.
Another interesting issue is the correlation among the demand sequences for different spare parts. The demand for two spare parts is probably correlated due to many factors such as project cycle, climatic influence, failure probability, etc. As proven by many prior works, the correlation information between two time series is valuable. Especially for intermittent time series prediction, analyzing the correlation among some sequences is believed to improve the predictability, which is critical for intermittent time series. The joint prediction of two or more sequences is able to enhance the prediction performance in terms of accuracy and numerical stability. We plan to explore the utilization of correlation information in the future work. A feasible implementation is running adaptive clustering algorithms with profile coefficients to determine the proper sequence clusters before the prediction, which is expected to improve the predictability. We can further adopt multi-output regression or multi-task learning algorithms to achieve joint prediction on the sequences in the same cluster. We plan to explore the utilization of correlation information in the future work.
In our future work, transfer learning will be studied for the prediction of demand for spare parts. In actual enterprises, the demand data are usually insufficient, especially for newly deployed equipment. Considering the distinction between different types of equipment, we plan to build a transfer learning model to incorporate the evolutionary information of the demand from available equipment. The reliability of the prediction interval is also an interesting work. It is necessary to find a reliable method to converge the prediction interval into a prediction point with a high confidence level.