1. Introduction
A rise in air traffic has been observed in recent years, paralleling the expansion of both the national economy and the civil aviation transportation sector. For this reason, airport delays have become a major drain on the global economy and a challenging issue for the aviation industry to resolve. Due to the complexity of the air transportation network, flight regularity management is required. How to effectively explore the delay pattern of an airport network, precisely deduce the spatial and temporal evolution trend of a delay, and macroscopically control the propagation law of delay have thus become crucial subject matters to address.
Numerous studies have been conducted on the topic of airport delays, both domestically and abroad. Some researchers have utilized traditional machine learning algorithms such as Decision Tree, Random Forest, Bayesian networks, and Linear Regression to predict delays [
1,
2,
3,
4,
5]. While conventional algorithms are simpler to interpret, they invariably lack accuracy and are not sufficiently effective at predicting delays. Due to the exponential growth of airline data, these algorithms are also confronted with massive flight data, resulting in computational bottlenecks. In addition to the extensive use of Big Data Mining Technology, neural network algorithms are also widely employed in the civil aviation industry. Through multiple models’ combination based on the deep learning paradigm, Reference [
6] proposed a Long Short-Term Memory (LSTM) recursion neural network, and demonstrated that the prediction accuracy was improved with the deepened structure; to solve the coding issue in delay prediction, Reference [
7] proposed a multiple layer artificial neural network to predict the airport delay of JFR Airport; Reference [
8] designed and proposed a method combining a deep belief network and support vector regression to predict the airport delay of PEK-HGH; References [
9,
10,
11] designed a Long Short-Term Memory (LSTM) network and its improved algorithm to achieve delay prediction, and achieved a satisfactory prediction result; Reference [
12] applied the Graph Convolutional Neural Network (GCN) delay prediction method to explore the spatial interaction hidden in an airport network. The results show that deep learning based on graph structure input has a great potentiality in air traffic delay prediction; concerning the causality of flight delay propagation between airports, Reference [
13] studied flight delay prediction standing at the perspective of an airport network, and established a DGLSTM depth learning framework based on 4-year historical data of 325 airports in the United States, and its accuracy and robustness is more competent than current popular methods; Reference [
14] established a flight departure time prediction model based on deep learning with analyzing the influences of different factors on flight departure time; Reference [
15] established the method of delay prediction for the whole process of transit flights, constructed an unbalanced data classification model, identified delayed flights at each prediction guarantee node, and achieved an accurate recognition rate of 96.5% for delayed flights; and Reference [
16] established the airport delay prediction model based on the airport network method. Researchers established clustering models on the characteristics of multiple airports’ networks, and concluded that the model based on Betweenness Centrality realized a satisfactory prediction effect through experimental comparison and verification.
The above researchers have conducted certain research and achieved fruitful research achievements on the prediction of an airport delay, mainly involving the innovation of an algorithm and the application of an airport network for delay prediction. In terms of the algorithm, they propose to integrate the long- and short-time recursion memory network, multilayer artificial neural network, deep belief network, and support vector regression; apply an intelligent optimization algorithm to optimize the parameters of a traditional BP neural network; and adopt other advanced algorithm structures to analyze the airport delay prediction. In terms of the airport network, many scholars have demonstrated that the integration of the input of the graph structure and the network eigenvalues can improve the accuracy of the delay prediction based on theoretical innovation, of which the integration of the airport delay prediction and the network structure of the airport network is the latest research direction.
Generally speaking, the methods related to flight delay prediction in the past are mainly focused on the improvement of the algorithm itself. However, the complexity of flight delay propagation and the space correlation of adjacent airport delays failed to attract sufficient attention. For during the process of airport delay transmission, this paper proposes a method to capture the regularity of the delay propagation of adjacent airports based on the time and space correlation transmission of a delay in the aviation network. In addition, this paper also proposes a new data set construction method, which can not only build airport cluster data sets with similar delay patterns, but also increase the robustness of the prediction set and capture the inherent mechanism of delay propagation from high dimensions. This plays a key data support role in improving the prediction accuracy. Meanwhile, this paper proposes a bidirectional long-term and short-term memory network BiLSTM that integrates the Attention mechanism to distinguish the impact of different characteristic variables on the delay of the target airport, which can selectively capture the key features that influence the delay under multi-dimensional complex and disordered time and space variables, ignoring the interference of secondary factors on the prediction results. Finally, the structural stability of the data sets and the prediction accuracy of the method and the traditional algorithm are compared through establishing different prediction models under different data sets, which verifies the effectiveness of the method proposed by this paper.
The core content of this paper includes that of
Section 2, the introduction on the concept and construction method of an airport cluster, as well as the time and space correlation analysis of delay in the airport network transmission;
Section 3, the introduction of relevant algorithm concepts, the algorithm structure and the extraction and analysis process of characteristic variables, and how to build the TS-BiLSTM-Attention model and data processing process;
Section 4, the stability of the airport cluster data set proposed in this paper and the prediction accuracy of the algorithm are verified by a case analysis; and
Section 5, the summary and discussion on our imagination of future work. The experimental flow chart of data processing is specified in
Figure 1.
3. TS-BiLSTM-Attention Airport Delay Prediction Model
3.1. BiLSTM Neural Network
Bi-directional Long Short-Term Memory (BiLSTM) is an optimization over the traditional unidirectional LSTM [
24]. In light of the fact that the LSTM can only predict the output of the next instant based on the sequence information of the previous moment, it is marginally insufficient for multidimensional temporal variables. The current output of multidimensional time-series data depends not only on the previous state but also on the future state. BiLSTM combines a forward LSTM layer and a reverse LSTM layer to capture “past” moment information from front to back and “future” moment information from back to front, respectively. BiLSTM is a combination of forward and reverse timing information input that can fully account for past and future information in the case of multidimensional timing data and can further improve the accuracy of the model prediction.
The structure of the BiLSTM cell is shown in
Figure 3.
x1,
x2, …,
xi …,
xt denote the corresponding input data at each moment of
ti(
i∈[1~
t]), and
F1,
F2,
F3, …,
Ft,
B1,
B2,
B3, …,
Bt denote the corresponding forward–forward and reverse–backward iterations of the LSTM hidden states, respectively.
The hidden layer update states of the forward LSTM, the inverse LSTM, and the final output process of the BiLSTM are depicted in Equations (6)–(8), respectively.
where
f1,
f2,
f3 are the activation functions between different layers;
ω1,
ω2,
ω3, …,
ω6 represent the corresponding weights of each layer.
3.2. Attention Mechanism
The Attention mechanism was derived from the simulation of human brain attentional characteristics, which was initially applied to image processing. In the field of deep learning, the Attention mechanism assigns relative importance weights to various features. Key contents are assigned greater weights and other contents are assigned lower weights. Consequently, the efficiency of information processing can be enhanced through differential weight assignment, which highlights the most important temporal characteristics in order to obtain higher-quality multidimensional variables [
25].
Figure 4 depicts the architecture of the Attention unit.
The essence of the attention mechanism is to assign a weighting factor to each value, as demonstrated by Equation (9).
where Source is a known element consisting of the data pair <Key, Value>; Query is the objective function.
During the training process, the attention model dynamically modifies the weights of each time step and calculates the weight coefficients of each variable feature—the calculation process for which the following equation can be utilized.
where
hi−k is the output value of the BiLSTM hidden layer;
αi is the attention weight coefficient of the current input;
hi−1,
hi-2, …,
hi−N is the input sequence;
Si−1,
Si−2, …,
Si−N is the hidden layer state value corresponding to the input sequence
hi−1,
hi−2, …,
hi−N; and
VST,
WS,
bS represent the learning parameters of the model, which will continue to be optimized with the model training process.
3.3. TS-BiLSTM-Attention Prediction Model Construction
We propose a TS-BiLSTM-Attention delay prediction model based on spatio-temporal sequences that capture the spatial and temporal characteristics of airport delays from the overall airport cluster delays.
Figure 5 depicts the four components of the model: data pre-processing, feature engineering, model training, and effect evaluation.
In the data pre-processing phase, because the historical flight data contain some missing values and outliers, the resulting prediction set has an unstable data structure and data noise interference. This affects the results of the experiment. This paper constructs airport clusters based on the similarity of airport network attribute values to reduce the impact of data anomalies and other factors on experimental results in order to address the aforementioned issue. The ability to accurately extract the feature variables affecting delays and construct a stable data set is related to the precision of delay prediction in terms of extracting feature variables. In the same airport, a previous flight delay will affect the departure of a subsequent flight, resulting in the rapid propagation of the delay throughout the airport. This would cause the flight departure to be delayed at that airport. Simultaneously, failure to land at the airport for a brief period of time due to inclement weather or other factors such as flow control at the landing airport will also impact the departure of all prescheduled flights from the associated airport during that period. As the incoming and outgoing flights with delay characteristics operate within the aviation network, delays will propagate in space. Using the inbound time series of this airport and the time series of the associated airports as the characteristic variables for delay prediction captures the inherent mechanism of delay propagation from the high-dimensional characteristic variables. In addition, the granularity of prediction time in this study is set to a 1 h delay per unit. The unit hour delay is defined as the ratio of the total delay time of all flights during a given time period to the total number of flights throughout that time period. Accordingly, the degree of delay at an airport can be accurately described within a given time frame.
We construct the TS-BiLSTM-Attention model and feed the Attention mechanism the features generated by the BiLSTM hidden layer. The Attention mechanism is utilized to automatically differentiate the importance of spatio-temporal information extracted from the hidden layer of BiLSTM utilizing weighting. This can effectively exploit the time-series properties of the multidimensional time-series features and exploit the profound spatio-temporal correlation. Attention can effectively reduce the loss of historical information and highlight the information of key historical nodes in order to reduce the impact of redundant information on prediction results per unit delay time. The output of the Attention layer is then used as the input for the Fully Connected layer, which outputs the final unit delay time. In the network parameter optimization phase, the Adam (Adaptive Moment Estimation) optimization algorithm is used to update the network parameters of each layer, while the Mean Squared Error (MSE) is employed as the loss function. The trained TS-BiLSTM-Attention model is then saved and the model’s validity is validated using the model test set.
Among them, the TS-BiLSTM-Attention prediction model is comprised primarily of BiLSTM-Attention, and the BiLSTM- Attention construction is illustrated in
Figure 6. The inputs consist of
T-extracted spatio-temporal feature variables from airport clusters. The features are incorporated into the output layer to ultimately output the delayed prediction values
y.
3.4. Model Construction
The essence of training a deep learning model is the iterative adjustment of model parameters. Model training involves utilizing variations in parameters such as the batch size, time step, number of hidden layers, and neural network nodes on the training set to minimize the loss function formed by the prediction outputs and actual data. The objective is to continuously update the algorithmic model structure in order to minimize prediction errors.
Step 1: Obtain the experimental dataset after data pre-processing and feature engineering. Split the dataset into training data and testing data in a ratio of 7:3.
Step 2: Build the TS-BiLSTM-Attention model for delay prediction. Input the processed multidimensional temporal features into the BiLSTM model.
Step 3: The multidimensional temporal data are inputted into the BiLSTM unit structure, and the “forward” and “backward” temporal processing is achieved using Equations (6) and (7). This process effectively integrates the temporal information from the “past” and “future”. Finally, Equation (8) outputs the hidden layer features, which serve as input variables for the Attention model. The Attention mechanism utilizes its own structural Attention mechanism to capture the spatio-temporal information obtained by BiLSTM. Equations (10)–(12) are used to assign weights to the features, and the parameters are continuously optimized during the model training process. This enables the differentiation of time series with spatio-temporal delay characteristics and further explores the spatio-temporal delay features based on a network analysis.
Step 4: The output of the Attention layer is used as input to a fully connected layer, which outputs the final unit delay time. The optimization of network parameters in the model structure is performed using the Adam (Adaptive Moment Estimation) optimization algorithm for iterative updates of internal parameters. The Mean Squared Error (MSE) is used as the loss function.
Step 5: Save the trained TS-BiLSTM-Attention model and use the model to validate its effectiveness using the testing dataset.
5. Conclusions
In this paper, airport clusters are constructed based on airport network attribute value similarities. In order to construct the TS-BiLSTM-Attention prediction model, the spatio-temporal correlation between airport clusters and nearby airports is extracted. The effectiveness of both the clustered data set and the model is demonstrated through comparative experiments. Experiments indicate many algorithms can be applied to build delay prediction models under different data sets, of which MAE and RMSE show great differences. The values of MAE evaluation indicators in the cluster-airport data set are 5.532, 5.556, 4.725, and 4.491, respectively, which are lower than those based on a single airport as 5.899, 6.041, 4.752, and 4.491, respectively, demonstrating that the delay prediction model (C-ADM) based on a cluster airport is more competent in predicting the departure delay of a single airport in the cluster. The delay prediction model (C-ADM) constructed using clustered airports can predict departure delays for individual airports within clusters more accurately. The model structure is more stable than the delay prediction model (S-ADM) constructed for single airports. The C-ADM model can reduce the effect of missing values and outliers on the accuracy of predictions. The clustered prediction set has a stable data structure and encompasses a significant amount of anomalous data. Compared with traditional BP, ARIMA, and LSTM neural network algorithms, the results show that the MAE of BP, ARIMA, and LSTM are 5.532, 5.556, and 4.725, respectively, under the model of C-ADM, which are greater than 4.494 under the model of TS-BiLSTM-Attention; the RMSE indicators are 7.633, 7.572, and 7.433, respectively, which are higher than 6.787 under the model of TS-BiLSTM-Attention. Therefore, the MAE and RMSE based on the TS-BiLSTM-Attention prediction model are smaller than the traditional model regardless of how the data set changes, which proves that the BiLSTM model integrated with the Attention mechanism can accurately extract multidimensional variables and complex time series characteristics and conduct regression prediction, while predictability, generalization, and universality are superior to traditional models.
Improvements of work: This paper only studies the transmission of an airport flight delay in the time and space network and the prediction of an airport delay, aiming at exploring the transmission mechanism of an airport delay and seeking prediction methods with higher and better performances. It is worth noting that there are still some topics deserving our further exploration. When conducting the cause analysis and data collection, weather, as one of the important factors influencing a flight delay, failed to be considered for the time being. In addition, the factors that influence an airport delay are mutually restricting and influence each other. Through collecting more complete data and extracting key factors influencing a delay, the prediction effect of an airport delay can be further improved.
This paper focuses on the dimension of airport networks and extracts the spatio-temporal correlations of delay propagation among adjacent airports. It aimed to capture the inherent mechanisms of delay propagation in high-dimensional space. Additionally, it constructs the TS-Attention-BiLSTM temporal prediction model to achieve more accurate predictions of departure delays at the target airport. In practical airport operations, by obtaining the delay status of adjacent airports, the dynamic prediction of departure delays at the target airport for the next moment is made possible. Any delay occurrence at an adjacent airport dynamically affects the departure time of flights at the target airport. Therefore, the proposed theoretical and algorithmic models in this paper enhance the real-time perception capability of the delay status at adjacent airports and provide real-time updates on the delay status at the target airport. This helps airports anticipate delays in advance and respond promptly with appropriate action plans.
The expansion and deepening directions of related research in the future can include that, when the airport encounters flight delays, delays often transmit rapidly through the airline network, and how to accurately construct the delay characteristic variables deserves further research in the future. Meanwhile, combining with reality, applying the latest deep learning algorithm to fit and model the multidimensional time and space data is of great significance for the delay prediction.