1. Introduction
Air transport is a complex system whose dynamics evolve over multiple temporal scales. When focusing on operational aspects, the largest time horizon is given by the daily cycle, with the system resetting every night—as at least most passenger flights stop operating, new crews start their shift, and (except in the case of extreme events) delays are absorbed. At the opposite extreme, decisions can be made by multiple actors on very short time scales; to illustrate, air traffic controllers have to react to potential separation losses that develop on the scale of a few minutes [
1] and that require reaction times of between 2 and 3 s [
2].
While those time scales are relatively easy to measure, a relevant open problem is the description of the one underpinning the propagation of delays (i.e., secondary or knock-on delays). Delay propagation is the result of processes that exist at different time scales. One may intuitively think that delays are related to the duration of flights, such that, if a flight
a is delayed at departure and it is to pass the delay to a second flight
b, the minimum time required to see the effect is the actual duration of
a. In addition, if delays are measured at landing, the observed time scale would be the duration of
b plus the turn-around time between
a and
b. On the other hand, information about delays can be transmitted much faster; for instance, an airline can delay a flight knowing another one with connecting passengers is delayed well before the latter has landed, and therefore well before its actual delay is recorded [
3,
4]. The existence of upstream delay propagations, in which the delay of one flight affects the flight prior to, it has also been documented [
5]; for instance, if the destination airport is congested, flights may not be allowed to land and may instead be retained on the ground at the airport of origin.
The problem of describing the time scale of delay propagation is made even more complex by the fact that limited options exist for measuring such phenomenon. Two main alternatives have been extensively studied: the analysis of the local dynamics of individual flights and airports [
6,
7,
8] and the use of large-scale synthetic models [
5,
8,
9,
10,
11]. To the best of our knowledge, only one research work has tried to explicitly assess the time scale of network-wide delay propagation from real data [
12] by detecting sliding windows for which a stable correlation is observed between the aggregate delays at two airports and then extracting the time lag maximising such correlation; the use of a linear correlation nevertheless introduces some problems, as for instance, it does not discriminate direct (i.e., true causalities) from indirect (spurious correlations) relationships.
In recent years, a complementary approach has been proposed based on the detection of instances of delay propagation through causality metrics, such as, e.g., the celebrated Granger causality test [
13,
14], and on mapping these instances as links of a complex network—see for instance, Refs. [
15,
16,
17,
18,
19,
20,
21]. Even if it comes with its own limitations, the Granger test presents the advantage of detecting instances of true “predictive causality” and is thus less affected by the presence of confounding elements—and hence of spurious correlations. A network representation is also easier to describe, as many techniques are available to quantify properties of the emerging structure [
22,
23].
Intuitively, a study similar to that proposed in Ref. [
12] could be performed using the Granger causality test, as this test yields the temporal lag maximising the causal connection for each analysed pair of time series: a lag that can be interpreted as the time required for the information describing delays to propagate between the corresponding airports. This nevertheless comes at a cost: a new time scale has to be defined over which the causality is tested. More specifically, this test requires as input a time series per airport, representing its dynamics. In this case, it represents the evolution of average landing delays; it is thus necessary to define the length of the window in which delays are averaged. Choosing the best time scale for evaluating the presence of a causality in the Granger sense is an open problem, known to be highly complex. Most research works have focused on the case of continuous dynamic processes sampled at a specific resolution—for being the most common problem found in science and engineering. In this case, increasing the temporal sampling rate yields improved causality estimations, both in synthetic models [
24] and in empirical studies [
25]. This is intuitively to be expected, not only because higher sampling frequencies yield longer time series (which are easier to analyse), but also because important dynamic patterns may otherwise be lost. However, increasing the sampling resolution is not always beneficial, with the detected causality approaching zero almost linearly as the sampling interval tends to zero [
26]. On the other hand, is has been shown that using too large a sampling interval can result in statistically significant yet spurious causality relationships [
27,
28,
29,
30,
31].
The problem becomes even more complex in the analysis of delay propagations, as the underlying process is a fixed point as opposed to a continuous process. In other words, even if the propagation of delays could be visualised as a continuous process, we only observe its result at discrete moments in time, i.e., whenever an aircraft actually lands. This implies that the sampling frequency cannot be made arbitrarily large. Otherwise, most samples would correspond to time windows for which no information (i.e., no landing events) is available [
32,
33]. On the other hand, safety limitations imply that at best only tens of landing events are available per hour. Ensuring a large number of events in each time window would therefore require impractically large windows.
In this paper, we explore how the Granger causality test can be used to describe the characteristic time scale of delay propagation processes. We show how results can be framed within three complementary viewpoints: a methodological one, an operational one and a systemic or network one. We explore this issue by constructing and analysing a minimal synthetic model of delay events, simulating how delays can propagate between two airports and how such propagation can be detected by the Granger causality test. We also extend the scope to a whole airport network by evaluating such causality on all possible pairs of airports, representing the structure as a complex network. In the latter case, the concept is tested on real data about operations in 50 large European airports during September 2019.
Results indicate that optimising the size of the window used to calculate the Granger causality for each pair of airports results in a substantial increase in the number of detected propagation links and, in turn, in different macro-scale (but notably not micro-scale) propagation structures. We further show how the choice of this temporal resolution strongly affects the ranking of airports in terms of their importance for the propagation of delays. We finally study how the Granger causality can be used to assess the time scale of delay propagation and how the chosen window size affects such estimation.
In the remainder of this paper, we first introduce the data (
Section 2.1), the Granger causality test (
Section 2.2), and the network reconstruction procedure (
Section 2.3). We then present a minimal model, generating synthetic time series of landing events at two airports with custom parameters in
Section 3. The results for the real data are analysed in
Section 4, organised according to methodological (
Section 4.1), operational (
Section 4.2),and network (
Section 4.3) viewpoints. We finally discuss the validation of the results in terms of time series length (
Section 5) and draw some methodological and operational conclusions (
Section 6).
3. Synthetic Model of Delay Propagation
Before delving deeper into the study of real data, we construct and evaluate a synthetic model of delays to understand the relationship between, on one hand, the size w of the window used to estimate the Granger causality and, on the other hand, the efficacy of the causality detection. The approach consists of generating two time series representing the evolution of average delays at two fictitious airports with similar dynamics. By lagging the second one a certain number of time periods and applying the Granger causality test between the original and the lagged series to assess whether a causality is detected, we run several iterations of the simulation for different window sizes and calculate the fraction of times we detect a statistically significant causality.
The first step is the creation of time series representing the global evolution of delays, for which we resort to the well-known Lorenz model [
58]:
With the three parameters fixed to
,
and
. It is well-known that the system exhibits a chaotic behaviour for these three values, and chaotic systems have a long history for being used as models of complex dynamics, e.g., in economics [
59] or biology [
60]. The
x variable of the Lorenz system, originally representing the rate of convection in a two-dimensional fluid layer, is here used to model the evolution of the average delay observed at an airport through the day, or in other words, the expected delay of an aircraft landing at time
t. The time series of
x is smooth, i.e., no sudden jumps occur. Additionally, it is deterministic but also chaotic; thus, no self-correlations are present. This last point is of relevance, as the Granger test requires stationary time series; if simpler functions are used to create such average delay evolution, e.g., a sinusoidal signal, the resulting time series would have, periodic dynamics that may cause spurious causalities. Once the time series has been generated, we normalise it in the range
. Finally, two time series are generated from it, respectively, referred to as the master (i.e., the cause) and the dependent (i.e., the consequence) time series, by lagging the latter a certain number of time periods representing the time needed for the delay to propagate.
Once the two time series representing the overall evolution of delays have been created, it is necessary to reconstruct a set of landing events, as these are the only data available when analysing the real system. These events, for both the master and the dependent airports, are generated with a time separation between subsequent ones given by random numbers drawn from a uniform distribution
. Thus,
defines the expected separation between consecutive landings and is therefore inversely proportional to the amount of traffic. The delay assigned to each landing event is drawn from a second uniform distribution
, where
is the value of the variable
x of the Lorenz model at time
t. The result is then two sets of events representing synthetic delays, each one described by a time stamp and an observed landing delay. Finally, for a fixed value of
w, we extract the average delay observed in non-overlapping windows of length
w, both for the events of the master and the dependent airports. These averages are merged in two time series, which can then be analysed using the Granger causality test. Note that the detrend step is not necessary here, as the use of a chaotic system ensures the stationarity of the time series. An example of the generation of one average delay time series can be seen in
Figure 1, top panels. From left to right, the three panels depict the generation of the Lorentz series representing the global evolution of delays, the generation of the landing events and the calculation of the average delays.
This model allows us to evaluate the behaviour of the Granger causality under well-controlled conditions. First of all, a causality is present between both time series by construction, as landing events in both airports are generated from time-shifted time series; it is thus possible to evaluate how sensitive the test actually is. Second, it is possible to fully control the parameters of the model and thus test the hypothesis that too small and too large values of w can result in an underestimation of the causality.
We proceed with the analysis of the model by varying the two main parameters: the average time between landing events and the window size w. For each combination of parameters, 1000 repetitions are generated, and the fraction of times that the p-value is significant is calculated.
The left middle panel of
Figure 1 reports the proportion of significant tests for different window lengths
w and for different event separation times
. All plots follow a similar trend, where for very low values of the window length (between zero and one), most of the tests are not statistically significant, as not enough landing events are available to obtain meaningful average values. The fraction then increases until it reaches a maximum and finally decreases for large window lengths. This is different than that observed for small
w, which is potentially due to the fact that too long a window results in the average of too much information, thus losing the high-frequency dynamics in both airports—or, in other words, that just the global average landing delay is obtained. The right middle panel of
Figure 1 additionally reports a scatter plot of the optimal window size as a function of the separation time
. In synthesis, the model confirms that an ideal (optimal) window size exists, for which the maximum number of causality relationships are detected; deviating from such an optimum results in an underestimation of the propagation.
We then change the simulation, specifically the synthesis of the landing events. The time separation between consecutive landings is now given by random numbers drawn from a distribution , e being an asymmetry exponent that stretches the positive tail of the distribution. In other words, while keeping the average constant, increasing this exponent results in clusters of landing events, with some of them happening after long periods of inactivity. Thus, this exponent is used to break the regularity of landing events and to simulate a more realistic burst distribution. We fix the average time between landing events and vary the window size w and the asymmetry exponent e; similar to before, we execute 1000 iterations for each pair of parameter values and record the proportion of times the p-value is significant.
The left bottom panel of
Figure 1 depicts how the fraction of significant tests varies when considering different window sizes and asymmetry exponents. In order to compensate for the fluctuations due to the stochastic nature of the analysis, the dotted lines represent second order polynomial fits. The right bottom panel further depicts a scatter plot of the best window size as a function of the exponent
e. A clear linear relation can be observed, where for larger asymmetry exponents, namely for less uniform separation times, the best window size grows; in other words, a larger window size is needed to detect causality relationships in order to cope with the periods of less activity.
In synthesis, results from the model confirm that the ideal window size to detect delay propagation is a function of the number of landing events; in other words, the higher the frequency with which we obtain information from the system, the higher the potential temporal resolution of the results. This optimum is a balance between two forces: the need for analysing large number of events to obtain a reliable average delay estimation, on one hand, and the risk of considering too long time windows, with the consequent smoothing of the fast part of the dynamics, on the other. Still, the estimation of such window length is made more complex by other factors, such as, for instance, the degree of regularity of landing events.
4. Analysis of Real Delay Propagation Patterns
When moving to the analysis of real delay data, the approach proposed here can be applied at three different levels. Firstly, there is a methodological point of view, i.e., one may simply be interested in optimising the analysis of delay propagation patterns and hence derive the best window size for detecting causality relationships. Secondly, such an optimal window size, along with the optimal lag as yielded by the Granger causality test, can be used to derive the time needed for delays to propagate between two airports. Finally, the last step is to move to a network level and see whether and how changing the window size modifies the resulting delay propagation topology. These three aspects will be tackled in this section.
4.1. Methodological Viewpoint: Best Time Scale for Detecting Delay Propagation
The identification of the best time scale for detecting delay propagation in real data is not dissimilar from what is presented in
Figure 1; specifically, for a pair of target airports, one needs to reconstruct the corresponding delay time series for different values of
w; to apply the Granger causality test, thus obtaining a
p-value as a function of
w and finally to select the
w minimising such a
p-value. Note that a smaller
p-value implies a more statistically significant relationship, but also that such relationship manifests clearer in the data.
The top left panel of
Figure 2 shows the window size
w minimising the
p-value, for those pairs of airports that display a statistically significant relationship (i.e.,
p-value
, with a Bonferroni correction for multiple testing). No clear pattern can be discerned, with the optimal
w varying from 30 min to up to two hour (see top right panel for a corresponding histogram). Specifically, no clear relationship seems to exist with the size of the airport (see middle right panel of
Figure 2 representing a box plot of all
ws for each airport and with airports sorted by decreasing number of operations and
Figure A1, reporting a scatter plot of the optimal
w as a function of the number of operations of airports).
Then, what are the elements that drive the size of such an optimal window? In order to answer this question, we extracted a set of time series describing the two most important aspects of the operations at an airport, i.e., the separation between landings and their delays—see the first column of
Table A4. Each one of these sets of values has been synthesised using metrics such as the average or the standard deviation—second column of
Table A4, and these features have further been combined for each pair of airports under analysis, using standard mathematical manipulations (e.g., the product or the maximum)—right columns of
Table A4. The result is a set of 90 features per link. As a first approach, the linear correlation between each one of these features and the optimal
w has been obtained, and the corresponding coefficient of determination
has been calculated—see results in
Table A4. Taken individually, none of them seem to substantially explain the best window size; the best result is given by the maximum of the median absolute deviation (MAD) of the landing times although the
is only
. We then exhaustively explored all combinations of four features and evaluated linear models based on them. The best four, yielding an
of
, are: the standard deviation of the landing separation for the receiving airport, the minimum Hurst exponent (HE) of the landing separation, the MAD of the landing times of the source airport, and the maximum of the MAD of the landing times. Scatter plots for these features and the corresponding linear fits are reported in the bottom panels of
Figure 2. Finally, when combining all 90 features in a linear model, the resulting
is
.
What conclusions can be obtained from these results? From a general point of view, the landing and delay dynamics of the involved airports are partly responsible for defining the best window length
w although a linear model can only explain
of the variability. Additionally, it can be seen that such
w depends mainly on the landing dynamics, with the delays themselves having a minor role—see
Table A4. While selecting a few features is not enough to construct a model able to predict the best
w, this latter seems to be related with the variability of the separation between consecutive landings, something that is in agreement with the results of
Section 3.
In synthesis, obtaining the best window size w for a pair of airports is not a trivial process. Although it depends on the characteristics of the landing events, and specifically on the time between consecutive operations, the best w strongly changes between different pairs of airports, such that its value cannot be predicted at this stage. While disappointing, this is to be expected, as real operations (and data) are more complex than any synthetic model. Consequently, the only reliable alternative is testing several values of w and selecting the one yielding the minimal p-value.
4.2. Operational Viewpoint: Delay Propagation Time
Next, we tackle the problem of estimating the time it takes for delays to propagate within the network. This is achieved by considering the optimal window size w for each pair of airports, as estimated in the previous subsection, for then performing an exhaustive search to find the maximum lag yielding the best (lowest) p-value. Finally, the delay propagation time is obtained by multiplying these two values.
The left panel of
Figure 3 reports the resulting delay propagation time for each pair of airports. As in the case of
Figure 2, only results for those pairs of airports that have a statistically significant propagation are reported. Intuitively, one may hypothesise that such propagation time should be proportional to (or at least, be a function of) the distance between the two involved airports. As is evident in the scatter plot of the right panel of
Figure 3, no simple pattern can be observed; a linear correlation analysis yields a coefficient of
, with a
p-value of
.
We further estimate the time required for a delay to be propagated by an aircraft performing two consecutive flights, respectively, departing from airports
a and
c, as a function of the distance between the two airports. The geographical distance is transformed into time by considering a ground speed of 500 knots, plus one hour for departure and arrival procedures and turn-around operations. In other words, the objective is to obtain an estimation of the minimum time required between the two consecutive departures. The result is depicted by the dashed diagonal line in the right panel of
Figure 3. It can be seen that most propagation times are well above the line, suggesting the presence of indirect propagation patterns—e.g., when the propagation between two airports
a and
c requires an intermediate airport
b. This is consistent with the use of the Granger causality test, which is designed to detect these indirect patterns.
At the same time, a few pairs of airports have delays propagating between them in very short times, as low as 50 min—see
Table A6. In other words, there are instances in which a delay takes less time to propagate than the duration of the shortest flight connecting the corresponding pair of airports. To illustrate this point,
Figure 4 depicts a graphical representation of the five pairs of airports with the largest (red arrows), smallest (green arrows) and most asymmetrical (blue lines) propagation times—numerical data are reported in
Table A5,
Table A6 and
Table A7. No clear trend can be identified. While most of the largest propagation times occur between distant airports, this also happens between Brussels and Hamburg and between Hamburg and Stuttgart. If delays between nearby airports are propagated by indirect connections, this does not explain the short time observed from Amsterdam and Hamburg or between Paris Orly and Stuttgart.
Following what was proposed in Ref. [
12], the left panel of
Figure 5 reports the distribution of the propagation time when airports are classified according to their size, i.e., in large and small ones—see also
Figure A2 for full results. The former ones are those with a number of flights larger than the median of all considered airports; the latter ones are those with less flights. Four distributions of propagation times are then calculated corresponding to all possible combinations of airports at each end of a propagation link. As opposed to what was reported in Ref. [
12], no significant differences can be observed. Additionally, the right panel of
Figure 5 reports two distributions of the propagation times, respectively, corresponding to pairs of airports that are nearer or farther away than the median of all pairwise distances. While the propagation time is slightly larger for airports located farther away, the difference in medians is not statistically significant (Mood’s median test,
p-value
).
We hypothesise that several factors may contribute to the complex relationship between the propagation time and airports’ characteristics. On one hand, it is clear that delays at neighbouring airports are influenced by shared weather patterns or even by interactions between their approach procedures. On the other hand, it is possible for information about delays to be transmitted before the corresponding flights reach their destination, as for instance when managed through ground delay programs. In short, it seems that the delay propagation time is driven by different and intermingling factors, including (but not exclusively) indirect connections and localised weather patterns.
4.3. Network Viewpoint: Propagation Network and Its Structure
The next logical step is to reconstruct the whole functional propagation network by varying the size of the window used to assess the Granger causality on the time series of pairs of airports. In
Figure 6, blue lines report the evolution of six topological metrics as defined in
Section 2.3 as a function of the window size used in the network reconstruction process. Note that this corresponds to the standard approach of using a single time scale for all possible propagation links. On the other hand, the dotted horizontal lines in the same figure report the network metrics when the optimal time scale is used for each airport pair.
Let us start by analysing the link density in the top left panel of
Figure 6. It is apparent that using a fixed time window of 60 min, as common in the literature [
15,
16,
17,
18,
19,
20,
21], leads to a significant information loss. In other words, and as seen in
Figure 1, long windows imply a loss of information about the fast dynamics of delays and hence in an underestimation of the propagation. It is worth noting the magnitude of this effect: almost three-fourths of the causality links are lost for windows of 60 min compared to the use of optimal window lengths. At the same time, using very short time windows seems beneficial, with an increase in the link density. This is nevertheless misleading, as very short windows necessarily include more periods of inactivity—see the aqua line, right
Y axis, representing the reliability, i.e., the fraction of airport pairs for which at least
of the windows have one or more landing events. As previously discussed, these correspond to missing values, and it has been shown that too large a share of them results in an overestimation of the causality [
33].
As to be expected, these changes in the link density, and hence in the number of detected links, have an impact on the values of all other topological metrics. Assortativity, transitivity and IC yield qualitatively similar values when comparing the results of using the optimal window size and a fixed 60 minimum one. The same is not true for the other two metrics, with the diameter and the efficiency being, respectively, under- and overestimated by using a fixed time window of 60 min. It is interesting to note that the former group of metrics are mostly local in nature, while the latter (i.e., those not correctly estimated) focuses on the macro-scale structure of the network. In other words, using a fixed (non-optimal) window size does not bias the structure created by pairs and triplets of airports, possibly those more strongly connected, but instead changes the overall structure of the propagation network.
To better understand how the centrality of the airports is modified by the use of different window sizes,
Figure 7 reports the evolution of the airport ranking according to three centrality measures: in-degree, out-degree and betweenness centrality. Specifically, in each case, we consider the five airports that have the largest centrality when reconstructing the propagation network with the optimal window size and calculate the position in the ranking of these airports when considering a fixed window size. A simple look at
Figure 7 shows that the window size has a profound effect on the centrality of airports, with these top five airports ending, in many cases, in the bottom half of the ranking. The sensitivity of the ranking to the window size is also noticeable, especially in the case of the betweenness. It is worth noting that this centrality metric strongly depends on the macro-scale structure of the network and that this result is therefore aligned with what is observed in
Figure 6.
5. Resampling Validation
As a last point, we analyse another aspect that can influence the optimal window size for estimating the Granger causality. Several studies indicated that the sensitivity and stability of the test can be improved with higher temporal sampling rates [
24,
25]; this nevertheless can be due to two factors: the presence of new high-frequency information which is lost for low sampling rates or alternatively, the simple fact that longer time series (i.e., more data) make tests more statistically significant, even if no new information is included. Here, we approach this problem by leveraging the test suggested in Ref. [
25] involving artificially resampling the analysed time series and comparing the evolution of the resulting
p-values.
We start by considering the two original time series for a given pair of airports, reconstructed using the corresponding optimal window length. We then consider a resampling rate , and synthesise a new pair of time series by firstly downsampling the original data by a factor of and secondly upsampling them by the same factor. The result is a new set of time series whose length is preserved but in which high-frequency information is deleted. We finally compare the two p-values obtained with the original and resampled time series by calculating , where and represent the p-values obtained with, respectively, the resampled and original time series. Note that implies that the causality calculated with the resampled time series is stronger than that seen in the original case.
Figure 8 reports violin plots of the distribution of
as a function of the resampling rate. In all cases, most
are greater than zero—see also the aqua line, right
Y axis, representing the percentage of links for which
. In addition, the average of all distributions sits around five, thus indicating that the
p-values for the original time series are significantly smaller than those for the resampled time series. In synthesis, one can conclude that the length of the optimal window is a function of the information the time series contains, specifically of its high-frequency part, and not directly of the time series length.
6. Discussion and Conclusions
In this paper, we have studied the effect of the temporal resolution used in the analysis of delay propagation in air transport, both in a synthetic model and in real operational data. While this is a problem well-known in the literature [
24,
25,
26], air transport presents several idiosyncrasies, most notably the fact that events generating the time series (i.e., landings and their delays) are discrete. As shown in
Section 3, the window size that maximises the sensitivity of a Granger causality test is a balance between several elements; in the simplest and noiseless case, it is between the need for detecting high-frequency dynamics through small windows and the need for windows long enough to contain a significant number of landing operations. Additional elements affecting the optimal window length include the presence of inactivity periods for which no events are available and the shape of the probability distribution of landing separation times, such that more heterogeneously spaced events require longer windows.
When applying these ideas to real data, three main conclusions can be drawn. First of all, due to the high complexity of real operations, the estimation of the best window size for a given airport pair can only be obtained numerically by systematically checking different window sizes and choosing the one minimising the test
p-value. While these optima are related to some properties of the landing operations (see
Figure 2), as previously discussed, such relationships are too weak to support the creation of an analytical solution. Secondly, the same approach can be used to infer the time required by delays to propagate between airports. Once again, results are loosely related to some airport characteristics, such as their distance and size, but no clear pattern can be discerned (see
Figure 3). Thirdly, the use of a non-optimal window size has profound effects on the estimated structure of the delay propagation networks, with global features being more affected than micro-scale ones (see
Figure 6).
It is worth noting that research works have hitherto used a fixed window length of one hour [
15,
16,
17,
18,
19,
20,
21]. This prima facie made sense, as it is a natural way of dividing the day in equal parts; simplifies calculations, as windows of different days always start at the same time and is aligned with the way airport capacity is defined (i.e., operations per hour). We have nevertheless shown that the use of this fixed window substantially underestimates causality relationships and that the reconstructed network topology is thus misrepresented. This should be interpreted as a note of caution by the community, as previously published results may be partly incorrect. The present work is thus not only a theoretical exercise. Rather, it has direct implications on the analysis of real air transport delays.
Regarding the delay propagation times between pairs of airports reported in
Figure 3, it is relevant to compare the results obtained here with those of Ref. [
12]. Specifically, those authors reported a correlation between the distance between two airports and the corresponding propagation timescale, and the latter was also modified by the role of the airport in the network (e.g., hub vs. peripheral airport), albeit to a lesser degree. Similar trends can be observed in our results, albeit they never are statistically significant (see
Figure 3 and
Figure 5). Several causes can explain such a discrepancy. First of all, the detection of the delay propagation patterns and the estimation of their time scale is performed in Ref. [
12] by using lagged linear correlations as opposed to the Granger causality considered here. Significant differences ought to be expected, especially taking into account that the Granger test is designed to detect indirect propagation patterns, i.e., when delays propagate between two airports through an intervening one. Secondly, Ref. [
12] is based on the use of windows of a fixed size of 30 min; even though the analysis is based on a different technique to detect the delay propagation, the use of a fixed window length seems sub-optimal in light of what is presented here. Thirdly, the possibility that differences between the US and European systems, e.g., in the management of delays [
61,
62], could be the basis of such discrepancies cannot be ruled out.
Future works will have to be targeted at confirming and extending the results obtained here. Firstly, given the variability observed when changing the temporal scale of the time series reconstruction, it stands to reason to expect even more heterogeneous results when links are obtained with different metrics, e.g., correlation or non-linear causality ones. Specifically, non-linear causality metrics, such as non-linear versions of the Granger causality [
42,
63,
64] or the Transfer Entropy [
65], usually require longer time series to obtain reliable results. As a consequence, correctly estimating the optimal time scale may become even more critical. Secondly, similar analyses could be performed when considering alternative ways of reconstructing delay time series, e.g., by analysing each airline independently [
15], thus effectively moving from a single- to a multi-layer representation [
66], or by estimating the departure (as opposed to arrival) delays. Thirdly, it would be interesting to check whether similar results are also obtained in other large air transport systems, e.g., the US or Chinese ones. It would also simplify future studies to have an explicit a priori estimation of the optimal window length, for instance obtained through a machine learning model trained over a large set of airports and their operations.
As a final note of caution, it is worth recalling that functional networks are a powerful tool to describe delay propagation patterns in air transport, as customarily conducted in other scientific fields [
44,
45,
46,
47,
48,
49]. At the same time, this kind of analysis provides little information about the factors and reasons behind them. In other words, they allow to describe, but not to explain the dynamics of delay propagation. While it is in principle possible to combine functional network representations with other operational data, and hence understand if and how much the latter affect the former, this has yet to be accomplished and represents an open field of research.