6.1. Example I: Bathtub-Shaped Functions
In this first example, the least squares method (Equation (
8)) was applied to estimate the parameters of the quadratic and competing risk bathtub models (Equations (
1) and (
2)) using
of each dataset shown in
Figure 3. The remaining
of the data not used for model fitting were then predicted by the models, and the RMSE (Equation (13)), PRMSE (Equation (14)),
(Equation (15)), and EC were computed, as shown in
Table 3. In each case, the sample size was four years or (
) months, with the exception of the 2020–2021 time series, which contained
months of data at the time the calculations were performed.
Table 3 shows that the competing risk model yielded better results on four out of the seven datasets. Additionally, is was very close across almost all the measures when the quadratic model achieved the best performance. However, both models performed poorly on the 1980 and 2020–2022 datasets due to their respective W- and K-shaped curves, resulting in undesirable low and negative
values for the quadratic model.
Figure 4 illustrates the U.S. recession data from 2001–2005, along with the fitted quadratic model and the
confidence interval (represented by the grey region around the quadratic model fit), calculated using Equation (16). At
, we have a dashed vertical line indicating the use of the initial 43 months to fit the model and the final 5 months to check its predictive capabilities. Two out of forty-eight observed datapoints fall outside the confidence interval, resulting in a conservative empirical coverage of
.
Figure 5 presents the 1980–1993 dataset, along with the fitted competing risk model and its confidence interval. Five out of the forty-eight observed data points are outside the confidence interval, achieving a slightly optimistic empirical coverage (EC) of
. A better model fit and more accurate predictions result in a lower sum of squared errors (SSE) and consequently a smaller variance of SSE (Equation (17)). This leads to a narrower confidence interval (CI) (Equation (16)), indicating more precise predictions for that model. However, narrower confidence intervals tend to have lower empirical coverage, as more datapoints fall outside the confidence limits.
In addition to validating the ability of the bathtub-shaped function to fit the resilience curves using statistical methods, the interval-based metrics described in
Section 5 are also predicted. These metrics include the performance preserved (PP, Equation (19)), normalized performance preserved (ANPP, Equation (20)), performance lost (PL, Equation (21)), normalized performance lost (ANPL, Equation (22)), performance preserved from the minimum (PFM, Equation (23)), average performance preserved (APP, Equation (24)), average performance lost (APL, Equation (25)), and weighted average performance preserved before/after the minimum (WAPP, Equation (26)).
Figure 6 shows the relative error of the quadratic and competing risk resilience metrics computed using the 1990–1993 dataset, assuming
, which assigns equal weight to the average performance preserved before and after the minimum. For example, the relative error for performance preserved is calculated as follows:
where
and
are the empirical performance preserved and model predictions, respectively.
Figure 6 shows that the quadratic model better predicted four out of seven metrics, but the relative error was lower than
across all metrics for both models, with the exception of the average normalized performance lost (ANPL, Equation (22)), because the system improved its performance to a higher level than the initial performance. As a result, a negative performance loss is experienced in the predictive period, and the respective relative errors for the ANPL of the quadratic and competing risk models are
and
. Since these values are substantially larger than the errors in the other metrics, they are not shown in
Figure 6 to enable an easier comparison of the other seven metrics.
6.2. Example II: Mixture Distributions
In this second example, the least squares method (Equation (
8)) was applied to estimate the parameters of the mixture distribution models (Equation (
3)) with
of the seven datasets present in
Figure 3. Four different mixtures of
and
were adopted based on reliability engineering [
60] to model disruptions and recoveries, including the Weibull (Wei) distribution
and the exponential (Exp) distribution, which is obtained when
in Equation (28). For simplicity, the transition from degradation is considered to be a constant
. Different recovery transition forms were considered, including
to illustrate an increasing trend characteristic of the economic data. The remaining
of the data not used for model fitting were then predicted using the models, and the RMSE (Equation (13)), PRMSE (Equation (14)),
(Equation (15)), and EC were calculated, as presented in
Table 4 for
, which worked well for all the recessions considered in
Figure 3.
Table 4 shows that mixing Exponential distributions (Exp–Exp) was not appropriate to fit any of the datasets. In contrast, at least one of the other
and
combinations resulted in a high
for all the datasets, except for the 1980 and 2020–2022 datasets, due to their W-shaped and K-shaped curves, respectively. In a few cases, the bathtub-shaped curves performed slightly better than the mixture distribution models, perhaps because the mixture models include numerous parameters that increase model complexity and do not significantly improve the fit.
Figure 7 presents the (Wei–Exp) model fitted to the 1990–1993 data, and
Figure 8 shows the (Exp–Wei) and (Wei–Wei) model fitted to the 1981–1983 data, since the (Exp–Wei) model performed better with respect to the RMSE and
. However, the (Wei–Wei) model achieved a lower PRMSE, and corresponding confidence interval. In
Figure 7, since one observed datapoint is outside the confidence interval, the empirical coverage is
. In
Figure 8, all but one datapoint are within the (Wei–Wei) confidence interval (light grey), achieving 97.91% empirical coverage. However, three datapoints are outside the (Exp–Wei) confidence interval (dark grey), resulting in 93.75% empirical coverage for this model.
The relative error (Equation (27)) of the predictions for each of the interval-based metrics described in
Section 5, from performance preserved (PP, Equation (19)) to weighted average performance preserved before/after minimum (WAPP, Equation (26)) with
, is reported in
Figure 9 for mixture distribution combinations fitted to the 1990–1993 data. The average normalized performance lost (ANPL, Equation (22)) exhibited higher errors due to the normalization step, with
values of
,
,
, and
for Exp–Exp, Wei–Exp, Exp–Wei, and Wei–Wei, respectively. These are not reported in
Figure 9 to maintain comparability among the smaller errors in the other seven metrics.
Figure 9 and the numerical values from Equation (22) indicate that the Wei–Exp model achieved the lowest relative error in four of the eight predicted metrics, while the Exp–Exp and Wei–Wei models each achieved the lowest in the two metrics. Considering
Table 4 and
Figure 9 together, the combination of Weibull for deterioration and exponential for recovery most frequently predicted performance and metrics accurately for the datasets considered, although several other combinations also exhibited a similar accuracy.
6.3. Example III: Covariates
In this third example, the covariate models based on multiple linear regression (Equation (
5)), multiple linear regression with interaction (Equation (
6)), and polynomial regression (Equation (
7)) were applied to all of the data sets shown in
Figure 3, and their goodness-of-fit and predictive accuracy were assessed. To this end, covariates were collected from January of the year in which a recession began and subsequently normalized by dividing all values in all intervals by the maximum value observed for that covariate in all of the intervals considered.
Forward stepwise selection [
52] was performed to identify a subset of covariates that exhibited a strong relationship to the change in performance. The forward stepwise selection method begins with a model (MLR, MLR with interaction, or polynomial regression) containing no covariates, applies maximum likelihood estimation with
of a dataset according to Equation (
12) in order to estimate the parameters of the model to predict the changes in performance, predicts the remaining
of the data not used for model fitting, computes the RMSE (Equation (13)), PRMSE, (Equation (14)), and
(Equation (15)), tests the addition of each individual covariate one at a time using the
criterion, adds the covariate (if any) whose inclusion produces the greatest improvement in model fit according to the
value, and repeats this process until none of the remaining covariates increase the proportion of the variation in the change in performance explained by the model.
To characterize the degradation and recovery curves of each recession, it was important to investigate, identify, and collect covariates that could be useful to forecast changes in economy systems [
61,
62] over time. Therefore, eight covariates were considered for each of the datasets, with the exception of the 2020-22 U.S. recession, which considered twenty-one covariates. The eight covariates included in all of the datasets are shown in
Table 5, while the 2020–2022 U.S. recession induced by the COVID-19 pandemic also considered another thirteen covariates related to additional market factors for which data are more readily available in recent years, as well as the pandemic, and are shown in
Table 6.
For each dataset,
Table 7 reports the set of covariates selected for inclusion in the model according to the forward stepwise selection procedure, as well as the corresponding goodness-of-fit measures for each of the three types of regressions considered.
Table 7 indicates that at least one of the three alternative regression models achieved a very high
on each dataset, with the exception of the 1980 data, where MLR achieved the highest
, which was still much higher relative to the values attained by the bathtub-shaped functions and mixture distributions reported in
Table 3 and
Table 4.
Table 7 also indicates that both the interaction and polynomial regression outperformed the simpler MLR on all measures on six of the seven data sets, with fewer or the same number of covariates, and it achieved competitive values on the dataset where MLR obtained the smallest values of RMSE and highest
. These results indicate that both MLR with interaction or polynomial regression may also adequately characterize the 1980 data set, despite these more complex models being penalized for including additional coefficients for interactions between covariates or higher-order terms. Thus, MLR with interactions achieved the best or very competitive values on all seven datasets.
Since neither bathtub-shaped functions nor mixture distributions were able to characterize the 1980 and 2020–2022 U.S. recessions well,
Figure 10 and
Figure 11 show these datasets and the fitted covariate model, respectively, as well as their 95% confidence intervals computed using Equation (18). The causes of the 1980 U.S. recession included the Federal Reserve’s contractionary monetary policy to combat double-digit inflation and the lingering effects of the energy crisis. Since MLR with interaction performed well on all datasets and exhibited the smallest PRMSE and was a close second to the MLR with respect to the RMSE and
measures given in
Table 7 on the 1980 data set, the results of this model are shown in
Figure 10. Five of the eight covariates used to produce the resilience curve were included in the stepwise procedure in the following order: crude oil price, Standard & Poor’s 500 Index stock market, treasury yield curve, mortgage rate, and federal funds. The covariate model tracked the trends well, despite multiple shocks. Moreover, all but one datapoint of the predicted points fell within the confidence intervals, demonstrating a high empirical coverage.
For the job losses caused by COVID-19 in 2020, the model fit data showed in
Figure 11 were computed using five of twenty-one available covariates using the stepwise procedure, including durable goods orders, new orders index, unemployment benefits claims, workplace closures, and consumer activity. The covariate model based on multiple linear regression with interaction achieved a very high
of
, as well as very low RMSE and PRMSE values, and it characterized the sharp drop in the data very well. A good fit was not possible using the bathtub-shaped functions or any of the mixture distribution models, since the curve was neither U or V-shaped, preventing good predictions for this model. Moreover, all but three datapoints were within the confidence interval, achieving an empirical coverage of 93.75% and substantially improved predictive abilities relative to the models without covariates.
For the sake of comparison with the results achieved using the bathtub-shaped functions and mixture distributions shown in
Figure 6 and
Figure 9, the three types of regression fit to the 1990–1993 recession data, and the resilience metrics were computed by discretizing the integration in
Section 5 into summations over the time intervals. Thus,
Figure 12 shows the relative errors (Equation (27)) of the resilience metrics, except for the normalized average performance lost (ANPL, Equation (22)), where
,
, and
for multiple linear regression, multiple linear regression with interaction, and polynomial regression, respectively, in order preserve visual comparability among the smaller errors in the remaining metrics.
Figure 12 and the numerical values determined from Equation (22) indicate that the simple MLR and MLR with interaction were competitive, since both of these regression methods achieved the lowest relative error on four of the eight metrics predicted. Moreover, a comparison of
Figure 6,
Figure 9, and
Figure 12 indicates that the covariate models improved the prediction of the resilience metrics substantially, since all three types of regression achieved relative errors of less than
across all the metrics.
Normality assumption validation: As described in
Section 3.3, in order to make valid inferences from regression models, the normality assumption for a model’s residuals should be verified according to methods such as the inspection of a histogram, QQ plot, and scatter plot of the variance of the residuals.
Figure 13 and
Figure 14 show these methods of validation for the 1980 and 2020–2022 model residuals, respectively.
Figure 13 exhibits a right-skewed bell-shaped histogram, which may be a result of the second more severe shock, but it does not violate the normality assumption. The line of the QQ plot is also straight, which suggests that the residuals are normally distributed. Both scatter plots of the residuals against time and
exhibit constant variance, which support the independence and linearity assumptions, respectively. The shocks occurring in the 1980 recession were caused by unusual events in the US. Therefore, two outliers were observed in each of the plots shown in
Figure 13, accurately reflecting the potential surprises inherent to economic systems. Hence, there is no justifiable reason to remove them from the model. Keeping these outliers in the model enables conservative predictions in future intervals, where there is a possibility that extreme events may occur again.
Figure 14 exhibits a left-skewed bell-shaped histogram, which may be a result of the sharp degradation but does not violate the normality assumption. The QQ plot also follows an approximately straight line, suggesting that the residuals are normally distributed. Both scatter plots of the residuals against time and
exhibit non-constant variance, where the variance increases over time or as
increases, indicating that the independence and linearity assumptions cannot be verified. These observations do not necessarily violate the normality assumption [
52] because these patterns may be a result of the small sample size, since the 2020–2022 U.S. recession data contain only 34 intervals. Since the histogram and QQ plot suggest that the residuals are normally distributed, and it is not possible to increase the sample size, the subtle patterns and possible violations observed in the scatter plots are not considered further. The outliers visible in the histogram and QQ plot shown in
Figure 14 are legitimate observations that explain the strong shock experienced during the 2020–2022 US recession. Therefore, these outliers were not excluded due to their extremeness, because doing so could have distorted the results by removing valuable information about the variability inherent in economic systems.
Formal statistical tests [
52] such as the Kolmogorov–Smirnov, Shapiro–Wilk, and Anderson–Darling tests can be performed to check whether the null-hypothesis that a data sample is normally distributed is rejected. For all tests, the null hypothesis may be rejected with a confidence interval of
when the respective
p-values are less than
. Applying these three tests to the residuals of the 1980 model, the respective
p-values were 0.00145068, 0.00000006, and 0.00000012, indicating that all the tests rejected the null hypothesis that the data were normally distributed, which disagrees with the visual results inferred from the plots shown in
Figure 13. For the 2020–2022 model residuals, the respective
p-values for the three tests were 0.251607, 0.051159, and 0.112513, which do not reject the null hypothesis, agreeing with the histogram and QQ plot shown in
Figure 14. Formal statistical tests for normality may also possess a low power on small sample sizes, and tests can be sensitive to any deviation from an idealized normal bell curve, such as the outliers observed in the data that suggest a lack of normality. As discussed previously, the outliers were not related to typos during data collection; therefore, their removal from the analysis cannot be justified. Hence, a visual assessment of normality is often more valuable than a formal test, especially for data possessing of a small sample size.
6.4. Example IV: Comparison of Resilience Models
In this final example, the best bathtub, mixture, and covariate model for each of the datasets from
Figure 3 are compared.
Table 8 summarizes the goodness-of-fit measures for the best bathtub, mixture, and covariate model on each dataset. The results indicate that the covariate models performed best across all seven data sets considered for all measures, with the exception of PRMSE on the 1974–1976 and 2001–2005 datasets, where the bathtub-shaped hazard functions and mixture distributions also achieved very good results because of the J-shaped curve that was exhibited. Moreover, MLR with interaction consistently performed best among the alternative regression models. Thus, the bathtub and mixture models are too rigid to model curves exhibiting more than one decrease and increase. Often, bathtub and mixture models also lack the flexibility to fit curves exhibiting a single decrease and increase. In both cases, poor predictions may result and should therefore be used with caution.
For each of the seven datasets,
Table 8 shows that the best covariate model achieved an
, with the exception of the 1980 data, where the system experienced two shocks. Nevertheless, the MLR with the interaction model still improved the
value from
(competing risk) and
(Weibull-Weibull) to
. Moreover, the bathtub-shaped hazard functions and mixture distributions did not fit the 2020–2022 data possessing a K-shaped curve well, but the MLR with the interaction model improved the
value from
(competing risk) and
(Weibull–Weibull) to
. Thus, we conclude that bathtub-shaped functions can predict curves possessing J, L, U, and V shapes, but not K and W shapes. On the other hand, covariate models are able to accurately capture the trends in any curve, as long as an appropriate set of covariates that describe the change in the performance of a system are available. It should also be noted that RMSE,
, and empirical coverage explicitly penalize covariate models for their additional parameters, yet the covariate models consistently achieved higher
values and produced more accurate empirical coverage values while simultaneously tracking and predicting performance and resilience metrics much more precisely than the simpler parametric models, which do not explain the underlying factors driving degradation and recovery.
Figure 15 and
Figure 16 compare the best models from each of the three classes of models for the 1990–1993 and 2001–2005 recessions, respectively, since each of the best models fit these two datasets well.
Figure 15 shows that although the competing risk bathtub-shaped hazard function and Wei–Exp mixture distribution both fit both the datasets well and achieve satisfactory goodness-of-fit measures and resilience metrics, their fitted curves are smooth and cannot capture the small variations in the change in performance. In contrast, the covariate model using MLR with interaction both tracks and predicts the observed trends more precisely, thanks to the inclusion of additional information about external processes that degrade and restore system performance. Similarly,
Figure 16 indicates that the covariate model based on MLR with interaction also tracks and predicts the 2001–2005 dataset better than the quadratic bathtub hazard function and Wei–Exp mixture distribution.
To further compare the three classes of the proposed models,
Figure 17 and
Figure 18 illustrate the best-fitting model for each class for the 1980 and 2020–2022 recessions. These figures highlight the shortcomings of the bathtub-shaped hazard functions and mixture distributions, which performed poorly in capturing the complexity of these recessions. This poor performance underscores the importance of the flexibility of the covariate modeling approach.
Figure 17 and
Figure 18 reveal that bathtub-shaped functions and mixture distribution models are inadequate for modeling multiple degradation events, such as the W-shaped curve observed in the 1980 recession, or the sharp drops characteristic of the K-shaped curves seen in the 2020–2022 data. In contrast, the covariate model based on multiple linear regression (MLR) with interaction terms effectively tracks these curves, accurately capturing both small variations and rapid changes in performance. Including additional relevant covariates could further reduce residuals and enhance the adjusted
value, indicating even better model performance. These results suggest that when a system experiences multiple disruptive events or sudden performance declines, predictions can be substantially improved by consistently collecting information about external activities that impact system behavior. This approach not only enhances model accuracy but also provides a more comprehensive understanding of the factors influencing system resilience.
To compare the resilience metrics, the relative errors (Equation (27)) of the predictions for each of the interval-based metrics described in
Section 5 computed based on competing risks, Wei–Exp, and MLR with interaction models are shown in
Figure 19 for the 1990–1993 data, since all these models fit this dataset well. Due to the normalization step, the relative errors of the normalized average performance lost (ANPL, Equation (22)) were
,
, and
for the competing risk, Wei–Exp and interaction models, respectively, and they are not shown in
Figure 19 to preserve comparability among the smaller errors in the other seven metrics. The results suggest that the covariate model predicted resilience metrics for the 1990–1993 data best.
In general, bathtub-shaped functions and mixture distribution models lack flexibility, imposing strong assumptions about trends that can result in poor model fits, low goodness-of-fit, and inaccurate predictions when the data do not conform to a prescribed shape. Thus, the resilience models based on multiple linear regression appear to be most suitable for a wide variety of trends. Hence, they also outperform traditional models when quantifying interval-based metric system resilience. Given these findings, it is recommended that policymakers adopt the covariate modeling approach for its superior flexibility, accuracy, and comprehensive assessment capabilities. Covariate models enable more informed decisions regarding the resilience of systems, helping to identify efficient resource allocation to enhance system resilience, which leads to better preparedness for multiple disruptive events and may reduce the costs associated with emergency responses. It is important to note that the covariate models presented in this study might be enhanced by (i) adding additional relevant covariates to the model to provide a more comprehensive understanding of the factors influencing resilience; (ii) detecting and addressing nonlinear collinearity between covariates to simplify interpretability and reduce model complexity; (iii) continuously updating the models with new data and insights to ensure that models remain relevant and accurate in changing environments; (iv) integrating sophisticated statistical techniques to improve the predictive capabilities; and (v) engaging with experts from various fields to provide diverse perspectives and innovative solutions to improve resilience modeling.