Figure 1.
Illustrative example: Decision F1 score as a function of normalized aleatoric and epistemic confidence thresholds.
Figure 1.
Illustrative example: Decision F1 score as a function of normalized aleatoric and epistemic confidence thresholds.
Figure 2.
Air quality monitoring stations in Trondheim, Norway.
Figure 2.
Air quality monitoring stations in Trondheim, Norway.
Figure 3.
Air quality level over one year of one representative monitoring station in Trondheim, where the air pollutant is commonly at a Very Low level and rarely exceeds Low.
Figure 3.
Air quality level over one year of one representative monitoring station in Trondheim, where the air pollutant is commonly at a Very Low level and rarely exceeds Low.
Figure 4.
Persistence forecast of air pollutant over one month in one representative monitoring station.
Figure 4.
Persistence forecast of air pollutant over one month in one representative monitoring station.
Figure 5.
PM-value regression using the XGBoost model over one month in one representative monitoring station.
Figure 5.
PM-value regression using the XGBoost model over one month in one representative monitoring station.
Figure 6.
Feature importance of a trained XGBoost model indicating how useful each input feature when making a prediction.
Figure 6.
Feature importance of a trained XGBoost model indicating how useful each input feature when making a prediction.
Figure 7.
Predicting the threshold exceedance probability of the air pollutant level using an XGBoost model in one representative monitoring station.
Figure 7.
Predicting the threshold exceedance probability of the air pollutant level using an XGBoost model in one representative monitoring station.
Figure 8.
Air quality prediction interval using quantile regression of a Gradient Tree Boosting model.
Figure 8.
Air quality prediction interval using quantile regression of a Gradient Tree Boosting model.
Figure 9.
Learning curve of training a BNN model to forecast PM-values. (Left:) negative log-likelihood loss; (Center:) KL loss estimated using MC sampling; (Right:) learning rate of exponential decay.
Figure 9.
Learning curve of training a BNN model to forecast PM-values. (Left:) negative log-likelihood loss; (Center:) KL loss estimated using MC sampling; (Right:) learning rate of exponential decay.
Figure 10.
Probabilistic forecasting of multivariate time-series of air quality using a BNN model in one representative monitoring station.
Figure 10.
Probabilistic forecasting of multivariate time-series of air quality using a BNN model in one representative monitoring station.
Figure 11.
Predicting threshold exceedance probability of air pollutant level using a BNN model.
Figure 11.
Predicting threshold exceedance probability of air pollutant level using a BNN model.
Figure 12.
Predicting threshold exceedance probability by transforming PM-value regression into binary predictions.
Figure 12.
Predicting threshold exceedance probability by transforming PM-value regression into binary predictions.
Figure 13.
Probabilistic forecasting of multivariate time-series air quality using a standard neural network model with MC dropout.
Figure 13.
Probabilistic forecasting of multivariate time-series air quality using a standard neural network model with MC dropout.
Figure 14.
Predicting the threshold exceedance probability of air pollutants level using a standard neural network with MC dropout.
Figure 14.
Predicting the threshold exceedance probability of air pollutants level using a standard neural network with MC dropout.
Figure 15.
Probabilistic forecasting of multivariate time-series air quality using a deep ensemble.
Figure 15.
Probabilistic forecasting of multivariate time-series air quality using a deep ensemble.
Figure 16.
Predicting threshold exceedance probability of air pollutants level using a deep ensemble.
Figure 16.
Predicting threshold exceedance probability of air pollutants level using a deep ensemble.
Figure 17.
Probabilistic forecasting of multivariate time-series air quality using an LSTM model with MC dropout.
Figure 17.
Probabilistic forecasting of multivariate time-series air quality using an LSTM model with MC dropout.
Figure 18.
Predicting threshold exceedance probability of air pollutants level using an LSTM model with MC dropout.
Figure 18.
Predicting threshold exceedance probability of air pollutants level using an LSTM model with MC dropout.
Figure 19.
Probabilistic forecasting of multivariate time-series air quality using a GNN model with MC dropout.
Figure 19.
Probabilistic forecasting of multivariate time-series air quality using a GNN model with MC dropout.
Figure 20.
Predicting threshold exceedance probability of air pollutants level using a GNN model with MC dropout.
Figure 20.
Predicting threshold exceedance probability of air pollutants level using a GNN model with MC dropout.
Figure 21.
Probabilistic PM-value regression using a SWAG model.
Figure 21.
Probabilistic PM-value regression using a SWAG model.
Figure 22.
Probabilistic threshold exceedance classification using a SWAG model.
Figure 22.
Probabilistic threshold exceedance classification using a SWAG model.
Figure 23.
Comparison of uncertainty estimation in PM-value regression when training (top) without adversarial training versus (bottom) without adversarial training. Using Adversarial training leads to smoother predictive distribution; thus, lower NLL (less overconfident predictions).
Figure 23.
Comparison of uncertainty estimation in PM-value regression when training (top) without adversarial training versus (bottom) without adversarial training. Using Adversarial training leads to smoother predictive distribution; thus, lower NLL (less overconfident predictions).
Figure 24.
Comparison of uncertainty estimation in threshold exceedance classification when training (top) without adversarial training versus (bottom) without adversarial training. Using Adversarial training leads to smoother predictive distribution; thus, lower CE (less overconfident predictions).
Figure 24.
Comparison of uncertainty estimation in threshold exceedance classification when training (top) without adversarial training versus (bottom) without adversarial training. Using Adversarial training leads to smoother predictive distribution; thus, lower CE (less overconfident predictions).
Figure 25.
Comparison of empirical performance of the selected probabilistic models in the PM-value regression task. The comparison is according to five performance metrics (left to right): CRPS, NLL, RMSE, PICP, and MPIW. Blue highlights the best performance, while red highlights the worst performance. The arrows alongside the metrics indicate which direction is better for that specific metric.
Figure 25.
Comparison of empirical performance of the selected probabilistic models in the PM-value regression task. The comparison is according to five performance metrics (left to right): CRPS, NLL, RMSE, PICP, and MPIW. Blue highlights the best performance, while red highlights the worst performance. The arrows alongside the metrics indicate which direction is better for that specific metric.
Figure 26.
Comparison of empirical performance of the selected probabilistic models in the threshold exceedance classification task. The comparison is according to five performance metrics (left to right): Brier score, cross-entropy, FI score, precision, and recall. Blue highlights the best performance, while red highlights the worst performance.
Figure 26.
Comparison of empirical performance of the selected probabilistic models in the threshold exceedance classification task. The comparison is according to five performance metrics (left to right): Brier score, cross-entropy, FI score, precision, and recall. Blue highlights the best performance, while red highlights the worst performance.
Figure 27.
Comparison of confidence reliability for the selected probabilistic models in the threshold exceedance task. (Left:) loss versus confidence. (Right:) count versus confidence. The selected models produce are rational, which means the loss-vs-confidence curves are monotonically decreasing.
Figure 27.
Comparison of confidence reliability for the selected probabilistic models in the threshold exceedance task. (Left:) loss versus confidence. (Right:) count versus confidence. The selected models produce are rational, which means the loss-vs-confidence curves are monotonically decreasing.
Figure 28.
Comparison of confidence reliability for the selected probabilistic models in the PM-value regression task. (Left:) loss versus confidence. (Right:) count versus confidence.
Figure 28.
Comparison of confidence reliability for the selected probabilistic models in the PM-value regression task. (Left:) loss versus confidence. (Right:) count versus confidence.
Figure 29.
Impact of adversarial training on predictive uncertainty in PM-value regression, using deep ensemble as an example. (Left:) loss versus confidence. (Right:) count versus confidence.
Figure 29.
Impact of adversarial training on predictive uncertainty in PM-value regression, using deep ensemble as an example. (Left:) loss versus confidence. (Right:) count versus confidence.
Figure 30.
Comparison of decision score in non-probabilistic and probabilistic models. (a) Decision score in a non-probabilistic model as a function of class probability threshold ( corresponding to aleatoric confidence). (b) Decision score in a probabilistic model as a function of both class probability threshold () and model confidence threshold ( corresponding to epistemic uncertainty).
Figure 30.
Comparison of decision score in non-probabilistic and probabilistic models. (a) Decision score in a non-probabilistic model as a function of class probability threshold ( corresponding to aleatoric confidence). (b) Decision score in a probabilistic model as a function of both class probability threshold () and model confidence threshold ( corresponding to epistemic uncertainty).
Table 1.
European Common Air Quality Index.
Table 1.
European Common Air Quality Index.
Index | | |
---|
Very low | 0–25 | 0–15 |
Low | 25–50 | 15–30 |
Medium | 50–90 | 30–55 |
High | 90–180 | 55–110 |
Very High | >180 | >110 |
Table 2.
Summary of performance results when forecasting the PM-value and threshold exceedance using a BNNs model.
Table 2.
Summary of performance results when forecasting the PM-value and threshold exceedance using a BNNs model.
Station | Particulate | PM-Value Regression | Threshold Exceedance Classification |
---|
RMSE↓ | PICP↑ | MPIW↓ | CRPS↓ | NLL↓ | Brier↓ | Precision↑ | Recall↑ | F1↑ | CE↓ |
---|
Bakke Kirke | | 4.81 | 0.99 | 17.62 | 0.51 | 1.29 | 0.04 | 1.00 | 0.44 | 0.61 | 0.13 |
| 5.86 | 0.94 | 26.12 | 0.50 | 1.28 | 0.03 | 1.00 | 0.30 | 0.47 | 0.09 |
E6-Tiller | | 3.77 | 0.92 | 13.25 | 0.54 | 1.39 | 0.02 | 0.00 | 0.00 | 0.00 | 0.08 |
| 9.40 | 0.92 | 34.18 | 0.48 | 1.26 | 0.06 | 0.00 | 0.00 | 0.00 | 0.23 |
Elgeseter | | 3.93 | 0.91 | 12.79 | 0.53 | 1.36 | 0.03 | 0.88 | 0.42 | 0.56 | 0.12 |
| 5.17 | 0.90 | 25.07 | 0.47 | 1.28 | 0.03 | 0.55 | 0.19 | 0.29 | 0.12 |
Torvet | | 4.07 | 0.90 | 10.83 | 0.48 | 1.30 | 0.03 | 0.75 | 0.46 | 0.57 | 0.13 |
| 5.25 | 0.93 | 18.47 | 0.43 | 1.17 | 0.03 | 0.50 | 0.23 | 0.32 | 0.10 |
Table 3.
Summary of performance results when forecasting the PM-value and threshold exceedance using a standard neural network with MC dropout.
Table 3.
Summary of performance results when forecasting the PM-value and threshold exceedance using a standard neural network with MC dropout.
Station | Particulate | PM-Value Regression | Threshold Exceedance Classification |
---|
RMSE↓ | PICP↑ | MPIW↓ | CRPS↓ | NLL↓ | Brier↓ | Precision↑ | Recall↑ | F1↑ | CE↓ |
---|
Bakke kirke | | 5.34 | 0.69 | 9.40 | 0.60 | 2.30 | 0.04 | 0.65 | 0.51 | 0.57 | 0.31 |
| 6.42 | 0.66 | 12.45 | 0.59 | 3.35 | 0.03 | 0.67 | 0.48 | 0.56 | 0.10 |
E6-Tiller | | 3.75 | 0.72 | 7.26 | 0.60 | 2.24 | 0.01 | 0.00 | 0.00 | 0.00 | 0.24 |
| 9.49 | 0.71 | 16.62 | 0.51 | 2.30 | 0.07 | 0.18 | 0.04 | 0.06 | 0.57 |
Elgeseter | | 4.43 | 0.70 | 7.29 | 0.57 | 2.12 | 0.05 | 0.57 | 0.38 | 0.45 | 0.43 |
| 5.59 | 0.69 | 12.11 | 0.51 | 2.38 | 0.04 | 0.37 | 0.32 | 0.34 | 0.17 |
Torvet | | 4.60 | 0.55 | 5.26 | 0.57 | 2.91 | 0.04 | 0.68 | 0.44 | 0.53 | 0.33 |
| 5.63 | 0.62 | 8.94 | 0.51 | 2.51 | 0.03 | 0.56 | 0.35 | 0.43 | 0.14 |
Table 4.
Summary of performance results when forecasting the PM-value and threshold exceedance using a deep ensemble.
Table 4.
Summary of performance results when forecasting the PM-value and threshold exceedance using a deep ensemble.
Station | Particulate | PM-Value Regression | Threshold Exceedance Classification |
---|
RMSE↓ | PICP↑ | MPIW↓ | CRPS↓ | NLL↓ | Brier↓ | Precision↑ | Recall↑ | F1↑ | CE↓ |
---|
Bakke kirke | | 5.29 | 0.77 | 11.65 | 0.57 | 1.67 | 0.05 | 0.69 | 0.53 | 0.60 | 0.55 |
| 6.21 | 0.70 | 14.00 | 0.57 | 2.46 | 0.03 | 0.60 | 0.36 | 0.45 | 0.26 |
E6-Tiller | | 3.78 | 0.77 | 8.46 | 0.58 | 1.84 | 0.01 | 0.00 | 0.00 | 0.00 | 0.34 |
| 9.44 | 0.72 | 16.07 | 0.50 | 2.14 | 0.07 | 0.31 | 0.08 | 0.12 | 1.16 |
Elgeseter | | 4.46 | 0.71 | 7.99 | 0.58 | 2.00 | 0.05 | 0.68 | 0.28 | 0.40 | 0.66 |
| 5.53 | 0.69 | 12.47 | 0.52 | 2.48 | 0.04 | 0.45 | 0.32 | 0.38 | 0.35 |
Torvet | | 4.45 | 0.57 | 5.13 | 0.56 | 2.66 | 0.04 | 0.73 | 0.31 | 0.43 | 0.55 |
| 5.39 | 0.64 | 8.68 | 0.49 | 2.19 | 0.03 | 0.62 | 0.19 | 0.29 | 0.30 |
Table 5.
Summary of performance results when forecasting PM-value or threshold exceedance using an LSTM model with MC dropout.
Table 5.
Summary of performance results when forecasting PM-value or threshold exceedance using an LSTM model with MC dropout.
Station | Particulate | PM-Value Regression | Threshold Exceedance Classification |
---|
RMSE↓ | PICP↑ | MPIW↓ | CRPS↓ | NLL↓ | Brier↓ | Precision↑ | Recall↑ | F1↑ | CE↓ |
---|
Bakke kirke | | 5.01 | 0.88 | 14.01 | 0.53 | 1.47 | 0.05 | 0.66 | 0.53 | 0.58 | 0.25 |
| 6.25 | 0.82 | 19.28 | 0.54 | 1.78 | 0.03 | 0.59 | 0.48 | 0.53 | 0.14 |
E6-Tiller | | 3.90 | 0.72 | 7.45 | 0.62 | 2.31 | 0.02 | 0.00 | 0.00 | 0.00 | 0.11 |
| 9.68 | 0.74 | 18.96 | 0.53 | 2.03 | 0.08 | 0.24 | 0.12 | 0.16 | 0.43 |
Elgeseter | | 4.32 | 0.72 | 8.91 | 0.59 | 2.10 | 0.05 | 0.58 | 0.40 | 0.47 | 0.28 |
| 5.98 | 0.73 | 15.14 | 0.55 | 2.64 | 0.05 | 0.30 | 0.29 | 0.30 | 0.24 |
Torvet | | 4.19 | 0.56 | 6.88 | 0.58 | 4.79 | 0.05 | 0.58 | 0.42 | 0.49 | 0.30 |
| 5.81 | 0.61 | 11.33 | 0.54 | 4.03 | 0.03 | 0.43 | 0.35 | 0.38 | 0.1 |
Table 6.
Summary of performance results when forecasting the PM-value or threshold exceedance using a GNN model with MC dropout.
Table 6.
Summary of performance results when forecasting the PM-value or threshold exceedance using a GNN model with MC dropout.
Station | Particulate | PM-Value Regression | Threshold Exceedance Classification |
---|
RMSE↓ | PICP↑ | MPIW↓ | CRPS↓ | NLL↓ | Brier↓ | Precision↑ | Recall↑ | F1↑ | CE↓ |
---|
Bakke kirke | | 4.70 | 0.88 | 12.33 | 0.52 | 1.41 | 0.05 | 0.61 | 0.53 | 0.56 | 0.21 |
| 6.26 | 0.79 | 16.10 | 0.54 | 1.83 | 0.03 | 0.43 | 0.36 | 0.39 | 0.11 |
E6-Tiller | | 3.80 | 0.83 | 9.14 | 0.57 | 1.60 | 0.02 | 0.00 | 0.00 | 0.00 | 0.11 |
| 9.46 | 0.80 | 19.89 | 0.48 | 1.59 | 0.07 | 0.19 | 0.06 | 0.09 | 0.35 |
Elgeseter | | 3.98 | 0.83 | 9.37 | 0.54 | 1.51 | 0.04 | 0.65 | 0.45 | 0.53 | 0.19 |
| 5.80 | 0.79 | 15.07 | 0.50 | 1.60 | 0.04 | 0.35 | 0.23 | 0.27 | 0.17 |
Torvet | | 4.27 | 0.68 | 6.19 | 0.50 | 2.04 | 0.05 | 0.55 | 0.46 | 0.50 | 0.22 |
| 5.55 | 0.70 | 10.39 | 0.47 | 1.83 | 0.03 | 0.36 | 0.35 | 0.35 | 0.11 |
Table 7.
Summary of performance results when using a SWAG model with adversarial training.
Table 7.
Summary of performance results when using a SWAG model with adversarial training.
Station | Particulate | PM-Value Regression | Threshold Exceedance Classification |
---|
RMSE↓ | PICP↑ | MPIW↓ | CRPS↓ | NLL↓ | Brier↓ | Precision↑ | Recall↑ | F1↑ | CE↓ |
---|
Bakke kirke | | 5.51 | 0.79 | 13.13 | 0.58 | 1.64 | 0.04 | 0.66 | 0.64 | 0.65 | 0.20 |
| 6.66 | 0.78 | 17.95 | 0.57 | 2.03 | 0.04 | 0.49 | 0.61 | 0.54 | 0.12 |
E6-Tiller | | 3.76 | 0.79 | 9.25 | 0.59 | 1.82 | 0.01 | 0.00 | 0.00 | 0.00 | 0.10 |
| 9.35 | 0.82 | 21.28 | 0.49 | 1.73 | 0.08 | 0.19 | 0.08 | 0.11 | 0.49 |
Elgeseter | | 4.53 | 0.73 | 9.33 | 0.59 | 1.97 | 0.04 | 0.60 | 0.45 | 0.52 | 0.21 |
| 5.76 | 0.76 | 16.61 | 0.53 | 1.96 | 0.04 | 0.37 | 0.45 | 0.41 | 0.18 |
Torvet | | 4.58 | 0.79 | 10.33 | 0.54 | 1.63 | 0.04 | 0.67 | 0.50 | 0.57 | 0.20 |
| 5.62 | 0.71 | 12.48 | 0.50 | 1.76 | 0.03 | 0.50 | 0.42 | 0.46 | 0.13 |
Table 8.
Comparison of the previous works and the proposed models when quantifying uncertainty in data-driven forecast of air quality.
Table 8.
Comparison of the previous works and the proposed models when quantifying uncertainty in data-driven forecast of air quality.
Metric | QR [29] | GP [28] | ConvLSTM [26] | BNN | Deep Ensembles | GNN | SWAG |
---|
RMSE↓ | 6.17 | 6.45 | 6.46 | 5.17 | 5.59 | 5.80 | 5.76 |
PICP↑ | 0.72 | 0.92 | 0.71 | 0.90 | 0.69 | 0.79 | 0.76 |
MPIW↓ | 14.79 | 36.96 | 12.18 | 25.07 | 12.11 | 15.07 | 16.61 |
CRPS↓ | NA | 0.53 | 0.56 | 0.47 | 0.51 | 0.50 | 0.53 |
NLL↓ | NA | 1.36 | 2.48 | 1.28 | 2.38 | 1.60 | 1.96 |