1. Introduction
Seasonal climate forecasting systems are primary tools to derive predictions of the seasonal climatic conditions several months in advance and, due to recent improvements in forecasting, they are gaining relevance as support to decision-making processes in a wide range of sectors, such as energy, agriculture, water and risk management [
1,
2]. Several centers worldwide, such as the National Centers for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF), provide seasonal climate predictions using fully coupled ocean-atmosphere general circulation models (GCMs). However, the effective spatial resolutions of global models, in the order of 100–300 km, are too coarse to provide suitable information for the regional and local scales generally required by sectoral applications. Such scale mismatch can result in systematic errors when model simulations are compared to observations, often preventing their direct usage by end users [
3,
4].
In order to improve the local-scale representativeness of predictions and to provide tailored information supporting decision-making processes, both dynamical and statistical downscaling techniques as well as bias-adjustment schemes have been developed. Dynamical downscaling implies the use of regional climate models (RCMs), which run on finer spatial resolutions, generally in the order of 10–20 km, over a limited domain and are initialized and driven at the boundaries by GCM outputs [
5]. Dynamical downscaling is computationally demanding, and the downscaled predictions can still be affected by biases and can thus require additional post-processing [
6,
7]. Statistical downscaling includes a large range of techniques of different complexities that are based on the relationship between large-scale climate predictors and local-scale observed predictands [
8]. It is beneficial in a wide range of applications since it is less computationally demanding than dynamical methods and has been found to perform comparably in most cases [
2,
3]. However, the choice of predictors is crucial, and it can significantly affect the variability and accuracy of results. The use of large sets of predictors could increase result uncertainty and reduce their interpretability [
9]. Bias-adjustment methods are post-processing techniques that compare coarse model predictions with reference fields over a calibration period and derive the proper corrections in order to match the statistical properties of model outputs with those of local climatological values [
10]. They include corrections of the mean and more complex adjustments of the distribution. These methods can be used in combination with downscaling or be applied directly to GCM outputs [
11,
12]. Bias adjustment was originally introduced to post-process climate model projections (see e.g., [
13]) and has been recently tested and applied in the context of seasonal forecasts [
14,
15,
16]. It provides the advantage of being easily applicable and adaptable to different types of variables and temporal resolutions. However, most inter-comparison and evaluation studies to date have focused only on temperature or precipitation, while fewer works have discussed the bias adjustment of seasonal predictions for other climate variables, such as wind speed [
17,
18].
The integration of post-processing schemes that enhance predictions and require relatively low computational costs could be beneficial for climate services together with the access to robust climate information for end users [
19]. In recent years, international initiatives have been developed in order to simplify the retrieval and processing of climate information, including seasonal predictions. For instance, the Copernicus Climate Change Service (C3S) provides access through the Climate Data Store (CDS,
https://cds.climate.copernicus.eu/#!/home, accessed on 26 October 2021) to a wide archive of global and European climate data, most of which are already targeted to the requirements of sectoral applications. However, efforts are still needed in the context of seasonal forecasts to provide users with technical tools and information on forecast performances and to tailor predictions on application domains. The bias removal is still an essential requirement for users when forecasts are included as input in impact models or when predicted values are used in the assessment of critical threshold exceedances supporting the decision-making process in specific sectors. However, it is important to select suitable reference datasets for estimating adjustment in order to avoid potential misrepresentation and errors, as discussed in detail in Ehret et al. (2012) and Maraun (2016) [
20,
21]. Bias-adjustment schemes require the use of reference datasets of a proper temporal length (at least 10 years) for ensuring a robust estimation of correction factors. In addition, the quality of reference data, either observations or reanalyses, influences the calibrated results and must be carefully ensured. In some cases, post-processing may compromise the physical consistency of climate variables and lead to unrealistic values (e.g., relative humidity above 100% or minimum temperature greater than maximum temperature) and requires a final check of outcomes. Finally, the main hypothesis on which bias-adjustment procedures are based is that data can always be described by the same distribution and the biases remain stable. In the case of seasonal forecasts, this assumption is valid, whereas for climate projections, more specific approaches are needed [
22].
Several international projects have recently been undertaken with the aim of fostering the use of forecast information for the improvement of sectoral applications. In particular, the EU H2020 project SECLI-FIRM (The Added Value of Seasonal Climate Forecasts for Integrated Risk Management Decisions) focused on assessing the impact of improved climate forecasts on operational planning for specific sectors, such as renewable energy production (
http://www.secli-firm.eu/, accessed on 26 October 2021). The SECLI-FIRM project was characterized by nine case studies in which different applications of seasonal forecast for energy were developed. The project stakeholders required the definition of a general approach for bias adjusting and downscaling seasonal forecast data. In this framework, we implemented a post-processing method implying the bias adjustment of seasonal forecasts and tested it over Europe. The scheme was chosen based on the need to define a suitable compromise between calibration accuracy and its flexibility to be adapted and run in several end-user applications. The method was applied and validated on ECMWF’s global forecasting system, SEAS5 [
23], over Europe for 2-m air temperature, precipitation and 10-m wind speed, using ERA5 as reference. Since most applications require only monthly or longer aggregated predictions, the calibration was applied directly to monthly forecasts. The performances of the presented approach were compared with those provided by the ADAMONT statistical scheme, which was also included in the SECLI-FIRM framework. ADAMONT was developed by Météo France and performs a daily bias adjustment of model data conditioned by atmospheric patterns [
24]. The intercomparison was conducted to derive further insights into the features of the proposed monthly forecast calibration using a bias-adjustment approach applied to a finer temporal resolution and including additional assumptions, such as the potential influence of atmospheric drivers on the distribution of climate variables and model biases.
2. Materials and Methods
2.1. Data
ECMWF SEAS5 seasonal forecasts used for this study were retrieved from CDS C3S. We focused on monthly aggregated reforecasts (or hindcasts) of 2-m mean air temperature, total precipitation and 10-m wind speed over the period 1993–2016. The dataset covered the global surface on a 1° × 1° regular grid and each forecast was composed of 25 members (i.e., independent realizations of the forecast) and 6 lead times (i.e., the predictions provided for the month of initialization and the following 5 months).
The reference data used for the bias adjustment and for the skill assessment of forecasts were derived from the fifth generation of the ECMWF global reanalysis, ERA5 [
25]. ERA5 spans the period 1950–present with hourly temporal resolution and is provided on a regular 0.25° × 0.25° grid. The monthly aggregates of ERA5 for the three variables were derived from the CDS C3S service for the same period spanned by the forecasts. ERA5 global fields were cropped over an extended European area (26.5° N–72.5° N, 22° W–45.5° E), which was used as target domain for the assessment of both raw and calibrated ECMWF SEAS5 seasonal predictions. In this study, the reanalysis was used as it is the best alternative to observations over the large European domain considered. This allowed us to obtain a more reliable assessment of ECMWF SEAS5 forecasts and calibration methods, which is independent from the heterogeneity in space and time of in-situ data availability. Moreover, it improved the replicability of the methodology over other regions where no or scarce observations are available.
2.2. The Bias Adjustment
The proposed post-processing scheme is a two-step procedure combining the spatial disaggregation of the 1° × 1° forecast fields to the target ERA5 grid and the empirical quantile mapping (QM). The method will be hereafter called B-QM.
The forecast spatial disaggregation was performed by means of a bilinear interpolation separately applied to each monthly prediction, ensemble member and lead time.
The QM adopted for the bias adjustment is a widely applied method to post-process climate model simulations that reduces the mismatch between the coarser model outputs and the spatial scales of interest [
26]. QM adjusts the modeled values to the reference data by matching the cumulative density function (CDF) of the simulations at each target location. More specifically, modeled and reference distributions are matched by establishing a quantile-dependent correction function that translates simulated quantiles into their reference counterparts. This function is then used to translate the modeled time series into bias-adjusted values with a distribution representative of the reference data, which is ERA5 in this case. QM was applied separately for each month and lead time. The transfer functions were obtained for each 0.25-grid cell from the entire forecast ensemble over the period 1993–2016, i.e., 25 realizations times the forecast instances for each month and then applied to each individual member. In order to avoid overfitting due to the small sample size of monthly values included in the calibration, the quantile adjustment was computed by considering deciles instead of centiles and applied by linearly interpolating the empirical distribution. Negative values in precipitation and wind speed, if any, were set to zero before QM and a wet-day correction equalizing the fraction of days with precipitation between the observed and the modelled data was applied.
The QM adjustment was performed under a leave-one-year-out (LOYO) cross-validation scheme in order to avoid artificial skills in result assessment, which can be particularly relevant for samples of small sizes [
16]. The implemented QM scheme was based on the R package
qmap [
27].
2.3. The Skill Assessment
The skills of ECMWF SEAS5 seasonal predictions over Europe were assessed before and after the bias adjustment using ERA5 as reference. The evaluation was performed over the 1993–2016 period for seasonal aggregates for winter (December to February, DJF), spring (March to May, MAM), summer (June to August, JJA) and autumn (September to November, SON) for a one-month lead time, e.g., the forecasts for JJA were initialized in May. In order to allow a more direct comparison, unadjusted forecasts were verified on the spatially disaggregated fields at 0.25° resolution.
Temperature and precipitation forecasts were assessed over land areas only, while the evaluation of wind speed was extended over the sea grid points due to the relevance of offshore wind, especially for the renewable energy sector. Moreover, the obtained skills over Europe for each variable were grouped by sub-regions based on the IPCC European sub-regional classification [
28]. This evaluation was designed to better identify the areas most prone to low forecasting system performance and to highlight their seasonal dependencies.
To further assess the robustness of B-QM, the skills were compared to those of SEAS5 predictions corrected using the ADAMONT method [
24]. ADAMONT was originally introduced to adjust climate model projections and then adapted to process seasonal forecasts. It performs forecast adjustment on a daily basis by applying a QM conditioned by weather regimes. The weather regimes were based on the classification of daily large-scale recurrent states in the circulation over a wide box spanning North Atlantic and Europe. The patterns were identified by grouping together 4 similar daily fields to create clusters of different fields. The classification was made by clustering the daily mean sea-level pressure (MSLP) anomalies (compared to the 1981–2010 monthly climatology) of ERA5. The motivation for using the weather regimes is the impact that different regimes of circulation can have on the distribution of environmental variables at the surface, as well as that of forecast biases. After the classification of daily fields, QM was separately calculated for groups of days to which the same weather regime is attributed. The bias adjustment in ADAMONT was applied directly to the original forecast grid without any previous spatial interpolation to the target grid. ERA5 was used as reference, making the outcomes comparable with those of B-QM.
The intercomparison was performed on temperature and precipitation aggregates for DJF and JJA for the one-month lead time over the 1993–2016 period for the European subdomain (35.5° N–59.5° N and 10.5° W–19.5° E) covered by ADAMONT. The analysis did not include wind speed, since the two sets of calibrated forecasts were not directly comparable. ADAMONT uses daily wind speed derived by averaging 6-h values, which are, in turn, the mean of 6-h u and v components, while B-QM uses monthly mean wind speed directly from both seasonal forecasts and ERA5. Such difference in the processing and retrieval of monthly wind speed prevented the equal comparison and validation of the two datasets.
2.4. The Verification Metrics
Deterministic and probabilistic metrics were used to measure both the performances of the forecasted ensemble mean and the event representativeness in the forecasted distribution.
Mean error (ME) and mean absolute error (MAE) report the accuracy of the ensemble mean predictions, i.e., the deviation from the reference fields:
where
is the ensemble mean prediction for the temporal instance
,
is the corresponding ERA5 value and
is the total number of forecasted instances.
The Pearson correlation (CORR) assesses the strength of association between the interannual time series of the ensemble mean forecast and the reference:
The spread to error ratio (SPR) is a measure of the forecast reliability and quantifies the ability of the ensemble forecast to represent the forecast error in a statistical sense:
where
is the intra-ensemble standard deviation and RMSE is the root mean squared error of the ensemble mean forecast.
The ranked probability score (RPS) assesses the ability of forecasts to predict the category the reference falls into. Both forecast and reference are separated into M categories, in this case tercile-based categories, and the squared difference between the CDFs of forecast and reference is calculated:
The Continuous Ranked Probability Score (CRPS) is the continuous version of RCP and accounts for the integrated squared difference between the cumulative distribution functions of prediction and reference for a continuous variable. CRPS corresponds to MAE for the deterministic forecast.
Based on these scores, the corresponding skills (RPSS and CRPSS) of the calibrated forecast with respect to those of a reference forecast were derived. More specifically, the added value of the forecasting system was estimated with respect to a climatological forecast derived from the reanalysis:
The significance of derived RPSS and CRPSS was computed using the standard error of the skill score estimated by the propagation of error and a 5% significant level was used to identify the statistically significant skill improvements.
In addition, the relative operating characteristic skill score (ROCSS) was computed to verify the ability of tercile-based categorical forecasts to discriminate between alternative outcomes with respect to a climatological forecast.
All metrics are summarized in
Table 1 together with the corresponding range of possible values and main rules for their interpretation.
4. Discussion
The comparison between unadjusted ECMWF SEAS5 seasonal forecasts and ERA5 demonstrated the need to adjust the model bias in order to improve the local representativeness of predictions. Relevant discrepancies in raw forecasts with respect to the reference were observed throughout Europe for all variables and in all seasons. The largest underestimations occurred in spring and winter temperatures (up to 8 °C) and in summer precipitation (up to 500 mm), while wind speed was mostly overestimated, especially over the sea. The B-QM approach allowed for the overall improvement of the agreement of seasonal forecasts with the ERA5 reference. The adjustment was proven to be particularly effective at calibrating temperature and wind-speed predictions, while more limited improvements were observed for precipitation, especially over European mountain areas where the residual MAE could still exceed 300 mm. Even though the scope of this work was to test the benefits and limitations of the standard QM approach, alternative versions of this procedure were considered. More specifically, parametric QM techniques using specific distribution functions, such as gamma, double gamma and generalized Pareto distributions to calibrate precipitation data from climate models were proposed and tested in the framework of the VALUE initiative and could represent suitable options for improving the ability of post-processing precipitation forecasts [
8].
For all variables, no relevant added value was found in terms of skill gain for bias-independent scores with respect to the use of unadjusted forecasts. However, significant improvements in CRPSS were obtained for calibrated predictions, even though the skills remained, in most cases, comparable to those provided by a climatological forecast. Positive scores were mainly obtained for spring and summer temperatures, as well as spring wind-speed predictions. These outcomes are in agreement with previous studies evaluating post-processing techniques for seasonal forecasts and confirm that bias-adjustment schemes do not significantly modify the skills of raw forecasts beyond the adjustment of systematic biases. Other statistical downscaling methods modelling the contribution of atmospheric predictors, such as Perfect Prognosis, were found to have a more relevant impact on forecast skills [
3,
16,
30]. Nevertheless, the choice of the most suitable calibration technique is strictly dependent on the scope and type of application. The effective bias removal of B-QM without worsening the skills of original forecasts still represents a meaningful achievement whenever predictions are integrated in end-user applications focusing on the mean properties of the forecast ensemble. Alternative downscaling procedures should be evaluated and adopted if end users need to focus on the probabilistic skills and tune them to the target scales of the analysis.
The overall agreement shown by the inter-comparison of B-QM and ADAMONT for seasonally aggregated predictions of temperature and precipitation suggested that the B-QM calibration, directly applying standard QM on monthly forecasts, represents a suitable alternative to derive tailored data for applications requiring monthly or seasonal quantities. Small observed discrepancies in residual bias distributions highlighted the effects of different calibration settings. In particular, the regional dependency of ME spatial patterns of ADAMONT fields could be partly due to the customized QM based on weather regimes that inflate the same correction to all grid points assigned to the same cluster. Moreover, the application of QM to raw forecasts without any previous spatial interpolation to the target grid of the reanalysis can lead to spatial discontinuities in the resulting fields. However, the unique quantile adjustment based on the same weather regime applied by ADAMONT to all variables is expected to better preserve the consistency between forecasted parameters, which could be particularly relevant when they are used to feed impact models.
The aim of the present work was not to establish the best method for calibrating seasonal forecasts, but rather to verify whether the proposed scheme can provide reasonable outputs, to identify the most critical variables and regions for the adjustment performance, and to provide end users with alternatives for processing seasonal forecasts by choosing the approach that best suits their needs. ADAMONT is planned to be used by Météo France as an operational service to provide seasonal forecasts of daily variables, while B-QM was proposed in the framework of the SECLI-FIRM project as an alternative method for downscaling and bias adjusting seasonal forecasts of monthly quantities. In this study, B-QM was applied and evaluated on hindcasts only; however, the same procedure can be applied to calibrate operational seasonal forecasts without substantial changes to the methodology. Due to the very similar performances of B-QM and ADAMONT, there are no specific reasons to prefer one method to the other when seasonal forecasts are required as monthly or seasonal aggregations. End users and industrial players can benefit from both approaches when managing and extracting calibrated information to add value to their businesses.
The considered schemes were proven to be effective in providing medium spatial resolution data, but they require further testing for finer resolutions, e.g., at the kilometer scale, to better verify their suitability for tailoring meaningful predictions for local applications. The same downscaling approaches can also be applied by replacing reanalysis with observation data in order to improve the representativeness of bias-adjusted fields. However, this evaluation is strongly dependent on the availability of accurate reference datasets at fine spatial scales.
Moreover, the ability of emerging alternative deep learning-based approaches, such as random forests and convolutional neural networks, in downscaling seasonal forecasts needs to be further investigated in forthcoming studies, and B-QM could represent a benchmark in the outcome evaluation. Deep learning-based techniques could also provide effective tools to bridge the gaps among short-term, sub-seasonal and seasonal forecasts and further enhance their integration into innovative climate services.