1. Introduction
The oncoming climate changes will exert influence on the ecosystems, on all branches of the international economy, and on the quality of life. Global Circulation Models (GCMs) are the most widespread and successful tools employed for both the numerical weather forecast and climate research since the 1980s [
1]. They provide methodologically well grounded global mean information on a large-scale but demonstrate considerable deficiencies in accurately estimating meteorological states on finer scales [
2]. The growing demands for accurate and reliable information on regional and sub-regional scales are not directly met by relatively coarse resolution global models. Thus, in recent decades have been proposed a variety of dynamical and statistical downscaling approaches, as an attempt to bridge the gap between future climate projections under specific scenarios and regional responses [
3]. Regional climate models (RCMs) are tools that greatly enhance the usability of climate simulations made by GCMs for studying past, present, and future climate, and its change as well as its impacts on a regional scale. Following the methodology of dynamical downscaling [
2], the outputs of GCMs can be used as driving fields for the nested RCMs running with higher resolution, allowing for capturing the smaller-scale features of the climate. The regional climatology of Southeast Europe (SEE) is the subject of interest in many efforts, including large collaborative projects such as ENSEMBLES [
4], CECILIA [
5], Med- and EURO-CORDEX [
6,
7] as well as national-wide initiatives [
8,
9]. Although dynamical downscaling is a methodologically well-grounded approach, it has also limitations, in particular, huge computational cost for centennial-long simulations even for relatively small domains [
2].
SEE, as part of the Mediterranean region, lies in a transition zone between the arid climate of North Africa and the temperate and rainy climate of Central Europe. It is affected by interactions between mid-latitude and tropical processes. Because of these features, even relatively minor modifications of the regional circulation can lead to substantial changes in the SEE-climate [
5,
10,
11]. This makes the region potentially vulnerable to climatic changes. Additionally, the spatial and temporal distribution of the surface temperature and especially the precipitation here is modulated from the complex orography and relatively long and fragmented coastline.
In SEE most climate model validations show considerable problems. For instance, the coarser and finer version of the EURO-CORDEX ensemble tends to produce warm and dry summer bias for this region [
12].
All these regional specifics determine the methodological necessity to implement reliable environmental records of climate data for historical hindcasts, future regional projections, and other climate change-related studies. NEX-GDDP, as a relatively new statistically downscaled climate dataset, offers seamless merged CMIP5 GCM hindcasts and projections at regional-to-local scales. This source, due to its high spatial and temporal resolution, long-term temporal coverage, as well as convenient single-point access to the data is attractive perspective for the regional climatology of SEE. GCMs-based parameter estimates must be, however, validated against independent reference data in order to assess their uncertainties before being used and this is the main our motivator for the present work. The study of recent climate provides a context for future climate change and is important for determining climate sensitivity and the processes that control regional warming [
1,
10,
11]. From a methodological point of view, this study is the necessary first step to get the overall impression of the general applicability of the NEX-GDDP dataset in the simulation of the projected future climate changes over SEE.
The paper shows the performance of the U.S. National Aeronautics and Space Administration (NASA) Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) multimodel (MM) ensemble in representing the basic spatiotemporal patterns of the climatic conditions over Southeast Europe (SEE) between 1950 and 2005. The study aims to evaluate the daily near-surface minimum and maximum air temperature as well as precipitation sum (hereafter referred to as tn, tx, and rr) at different time scales. It is worth emphasizing, however, that the investigation of the model skill of single ensemble members is out of the scope of this work.
The essential merit of the NEX-GDDP product, alongside its basic strengths which will be concisely described in
Section 2, is the relatively big number of GCMs included in the dataset collection. This fact is somehow overlooked in many studies, concerning NEX-GDDP, and, subsequently, they are based on single models or a small subset of models. Contrarily, our primary intent was to use the full capacity of the product, building an ensemble from all GCMs-members. Subsequently, the present evaluation relies entirely on ensemble characteristics, including also estimation of the intermodel spread. The explicit consideration of the NEX-GDDP intermodel spread alongside the significant fact that this study is, to our knowledge, the first one for the target domain, is novel and original our contribution.
Following this introduction,
Section 2 describes the examined NEX-GDDP dataset, the reference data source as well as the applied methods.
Section 3 presents the construction of the NEX-GDDP MM ensemble, the performed calculations, and the obtained results of the comparison with the reference dataset.
Section 4 contains concise concluding remarks as well as an outlook for further work.
2. Data and Methods
The NEX-GDDP dataset consists of statistically downscaled climate scenarios derived from the output of twenty-one GCMs of the Coupled Model Intercomparison Project Phase 5 (CMIP5) and across two of the four Representative Concentration Pathways (RCPs) emissions scenarios, 4.5 and 8.5 [
13]. The global grid spacing is 0.25° × 0.25° (approximately 25 km × 25 km in the mid-latitudes), the dataset covers the periods 1950–2005 (historical or retrospective run) and 2006–2100 (RCP4.5 and RCP8.5 runs). The historical run and each of the climate projections include data for
tn,
tx, and
rr on a daily scale. The bias correction spatial disaggregation (BCSD) algorithm is used to produce the NEX-GDDP datasets [
14]. The BCSD is a statistical downscaling two-step procedure, based on in situ and remote sensing measurements [
15], that addresses limitations of global GCM outputs (coarser resolution and biased at regional/local scale) [
16,
17]. Since the NEX-GDDP product provides climate change information in the past and future periods at relatively fine spatial and temporal scales, it can be utilized for climatological studies at regional/local scales [
13,
14]. Subsequently, it is used for many applications around the globe as quantification of heatwave projections over Pakistan [
18], China [
19], and three mega-regions (the Eastern United States, Europe, and Eastern Asia) [
20]; assessment of the Indian summer monsoon [
15,
21,
22] as well as general investigations on near- and long-term climate projections over China [
3] and Southeast Asia [
23]. The generalized conclusion is that the NEX-GDDP product offers considerable improvements over CMIP5 GCM hindcasts and projections at regional-to-local scales, with an unchanged global long-term increment [
3].
As an observational reference for evaluating simulated extreme temperatures and precipitation sum, we use version 23.1e of the daily gridded dataset E-OBS [
24] at the same grid spacing as the NEX-GDDP product. E-OBS covers the entire European land surface and is based on the European Climate Assessment and Dataset (ECA&D) station records plus more than 2000 further stations from different archives. It was developed primarily for regional climate model evaluation, but it is also being used subsequently for various applications, including monitoring of climate extremes [
25]. E-OBS is de facto standard reference for the evaluation of the RCMs participating in EURO-CORDEX [
12], it is used frequently for the same purpose in other validation studies [
8,
9]. Alongside the improved quantification of the uncertainty, the newer, ensemble versions of the E-OBS, represent generally better than their predecessors the temperature extremes. Recently, comparing E-OBSv19.0e with regional datasets, Bandhauer et al. [
26] found that the former reproduces the pattern of the daily precipitation climate well in all three considered regions (the Alps, Carpathians, Fennoscandia).
The present study concerns only the MM ensemble statistics rather than the simulation output of the individual models. This type of analysis weights all models equally. Although an equal weighting does not incorporate known differences among models in their ability to reproduce various climatic conditions, the MM mean (MMM)—which can be regarded as an estimate of the forced climate response—performs better in most cases than individual simulations [
27,
28,
29]. According to [
30], the MMM has increased skill, consistency, and reliability. Alongside the MMM, the present study considers also the MM median (MX50). The MX50 is, in contrast to the MMM, statistically robust and thus much less sensitive to outliers in the ensemble. Subsequently, it is used as primary MM ensemble statistics in many studies [
31,
32].
In the present study, we focus on the extreme temperatures
tn and
tx and precipitation sum
rr as being the three most often used climatic elements in determining the conditions of the regional climate [
9,
12,
29]. The general concept for evaluation of the model skill is to superimpose the models’ twentieth-century hindcast simulations to the reference dataset(s) [
29]. This is done also in the present study and the NEX-GDDP MM ensemble is evaluated in four distinct aspects:
Representation of the performance of the MM ensemble utilizing different metrics on a monthly basis;
Comparison of the multiyear seasonal and annual values of the tn, tx and rr with the reference;
Estimation of the inter-annual variability and trend;
Validation of the availability of the MM ensemble to reproduce climate extremes described with key ETCCDI climate indices.
The representation of the performance of the MM ensemble utilizing different metrics is the most traditional way for evaluation of the RCM-skill and, subsequently, widely used in many studies [
9,
12,
29]. Different metrics can be used to assess the model’s ability to simulate the past and present climatic conditions and we have applied the well-known mean bias (BIAS), root mean square error (RMSE), and correlation coefficient (
corr), computed on monthly basis both in the grid space and over the time.
To reveal more in-depth the capacity of the NEX-GDDP MM ensemble in reproducing long-term climatic conditions over SEE, further assessments of additional variables are needed. For this purpose the multiyear mean seasonal and annual
tn,
tx as well as multiyear mean seasonal and annual
rr are compared with the reference, similarly to [
9,
12].
The importance of assessing the long-term variability of the analyzed variables is often emphasized [
33]. The primary reason is to estimate the sustainability of the detected inter-annual changes. In the present study, the magnitude of the trend is estimated with the Theil-Sen Estimator (TSE) [
34,
35] and its statistical significance is analyzed with the Mann–Kendall (MK) test [
36,
37]. Both procedures are especially suitable for non-normally distributed data, data containing outliers, and are robust [
25]. They are recommended by the World Meteorological Organisation [
38] and are frequently applied by many researchers in the analysis of environmental time series [
31].
Hence the immediate damages to humans and their properties are not obviously caused by gradual changes in temperature or precipitation but mainly by extreme climate events, it is essential to validate the availability of the MM ensemble to reproduce climate extremes [
32]. There are various methods to characterize extreme events and the analysis of climate indices (CIs) based on daily temperature and precipitation data is frequently applied non-parametric approach. The CIs of the Expert Team on Climate Change Detection and Indices (ETCCDI) were constructed to sample a wide variety of climates and agreed on a frame of a large collaborative project [
33]. In the present study, 5 key temperature-based and 5 key precipitation-based ETCCDI-CIs are used for the ensemble validation as in [
9,
29,
32].
The results of the evaluation in all four aspects are presented in the next section.
3. Calculations and Results
First, the NEX-GDDP data of all 21 GCMs (see [
14] for a descriptive list of the GCMs, or [
21]) are downloaded from the NASA data portal (
ftp://ftp.nccs.nasa.gov/NEX-GDDP, accessed on 10 November 2021) using the special option for spatial sub-setting. Next, the MM ensemble mean (MMM), median (MX50) as well as the 25th and 75th percentile (lower and upper quartile, noted further X25 and X75) are computed for the historical period 1950–2005. All netCDF file manipulations are performed with the powerful tool Climate Data Operators [
39], embedded in the authors’ purpose-built Linux bash shell scripts. The consideration of the full set of all 21 GCMs is rather different than the approach in [
22] where only 5 models are used. In the study [
23], dedicated to the evaluation of the NEX-GDDP dataset over Southeast Asia, is also strongly recommended to use a larger ensemble of models that could provide higher confidence in climate projections. According to the number of the GCMs, the NEX-GDDP ensemble is also significantly bigger than the GCMs-ensembles in our previous studies [
10,
11,
31].
The intermodel spread is quantified with the interquartile distance, i.e., the difference X75-X25 of the ensemble. This measure is frequently used for the same purpose in many climatological studies, based on multimodel ensembles [
31,
32].
In the next stage, the ensemble MMM, MX25, MX50, and MX75 are aggregated on monthly basis and, using the calculated in advance monthly means of the tn, tx, and rr from E-OBS, the BIAS, RMSE, and corr are computed over the grid space of the considered domain.
Additionally, the 95% confidence interval of the corr is computed, and, for sake of clarity, only the time series of the 7-months running means of the MX50 are presented in this case.
The overall first impression from
Figure 1,
Figure 2 and
Figure 3 is that the time evolution of the BIAS, RMSE, and
corr for the MX50 and the MMM are practically identical. The quantitative aspect will be commented in the next paragraph.
The spatial distribution of the considered metrics is also important.
Figure 4 shows the spatial distribution of the biases of the X25, X50 and X75 of the considered parameters in respect to the reference.
First and foremost, the biases of all parameters of the lower quartile (X25) are predominantly negative and, vice versa, the biases of the upper quartile (X75)—positive. The biases of the medians (X50s) are, compared with those for the X25 and X75, with the generally smallest absolute value. This fact, together with the former two findings, indicates that the reference is, as a whole, in the interquartile interval, closest to the median. Such optimal behavior could be judged as evidence of the ability of the MM ensemble to reproduce the historical climate.
We present also the maps of the RMSE and
corr over time (
Figure 5) but, for sake of brevity, of the MX50 only.
The maps in
Figure 5 demonstrate that there are no principal differences for the
tn and
tx, neither in magnitude nor in the spatial distribution. The statistical metrics of the extreme temperatures are relatively uniformly distributed across the domain, without clear spatial patterns. The overall picture of the evaluation of the monthly precipitation sum is rather different. Generally, the degree of disagreement between the NEX-GDDP ensemble and the reference, expressed in the values of the considered measures, is bigger. A common conclusion of many evaluation studies is that the thermal conditions are generally better than the precipitation simulated both in the GCMs [
9,
30] and the RCMs [
8,
12,
29]. According to the BIAS, the NEX-GDDP ensemble slightly overestimates the monthly precipitation sum over the bigger part of the domain. Conversely, significant underestimation is detected over Asia Minor. It should be kept in mind, however, that the E-OBS shows also considerable issues over this region. The spatial distributions of the RMSE and
corr for the precipitation are more heterogeneous than the corresponding fields of the extreme temperatures. Some local effects could be also outlined as, for example, the relatively high RMSE (over 30 mm) and relatively low (below 0.4)
corr over the Dinaric Alps.
Concluding this section, we provide also the time mean values of the considered metrics calculated over the grid space (left section) and spatial means of the calculated over time. These values, listed in
Table 1 give a consolidated overall impression of the magnitude of the disagreement of the NEX-GDDP ensemble XMM and X50 from the reference.
Generally, the correlation coefficients for the extreme temperatures are very high. The RMSE for these two variables is around 2 °C and the BIAS is practically negligible. There are no significant differences in the results for tn and tx on the one hand and in the results for XMM and X50 on the other. As expected, the values of the statistical metrics for the rr are essentially higher than their counterparts for the extreme temperatures. The mean RMSE is close to 33 mm. Significant differences for both ensemble characteristics, XMM and X50, except for the BIAS, are also absent.
Traditional methods for evaluating the models’ ability to reproduce regional climatic conditions include the statistical analysis of seasonal and annual mean fields and mean error fields relative to the reference measurements [
9]. We present validation for the mean
tn,
tx, and
rr (
Figure 6,
Figure 7 and
Figure 8) but for sake of brevity for the MX50 only. With the intention to make the comparison of
Figure 6 and
Figure 7 easier, the same color legend is used.
Generally and most importantly, the magnitude and spatial distribution of the reference data of the extreme temperatures is very well reproduced from the models’ ensemble. Another important outcome is the detected absence of principal difference in the overall picture for the tn on the one hand and the tx on the other. The absolute bias for both temperatures is between −1.5 °C and 1.5 °C over the bigger part of the model domain, except for some isolated regions. The bias does not demonstrate also significant inter-seasonal change. The validation of the multiyear mean seasonal and annual precipitation sum indicates, first of all, spatially prevailing positive bias. It is most prominent in the summer and can also be identified in the other seasons, but is less emphasized. There are also some hot spots of clearly expressed dry bias; the most noticeable, both in magnitude and spatial extent, is over the bigger part of Asia Minor. The latter, however, could be partially explained by the problematic representation of the precipitation in E-OBS over this region, mentioned above.
Many recent studies agree on the fact that the NEX-GDDP ensemble describes very well the mean climate state. According to the extreme temperatures, in [
3] is found that the NEX-GDDP successfully reproduces the 1980–2005 averaged climatology of the minimum temperature in January, the maximum temperature in July, and the precipitation rate in July over China. The precipitation sum in July is more consistent with the observations than CMIP5 GCMs. In [
21], is noted that the NEX-GDDP data, as compared to the CORDEX and CMIP5, shows a substantial improvement in the precipitation biases over India. The key message in ref. [
22] is that the NEX-GDDP data can simulate the Indian summer monsoon rainfall pretty well in comparison to the observation.
To examine the extent up to which the NEX-GDDP ensemble captures the inter-annual variability and trend of the
tn,
tx, and
rr, the area-weighted average (AAs) values are first calculated over the whole domain applying the E-OBS land-sea mask and then timely aggregated on annual basis. Next, the linear trend magnitude
and the
p-value of the statistical significance are computed.
Figure 9 shows the time series of the AAs of the annual mean
tn and
tx; the annual
rr from the reference data on the one hand and from the NEX-GDDP ensemble MX50 and MMM on the other as well as the corresponding trend lines.
Figure 9 demonstrates, first of all, that the modelled inter-annual variability is significantly lower than the estimated from the reference. The multi-year mean precipitation sum is bigger than the E-OBS estimation which is in coherence with outcomes from the previous paragraph shown in
Figure 8. The values of the linear trend magnitude
and the
p-value of the statistical significance are listed in
Table 2 but for sake of brevity for NEX-GDDP ensemble MX50 only.
Table 2 shows that the NEX-GDDP ensemble reproduces correctly the sign of the general tendency, but not the trend magnitude and, for the extreme temperatures, the degree of statistical significance. The trend magnitude of the ensemble MX50 is for all variables approximately three times bigger and for the extreme temperatures statistically significant at the 5% level.
Summarizing the result of previous studies, in [
21] is stated that the simulation of trends in the CMIP5 models over India is even poor than the climatological means both in magnitude as well as in distribution. In the same study is found also that the all-India AAs summer mean temperatures are underestimated from the NEX-GDDP ensemble and significantly bigger trend magnitude.
An essential part of the evaluation of single or ensemble models’ performance is to investigate its ability to reproduce the climate extremes, quantified with higher-order statistics, and the tails of the distribution function of the considered variables [
9,
29]. Therefore, we selected some key ETCCDI climate indices to study the exceedance of given thresholds and duration of specific phenomena. For evaluation of the extreme thermal conditions we use the threshold indices annual occurrence of frost days (FD) and annual occurrence of summer days (SU); the percentile-based indices occurrence of cold nights (TN10p), occurrence of warm days (TX90p); the duration index growing season length (GSL). The definitions of these indicators can be found in [
33], for sake of brevity, we will skip them here.
Figure 10 shows the comparison of the multiyear means of these indices based on the E-OBS on the one hand and the NEX-GDDP ensemble on the other.
The analysis of
Figure 10 indicates that the spatial structures of temperature index values are reproduced satisfactory as a whole. The ensemble generally overestimates the FD and, although weaker, underestimates the SU as well as the GSL. This result is caused by the long-term course of the underlying variables, namely
tn for the FD and
tx for the SU. Thus, similarities to the multiyear winter bias of the
tn (
Figure 6) could be found: colder, therefore overestimated FD in the bigger part of the domain. Similarly, the hot spots of positive bias of the
tx over east Ukraine and Turkey as well as negative bias over the north Black Sea coast (
Figure 7) coincide with the regions with well noticeable bias with the same sign of SU in
Figure 10. Spatially the biases of the cold nights (TN10p) and the warm days (TX90p) are most consistent. At first sight, the negative bias of TN10p contradicts to the positive of the FD and, vice versa, the positive bias of TX90p contradicts to the generally negative bias of the SU. It should be kept in mind, however, that the percentile thresholds, used in the calculation of the ensemble-based indices, are computed from the time series of the models’ ensemble, rather than the reference data.
Extreme precipitation is another strong indicator that links regional climate change to people’s daily activities, agricultural plans, and disaster prevention [
3]. To investigate the daily precipitation extremes, we have considered the maximum 1-day precipitation amount (RX1day), number of heavy precipitation days greater than 10 mm (R10mm), consecutive dry days (CDD), consecutive wet days (CWD), and very wet days (R95p), (see [
33] again). RX1day is an absolute index (it represents maximum precipitation within the year), R10mm is a threshold index, and the R95p is a percentile-based index. CDD, as well as the CWD, are duration indices. All of them are frequently used in many studies conducted at different spatial scales, from planetary to continental, regional, and local scales [
11,
29]. Similar to
Figure 10,
Figure 11 shows the comparison of the multiyear means of these indices based on the E-OBS on the one hand and the NEX-GDDP ensemble on the other.
The overall and most important outcome from the analysis of
Figure 11 is the significant disagreement, both in magnitude and spatial distribution, between the simulation and reference. The NEX-GDDP ensemble considerably underestimates everywhere the R10mm, RX1day, and R95p as well as the CWD over the bigger part of the model domain. Contrarily, the CDD is essentially overestimated. Obviously, the NEX-GDDP ensemble can not adequately reproduce the number of high-intensity events and the precipitation sums during them or, in other words, the simulation fails in the description of the upper tail of the precipitations’ distribution. This fact is illustrated by the histograms of the AAs daily precipitation sum the whole period 1950–2006 from E-OBS on the one hand and from the NEX-GDDP ensemble on the other hand, as shown in
Figure 12.
Figure 12 shows how different are the distribution patterns of the daily precipitation sum from E-OBS on the one hand and from the NEX-GDDP ensemble on the other hand. Nearly 95% of the modelled daily precipitation falls in the narrow interval 1.5–3.0 mm and cases with relatively heavy (i.e., >10 mm) precipitation, as shown also in
Figure 11, are practically absent. The NEX-GDDP ensemble demonstrates also a very small relative number of cases on the other tail of the distribution, i.e., days with 0.0–0.5 mm.
The latter could be attributed to the special measurements for preventing the so-called “drizzling effect” in the modern GCMs.
The evaluation studies of the performance of the NEX-GDDP ensemble at daily scales documents mixed, partially contradicting results. One of the key messages in [
23] is that over South Asia, NEX-GDDP-simulated daily rainfall statistics are not close to observation. Similar to our findings, in [
23] is emphasized, that NEX-GDDP fails to capture the observed distributions of the high-intensity rainfall in almost all considered cities and underestimates the rainfall magnitudes. NEX-GDDP also fails to capture the frequencies of dry days when compared to reference data. In [
3], is stated, that NEX-GDDP significantly reduces the biases in the climatology of daily total precipitation, in terms of the spatial distribution and extremes across China. In [
21], is concluded, that the occurrence of temperature and precipitation extremes is captured realistically in NEX-GDDP data for the reference period over India, and therefore the dataset has great potential for studies related to climate extremes and climate change impact assessment.