1. Introduction
Rainfall erosivity concerns the ability of rainfall to precipitate soil loss [
1], as it supplies energy to the mechanical processes of soil erosion. Decertification has been identified as one of the most serious issues facing Mediterranean European countries, including Greece [
2], and a possible increase in future rainfall, due to climate change, will aggravate this process, as soil erosion increases at a greater rate [
3]. Unanticipatedly, a decrease in future rainfall and a possible decrease of biomass production may also lead to higher erosion rates [
4].
Higher erosion rates in conjunction with unsustainable land management and increasing human pressure can lead to soil degradation [
5], and consequently a disrupted ecological balance, a decreasing agricultural production and income [
6] and even the reduction of effectiveness of adaptation options [
7]. Several issues may arise due to accelerated soil losses on achieving of the Sustainable Development Goals of the United Nations [
8], as these goals are dependent on a healthy biophysical environment in which the soil is the base [
9]. In order to predict these soil erosion future changes it is necessary to simulate changes in future rainfall erosivity, land uses and the application of policies on land management [
10].
Universal Soil Loss Equation (USLE) [
11], which is the most widely used soil erosion prediction model in the world [
12], is an empirical equation that estimates the long-term, average, rate of soil loss involving the product of six factors. The USLE erosivity factor,
R, is calculated using high frequency or break-point precipitation data with a duration of over 20 years [
13,
14], as a function of rainfall intensity and depth. In the second revised version of USLE, RUSLE2 [
15], monthly Erosivity Density (
ED) was introduced, as a measure of rainfall erosivity per unit rainfall, which requires shorter precipitation record lengths.
ED is approximately a function of values only related to rainfall intensity and was used in RUSLE2 as an intermediate step, in conjunction with coarser, monthly, precipitation data, to compute R values in the USA.
Precipitation in Greece has been investigated in several studies over the past two decades [
16,
17,
18,
19,
20,
21,
22,
23]. In general, precipitation varies from its maximum values during winter to a minimum during summer. The highest precipitation values are observed on the mountain range of Pindos, and its expansion on Peloponnesus, and the lowest values of precipitation are recorded on the Cyclades Islands, at the center of the Aegean Sea. Due to the fact that most weather systems and prevailing winds are moving over the Ionian Sea perpendicularly to Pindos, a contrast exists between the wetter western parts of the country and the dryer eastern ones. During summer months convective activity over northern Greece produces higher precipitation amounts than over the drier southern parts.
Global Circulation Models (GCMs) are models that represent the atmosphere, land surface, ocean and sea ice and simulate their interactions in three dimensions, to make long-term predictions of climate [
24]. Different scenarios about future concentrations of greenhouse gases (Representative Concentration Pathways, RCPs) are employed to describe a set of different climate futures that drive GCMs [
25]. Due to the coarse grid scale of GCMs (over 80 to ~300 km), Regional Climate Models (RCMs) were developed to downscale them and provide information on a finer scale, more applicable to local scaled phenomena, impact studies and adaptation decisions. RCMs are dependent on GCMs, because GCMs provide the response of global circulation, greenhouse gases concentrations, etc. and RCMs refine them in a spatiotemporal sense, using features such as the topography, coastlines, land cover or mesoscale dynamics [
26]. The climatic models from COordinated Regional Climate Downscaling EXperiment over Europe (EURO-CORDEX) [
27], using high-resolution RCMs for the high greenhouse gases concentration scenario, RCP8.5, project a decrease of precipitation from 1971–2000 to 2071–2100 and for the medium concentration scenario, RCP4.5, project the same trend with a smaller magnitude [
27,
28]. Regarding rainfall intensity, in the form of heavy precipitation that exceeds the intensity at the 95th percentile of daily precipitation, the same models project diverging trends that are not statistically significant in most areas of Greece [
27,
28].
In our previous studies about
ED in Greece [
29,
30] it was proven that, in general,
ED values are robust to the presence of missing values in contrast to
R, which is highly affected, and specifically in Greece: (a) the values of
ED are not significantly correlated with the elevation, (b)
ED annual timeseries are found to be stationary, in contrast to reported precipitation trends for the same time period and (c)
ED can be considered as spatially autocorrelated, as three contiguous areas were identified using clustering analysis, that had distinct temporal patterns. A comparison of our studies with an earlier study about R in Greece by Panagos et al. [
31] revealed that the previously reported R values were underestimated due to the presence of a significant volume of missing data in the precipitation records used in the calculations.
In the Mediterranean region, the annual
R model MedREM was developed using annual precipitation depth, the longitude and annual daily maximum precipitation data [
32]. A recent paper regarding the estimation of future
R values in Europe for 2050 [
10], used one GCM and a single RCP, applied Gaussian Process Regression using monthly variables obtained from the WorldClim dataset [
33] and estimated an increase of
R by 14.8% in Greece. A number of papers in Europe examined the potential increase of rainfall erosivity using temporal trends of high resolution precipitation data in Western Germany [
34], Belgium [
35] and in the Czech Republic [
36]. Other studies in various parts of the world used GCMs in conjunction with empirical equations that predict R using annual precipitation [
37,
38], monthly [
39,
40] and daily rainfall indices [
41,
42]. A different approach estimated projected
R changes, using a weather generator with spatial and temporal downscaled precipitation values coming from various GCMs [
43].
Random Forests [
44] is a data-driven algorithm in the area of supervised learning which tries to fit a model using a set of paired input variables and their associated output response and can be used in classification and regression problems. Quantile Regression Forests (QRF) [
45], is an extension of RF that is able to compute prediction intervals of the output response in regression problems. RF has been used for spatial prediction in various domains [
46,
47,
48,
49,
50] and recently, Hengl et al. [
51] presented RF for a spatial predictions framework, that can make equally accurate predictions as kriging, without the need of statistical assumptions.
The aim of this work is to calculate the current and estimate the future changes of R values in Greece. The latest methodologies developed and presented with RUSLE2 are used, taking into account the presence of missing values in precipitation records. The first objective of the analysis is to create monthly precipitation and ED maps, as intermediate datasets, and to estimate the uncertainty of their predictions and their errors using cross-validation. Consequently, the current R values in Greece are computed from the interpolated surfaces and error propagation is used to estimate approximately their uncertainty. Finally, downscaled precipitation from GCMs-RCMs is validated and used, along with ED, to estimate the potential changes of R in Greece for the years 2040, 2070 and 2100 using two future greenhouse gases concentration trajectories/pathways (i.e., RCPs). This type of analysis is a novel approach and it has not been presented, until now, in the international literature.
2. Materials and Methods
The methodology that was applied in the study, is presented in the flowchart of
Figure 1, where data flows from the left to the right. Blue symbolizes the data that were used as input (pointwise and raster), orange the intermediate raster datasets that were created and green the final results, also in raster format, that were computed using different sets of input and intermediate datasets.
2.1. Data Acquisition and Processing
Point precipitation data from 237 meteorological stations across Greece (
Figure 2), was used in the analysis. The data consisted of:
Pluviograph data for 87 meteorological stations that were taken from the Greek National Bank of Hydrological and Meteorological Information (NBHM) [
52]. The time series comprised a total of 2273 years of 30-min-records and 394 years of five-min-records for the time period from 1953 to 1997, with a mean length of 30 years per station. The timeseries coverage was 62.8% on average.
Mean monthly precipitation data for 150 meteorological stations that were taken from the Hellenic National Meteorological Service (HNMS) [
53]. These are homogenized data and available for the time period from 1971 to 2000. These data were used to overcome the limitations of precipitation data from NBHM.
Five different monthly GCM-RCM raster datasets were downloaded for a number of experiments and time periods (
Table 1). These data were computed in the framework of the EURO-CORDEX [
27,
54], had a horizontal resolution 0.11° × 0.11° and were remapped using bilinear interpolation to a 30″ × 30″ resolution grid using the Climate Data Operator (CDO) software [
55].
The GCM-RCM monthly precipitation timeseries data were:
Historical, for the time period from 1971 to 2000 (like the ones coming from HNMS) as they were driven by the boundary conditions provided by the GCMs.
Future, for the forcing scenarios RCP4.5 (where greenhouse gases emissions peak around 2040 and then decline) and RCP8.5 (where greenhouse gases emissions continue to rise throughout the 21st century), using the control ensemble of the CORDEX climate projection experiments. The dataset period was from 2011 to 2100.
Digital elevation raster data were downloaded from the NASA Shuttle Radar Topography Mission (SRTM) [
56], and aggregated to the same 30″ × 30″ resolution grid as the one used in the GCM-RCMs datasets.
2.2. Monthly Erosivity Density Calculation
The erosivity of a single erosive rainfall event,
(MJ·mm/ha/h), given the product of the kinetic energy of rainfall and its maximum 30 min intensity, was computed using the pluviograph records from NBHM [
15]:
where
is the kinetic energy per unit of rainfall (MJ/ha/mm),
the rainfall depth (mm) for the time interval
of the hyetograph, which has been divided into
time sub-intervals and
is the maximum rainfall intensity for a 30 min duration during that rainfall.
The quantity
was calculated for each time sub-interval,
, applying the kinetic energy equation that was used in RUSLE2 [
57], which was recently evaluated in Italy, a nearby country to Greece, and had the best performance among alternative literature expressions [
58]:
where
is the rainfall intensity (mm/h).
An individual rainfall event was extracted from the continuous pluviograph data, if its cumulative depth for a duration of 6 h at a certain location was less than 1.27 mm. A rainfall event was considered to be erosive if it had a cumulative rainfall depth greater than 12.7 mm. Only the screened events with a return period of less than 50 years were used in the calculations.
On the grounds that the use of coarser fixed time intervals to a finer one can lead to an underestimation of the value of erosivity [
59,
60], monthly conversion factors
were computed using the five-min-time-step timeseries:
where
is the number of storms at month
m,
is the erosivity of a storm using the five min time step and
the erosivity of the same storm when the timeseries was aggregated using a 30 min time step. These conversion factors were applied to the values of erosivity that were estimated from 30-min pluviograph data.
After the computation of
values, the average monthly rainfall erosivity density
(MJ/ha/h) per station was calculated:
where
is the number of storms during the month
,
the erosivity of storm
,
the monthly precipitation height and
the number of years.
2.3. Spatial Quantile Regression Forests
Random Forests (RF) [
44] is one of the most successful methods used in Machine Learning [
61], among other reasons because of: (a) its robustness to outliers [
62] and overfitting [
63], (b) its ability to perform feature selection [
64] and (c) the fact that its default parameters, as implemented in software, give satisfactory results [
61,
65]. Examples of RF open source software are the R’s language packages randomForest [
66] and its faster alternative, ranger [
67].
In summary, RF consists of a number of decision trees [
68]. For each tree, a random set of the dataset is created via bootstrapping [
69] and in each node of the tree a random set of
n input variables from the
p variables of the dataset is considered to pick the best split [
70]. The prediction of the output response in regression problems is the mean value of the estimations of these random decision trees. The estimate of the out-of-sample error is computed using the out-of-bag error [
71], without the need of cross-validation.
Quantile Regression Forests (QRF) is an extension of RF that provides information about the full conditional distribution of the output response and not only about its mean [
45], as is the case in plain RF. In this way, it is possible to provide prediction intervals and measures of uncertainty.
The use of QRF as a framework for the modeling of spatial variables was introduced by Hengl et al. [
51], where the distances among observation locations are used as variables in QRF in order to incorporate geographical proximity effects. In this way, using these buffer distances, spatial QRF imitate kriging’s spatial correlation. Spatial QRF (spQRF) produces comparable results, in terms of accuracy, compared to the state-of-the-art kriging methods, with the advantage of no prior assumptions about the distribution or stationarity of the response variable [
51].
In this work, spQRF were used for the spatial prediction of monthly precipitation and ED:
As a measure of the out-of-sample error, the average Root Mean Squared Error (
RMSE), the coefficient of determination
R2 and Lin’s Concordance Correlation Coefficient (
CCC) were computed using 10-fold cross validation:
where
is the total number of cross-validation locations,
is the predicted value of
at a cross-validation location
(i.e., the coordinates longitude and latitude of location
),
,
,
,
are the means and variances of
and
, respectively, and
is the covariance of
and
.
describes the ratio of variance that is explained by a model and may be negative, among other reasons, if an inappropriate model is used [
73].
CCC combines measures of both precision and accuracy and examines how far
deviate from the line of perfect concordance (the line of 45 degrees on a square scatterplot) and ranges from 0 to ±1 [
74,
75].
The uncertainty of the predictions from the models, in the form of prediction error standard deviation, was computed using a dense level of all the quantiles per 1% of the output response and for every location [
51]:
where
is the
pth percentile of the distribution of the response variable at location
and
the mean value of
for all the percentiles.
The quantity z-score, which quantifies the error of prediction errors, was calculated at cross validation locations [
76]:
where
, ideally, should have a mean equal to zero and variance equal to one. On the contrary:
If variance(z) >> 1, the model underestimates the actual prediction uncertainty.
If variance(z) << 1, the model overestimates the actual prediction uncertainty.
2.4. Regional Climate Models Historical Precipitation Validation
In order to validate and select one of the GCMs-RCMs for the projected erosivity calculations, at first, a multi-layer raster dataset was computed for each of the five models with the overall 30-year-mean-monthly precipitation values, using the historical time period from 1971 to 2000. Then, using the monthly precipitation spQRF models to create raster datasets with the same 30″ × 30″ resolution grid, RMSE, CCC and R2 errors metrics were computed, setting in Equations (5)–(7) as the values estimated by GCMs-RCMs and as the values calculated by the spQRF models.
2.5. Current and Projected Erosivity Calculation
The estimation of current and future monthly erosivity was made under the assumption of future temporal stationarity of
ED values, due to the fact that
ED is related to seasonal rainfall intensity [
15] and EURO-CORDEX GCMs-RCPs models does not project statistically significant trends in most areas of Greece [
27]. Temporal stationarity of
ED values already has been documented in Greece for the historical period [
30].
The current monthly erosivity was calculated as the product of predictions of the trained spQRF models of monthly
and precipitation
for each month
m:
The prediction error standard deviation of monthly
was calculated using error propagation [
77]:
In order for Equation (11) to hold true, and were considered as the true values on locations and that were independent of each other. More specifically, these values were considered, to a good approximation, as not correlated on the basis of the following remarks:
The annual rainfall erosivity and its error standard deviation were calculated, respectively:
The projected values of monthly erosivity were computed using the ratio
of the future 30-years-mean
to the historical 30-years-average-monthly precipitation values
that came from a GCM-RCM:
where
f is the future year in which long term average monthly values refer to and:
The projected annual erosivity was estimated with:
And the ratio of future to current values:
With Equations (14)–(17), the projected annual erosivity values were calculated preserving the relative projected changes of precipitation without the direct use of simulated values coming from a GCM-RCM and the need to apply a bias-correction method. However, the need to use a statistical downscaling technique to apply bias correction to the GCM + RCM output will be tested.