1. Introduction
Rainfall observations are the basis for many scientific studies in various areas, such as climatology, hydrology, geography, and ecology. Rainfall is also a driver for economy in relation to rainfed crops, especially cereals. The investigation of drought occurrence is based on drought indices computed using a monthly rainfall series as well as an evapotranspiration series ([
1,
2,
3]). A drought warning is crucial for cereal crops and food security [
4]. The Standardized Precipitation Index
[
5] was recommended by the World Meteorological Organization (WMO), among other indices [
6]. Another source of data to elaborate drought indices is potential evapotranspiration, giving rise to the Standardized Precipitation Evapotranspiration Index (
) [
7]. High correlations were found between
and
in temperate and continental regions of Iran [
2], while low correlations were found in arid climate regions. Considering this question, we focused on
with a three-month accumulation for quantifying the immediate drought impacts such as agricultural and soil moisture drought. In Tunisia, cereals are mostly rainfed, and the country has a high deficit in cereal trade balance. Thus, agricultural drought is of prime interest in the study region. Recently, a national insurance fund was implemented in 2019 to help farmers facing drought impacts on cereal crops, using
information as support for its decisions. The Standardized Precipitation Index
SPI-3 represents the number of standard deviations by which the normalized, cumulative precipitation deviates from the climatological median for a three-month accumulation period [
5]. In the parametric framework,
SPI is obtained by first adopting an underlying parametric distribution to fit rainfall totals and then by transforming the estimated non-exceeding probability to make them normally distributed. Such a normalization is important for achieving spatial intercomparisons as well as for interpreting the drought index [
1] in terms of drought severity. Several two-parameter and three-parameter candidate distributions were evaluated to assess whether they correctly represented the skewed rainfall totals. After considering various candidate distributions and ranking them based on several durations of accumulation periods (using 0.5° × 0.5° gridded historical data based on ERA-40 reanalysis), researchers found it more accurate to use a simple distribution to estimate
SPI series in order to avoid overfitting and interpretability issues [
1]. The two-parameter gamma distribution was identified as the most likely to give acceptable results across all accumulation periods and regions of Europe [
1]. Its success was explained by the flexibility of its shape parameter. A three-month resolution [
8] found that, for global land grid points, two-parameter gamma distribution performed at a level of 94% accuracy in the category “best performance”.
An issue pointed out by [
1] was the method adopted for testing the adequacy of the candidate distribution. They argued that the lack of rejection of the Kolmogorov–Smirnov test does not ensure the normality of the transformed rainfall. They recommended testing
normality using the Shapiro–Wilk test [
9]. Since normality is a necessary condition for interpretating
SPI in terms of drought categories, as adopted by the WMO’s
SPI user guide [
6], a default in normality will result in over- or underestimating drought severity, especially in the lower tail of the distribution. Another recommendation was to implement a limitation to interpret
SPI estimates to intervals (−3, +3) since the likelihood of exceeding these limits is very low, especially considering that the commonly available lengths of samples are not very high [
1].
Based on a comparison with crop yields, [
10] found that
SPI was overestimated for stations with a low rainfall average. Such an overestimation in the left tail of the rainfall distribution resulted in underestimating drought severity. They mentioned the failure in fulfilling the normality assumption as a reason for this overestimation. The Shapiro–Wilk rejection was 3–13% in the case of the gridded, historical ERA-40 reanalysis data over Europe [
1]. In addition, the normality test of
derived from the gamma model failed for approximately 10% of 264 studied series in Brazil [
11]. Studying rainfall data from the United States, Ref. [
12] suggested that the normality test results were linked to local climate conditions and that rejection was more likely for stations characterized by a high probability of no-rain cases.
According to [
1,
12], one challenge in selecting a candidate distribution is to ensure that it captures the likelihood of zero precipitation. Thus, our aim was to assess the relevance of computing the Standardized Precipitation Index
SPI-3 using a two-parameter gamma distribution in the case of the Medjerda River Basin that, according to Koppen’s climate classification [
13], fits into the following climates:
(temperate, dry, and hot summer),
(arid hot desert), and
(arid hot steppe). The Medjerda River Basin is the most important perennial surface water provider in Tunisia. It flows to the South Mediterranean in Kallat Landlous, with a basin area of nearly 23,000 km². It is a transboundary river with its source in Algeria. In addition, it is one of the most important transboundary rivers in North Africa, feeding the West Mediterranean. To date, there has not been a study with
SPI-3 estimation using the potential rainfall network that is displayed in Tunisia. Conversely, the performance of the Standardized Precipitation Index using gamma distribution has not been estimated so far for this basin. The most recent research for the North Africa region used gamma distributions to estimate
time series in 16 stations of the Wadi Mina basin (4900 km²) in northern Algeria [
14], without elaborating on the conclusions about normality assumption. Additionally, Ref. [
15] studied 194 rainfall series, including the entire domain of northern Algeria using a copula approach for estimating the
SPI-3 drought’s severity, duration, extension area, and return levels (1970–2006). The advantage of the copula approach is to avoid estimating the univariate probability distribution functions (pdfs). However, they did not mention any test for normality assumption. In [
16], the authors studied the period 1960–2010 using 65 rainfall stations in the Cheliff-Zahrez basin (56,227 km
2) of northern Algeria with semi-arid to arid climate conditions. They adopted a non-parametric
estimation without mentioning the performance with respect to the normality of resulting
. Thus, our study was based on a higher rainfall network density and covered the three types of climates of Tunisia in order to draw conclusions about the effectiveness of the normality test for
when using the gamma distribution as the underlying statistical model.
2. Materials and Methods
The data are from rainfall yearbooks published by the Tunisian National Hydrological Service, which is part of the Ministry of Agriculture, Hydraulics, and Fish Resources. There, one can find information about the size and composition of the rainfall network during a given hydrological year, which is assumed to extend from 1 September of one year through 31 August of the next year. The station names and identifiers are provided, as well as the elevation above sea level and the geographical localization for every station. Tunisia is subdivided into 7 main watersheds, with the Medjerda River Basin assigned to the fifth. In rainfall and hydrometric reports, the Medjerda River Basin is known as Basin 5 (). Consequently, the stations of the Medjerda River Basin have identifiers beginning with ‘5…’. It is worth noting that the labelling of the station names may present some small changes from one yearbook to another, due to different methods of translating the name from Arabic to French. This is why the most important reference for any station is its identifier.
The yearbooks report daily rainfall measurements and monthly and annual rainfall totals. They indicate daily and monthly missing data (gaps) using the symbol ‘-’. Gaps may be due to a temporary lack of observation, deterioration of the equipment, or loss of register. It may happen that a station is abandoned, mostly due to the lack of an observer (from death, a change of residence, or other personal reasons) or due to a decrease in database budgets, resulting in reduction of the network size. A closure is officially reported in a specific (not public) database containing the station’s name, location, and elevation above sea level. When this is not reported, one cannot be certain of a station’s closure. In the Medjerda River Basin, the first station was implemented in 1888. In this study, we focused on the analysis of the recent period 1950–2018 when the network was comprised of 144 stations. One single station (code number 53766) out of 144 had no metadata. The spatial locations of the remaining 143 stations are shown in
Figure 1. Two stations are located out of the basin, showing that their coordinates are erroneous in the yearbook (
Figure 1). However, this did not affect our results. The stations are represented within their climate classification, using Koppen’s climate classification system (
Figure 1). A total of 49% are
(temperate, dry, and hot summer), while 31% belong to
(arid hot desert) and 20% to
(arid hot steppe). In the Medjerda River Basin, the driest month is July, while the rainy season is from November to January. The transition season is August through October, and spring expends from March to May.
We first evaluated the network’s spatiotemporal extension every month. The gaps in the monthly rainfall data were not reconstructed. Thus, the number of stations with complete monthly data changes from year to year. However, it cannot exceed 144, which is the size of the studied network. We then identified the 3-month total for every station. Any 3-month period containing a missing month (a gap) was assumed missing and was removed from the time series used to fit the underlying statistical distribution, since the gap meant that the resulting
SPI-3 predictions were missing in those cases. We further selected those stations with sample sizes of 3-month totals greater than 30. We then evaluated the frequency
f0 of occurrence of no rain during every 3-month period. The frequency
f0 was estimated as the ratio of the number of zeros in the series by the series size, as recommended in [
6]. Additionally, data control was performed for the 3-month total using outlier detection. To do so, we used Grubbs’ test [
17] to assess whether the maximum was an outlier. We use a two-sided test and the critical value
Gmax of the t-distribution at a significant level of 5% to evaluate the test statistic
G.
The parametric estimation of
SPI-3 required fitting a statistical distribution to the 3-month totals. As stated before, we adopted two-parameter gamma distribution, with a
α scale and
λ-shaped parameters [
18,
19]. The method of maximum likelihood, which uses the logarithmic transformation of data and is known to be more efficient, was not appropriate because of the existence of zeros in the data. The method of moments was used to estimate the model’s parameters. Furthermore, based on
α and
λ estimators, the standardized gamma variable
Ky associated with any observed 3-month total
y was appraised using the following equation:
Additionally, the Wilson–Hilferty transformation (see [
18], Eq. 4.27, p. 33) was adopted to normalize quantiles. The standardized gamma quantile was related to the qth quantile of the standard normal distribution
Φq by Equation (2), which states that:
where
Cs was the population asymmetry coefficient.
Estimating
α and
λ and inverting (2) helped to calculate
SPI-3, taking into account the presence of zeros in the data and the fact that the gamma was not defined for
y = 0 in [
20].
where
y represents the 3-month rainfall total;
Φ−1() is the inverse of the standard normal cumulative distribution function (
Φq = Φ−1 (
q)) [
21].
Furthermore, the adequacy of the adjusted pdf model was tested using a Kolmogorov–Smirnov test at the 5% significance level. In addition, for those models that were not rejected for adequacy, a Shapiro–Wilk test [
9] was implemented to check whether the
SPI-3 estimates were normally distributed. According to [
22], the Shapiro–Wilk test achieves the best performance when compared with other normality tests.
Finally, the normality results were examined in the light of the following factors: the sample size, station location, Grubbs’ test results, frequency
f0 of the occurrence of no rain, and population asymmetry coefficient
Cs.
Figure 2 shows the inputs of the study (monthly rainfall series of 144 stations), its outputs (
SPI-3 series and the percentage of the series with an accepted normality test), and the steps to bridging them.
4. Discussion
When considering the non-sensitivity of normality results compared with an increase in the sample size, our results are in conformity with the literature [
8]. The rejection of an
SPI-3 normality test was at a rate of 17%, while 3–13% were mentioned in [
1], compared with 10% in [
11]. One may argue that the gridded data used in [
1] contained fewer zeros than that of historical data and that the geographical area studied in [
1] included fewer stations under arid and semi-arid conditions. In fact, 0.01 mm was assumed to be the threshold for zero and had few cells with zero accumulation of precipitation in [
1]. In addition, [
1] fitted daily distributions and computed
averaging on a monthly basis for their use of gridded data. When normalizing zero in the
, the Weibull plotting position is often used for estimating
f0. The Weibull plotting position estimations were averaged using the alternative form
f0 = (
m + 1)/(2
N + 1), where
m was the number of zeros and
N was the sample size in [
1]. In this method, a depletion operated in the high values of f
0. In our case, such a transformation would effectively be in favor of decreasing the number of cases of rejecting normality.
The percentage of
SPI-3 being less than −3 was 0.12% (63 out of 48,835). The total percentage of values out of the interval (−3, +3) was 0.14%, which is very low in accordance with literature recommendations [
1]. The minimum estimated
SPI-3 was −3.803 for station 50535, corresponding to December–February 1982. A study of drought occurrence in the Mediterranean region using
SPI-12,
SPEI-12, and two other indicators showed that 1982 achieved the most severe drought in the North Africa region [
23]. Conversely, only eight (8)
SPI-3 estimates were greater than +3, and the maximum was 3.181, which corresponded to period 3 (September–November) of 1969 at station 58272. The autumn of 1969 is well-known for its dramatic floods in Tunisia. Thus, our results are accurate in regards to
SPI-3 interpretation when considering the fact that we did not remove outliers.
5. Conclusions
The study of the rainfall network pattern and its time evolution in northern Tunisia within the case of the Medjerda River Basin resulted in retaining 98 out of 144 stations, totaling more than 30 years of observations for every 3-month duration during the period of 1950–2018. This represents a high spatial density of 0.4 station/100 km². Based on a massive amount of data (48,835 samples of three-months of rainfall accumulation with sample sizes greater than 30, covering an area of 23,000 km²), our study confirmed the potential of the two-parameter gamma distribution for estimating
SPI-3 in North Africa, as well as its drawbacks. We found that the category
Csa was over-represented in the sample of rejected cases. The most important conclusions are that the presence of outliers as well as the grade of asymmetry coefficient
Cs and the importance of
f0 are all responsible for the poor performance of gamma distribution with respect to the normality test of
SPI-3. In fact,
f0 is the most sensitive factor, followed by
Cs, and then by the presence of the maximum detected outlier. Thus, high
f0 and high skewness coefficients are more often attached to a rejection of a Shapiro–Wilk test for
SPI-3 under the gamma distribution hypothesis. Therefore, our results confirm the importance of rainfall intermittency in explaining the gamma model’s deficiency in representing
SPI-3 in dry climate conditions. The size of the rainfall time series and the variability of climate conditions in the study region were large enough to extend these conclusions to other sites with similar climate conditions. We therefore propose the adoption of the exponentiated Weibull distribution as an alternative [
8] to the gamma in order to estimate
. Prospecting heavy-tailed distributions may be another issue.