3.1. Model Performances for the Calibration Basins
The model in Equation (1), with the four parameters individually calibrated for each site (Step 1A,
Figure 6), has shown, over the 53 calibration basins, a high capacity for reproducing observed streamflows, with
NSE values ranging from 0.64 to 0.93 and a mean of 0.77 (
Table 4a). The
NSE values for such basins are represented in the histogram of
Figure 7 (upper-left panel) and in the map of
Figure 8a. From the latter as well as the results synthesized in
Table 4a, it can be observed that the model in sub-zone C provides, on average, higher performance than results in the other two sub-zones.
The assumption that
a1 is constant for each sub-zone has provided new sets of optimal values for
a2,k,
a3,k, and
a4,k (Step 1B) whose corresponding performance, in terms of
NSE, is reported in the same histogram of
Figure 7 and
Table 4a. The results are quite similar to those previously discussed (Step 1A), denoting a not significant decrease in the model’s performances; on average, a reduction in the order of 2% for the
NSE indexes can be noticed. A significant
NSE reduction (from 0.73 to 0.44) has been detected only for the
ID 66 basin (
sub-zone C), which is characterized by the lowest value of
MAP and the second highest value of
p.
As expected, the use of the regional relationships in Equation (4) for the
ai,k parameter assessment (Step 2), reduces model performance with respect to the adoption of optimal (Steps 1A and B) parameter sets (histogram of
Figure 7). Actually, this performance reduction is a price that has to be paid in order to provide a procedure able to assess runoff at ungauged sites. Nevertheless, the final model performances (after the Step 2) (
Table 4a) remain in the range of acceptable values in terms of
NSE,
i.e., performance equal or more than satisfactory, according to the performance rating of Moriasi
et al. [
60], for all the calibration basins; the only two exceptions are the abovementioned
ID 66 basin (sub-zone C), which had the lowest starting
NSE value (
i.e.,
NSE = 0.44 at Step 1B), and
ID 22 basin (sub-zone A), where the resulting values are, however, positive and equal to 0.33 and 0.35, respectively. For this last basin, model performance reduction after the regionalization procedure is rather consistent, since, after Step 1B, the
NSE was quite high (0.77). This could be attributable to the fact that the parameter values derived for this basin after Step 2 are markedly different from the corresponding values obtained at Step 1B, especially with regard to the parameter
a4, whose regionalized value results were halved with respect to the previously obtained optimal value (from 0.43 to 0.23).
The percent reduction of the
NSE after the regionalization, with respect to the
NSE computed at the Step 1B, ranges from 0.17% to 57%, with the mean
NSE (=0.69) over all the calibration basins reduced by almost 9% and denoting overall good performances. The efficiency reductions after the regionalization with respect to Step 1A are slightly more marked, with
NSE reduced by 11% on average. This comparison is also emphasized in
Figure 8, where the
NSE obtained with the regionalized parameters (
Figure 8b) and with the parameter sets of Step 1A (
Figure 8a) are represented by the same color bar. Also, after the regionalized procedure, the model exhibits, on average, a higher accuracy for the basins in sub-zone C, with a mean
NSE of 0.70 (
Table 4a). The
ID 47 basin (sub-zone C) has shown the highest
NSE at every calibration phase (
Table 4a), with a performance reduction after the regionalization (Step 2) equal to about 3%.
Model performances at Step 2 have also been assessed using three further statistical criteria measuring the agreement between observed and simulated monthly streamflow series: the mean error,
ME (in mm/month); the dimensionless mean error,
ME/, given by the ratio between the
ME and the average observed streamflow (
); and the root mean square error,
RMSE (in mm/month). For each index, the mean, the standard deviation, the minimum, the maximum, and the best values over all the calibration basins are synthesized in
Table 4b. The results of this analysis have confirmed the outcomes relative to the performance index used for calibration (
i.e.,
NSE), confirming the validity of the adopted calibration procedure. The model has globally provided a satisfying accuracy in terms of all the analyzed indexes and can be considered unbiased, as demonstrated by the relatively low values of
ME. The values of
ME, in absolute value, are lower than 3 mm/month for about 80% of the analyzed basins, while the mean
ME/ over the calibration basins is almost null. In terms of
RMSE, the indexes have been of the same order of magnitude as those found through a similar modeling approach by Cutore
et al. [
47] for the Simeto river sub-basins (Sicily, Italy), with the worst performance (about 45 mm/month) obtained at basins
ID 61, 67, and 68, which are all basins corresponding to
NSE values lower than the mean value (0.69, 0.63, and 0.58, respectively).
The basin with the highest
NSE (
i.e.,
ID 47) also provided satisfactory performance in terms of
ME,
ME/, and
RMSE (0.428, 0.042, and 5.05 mm/month, respectively). The basins with the lowest
NSE (
i.e.,
ID 22 and 66), showed the worst performance with respect to the other considered statistical criteria (
ME = −6.67 and −2.54 mm/month,
ME/ = −0.49 and −0.69,
RMSE = 15.37 and 9.97 mm/month, respectively). Only one basin (
ID 40; sub-zone C) associated with a satisfying
NSE (0.63), even if it was below the mean of the other basins, has provided relevant values for
ME (11 mm/month) and
ME/ (0.35), denoting a weak performance that is, however, comparable to the worst performance obtained in different studies (e.g., [
47]).
A comparison between monthly flow duration curves (FDCs) based on observed and simulated runoff values is reported in
Figure 9 for different representative basins. Four examples, referring to basins with performance indexes approximately equal to the averages over all the calibration basins, are reported: basin
ID 18 can be considered representative of basins with mean
NSE, basin
ID 42 is representative of basins with mean
ME, basin
ID 25 is representative of basins with mean
ME/, while basin
ID 59 is representative of basins with mean
RMSE. Moreover, since the FDCs typically depend on the period considered, a comparison of the basins with the shortest (
ID 45) and the longest (
ID 23) sample size is also reported. Although some important details of the variations in flows can be obscured when the FDCs are computed using monthly data, rather than daily or finer resolution data, this analysis has been useful in confirming model accuracy. For all the examined basins, in fact, the model has satisfactorily reproduced the magnitudes associated with the various observed monthly runoff values, from the lowest and more frequent values to the highest and rarer ones, proving to be effective also in the assessment of the probability that a certain runoff value will be equaled or exceeded.
3.2. Model Validation
Model validation has been carried out by applying the model to the six validation basins (
i.e.,
ID = 12, 24, 43, 46, 49, and 63;
Figure 4) previously selected and not considered during the calibration (two of different area for each sub-zone). For each of them, the entire available historical monthly streamflow series has been reproduced by the model and compared with the corresponding empirical series, analyzing model performances through the same indexes previously used.
Performance achieved in the validation phase, summarized in
Table 5, was similar among the different basins and very close to that measured at the calibration basins, with high
NSE values (mean
NSE = 0.74) and low
ME,
ME/ and
RMSE for all the six basins. Model series reproduction in the larger basins (
ID 24, 63 and 46) was slightly more accurate (mean
NSE = 0.79) than in the smaller ones (mean
NSE = 0.71), but no significant difference can be noticed among model performances in validation over the different sub-zones. The results indicate the best performances (
i.e., best
NSE and
RMSE) at the
ID 46 basin (sub-zone C), with a sample size of 10 years, while the lowest performance was at the rainiest basins (
ID = 12 and 49), characterized by the longest series (39 and 18 years, respectively).
Despite the fact that the validation basins are characterized by marked differences in terms of rainfall–runoff transformation, with mean annual runoff coefficients ranging from 0.14 (ID 43) to 0.34 (ID 49), the model has shown, over the six basins, an equal ability to capture the different basins’ hydrological response, with performance that can be classified as “good” in all the basins and, for two cases (i.e., basins ID 24 and 46), even as “very good” (i.e., NSE > 0.75).
Figure 10 depicts a comparison between observed and simulated monthly specific streamflow series (mm/month) for the six validation basins, also reporting the corresponding precipitation series. Simulated series are quite close to the observed series, reproducing well most of the peaks and null values. Despite the marked differences, in terms of both observed rainfall and streamflow, that can be noticed among the basins, the model captures rather accurately the monthly streamflow variability for both the smaller (left panels) and larger (right panels) basins in the three sub-zones. For example, the
ID 12 basin is characterized by a seasonal streamflow regime, with about five months per year almost dry and frequent winter peaks with streamflow of about 100 mm/month, while the
ID 24 basin is characterized by more regular behavior in the observed streamflow, with, on average, less rainfall and streamflow, and with only five months out of eight years having streamflow on the order of 100 mm/month; for both basins the model showed similar performance.
3.3. Model Performances at Different Aggregated Time Scales
Model performances have been further evaluated at different temporal aggregations, also considering the seasonal and the annual time scales. This analysis has been performed with regard to both the 53 calibration basins and the six validation basins, considered as a unique sample. More specifically, simulated monthly streamflow has been aggregated at the annual scale and also at the seasonal scale, considering the year as divided into two seasons: dry season, from April to September, and wet season, the remaining six months of the year.
Figure 11 compares all the estimates of the monthly, annual, and seasonal streamflow obtained from the regional model for all the basins with the corresponding observed streamflow. In the top left panel of the figure, a total of 12,312 theoretical monthly streamflow estimates are plotted against the corresponding empirical values, while the other scatter diagrams on the top refer to the annual streamflow values (middle panel, 1026 values) and the mean annual totals of streamflow for each basin (right panel, 59 values). In the bottom panels of
Figure 11, the empirical and theoretical
dry season (left panel),
wet season (middle panel) streamflow values and the mean seasonal totals of streamflow for each basin (right panel) are similarly compared.
The high predictive ability of the model at the different aggregation scales is demonstrated by the fact that, for all the plots, most of the points are rather close to the perfect agreement lines (also reported in all the graphs), with high values of the coefficient of determination R2. These values are greater than 0.92 at the monthly level and 0.90 at the annual level. At the seasonal level, the model reproduces the dry season streamflow with an R2 of 0.95 and the wet season values with an R2 of 0.90. Although the cloud of points appears to be more disperse than in the other plots, the high value of R2 obtained at the monthly scale can be explained by the presence of a considerable number of observed values that are identically reproduced by the model (i.e., 22% of the null values and about 5% of the not-null streamflow).
Satisfying model performance has also been obtained with regard to the reproduction of the mean annual and seasonal totals of streamflow for the different basins (right panels of
Figure 11), as is demonstrated by the resulting high
R2 values (
i.e., 0.95, 0.89, and 0.92 for the
dry season, the
wet season, and the annual analysis, respectively) and the low mean percent errors (from 2.2% for the annual and the
wet season, to 12.5% for the
dry season). The best performance in terms of absolute error (AE) was obtained at the basins with
ID 42 for the
dry season (AE = 0.21 mm/season),
ID 37 for the
wet season (AE = 0.02 mm/season), and
ID 45 for the annual analysis (AE = 0.34 mm/season), while the worst resulted at basins with
ID 56 for the
dry season (AE = 65 mm/season), and
ID 40 for the other two aggregation periods (AE = 121 mm/season and 135 mm/year for the
wet season and the year, respectively). These values are consistent with results previously obtained at the monthly scale, where the
ID 40 basin showed the worst performance in terms of both
ME and
ME/ (
Table 4b).
The results represented in
Figure 11 have, therefore, demonstrated an elevated model capacity to reproduce not only the runoff at the monthly scale but also at coarser time resolutions, showing a satisfying ability to also reproduce the seasonal and interannual variability. A noteworthy aspect is that, despite the use of different models for the three subzones (
i.e., same model structure and different regression parameters for each) and the application under 59 different boundary conditions (
i.e., 59 different basins), all the estimates, at all the analyzed aggregation time scales, have shown comparable error, as can be observed from the left and middle panels of
Figure 11.
Also at coarser time resolutions, model performances in validation basins have results comparable with those relative to the calibration basins, with similar residual errors. The analysis at the annual time scale for the validation basins has been further deepened by comparing simulated and observed annual streamflow series and computing all the different performance indexes previously used at the monthly scale. The results of this analysis, together with a comparison between empirical and theoretical values for the main annual statistics (mean, standard deviation, minimum, and maximum), are synthesized in
Table 6.
For most of the cases, all the simulated main annual statistics are very close to the observed ones; moreover, it can be noticed that the reproduced variability is essentially never higher than that observed. The annual performance indexes show a good agreement between simulated and observed annual streamflow series: the highest NSE (0.96) was reached for the ID 43 basin, while the NSE values for the other basins are all higher than 0.64, with the only exception being the ID 46 basin, where a relatively low efficiency (NSE = 0.26) was obtained due to a low variance of the observed series that could negatively affect the NSE representativeness (see Equation (3)). Other indexes, in fact, denote good model performance at the ID 46 basin, while the ID 12 basin had the lowest performance, with values for ME and ME/ slightly outside the ranges obtained for the other basins and also a relatively high value for RMSE.