3.1. Estimation Performance of Models
Following the WCOA-based parameter estimation approach detailed in the previous section, the model parameters of the Muskingum models listed in
Table 1 were obtained using the datasets of eight distinct flood events. For each flood event, the parameters and prediction performance of the models used are presented in
Table A1. During the optimization process, the WCOA was run 10 times for each flood dataset in order to obtain more reliable model parameters with accurate ranges. For instance, when the model parameters of M1 were determined by using dataset 1 (DS1), 10 possible candidate solutions were retrieved by the WCOA. Although the estimated model parameters could be very close to each other for each trial, the estimated parameter set with the smallest cost function value among the candidate solutions was noted. Thus, the error metrics for each model were computed, as given in
Table A1. The same process was repeated to obtain the error metrics for the remaining models.
According to the findings presented in
Table A1, models M7, M8, M9, and M10 yielded highly accurate predictions considering the error metrics of
(>0.999) and
N-RMSE (<0.01) for DS1 as the smooth single peak (SSP) dataset. For the dataset DS2, which shares similar attributes to DS1, it is worth noting that, in addition to the models mentioned previously, models M2, M3, and M5 also exhibit considerable predictive performance with respect to the aforementioned metrics of
(>0.999) and
N-RMSE (<0.01). For the non-smooth single peak (NSP) datasets (DS3 and DS4), when evaluated with both metrics (
> 0.98 and
N-RMSE < 0.31), models M9 and M10 provide the best results, but it is observed that the prediction capacity of these models decreases when compared to those in SSP datasets. When examining the DS5 dataset, which is of the multi-peak (MP) type, it was observed that all models, except for the M4 model, provided relatively good prediction results. Another dataset (DS6) of the same type (MP) showed that the models between M5 and M10 performed well. However, in these datasets (DS5 and DS6), the performance of the M5, M9, and M10 models stands out compared to others when considering both error metrics. Finally, for the datasets referred to as irregular (DS7 and DS8) in this study, the M5, M9, and M10 models were consistently ranked as the top three performers among the 10 models evaluated in both
(
> 0.99) and
N-RMSE (
N-RMSE < 0.015 for DS7 and
N-RMSE < 0.025 for DS8). This situation was also observed for the estimation performance of the models realized in the MP type.
Figure 3 also depicts the relative percentage error of each model over the given flood duration for each dataset. For the SSP-type datasets (DS1 and DS2), all methods could achieve better parameter estimation performance within a certain error tolerance. The falling and the rising limbs of each hydrograph for M4 show an increase in the error percentages and this error is even more pronounced in the falling limbs of the hydrographs, especially for the SSP type. In addition, a similar pattern can be observed for the NSP-type datasets (DS3 and DS4). Likewise, the relative errors of the models, except for model M4, were very small for DS5, that belongs to the MP-type dataset category. However, all methods, in general, gave reasonable estimation results for DS6, that was categorized as an MP-type dataset. Models M1, M3, and M4 were not able to produce the model parameters satisfying a good performance for DS7 while the remaining ones successfully estimated the model parameters. For DS8 as an irregular dataset type, the model parameters of all methods could be easily identified by the WCOA.
The Wilson river flood dataset (DS1) has been studied by several researchers. The reported SSE values (in (m
3/s)
2) for this dataset were obtained as 39.8 by [
44], 17.55 by [
45], 9.82 by [
11], 7.67 by [
2], 5.124 by [
46], 4.11 by [
47], 4.04 by [
25], 1.92 by [
31], 1.092 by [
48], 0.799 by [
49], and 0.65 by [
50]. In this study, the SSE value for DS1 with the WCOA was found to be 7.66. Moreover, the proposed model (M10) achieved an SSE value of 4.09. The range of SSE values reported by different researchers suggests that there may be considerable variability in flood modeling results for this dataset, depending on the specific optimization algorithm and routing approach used. For instance, the SSE value of model M7 in this study (with the four-parameter non-linear model proposed by [
2]) was calculated as 7.67, while the same metric was found to be 9.82 for model M9 (with the four-parameter non-linear model by [
11]). Although these two models have the same number of model parameters, the reported SSE values are different because the researchers used different routing approaches and optimization algorithms as well. Similar findings can also be drawn for other datasets used in this study. For the Viessman and Lewis flood dataset (DS5), the reported SSE values were 71,708 in [
45], 65,324 in [
44], 28,855 in [
50], and 8449 in [
49]. However, the implemented routing model with the WCOA in this study provided notably smaller SSE values using the examined models except for M4, as shown in
Table A1. It is quite remarkable that the SSE value of model M1 (with the two-parameter linear model proposed by [
5]) was 14.53, while a model proposed by [
49] with 12 parameters yielded an SSE value of 8449. However, comparing the reported results in the literature directly (without taking into account the applied routing approaches and optimization methods) can be misleading in interpreting which model exhibited a better estimation performance.
In summary, this exercise reveals that the WCOA is able to show a very competitive estimation performance when compared to the reported literature. Thus, the WCOA can be regarded as a viable algorithm with the outlined routing approach. These findings also suggest that the particular models are effective in capturing the outflow patterns in these datasets.
To interpret the performance of the models in
Table A1, the models were ordered according to their
and
N-RMSE. Considering
values, the rank score was then assigned to each model such that ranking starts from 1 for the model having the highest
value and successively continues to the rank value of 10. For
N-RMSE values, the rank score starts from 1 for the smallest
N-RMSE value, as shown in
Table 3. Furthermore, the averaged rank (AR) of each model was given for the datasets previously categorized as SSP, NSP, MP, and irregular. Based on the overall AR value of
and
N-RMSE in
Table 3, the best two models were found to be M9 and M10, whereas the worst two models were noted to be M1 and M4. When the results are analyzed according to the hydrograph type, model M4 shows the worst performance with a rate of 75% (six times it was assigned with a rank score of 10 among the eight datasets) for both criteria. Although this model has a physical sense evolving from the Manning equation, forcing the model with a constant exponent (m = 3/5) decreases the performance in the calculations. This remark is also supported by the relative error results, presented in
Figure 3. However, the best model is not the same for all hydrograph types. Although half of the datasets, more specifically four datasets for the
criterion and five datasets for the
N-RMSE criterion, have superior results with model M10; models M8, M5, and M9 are ranked as the best for the hydrograph types of SSP, MP, and irregular, respectively.
Considering
Table A1 and
Table 3, any comparison between the error metrics could be insufficient to form a solid judgment. Therefore, a comprehensive assessment was needed to compare the capabilities of the models. To apply the methodologies of MCDA outlined previously, the decision matrix for each hydrograph type was set as listed in
Table 4. The results presented in
Table 5 cover the ranking obtained from the MCDA tools considering four hydrograph types under three scenarios. To this end, the average rank values for
and
N-RMSE were employed as decisive criteria for each dataset. The performance of a model can be evaluated using two metrics, and in this context, a lower value assigned implies better performance. Specifically, the metrics are inversely proportional to the model’s effectiveness, meaning that a decrease in the assigned value indicates an improvement in the model’s performance. Therefore, the aforementioned two metrics were considered as non-beneficiary (minimized) criteria. The number of parameters (NP) for a model could be again considered as a decisive criterion. As listed in
Table 1, the NP could indicate the models’ complexity. A model containing fewer parameters may be more desirable than a model with a larger number of parameters. Thus, NP was a non-beneficiary (minimized) criterion. Finally, the dimensional consistency of a model was labeled as the model background (MB) criterion. A model constructed using physical-based parameters was deemed more realistic compared to a model that employed additional parameters solely for improving the fitting accuracy. In this criterion, a binary entry of 1 was assigned to physically sound models, while models containing non-physical parameters solely used for fitting purposes were assigned a binary value of 0, as shown in
Table 4. Therefore, the MB was assigned as a beneficiary (maximized) criterion.
3.2. Multi-Criteria Decision Analysis (MCDA) Results
To apply MCDA tools effectively, it is essential to identify the weight of each criterion. To achieve this, three distinct weight scenarios were designed. In scenario 1, the weights assigned to the , N-RMSE, NP, and MB criteria were 0.275, 0.275, 0.225, and 0.225, respectively. This scenario implies that the weights for the error metrics criteria (55%) are slightly higher than those for the remaining criteria (45%). Therefore, the results from scenario 1 were expected to provide a balanced ranking of the employed models taking into account the implemented criteria. In scenario 2, weights of 0.3, 0.3, 0.2, and 0.2 were assigned to the , N-RMSE, NP, and MB criteria, respectively. The objective of scenario 2 was to observe the impact of increasing the weights for the error metrics on the ranking. Finally, scenario 3 was designed to assign more weight to the error metric criteria (0.35 each), while the remaining NP and MB criteria were given weights of 0.15 each. Through this assessment, these scenarios can provide a better understanding of how different the weight distribution to each criterion may affect the ranking of the models implemented.
Both TOPSIS and PROMETHEE identified model M8 (a non-linear model with five parameters proposed by [
10]) as the most suitable model for the hydrograph type of SSP in all scenarios. With the NSP hydrograph type, model M5 (a linear model with three physical-based parameters) was found to be the best model according to the aforementioned criteria for scenario 1, as determined by both tools. However, in scenario 3, where the error performance was dominant, model M10 (non-linear model with six parameters) was identified as the best alternative model among the comparison poll. It should be noted that model M10 was identified as the best performing model based on the
and
N-RMSE criteria as shown in
Table 4. However, the ranking order may shift due to the influence of the NP and MB criteria, which can cause model M5 to move from a lower position to the top, as observed in scenario 1. This underscores the importance of using a multi-criteria decision analysis approach to ensure a fair and comprehensive evaluation of the capabilities of the examined models. For the MP and irregular hydrograph types, the application of both MCDA tools consistently identified the M5 model and its non-linear variant M9 (with four parameters) as the best and second-best models, respectively, across all scenarios. However, a highly competitive model, M10, was observed to have superior performance based on its
and
N-RMSE values as listed in
Table 4 for the MP and irregular types. Notably, the ranking of the models was observed to be sensitive to the weight scenarios utilized in this study, as demonstrated by the varying influence of the NP and MB criteria. As observed from
Table 5, M1, the classical linear model with two parameters, showed a notable performance when model simplicity and background were preferable, as aimed for in scenario 1. Both MCDA tools determined the fourth ranking place for the SSP, and third place for the MP and irregular types. However, fourth place can be obtained by TOPSIS, whereas PROMETHEE put it in sixth place when the NSP-type datasets are employed. All in all, model M1 is still a viable model if the complexity as a decisive criterion is taken into account by the practitioners. Furthermore, the data characteristic may affect the efficiency of the models. Model M8 was found to be the best model for the SSP, whereas its capability dramatically varied for the MP and irregular hydrograph types under the scenarios implemented. As an additional evaluation, a new decision matrix was created using the overall average AR scores of eight flood events (shown in the last column of
Table 3) to eliminate any potential impact of hydrograph type, as given in
Table 6. Once again, the M5 model and its non-linear variant M9 were found to be the superior models for scenarios 1 and 2, as indicated in the decision matrix presented in
Table 3. In scenario 3, where error metrics were dominant, TOPSIS ranked model M9 in the third place, while PROMETHEE placed it in second place.
As mentioned earlier,
Table 5 presents the results of four different hydrograph types, while
Table 6 displays the outcomes based on the average AR values of eight distinctive flood events. To facilitate the interpretation of the results for each model, the ranking places of each model can be categorized into five main groups: the first or the second place (denoted as 1||2) in group 1 (G1), the third or the fourth place (3||4) in group 2 (G2), the fifth or the sixth place (5||6) in group 3 (G3), the seventh or the eighth place (7||8) in group 4 (G4), and the ninth or the tenth place (9||10) in group 5 (G5), as illustrated in
Figure 4. In other words, there are 15 possible cases (four for hydrograph types and one aggregated evaluation for all datasets by three scenarios) provided by each MCDA tool for each model. For instance, in the case of SSP using TOPSIS, the ranking places of the M1 model were found to be the fourth place in scenario 1, sixth place in scenario 2, and ninth place in scenario 3. This procedure was carried out for each model and the implemented cases. The ranking places were noted for each model among the implemented cases and subsequently categorized as explained. For example, the performance of model M1 was categorized as 0% in G1, 46.6% (7 occurrences out of 15) in G2, 20% (3 out of 15) in G3, 26.6% (4 out of 15) in G4, and 6.6% (1 out of 15) in G5, as shown in
Figure 4. As also seen in
Figure 4, model M5 was able to produce a superior performance to complete the comparison within the first and the second ranking places (G1) with scores of 73.3% by TOPSIS and 66.6% by PROMETHEE. Similarly, model M9 can be evaluated as an identical competitor model regarding its scores in G1, which were 66.6% by TOPSIS and 80% by PROMETHEE. Apart from these models, model M10 can give the highest success rate for the third and fourth places (G2), which were 66.6% by TOPSIS and 80% by PROMETHEE. For the G2 category, model M1 showed a notable performance, with equal success scores of 46.6% by TOPSIS and PROMETHEE. For an alternative interpretation of
Figure 4, the success rate for each ranking place category is cumulatively shown in
Figure 5 in order to understand which model stands out. As a result of the analyses conducted using both MCDA tools, models M5, M9, and M10 emerge as the top performers among the flood routing models in the comparison pool, as illustrated in
Figure 5.
3.3. Comparison of MCDA Tools
In this study, two outranking MCDA tools, TOPSIS and PROMETHEE, were employed to evaluate the performance of different Muskingum models under the aforementioned criteria. TOPSIS and PROMETHEE are widely used in decision-making processes due to their ability to handle complex decision problems [
51]. In general, any MCDA method can develop a different series of rankings for the same problem depending on the strategy and mathematical background of MCDA implemented. TOPSIS is sensitive to the weights assigned to the criteria and ignores the interrelationships between the criteria [
52]. PROMETHEE, on the other hand, generates a pairwise comparison matrix to assess the relative importance of each criterion [
53]. However, it is sensitive to the choice of preference function and decision threshold used [
54]. Therefore, selecting the appropriate MCDA method depends on the specific problem being addressed and the characteristics of the data being evaluated [
55,
56,
57].
To analyze the differences in rankings obtained from TOPSIS and PROMETHEE, a consistency rate (CR), which is the percentage of cumulative differences of the model ranked from the same case for both MCDA tools, was defined. As observed from
Table 5, both MCDA tools ranked models M8, M1, M10, M2, and M3 in the 1st, 4th, 5th, 9th, and 10th ranking places, respectively, in scenario 1. Thus, the CR of both tools for the SSP was realized as 50% (5 out of 10) in scenario 1, 100% in scenario 2 and 70% (7 out of 10) in scenario 3, which yields an average of 73.3% overall. For the NSP type, the CR was observed to be 40%, 30%, and 50% for scenarios 1, 2, and 3, respectively. It is noteworthy that the CR is 70% in scenario 1, 60% in scenario 2, and 60% in scenario 3 for the MP type, whereas the CR values of the irregular type were 80%, 50%, and 30% in scenarios 1, 2, and 3, respectively. This analysis is denoted CR(0) and depicted in
Figure 6. The plus (+) values in
Figure 6 indicate that TOPSIS ranks the model higher compared to PROMETHEE, while the minus (−) values indicate the opposite condition. Furthermore, the CR was recomputed in such a way that
rank difference between the MCDA tools was tolerated and denoted as CR(1), shown in a gray color in
Figure 5. Considering the CR(1) values, both MCDA tools are in a high accordance for almost all cases implemented in this study. Thus, this analysis highlights the effectiveness of these MCDA tools in providing reliable model rankings.