Combining Multiple Machine Learning Methods Based on CARS Algorithm to Implement Runoff Simulation
Abstract
:1. Introduction
2. Methodology
2.1. Runoff Forecasting Models
2.2. Feature Selection Algorithm Based on CARS
2.3. Machine Learning Models
2.3.1. Gradient Boosting Decision Tree (GBDT) Model
2.3.2. Neural Network Models
2.3.3. Shallow Machine Learning Models
3. Study Area and Data Description
3.1. Study Area
3.2. Data Description
4. Data-Driven Runoff Forecasting Model Based on CARS
4.1. Identification of Model Order
4.2. Settings of Model Parameter
4.3. Performance Evaluation Metrics
5. Results and Analysis
5.1. Overall Forecasting Performance Evaluation
5.2. Effectiveness of Flood Forecasting for Different Events
5.3. Percentage Analysis of Duration Curves for Low, Medium, and High Flows
5.4. Analysis of Model Generalization Ability
6. Conclusions
- (1)
- Machine learning runoff models based on the CARS feature selection algorithm demonstrate good applicability in flood forecasting. For a lead time of T = 1 h, the NSE of the seven models ranges from 0.9205 to 0.9965, the R2 ranges from 0.8853 to 0.9741, and the RMSE ranges from 6.72 m3/s to 32.01 m3/s. Forecast accuracy decreases to some extent with increasing lead time. Overall, the machine learning runoff models based on the CARS feature selection algorithm achieve satisfactory simulation results, with the SVR model showing relatively high forecasting accuracy;
- (2)
- Through the study of flood forecasting for different flood events, during the simulation process, the seven models often exhibit fluctuations in the pre-peak stage, with the onset of rising occurring earlier than the actual measured flood. However, they demonstrate good forecasting performance in terms of peak flow present time and recession stage. The SVR model predicts flow closest to the measured flow, performs well in predicting peak flow, and exhibits the most stable overall forecasting performance, indicating good flood forecasting capabilities. As the forecast lead time increases, the NSE and R2 of the seven models decrease to varying extents, resulting in reduced forecasting accuracy. Additionally, the predicted peak flow for flood events tends to decrease with longer lead times, leading to instances where the models underestimate the peak flow;
- (3)
- Considering the percentage indicators of flow historical curves (%BiasFHV, %BiasFMS, and %BiasFLV), the SVR model’s simulated runoff shows overestimation in high-flow segments, underestimation in medium–high-flow segments, and lower accuracy in low-flow segments compared to medium–high-flow segments. Overall, it exhibits the smallest deviations compared to other models and performs the best;
- (4)
- The NRMSE values of the seven models on the testing dataset are sometimes higher than those on the validation dataset but lower than those on the training dataset, indicating good generalization ability. As the lead time increases, the NRMSE values gradually decrease on the training, validation, and testing datasets, suggesting a gradual decline in generalization ability.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kan, G.; Hong, Y.; Liang, K. Flood Forecasting Research Based on Coupled Machine Learning Models. China Rural Water Hydrol. 2018, 10, 165–169, 176. (In Chinese) [Google Scholar]
- Kan, G. Study on Application and Comparative of Data-Driven Model and Semi-Data-Driven Model for Rainfall-Runoff Simulation. Acta Geod. Cartogr. Sin. 2017, 46, 265. (In Chinese) [Google Scholar]
- Liang, K.; Kan, G.; Li, Z. Application of A New Coupled Data-driven Model in Rainfall-Runoff Simulation. J. China Hydrol. 2016, 36, 1–7. (In Chinese) [Google Scholar]
- Beven, K.J.; Kirkby, M.J. A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrol. Sci. J. 1979, 24, 43–69. [Google Scholar] [CrossRef]
- Du, T.; Guo, M.; Zhang, J.; Tian, S. Hydrological simulation and application of VIC model in the Xijiang River Basin. Res. Soil Water Conserv. 2021, 28, 121–127. (In Chinese) [Google Scholar] [CrossRef]
- Zhao, A.Z.; Liu, X.F.; Zhu, X.F.; Pan, Y.Z.; Li, Y.Z. Spatial and temporal distribution of drought in the Weihe River Basin based on SWAT model. Prog. Geogr. 2015, 34, 1156–1166. (In Chinese) [Google Scholar]
- Baker, T.J.; Miller, S.N. Using the Soil and Water Assessment Tool (SWAT) to assess land use impact on water resources in an East African watershed. J. Hydrol. 2013, 486, 100–111. [Google Scholar] [CrossRef]
- Assaf, M.N.; Manenti, S.; Creaco, E.; Giudicianni, C.; Tamellini, L.; Todeschini, S. New optimization strategies for SWMM modeling of stormwater quality applications in urban area. J. Environ. Manag. 2024, 361, 121244. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, H.; Guo, N.; Xu, Y.; Li, L.; Xie, J. Ensemble modeling of non-stationary runoff series based on time series decomposition and machine learning. Prog. Water Sci. 2023, 34, 42–52. (In Chinese) [Google Scholar]
- Li, B.; Tian, F.; Li, Y.; Ni, G. Deep learning hydrological model integrating spatial-temporal characteristics of meteorological elements. Prog. Water Sci. 2022, 33, 904–913. (In Chinese) [Google Scholar]
- Valipour, M. Long-term runoff study using SARIMA and ARIMA models in the United States. Meteorol. Appl. 2015, 22, 592–598. [Google Scholar] [CrossRef]
- Duan, Y.; Ren, L. Research on daily runoff prediction of the middle reaches of the Yellow River based on BP neural network. People’s Yellow River 2020, 42, 5–8. (In Chinese) [Google Scholar]
- Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall-runoff modelling using long short-term memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
- Chiang, Y.-M.; Chang, F.-J. Integrating hydrometeorological information for rainfall-runoff modelling by artificial neural networks. Hydrol. Process. Int. J. 2009, 23, 1650–1659. [Google Scholar] [CrossRef]
- Kan, G.; Yao, C.; Li, Q.; Li, Z.; Yu, Z.; Liu, Z.; Ding, L.; He, X.; Liang, K. Improving event-based rainfall-runoff simulation using an ensemble artificial neural network based hybrid data-driven model. Stoch. Environ. Res. Risk Assess. 2015, 29, 1345–1370. [Google Scholar] [CrossRef]
- Guo, T.; Song, S.; Zhang, T.; Wang, H. A new step-by-step decomposition integrated runoff prediction model based on two-stage particle swarm optimization algorithm. J. Hydraul. Eng. 2022, 53, 1456–1466. (In Chinese) [Google Scholar]
- Xiong, Y.; Zhou, J.; Sun, N.; Zhang, J.; Zhu, S. Monthly runoff forecast based on adaptive variational mode decomposition and long short-term memory network. J. Hydraul. Eng. 2023, 54, 172–183, 198. (In Chinese) [Google Scholar]
- Li, C.; Jiao, Y.; Kan, G.; Fu, X.; Chai, F.; Yu, H.; Liang, K. Comparisons of Different Machine Learning-Based Rainfall–Runoff Simulations under Changing Environments. Water 2024, 16, 302. [Google Scholar] [CrossRef]
- Kan, G.; Li, J.; Zhang, X.; Ding, L.; He, X.; Liang, K.; Jiang, X.; Ren, M.; Li, H.; Wang, F.; et al. A new hybrid data-driven model for event-based rainfall–runoff simulation. Neural Comput. Appl. 2017, 28, 2519–2534. [Google Scholar] [CrossRef]
- Ding, Q.; Wang, Y.; Zhang, J.; Jia, K.; Huang, H. Joint estimation of soil moisture and organic matter content in saline-alkali farmland using the CARS algorithm. Chin. J. Appl. Ecol. 2024, 35, 1321–1330. [Google Scholar] [CrossRef]
- Wang, Y.; Ding, Q.; Zhang, J.; Chen, R.; Jia, K.; Li, X. Retrieval of soil water and salt information based on UAV hyperspectral remote sensing and machine learning. Appl. Ecol. J. 2023, 34, 3045–3052. [Google Scholar]
- Shang, T.; Chen, R.; Zhang, J.; Wang, Y. Estimation of soil organic matter content in Yinchuan Plain based on fractional differential combined spectral index. J. Appl. Ecol. 2023, 34, 717–725. [Google Scholar]
- Szczepanek, R. Daily Streamflow Forecasting in Mountainous Catchment Using XGBoost, LightGBM and CatBoost. Hydrology 2022, 9, 226. [Google Scholar] [CrossRef]
- Hao, R.; Bai, Z. Comparative Study for Daily Streamflow Simulation with Different Machine Learning Methods. Water 2023, 15, 1179. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 27. [Google Scholar] [CrossRef]
- Shi, J.; Jin, S.; Hu, C. Analysis of peak flow characteristics change in the Bahe River Basin. Shaanxi Water Resour. 2023, 58–61. [Google Scholar] [CrossRef]
- Ke, X.; Wang, N. Comparative study on runoff variation laws in typical north and south Qinling basins. J. Xi’an Univ. Technol. 2019, 35, 452–458. [Google Scholar] [CrossRef]
- Kotthoff, L.; Thornton, C.; Hoos, H.H.; Hutter, F.; Vanschoren, J. Automated Machine Learning: Methods, Systems, Challenges; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Pelikan, M.; Pelikan, M. Bayesian optimization algorithm. In Hierarchical Bayesian Optimization Algorithm: Toward a New Generation of Evolutionary Algorithms; Springer: Berlin/Heidelberg, Germany, 2005; pp. 31–48. [Google Scholar]
- Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
- Yilmaz, K.K.; Gupta, H.V.; Wagener, T. A process-based diagnostic approach to model evaluation. Application to the NWS distributed hydrologic model. Water Resour. Res. 2008, 44. [Google Scholar] [CrossRef]
- Fu, X.; Kan, G.; Liu, R.; Liang, K.; He, X.; Ding, L. Research on Rain Pattern Classification Based on Machine Learning: A Case Study in Pi River Basin. Water 2023, 15, 1570. [Google Scholar] [CrossRef]
Model Name | Hyperparameter Name | Range | Optimal Value |
---|---|---|---|
CatBoost | Iterations | [50, 300] | 184 |
Learning rate | [0, 1] | 0.15 | |
Depth | [2, 15] | 8 | |
XGBoost | Iterations | [50, 300] | 280 |
Learning rate | [0, 1] | 0.15 | |
Depth | [2, 15] | 10 | |
LightGBM | Iterations | [50, 300] | 190 |
Learning rate | [0, 1] | 0.01 | |
Depth | [2, 15] | 10 | |
BP neural network | Number of neurons in the hidden layer 1 | [50, 300] | 96 |
Number of neurons in the hidden layer 2 | [50, 300] | 50 | |
Learning rate | [0, 1] | 0.0001 | |
Epochs | [50, 300] | 126 | |
LSTM neural network | Number of neurons in unit1 | [50, 300] | 32 |
Number of neurons in unit2 | [50, 300] | 59 | |
Learning rate | [0, 1] | 0.006 | |
Epochs | [50, 300] | 254 | |
SVR | C | [0.1, 100] | 10 |
kernel | [‘linear’, ‘rbf’, ‘poly’] | rbf | |
epsilon | [0.01, 0.2] | 0.01 | |
KNN | K | [1, 9] | 6 |
Forecast Period | Model Name | NSE | R2 | RMSE | %BiasFHV | %BiasFMS | %BiasFLV |
---|---|---|---|---|---|---|---|
T = 1 h | CatBoost | 0.9703 | 0.9391 | 19.55 | −0.75% | −2.36% | −44.75% |
XGBoost | 0.9796 | 0.9741 | 16.19 | −0.07% | 1.05% | −44.38% | |
LightGBM | 0.9661 | 0.9415 | 20.89 | −1.11% | −0.81% | −48.00% | |
LSTM | 0.9205 | 0.8853 | 32.01 | −18.63% | −25.80% | −43.43% | |
BP | 0.9689 | 0.9624 | 19.99 | −7.33% | −8.82% | −85.77% | |
KNN | 0.9680 | 0.9139 | 20.28 | −2.51% | −1.32% | 0.91% | |
SVR | 0.9965 | 0.9730 | 6.72 | −1.58% | 2.31% | −7.25% | |
T = 2 h | CatBoost | 0.9311 | 0.9092 | 29.77 | −0.75% | −2.55% | −53.05% |
XGBoost | 0.9361 | 0.9374 | 28.66 | 0.16% | 1.54% | −52.73% | |
LightGBM | 0.9407 | 0.8969 | 27.61 | −2.07% | 0.69% | −54.48% | |
LSTM | 0.9140 | 0.8802 | 33.27 | −14.17% | −24.35% | −59.03% | |
BP | 0.9466 | 0.9422 | 26.22 | −6.78% | −8.97% | −65.09% | |
KNN | 0.9419 | 0.8879 | 27.33 | −2.91% | −2.31% | 0.17% | |
SVR | 0.9874 | 0.9212 | 12.72 | −3.01% | 17.35% | 45.23% | |
T = 3 h | CatBoost | 0.8789 | 0.8921 | 39.47 | −0.75% | −2.30% | −55.05% |
XGBoost | 0.8832 | 0.9033 | 38.75 | −0.27% | 0.49% | −54.69% | |
LightGBM | 0.9011 | 0.8725 | 35.66 | −2.91% | −1.45% | −65.27% | |
LSTM | 0.8941 | 0.8780 | 36.93 | −11.86% | −17.70% | −68.20% | |
BP | 0.9019 | 0.9177 | 35.53 | −6.82% | −7.44% | −58.66% | |
KNN | 0.8996 | 0.8641 | 35.93 | −3.14% | −3.07% | −0.53% | |
SVR | 0.9707 | 0.8826 | 19.44 | −4.13% | 11.00% | 24.25% | |
T = 4 h | CatBoost | 0.8185 | 0.8836 | 48.32 | −0.75% | −3.53% | −44.77% |
XGBoost | 0.8162 | 0.8646 | 48.63 | −0.17% | 1.27% | −44.19% | |
LightGBM | 0.8507 | 0.8588 | 43.83 | −3.59% | −0.09% | −65.57% | |
LSTM | 0.8597 | 0.8715 | 42.55 | −9.85% | −6.85% | −83.26% | |
BP | 0.8659 | 0.8954 | 41.55 | −8.13% | −9.90% | −48.85% | |
KNN | 0.8469 | 0.8456 | 44.38 | −3.47% | −3.64% | −1.09% | |
SVR | 0.9474 | 0.8486 | 26.04 | −7.61% | 9.28% | 32.45% | |
T = 5 h | CatBoost | 0.7563 | 0.8766 | 55.99 | −0.79% | −3.79% | −61.98% |
XGBoost | 0.7522 | 0.8369 | 56.45 | −0.44% | 0.51% | −61.01% | |
LightGBM | 0.7954 | 0.8467 | 51.30 | −4.06% | −3.66% | −65.08% | |
LSTM | 0.8209 | 0.8586 | 48.13 | −6.92% | 14.62% | 13.90% | |
BP | 0.8221 | 0.8809 | 47.86 | −7.81% | −9.38% | −65.79% | |
KNN | 0.7881 | 0.8249 | 52.22 | −3.49% | −2.49% | −1.32% | |
SVR | 0.9004 | 0.8411 | 35.82 | −8.92% | 11.78% | 19.15% | |
T = 6 h | CatBoost | 0.6912 | 0.8718 | 63.03 | −0.77% | −3.17% | −56.11% |
XGBoost | 0.6874 | 0.8319 | 63.41 | −0.65% | 0.68% | −54.42% | |
LightGBM | 0.7373 | 0.8293 | 58.13 | −3.85% | −1.38% | −60.46% | |
LSTM | 0.7983 | 0.8471 | 51.04 | −7.38% | 4.83% | −28.25% | |
BP | 0.7654 | 0.8645 | 54.94 | −8.07% | −9.59% | −64.61% | |
KNN | 0.7274 | 0.8019 | 59.22 | −3.54% | −2.69% | −1.31% | |
SVR | 0.8496 | 0.8315 | 44.01 | −10.92% | 12.81% | 21.85% |
Flood Event Number | Model Name | 1 h | 3 h | 6 h |
---|---|---|---|---|
20090828 | svr | 0.30% | 1.96% | 2.65% |
knn | −4.60% | −2.14% | 7.98% | |
bp | 0.57% | 3.26% | 5.23% | |
lightGBM | −4.21% | −1.12% | 7.64% | |
lstm | 0.70% | 2.67% | 16.43% | |
Xgboost | −1.87% | −1.97% | 4.76% | |
catboost | −0.20% | 0.30% | 10.50% | |
20100820 | svr | −0.10% | 1.73% | 3.08% |
knn | 0.11% | 2.52% | 8.23% | |
bp | −1.75% | 2.65% | 3.21% | |
lightGBM | −2.32% | 0.50% | 3.36% | |
lstm | −1.35% | 3.60% | 23.36% | |
Xgboost | −0.90% | 3.25% | 4.90% | |
catboost | −3.30% | −3.05% | 11.50% |
Forecast Period | Model | Training | Validation | Testing |
---|---|---|---|---|
T = 1 h | CatBoost | 0.2341 | 0.4302 | 0.4021 |
XGBoost | 0.2366 | 0.2555 | 0.2099 | |
LightGBM | 0.3009 | 0.2862 | 0.3153 | |
LSTM | 0.4323 | 0.6051 | 0.5346 | |
BP | 0.2699 | 0.4682 | 0.2746 | |
KNN | 0.1983 | 0.3595 | 0.5575 | |
SVR | 0.0963 | 0.1182 | 0.0908 | |
T = 2 h | CatBoost | 0.4084 | 0.5407 | 0.4819 |
XGBoost | 0.4097 | 0.5568 | 0.3567 | |
LightGBM | 0.3904 | 0.4405 | 0.4245 | |
LSTM | 0.4385 | 0.7263 | 0.5439 | |
BP | 0.3597 | 0.5863 | 0.3523 | |
KNN | 0.3213 | 0.5329 | 0.6115 | |
SVR | 0.1816 | 0.2340 | 0.1689 | |
T = 3 h | CatBoost | 0.5589 | 0.7006 | 0.5647 |
XGBoost | 0.5598 | 0.6944 | 0.4878 | |
LightGBM | 0.4942 | 0.6297 | 0.5639 | |
LSTM | 0.4827 | 0.8591 | 0.5859 | |
BP | 0.4887 | 0.8054 | 0.4612 | |
KNN | 0.4554 | 0.7313 | 0.6878 | |
SVR | 0.2733 | 0.3907 | 0.2584 | |
T = 4 h | CatBoost | 0.6908 | 0.8687 | 0.6487 |
XGBoost | 0.6922 | 0.8745 | 0.6687 | |
LightGBM | 0.5990 | 0.8277 | 0.7015 | |
LSTM | 0.5552 | 1.0233 | 0.6548 | |
BP | 0.5767 | 0.9180 | 0.5296 | |
KNN | 0.5790 | 0.9202 | 0.7797 | |
SVR | 0.3651 | 0.5269 | 0.3487 | |
T = 5 h | CatBoost | 0.8029 | 1.0217 | 0.7268 |
XGBoost | 0.8045 | 0.9835 | 0.7907 | |
LightGBM | 0.6978 | 0.9894 | 0.8246 | |
LSTM | 0.6286 | 1.1786 | 0.7220 | |
BP | 0.6655 | 1.0571 | 0.6036 | |
KNN | 0.6865 | 1.1190 | 0.8743 | |
SVR | 0.4993 | 0.7531 | 0.4748 | |
T = 6 h | CatBoost | 0.9025 | 1.1709 | 0.8119 |
XGBoost | 0.9048 | 1.0853 | 0.8936 | |
LightGBM | 0.7976 | 1.1246 | 0.9009 | |
LSTM | 0.6660 | 1.2568 | 0.7629 | |
BP | 0.7662 | 1.2017 | 0.6904 | |
KNN | 0.7811 | 1.2928 | 0.9665 | |
SVR | 0.6132 | 0.9273 | 0.5832 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fan, Y.; Fu, X.; Kan, G.; Liang, K.; Yu, H. Combining Multiple Machine Learning Methods Based on CARS Algorithm to Implement Runoff Simulation. Water 2024, 16, 2397. https://doi.org/10.3390/w16172397
Fan Y, Fu X, Kan G, Liang K, Yu H. Combining Multiple Machine Learning Methods Based on CARS Algorithm to Implement Runoff Simulation. Water. 2024; 16(17):2397. https://doi.org/10.3390/w16172397
Chicago/Turabian StyleFan, Yuyan, Xiaodi Fu, Guangyuan Kan, Ke Liang, and Haijun Yu. 2024. "Combining Multiple Machine Learning Methods Based on CARS Algorithm to Implement Runoff Simulation" Water 16, no. 17: 2397. https://doi.org/10.3390/w16172397
APA StyleFan, Y., Fu, X., Kan, G., Liang, K., & Yu, H. (2024). Combining Multiple Machine Learning Methods Based on CARS Algorithm to Implement Runoff Simulation. Water, 16(17), 2397. https://doi.org/10.3390/w16172397