Incorporating Recursive Feature Elimination and Decomposed Ensemble Modeling for Monthly Runoff Prediction
Abstract
:1. Introduction
2. Study Area and Data
3. Research Methods
3.1. Decomposed Ensemble Model for Streamflow Forecasting
3.2. Model Evaluation Criteria
3.3. Singular Spectrum Analysis (SSA) for Data Decomposition
3.4. LSTM for Time Series Forecasting
4. Case Study
4.1. Experimental Setup
4.2. PACF for Optimal Lag Selection
4.3. Data Normalization
4.4. Bayesian Optimization for Hyperparameter Tuning
5. Results Analysis
5.1. Data Decomposition
5.2. Determining Input Variables Using PACF
5.3. Mutual Information Method for Predictor Screening
5.4. Predictor Selection via Recursive Feature Elimination and Cross Validation
5.5. Comparison of Model Performance Across Prediction Schemes
5.6. Comparison of Direct and RFECV Prediction Schemes
6. Discussion
7. Conclusions
- The decomposition levels of SSA and VMD can be determined by observing central frequency aliasing in the last subprocess to prevent frequency overlap across subcomponents, minimize intercorrelation, and avoid spurious or redundant elements.
- When using the db45-3 decomposition, RFECV–LSTM exhibits higher NSE values and lower NRMSE and PPTS values compared to the direct LSTM model when applied to data from the Hanzhong station, with the most pronounced performance gap across all cases. This demonstrates that the recursive pruning of weakly correlated predictors can further enhance predictive accuracy. With high predictor dimensionality and weak correlation to the forecast target, RFECV–LSTM shows a superior forecasting performance over its direct LSTM counterpart. MIR–LSTM performs much worse than the other two schemes by directly removing predictors below the average mutual information value, resulting in some loss of valuable predictive information.
- Although VMD and DWT yield lower intercorrelation among subcomponents than SSA, their most difficult to predict subcomponents have higher noise levels in the frequency spectrum compared to SSA. Thus, SSA–LSTM achieves the best predictive performance.
- The proposed RFECV–SSA–LSTM model achieves NSE values greater than 0.9 across all lead times of 1, 3, 5, and 7 months, outperforming the RFECV–LSTM, MIR–LSTM, and direct LSTM forecasting models based on different decomposition methods. Thus, RFECV–SSA–LSTM is a mature, reliable, and effective streamflow forecasting scheme.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xu, Z.; Mo, L.; Zhou, J.; Fang, W.; Qin, H. Stepwise decomposition-integration-prediction framework for runoff forecasting considering boundary correction. Sci. Total Environ. 2022, 851, 158342. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Chen, Z.; Qin, M. Monthly Runoff Prediction Via Mode Decomposition-Recombination Technique. Water Resour. Manag. 2024, 38, 269–286. [Google Scholar] [CrossRef]
- Wu, C.L.; Chau, K.W. Data-driven models for monthly streamflow time series prediction. Eng. Appl. Artif. Intell. 2010, 23, 1350–1367. [Google Scholar] [CrossRef]
- Xie, T.; Zhang, G.; Hou, J.; Xie, J.; Lv, M.; Liu, F. Hybrid forecasting model for non-stationary daily runoff series: A case study in the Han River Basin, China. J. Hydrol. 2019, 577, 123915. [Google Scholar] [CrossRef]
- Ribeiro, V.H.A.; Reynoso-Meza, G.; Siqueira, H.V. Multi-objective ensembles of echo state networks and extreme learning machines for streamflow series forecasting. Eng. Appl. Artif. Intell. 2020, 95, 103910. [Google Scholar] [CrossRef]
- Wagena, M.B.; Goering, D.; Collick, A.S.; Bock, E.; Fuka, D.R.; Buda, A.; Easton, Z.M. Comparison of short-term streamflow forecasting using stochastic time series, neural networks, process-based, and Bayesian models. Environ. Modell. Softw. 2020, 126, 104669. [Google Scholar] [CrossRef]
- Adnan, R.M.; Liang, Z.; Heddam, S.; Zounemat-Kermani, M.; Kisi, O.; Li, B. Least square support vector machine and multivariate adaptive regression splines for streamflow prediction in mountainous basin using hydro-meteorological data as inputs. J. Hydrol. 2020, 586, 124371. [Google Scholar] [CrossRef]
- Kumar, V.; Unal, S.; Bhagat, S.K.; Tiyasha, T. A data-driven approach to river discharge forecasting in the Himalayan region: Insights from Aglar and Paligaad rivers. Results Eng. 2024, 22, 102044. [Google Scholar] [CrossRef]
- Costa, G.E.D.M.; Menezes Filho, F.C.M.D.; Canales, F.A.; Fava, M.C.; Brandão, A.R.A.; de Paes, R.P. Assessment of Time Series Models for Mean Discharge Modeling and Forecasting in a Sub-Basin of the Paranaíba River, Brazil. Hydrology 2023, 10, 208. [Google Scholar] [CrossRef]
- Wang, J.; Li, Y.; Gao, R.X.; Zhang, F. Hybrid physics-based and data-driven models for smart manufacturing: Modelling, simulation, and explainability. J. Manuf. Syst. 2022, 63, 381–391. [Google Scholar] [CrossRef]
- Shen, Q.; Mo, L.; Liu, G.; Wang, Y.; Zhang, Y. Interpretable probabilistic modeling method for runoff prediction: A case study in Yangtze River basin, China. J. Hydrol. Reg. Stud. 2024, 52, 101684. [Google Scholar] [CrossRef]
- Zhai, X.B.; Li, Y.P.; Liu, Y.R.; Huang, G.H. Assessment of the effects of human activity and natural condition on the outflow of Syr Darya River: A stepwise-cluster factorial analysis method. Environ. Res. 2021, 194, 110634. [Google Scholar] [CrossRef] [PubMed]
- Yu, Y.; Zhu, R.; Ma, D.; Liu, D.; Liu, Y.; Gao, Z.; Yin, M.; Bandala, E.R.; Rodrigo-Comino, J. Multiple surface runoff and soil loss responses by sandstone morphologies to land-use and precipitation regimes changes in the Loess Plateau, China. Catena 2022, 217, 106477. [Google Scholar] [CrossRef]
- Tan, Q.; Lei, X.; Wang, X.; Wang, H.; Wen, X.; Ji, Y.; Kang, A. An adaptive middle and long-term runoff forecast model using EEMD-ANN hybrid approach. J. Hydrol. 2018, 567, 767–780. [Google Scholar] [CrossRef]
- Wang, S.; Peng, H.; Hu, Q.; Jiang, M. Analysis of runoff generation driving factors based on hydrological model and interpretable machine learning method. J. Hydrol. Reg. Stud. 2022, 42, 101139. [Google Scholar] [CrossRef]
- Ditthakit, P.; Pinthong, S.; Salaeh, N.; Weekaew, J.; Thanh Tran, T.; Bao Pham, Q. Comparative study of machine learning methods and GR2M model for monthly runoff prediction. Ain Shams Eng. J. 2022, 14, 101941. [Google Scholar] [CrossRef]
- Yong, W.; Zhang, H.; Fu, H.; Zhu, Y.; He, J.; Xie, J. Improving prediction accuracy of high-performance materials via modified machine learning strategy. Comput. Mater. Sci. 2022, 204, 111181. [Google Scholar] [CrossRef]
- Mao, G.; Wang, M.; Liu, J.; Wang, Z.; Wang, K.; Meng, Y.; Zhong, R.; Wang, H.; Li, Y. Comprehensive comparison of artificial neural networks and long short-term memory networks for rainfall-runoff simulation. Phys. Chem. Earth Parts A/B/C 2021, 123, 103026. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 8, 1735–1780. [Google Scholar] [CrossRef]
- Li, W.; Liu, C.; Xu, Y.; Niu, C.; Li, R.; Li, M.; Hu, C.; Tian, L. An interpretable hybrid deep learning model for flood forecasting based on Transformer and LSTM. J. Hydrol. Reg. Stud. 2024, 54, 101873. [Google Scholar] [CrossRef]
- Man, Y.; Yang, Q.; Shao, J.; Wang, G.; Bai, L.; Xue, Y. Enhanced LSTM Model for Daily Runoff Prediction in the Upper Huai River Basin, China. Engineering 2022, 24, 229–238. [Google Scholar] [CrossRef]
- Gou, J.; Miao, C.; Duan, Q.; Zhang, Q.; Guo, X.; Su, T. Seasonality and Impact Factor Analysis of Streamflow Sensitivity to Climate Change Across China. Earth’s Future 2022, 10, e2022EF003062. [Google Scholar] [CrossRef]
- Kumar, V.; Sen, S. Analysis of Spring Discharge in the Lesser Himalayas: A Case Study of Mathamali Spring, Aglar Watershed, Uttarakhand. Water Sci. Technol. Libr. 2018, 78, 321–338. [Google Scholar]
- Apaydin, H.; Taghi Sattari, M.; Falsafian, K.; Prasad, R. Artificial intelligence modelling integrated with Singular Spectral analysis and Seasonal-Trend decomposition using Loess approaches for streamflow predictions. J. Hydrol. 2021, 600, 126506. [Google Scholar] [CrossRef]
- Zuo, G.; Luo, J.; Wang, N.; Lian, Y.; He, X. Decomposition ensemble model based on variational mode decomposition and long short-term memory for streamflow forecasting. J. Hydrol. 2020, 585, 124776. [Google Scholar] [CrossRef]
- Gao, Z.; Yin, X.; Zhao, F.; Meng, H.; Hao, Y.; Yu, M. A two-layer SSA-XGBoost-MLR continuous multi-day peak load forecasting method based on hybrid aggregated two-phase decomposition. Energy Rep. 2022, 8, 12426–12441. [Google Scholar] [CrossRef]
- Shoaib, M.; Shamseldin, A.Y.; Melville, B.W.; Khan, M.M. A comparison between wavelet based static and dynamic neural network approaches for runoff prediction. J. Hydrol. 2016, 535, 211–225. [Google Scholar] [CrossRef]
- Huang, S.; Chang, J.; Huang, Q.; Chen, Y. Monthly streamflow prediction using modified EMD-based support vector machine. J. Hydrol. 2014, 511, 764–775. [Google Scholar] [CrossRef]
- Wang, W.; Chau, K.; Qiu, L.; Chen, Y. Improving forecasting accuracy of medium and long-term runoff using artificial neural network based on EEMD decomposition. Environ. Res. 2015, 139, 46–54. [Google Scholar] [CrossRef]
- Xiao, H.; Zhang, J. Multi-temporal relations between runoff and sediment load based on variable structure cointegration theory. Int. J. Sediment Res. 2023, 38, 216–227. [Google Scholar] [CrossRef]
- Marques, C.A.F.; Ferreira, J.A.; Rocha, A.; Castanheira, J.M.; Melo-Gonçalves, P.; Vaz, N.; Dias, J.M. Singular spectrum analysis and forecasting of hydrological time series. Phys. Chem. Earth Parts A/B/C 2006, 31, 1172–1179. [Google Scholar] [CrossRef]
- Tan, R.; Hu, Y.; Wang, Z. A multi-source data-driven model of lake water level based on variational modal decomposition and external factors with optimized bi-directional long short-term memory neural network. Environ. Modell. Softw. 2023, 167, 105766. [Google Scholar] [CrossRef]
- Ni, L.; Wang, D.; Singh, V.P.; Wu, J.; Wang, Y.; Tao, Y.; Zhang, J. Streamflow and rainfall forecasting by two long short-term memory-based models. J. Hydrol. 2020, 583, 124296. [Google Scholar] [CrossRef]
- Wang, C.; Xiao, Z.; Wu, J. Functional connectivity-based classification of autism and control using SVM-RFECV on rs-fMRI data. Phys. Medica 2019, 65, 99–105. [Google Scholar] [CrossRef] [PubMed]
- Xing, H.; Niu, J.; Feng, Y.; Hou, D.; Wang, Y.; Wang, Z. A coastal wetlands mapping approach of Yellow River Delta with a hierarchical classification and optimal feature selection framework. Catena 2023, 223, 106897. [Google Scholar] [CrossRef]
- Chen, B.; Steinberger, O.; Fenioux, R.; Duverger, Q.; Lambrou, T.; Dodin, G.; Blum, A.; Gondim Teixeira, P.A. Grading of soft tissues sarcomas using radiomics models: Choice of imaging methods and comparison with conventional visual analysis. Res. Diagn. Interv. Imaging 2022, 2, 100009. [Google Scholar] [CrossRef] [PubMed]
- Zheng, H.; Lv, W.; Wang, Y.; Feng, Y.; Yang, H. Molecular kinematic viscosity prediction of natural ester insulating oil based on sparse Machine learning models. J. Mol. Liq. 2023, 385, 122355. [Google Scholar] [CrossRef]
- Ladouali, S.; Katipoğlu, O.M.; Bahrami, M.; Kartal, V.; Sakaa, B.; Elshaboury, N.; Keblouti, M.; Chaffai, H.; Ali, S.; Pande, C.B.; et al. Short lead time standard precipitation index forecasting: Extreme learning machine and variational mode decomposition. J. Hydrol. Reg. Stud. 2024, 54, 101861. [Google Scholar] [CrossRef]
- Wu, Z.; Huang, N.E. Ensemble Empirical Mode Decomposition: A Noise-Assisted Data Analysis Method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
- Feng, Z.; Niu, W.; Wan, X.; Xu, B.; Zhu, F.; Chen, J. Hydrological time series forecasting via signal decomposition and twin support vector machine using cooperation search algorithm for parameter identification. J. Hydrol. 2022, 612, 128213. [Google Scholar] [CrossRef]
- Bai, P.; Liu, X.; Xie, J. Simulating runoff under changing climatic conditions: A comparison of the long short-term memory network with two conceptual hydrologic models. J. Hydrol. 2021, 592, 125779. [Google Scholar] [CrossRef]
- Zhou, Y.; Guo, S.; Xu, C.; Chang, F.; Yin, J. Improving the Reliability of Probabilistic Multi-Step-Ahead Flood Forecasting by Fusing Unscented Kalman Filter with Recurrent Neural Network. Water 2020, 12, 578. [Google Scholar] [CrossRef]
- Adaryani, F.R.; Jamshid Mousavi, S.; Jafari, F. Short-term rainfall forecasting using machine learning-based approaches of PSO-SVR, LSTM and CNN. J. Hydrol. 2022, 614, 128463. [Google Scholar] [CrossRef]
- Bhandari, H.N.; Rimal, B.; Pokhrel, N.R.; Rimal, R.; Dahal, K.R.; Khatri, R.K.C. Predicting stock market index using LSTM. Mach. Learn. Appl. 2022, 9, 100320. [Google Scholar] [CrossRef]
- Anh, D.T.; Pandey, M.; Mishra, V.N.; Singh, K.K.; Ahmadi, K.; Janizadeh, S.; Tran, T.T.; Linh, N.T.T.; Dang, N.M. Assessment of groundwater potential modeling using support vector machine optimization based on Bayesian multi-objective hyperparameter algorithm. Appl. Soft Comput. 2023, 132, 109848. [Google Scholar] [CrossRef]
- Su, Z.; Wang, Y.; Tan, B.; Cheng, Q.; Duan, X.; Xu, D.; Tian, L.; Qi, T. Performance prediction of disc and doughnut extraction columns using bayes optimization algorithm-based machine learning models. Chem. Eng. Process. Process Intensif. 2023, 183, 109248. [Google Scholar] [CrossRef]
- Yan, X.; Liu, D.; Xu, W.; He, D.; Hao, H. Hydraulic fracturing performance analysis by the mutual information and Gaussian process regression methods. Eng. Fract. Mech. 2023, 286, 109285. [Google Scholar] [CrossRef]
- Li, L.; Ching, W.; Liu, Z. Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods. Comput. Biol. Chem. 2022, 100, 107747. [Google Scholar] [CrossRef]
Decomposed IMFs | Numbers of Input | Input Variables |
---|---|---|
IMF1 | 7 | |
IMF2 | 4 | |
IMF3 | 5 | |
IMF4 | 6 | |
IMF5 | 5 | |
IMF6 | 4 | |
IMF7 | 5 | |
IMF8 | 5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, W.; Zhang, X.; Shen, Y.; Xie, J.; Zuo, G.; Zhang, X.; Jin, T. Incorporating Recursive Feature Elimination and Decomposed Ensemble Modeling for Monthly Runoff Prediction. Water 2024, 16, 3102. https://doi.org/10.3390/w16213102
Ma W, Zhang X, Shen Y, Xie J, Zuo G, Zhang X, Jin T. Incorporating Recursive Feature Elimination and Decomposed Ensemble Modeling for Monthly Runoff Prediction. Water. 2024; 16(21):3102. https://doi.org/10.3390/w16213102
Chicago/Turabian StyleMa, Wei, Xiao Zhang, Yu Shen, Jiancang Xie, Ganggang Zuo, Xu Zhang, and Tao Jin. 2024. "Incorporating Recursive Feature Elimination and Decomposed Ensemble Modeling for Monthly Runoff Prediction" Water 16, no. 21: 3102. https://doi.org/10.3390/w16213102
APA StyleMa, W., Zhang, X., Shen, Y., Xie, J., Zuo, G., Zhang, X., & Jin, T. (2024). Incorporating Recursive Feature Elimination and Decomposed Ensemble Modeling for Monthly Runoff Prediction. Water, 16(21), 3102. https://doi.org/10.3390/w16213102