Next Article in Journal
Block-Greedy and CNN Based Underwater Image Dehazing for Novel Depth Estimation and Optimal Ambient Light
Previous Article in Journal
Evolution of Surface Drainage Network for Spoil Heaps under Simulated Rainfall
 
 
Article
Peer-Review Record

Splitting and Length of Years for Improving Tree-Based Models to Predict Reference Crop Evapotranspiration in the Humid Regions of China

Water 2021, 13(23), 3478; https://doi.org/10.3390/w13233478
by Xiaoqiang Liu 1, Lifeng Wu 2,3,*, Fucang Zhang 1,*, Guomin Huang 2, Fulai Yan 1 and Wenqiang Bai 1
Reviewer 1:
Reviewer 2: Anonymous
Water 2021, 13(23), 3478; https://doi.org/10.3390/w13233478
Submission received: 1 November 2021 / Revised: 23 November 2021 / Accepted: 24 November 2021 / Published: 6 December 2021
(This article belongs to the Section Hydrology)

Round 1

Reviewer 1 Report

The paper needs major modifications before it is processed:

1- Abstract section needs to add some quantitative performance of RF and .  XGB. The sentences "However, model performance with the 10-year data was worst among different time spans when considering the independent test, while the time span of 30 years had  high accuracy and stable performance. Therefore, to improve the daily ET0 predicting performance of the models in humid regions of China, the random forest model with data sets of 30-year and the 9:1 data splitting strategy is recommended." need modifications. 

2-Adding some applications of random forest model to the introduction model can improve literature review:

-Flood Risk Mapping by Remote Sensing Data and Random Forest Technique

-A Novel Multiple-Kernel Support Vector Regression Algorithm for Estimation of Water Quality Parameters

3-Why did used RF and XGB techniques rather other soft computing models?

4-What are criterion to select meteorological variables?? This section needs major clarifications. In another research, "Meteorological variables including maximum temperature (Tmax), minimum temperature (Tmin), average temperature (Tmean), relative humidity (RH), 24-h rainfall (P24) and wind speed (U2) were used. Authors can use this research work:

-Evaluation of drought events in various climatic conditions using data-driven models and a reliability-based probabilistic model

5-In the "statistical performance analysis", authors are recommended to use Scatter Index and BIAS instead of NSE because NSE and R2 come from the same origin. SI and BIAS formulations are found in the followings:

More reliable predictions of clear-water scour depth at pile groups by robust artificial intelligence techniques while preserving physical consistency

6-Equation given by R^2 is indeed R. Please take the sqrt from Eq.(11) and convert it to R. In fact, meaning of R^2 (CoD: coefficient of determination) is quite different from R. R^2 is not coefficient of determination and authors took the second exponent of R.

7-Reiliability and uncertainty of results need to be added to the results:

Receiving More Accurate Predictions for Longitudinal Dispersion Coefficients in Water Pipelines: Training Group Method of Data Handling Using Extreme Learning Machine Conceptions 

Author Response

Reviewer #1: Summary (Manuscript ID: water-1467728)

 

The paper needs major modifications before it is processed:

 

1- Abstract section needs to add some quantitative performance of RF and XGB. The sentences "However, model performance with the 10-year data was worst among different time spans when considering the independent test, while the time span of 30 years had high accuracy and stable performance. Therefore, to improve the daily ET0 predicting performance of the models in humid regions of China, the random forest model with data sets of 30-year and the 9:1 data splitting strategy is recommended." need modifications.

 

Response: Thanks for pointing this out. We had added some quantitative performance of RF and XGB. The sentences "However, model performance with the 10-year data was worst among different time spans when considering the independent test, while the time span of 30 years had high accuracy and stable performance. Therefore, to improve the daily ET0 predicting performance of the models in humid regions of China, the random forest model with data sets of 30-year and the 9:1 data splitting strategy is recommended." had modified. The details were as follows:

 

“Results showed that the accuracy of RF (R2=0.867, RMSE=0.517, MAE=0.377, NSE=0.860) was overall better than that of XGB (R2=0.862, RMSE=0.528, MAE=0.383, NSE=0.854) in different input parameters. ”

“Nevertheless, the performance of 10-year data was worst among three time spans when considering the independent test. Therefore, to improve the daily ET0 predicting performance of the tree-based models in humid regions of China, the random forest model with data sets of 30-year and the 9:1 data splitting strategy is recommended. ”

 

2-Adding some applications of random forest model to the introduction model can improve literature review:

 

-Flood Risk Mapping by Remote Sensing Data and Random Forest Technique

 

Response: Thanks for pointing this out. According to your opinion, we have added literature on flood risk mapping based on remote sensing data and random forest technology.

 

“In addition, random forest is also widely used in flood probability mapping (Chen et al., 2020; Avand et al., 2020), and there are relevant reports using remote sensing data (Avand et al., 2021; Najafzadeh et al., 2021a). Meanwhile, the random forest had also been well applied in water quality (Wang et al., 2021)”

 

-A Novel Multiple-Kernel Support Vector Regression Algorithm for Estimation of Water Quality Parameters

 

Response: Thanks for pointing this out. According to your opinion, we have added literature on Estimation of water quality parameters based on a novel multiple-kernel support vector regression algorithm.

 

“Over the past few decades, machine learning model have been successfully modeled in various fields (pan evaporation, dew point temperature, global solar radiation, streamflow, water quality, drought events et al.), due to their excellence in dealing with complex and nonlinear relationships (Sun et al., 2009; Sing et al., 2011;Kim et al., 2015; Mehdizadeh et al., 2017; Wang et al., 2017a, b; Yaseen et al., 2018a, b; Kaba et al., 2018; Keshtegar et al., 2018; Fan et al., 2018a, b; Wu et al., 2019a; Dong et, al., 2020; Najafzadeh and Niazmardi 2021; Barzkar et al 2021; Movahed et al., 2020; Najafzadeh et al 2021a)”

 

3-Why did used RF and XGB techniques rather other soft computing models?

 

Response: Thanks for pointing this out. As for the question you raised, we also considered other models in the process of data processing, but we found that the calculation result was not ideal after calculation, so we only used RF and XGB models. This also showed that the RF and XGB is stable and reliable. Other scholars have also been widely used in ET0 predicting.

 

“Feng, Y., Cui, N., D., Gong, N., et al., 2017c. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 193, 163-173.

Wang, S., Lian, J., Peng, Y., et al., 2019. Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China. Agric. Water Manag. 221, 220-230.

Fan, J., Wu, L., Zheng, J., Zhang, F., 2021b. Medium-range forecasting of daily reference evapotranspiration across

China using numerical weather prediction outputs downscaled by extreme gradient boosting. J. Hydrol. 601, 126664.

Wu, L., Peng, Y., Fan, J., Wang, Y., 2019b. Machine learning models for the estimation of monthly mean daily reference evapotranspiration based on cross-station and synthetic data. Hydrol. Res. 50 (6), 1730-1750.”

 

4-What are criterion to select meteorological variables?? This section needs major clarifications. In another research, "Meteorological variables including maximum temperature (Tmax), minimum temperature (Tmin), average temperature (Tmean), relative humidity (RH), 24-h rainfall (P24) and wind speed (U2) were used. Authors can use this research work:

 

-Evaluation of drought events in various climatic conditions using data-driven models and a reliability-based probabilistic model

 

Response: Thanks for pointing this out. My answer is:first of all, temperature is the most basic element ; second, estimating ET0, RH, U2, Rs is essential, but these three are less readily available than temperature, coupled with the difficulty of obtaining high-quality and stable long-sequence data. These meteorological parameters were combined with temperature, on the one hand, it was convenient to compare and evaluate their contribution to ET0. On the other hand, it can reduce the impact of data interaction. To sum up, the meteorological variables had been selected in this manuscript. We carefully read this document: Evaluation of drought events in various climatic conditions using data-driven models and a reliability-based probabilistic model. Thank you very much for providing ideas for my future research work. We also hope to learn from you in this field.

 

5-In the "statistical performance analysis", authors are recommended to use Scatter Index and BIAS instead of NSE because NSE and R2 come from the same origin. SI and BIAS formulations are found in the followings:

 

More reliable predictions of clear-water scour depth at pile groups by robust artificial intelligence techniques while preserving physical consistency

 

Response: Thanks for pointing this out. About your suggestion to use Scatter Index and BIAS instead of NSE. We believed that although NSE and R2 come from the same origin, NSE can reflect the accuracy of simulation from another angle. We believe that although NSE and R2 come from the same origin, NSE can indicate that the daily ET0 predicting are close to the average level of the FAO-56 PM ET0, while R2 indicates the degree of coincidence between the FAO-56 PM ET0 value and the ET0 predicting value. In addition, the statistical indicators in this manuscript are analyzed with the average value. Therefore, after comprehensive consideration, we think NSE can be used. Furthermore, NSE is also widely used in this research field.

 

“Shiri, J., 2018. Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology. J. Hydrol. 561, 737-750.

Feng, Y., Cui, N., D., Gong, N., et al., 2017c. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 193, 163-173.”

 

6-Equation given by R^2 is indeed R. Please take the sqrt from Eq.(11) and convert it to R. In fact, meaning of R^2 (CoD: coefficient of determination) is quite different from R. R^2 is not coefficient of determination and authors took the second exponent of R.

 

Response: Thanks for pointing this out. Let's understand the difference between R and R2, but after comprehensive consideration, we still decided to use R2. Because a large number of literature use R2 and other statistical indicators to analyze the model. The details are as follows:

 

“Fan, J., Ma, Xin., Wu, L., et al., 2019. Light Gradient boostinging Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric. Water Manage. 225, 105758.

Wu, L., Huang, G., Fan, J., et al., 2020. Hybrid extreme learning machine with meta-heuristic algorithms for monthly pan evaporation prediction. Comput. Electron. Agric. 168, 205115.

Lu, X., Ju, Y., Wu, L., et al., 2018. Daily pan evaporation modeling from local and cross-station data using three tree-basedmachine learning models. J. Hydrol. 566, 668-684.”

 

7-Reiliability and uncertainty of results need to be added to the results:

 

Receiving More Accurate Predictions for Longitudinal Dispersion Coefficients in Water Pipelines: Training Group Method of Data Handling Using Extreme Learning Machine Conceptions

 

Response: Thanks for pointing this out. This study mainly focuses on the impact of different data splitting strategies and data time spans on the accuracy of the model. Our results show that different data splitting strategies and data time spans during the modeling process can affect the accuracy of the model prediction. Therefore, the uncertainty of the results of this study came from the data splitting strategies, time spans and the meteorological factors of climate change. In this manuscript, the uncertainty was expressed in the accuracy range of the model.

 

The above modifications have been highlighted in yellow in the revision manuscript. Thank you once for your suggestions on this manuscript.

Reviewer 2 Report

I have carefully reviewed the above manuscript, which falls within the scope of the Journal. Overall, the manuscript is a good paper, well written with proper structure, thorough literature review, and exciting findings. Following there are some comments for the authors.

Based upon my comments, the current manuscript could be accepted for publication after minor revisions.

 

Comments:

  • Line 177: Replace “includeing” by “including”.
  • Line 183: Replace “This” by “this”.
  • Line 189: add the “temperature”.

Author Response

Reviewer #2: Summary (Manuscript ID: water-1467728)

 

I have carefully reviewed the above manuscript, which falls within the scope of the Journal. Overall, the manuscript is a good paper, well written with proper structure, thorough literature review, and exciting findings. Following there are some comments for the authors.

 

Based upon my comments, the current manuscript could be accepted for publication after minor revisions.

 

Comments:

 

Line 177: Replace “includeing” by “including”.

 

Response: Thanks for pointing this out. We have revised according to your request, hoping to meet your requirements for this manuscript.

 

“This area is rich in water and heat resources, the geographic range including two river basins (the Yangtze River basin and the Pearl River basin).

Line 183: Replace “This” by “this”.

 

Response: Thanks for pointing this out. We have revised according to your suggestion, hoping to meet your requirements for this manuscript.

 

“Therefore, this area has become an area of widespread concern for many scholars who study hydrological phenomena and climate (Lu et al., 2018; Wu et al., 2019c; Huang et al., 2019).”

 

Line 189: add the “temperature”.

 

Response: Thanks for pointing this out. We have revised according to your recommendation, hoping to meet your requirements for this manuscript.

 

“Used temperature data”

 

The above modifications have been highlighted in yellow in the revision manuscript.

Round 2

Reviewer 1 Report

Accept as is.

Back to TopTop