Next Article in Journal
The Circular Economy Approach to Improving CNP Ratio in Inland Fishery Wastewater for Increasing Algal Biomass Production
Next Article in Special Issue
Application of the British Columbia MetPortal for Estimation of Probable Maximum Precipitation and Probable Maximum Flood for a Coastal Watershed
Previous Article in Journal
Characteristics of Infrastructure and Surrounding Geo-Environmental Circumstances Involved in Fatal Incidents Caused by Flash Flooding: Evidence from Greece
Previous Article in Special Issue
Water Balance of Pit Lake Development in the Equatorial Region
 
 
Article
Peer-Review Record

Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages

Water 2022, 14(5), 747; https://doi.org/10.3390/w14050747
by Will Sanders, Dongfeng Li, Wenzhao Li and Zheng N. Fang *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Water 2022, 14(5), 747; https://doi.org/10.3390/w14050747
Submission received: 13 January 2022 / Revised: 8 February 2022 / Accepted: 16 February 2022 / Published: 26 February 2022
(This article belongs to the Special Issue Advances in Flood Forecasting and Hydrological Modeling)

Round 1

Reviewer 1 Report

This paper presents a data-driven modeling study for flood stage forecast. The authors develop a modeling framework with XGBoost machine learning algorithm, which can predict flood stage levels at a 5-minute interval. In the study, the authors used several extreme events as training/validation datasets and showed good performance of the model. The manuscript is on a topic of interest to the audience of this journal. I only have a few comments that I hope the authors could address in their revision.

Specific comments:

  1. As the paper shows in the results section, most of the simulations have a delay of peak time. It will be helpful if the authors can provide some detailed explanation for this delay. The authors provide discussion about possible methods that can improve the model performance in the conclusions section. I would suggest to move this part to the discussion section. In the conclusions section, the authors can briefly list the improvement directions and also future study ideas.
  2. In Table 4, the difference of lead times is significantly larger for S#2 simulation than S#1. S#2 simulations have more input information than S#1, which is expected to have better performance for a data-driven model. The authors briefly discussed about this in the Discussion section. I suggest the authors add more in-depth discussion about this issue and show the actual effect of different input data (inside or outside the watershed) on the modeling performance.

Author Response

1. As the paper shows in the results section, most of the simulations have a delay of peak time. It will be helpful if the authors can provide some detailed explanation for this delay. The authors provide discussion about possible methods that can improve the model performance in the conclusions section. I would suggest to move this part to the discussion section. In the conclusions section, the authors can briefly list the improvement directions and also future study ideas.

 

Response:

We would like to thank the reviewer for the insightful comments on the delay of the peak time as shown in Figures 8 – 13. Here we explain as following: the XGBoost-based FAS reported the flood possible (FP) and flood likely (FL) alerts only when at least three models together predict the stage levels are over the FP or FL levels. It is a relatively conservative engineering approach when compared with a more proactive design (e.g., reporting flood alert whenever one of the models predicts over FP/FL stages). This conversative approach normally result in a short delay (5-10 mins) as shown in Table 4. From a warning system/application standpoint, we believe that this conservative engineering design will reduce unnecessary overreactions and the delay less than 10 mins is regarded acceptable in flood practices considering that the system interval is at a 5-min step. For future testing and other applications, a more aggressive triggering mechanism could be set. For an example, the trigger could be set when the first critical stage is predicted for a short-lead time model instead of the 3-models setup in the current study. This effort will be reported in a forthcoming paper. Based on the reviewer’s suggestion, we have added one more information (lines 462 to 474) in the discussion section that explains this matter from both the XGboost modeling and flood warning perspectives:

 

“It is noted that some of the S#1 models have 5-10 minute delay in predicting the times of the critical stages mainly due to the mechanism of how the system triggers the flood possible (FP) and flood likely (FL) alerts only when at least three models predict the stage levels are over the FP or FL levels. It is a conservative engineering approach that will likely result in short delay as shown in Table 4. From a flood warning system/application standpoint, this conservative engineering design can reduce unnecessary overreactions and the delay of less than 10 minutes is acceptable in flood practices considering that the system interval is at a 5-minute step based on the temporal resolution of the source data. The delay times can be reduced by utilizing finer temporal resolution (e.g., 1-minute) data, as well as implementing a more aggressive alerting design (e.g., triggering flood alert when-ever one of the models firstly predicts higher values than the critical stages). On-going research will be reported in a forthcoming paper for this particular effort.”

 

We also thank for the reviewer’s suggestion on moving the part of future work to improvement of the flood alert system into the discussion section. After carefully evaluation, we still believe the current structure of the discussion section covers all the necessary discussions pertinent to the results and the conclusion section is the best section to demonstrate the future work plans.

 

2. In Table 4, the difference of lead times is significantly larger for S#2 simulation than S#1. S#2 simulations have more input information than S#1, which is expected to have better performance for a data-driven model. The authors briefly discussed about this in the Discussion section. I suggest the authors add more in-depth discussion about this issue and show the actual effect of different input data (inside or outside the watershed) on the modeling performance.

Response:

We also thank for the reviewer’s insights to improve the discussion of the performance differences between S#1 and S#2 scenarios. We have added more explanations about the different gauge types inside and outside of the watershed and their impact on the stage forecasting. In addition, we have added more information and references regarding to the improvements of the system by input variables selection, introducing the internal spatial information other than using external rain gauges data, as shown in lines 479 – 496:

 

“The authors think that there could be several reasons: firstly, using of more information of the watershed might have weaken the overall performance as shown in S#2 (the S# 2 includes additional rain gauges from outside of the watershed in additional to the same information as the S#1 is configured with). It is noted that stage data from the stream gauges as used in the S#1 serves as more direct information for stage forecasting than the information obtained from the rain gauges. Secondly, it is also likely that the major storms recorded by the gauges outside of the watershed have a negative impact on the stage prediction through less accurate and indirect information. Lastly, the XGBoost models utilize all the input variables directly without preprocessing or selection and a proper selection of input variables is needed to improve the performance of XGBoost models[49]. Overall, the results of this study suggest that including the gauge information only from the inside of the watershed as configured in S#1 is adequate and advantageous for training XGBoost models for flood warning practices. Instead, the internal spatial information of the watershed, such as variables of physical conditions (e.g., soil moisture, land cover and land use, etc.), as well as other available geological, geomorphological, and hazard data can be more applicable when they are integrated into the flood alert system to improve its performance in flood critical areas for civil protection purposes [50].”

Reviewer 2 Report

The manuscript presents a prototype, data-driven, flood alerting system oriented to local flood warning. The paper is well organized and the authors present clear and solid their method and corresponding results. The discussion covers all the main issues that may be raised. the supplementary material clarifies some details of the proposed method.

Overall, the manuscript is well prepared and its contribution falls into the scope of the SI. thus i suggest publication in its present form

Author Response

The manuscript presents a prototype, data-driven, flood alerting system oriented to local flood warning. The paper is well organized and the authors present clear and solid their method and corresponding results. The discussion covers all the main issues that may be raised. the supplementary material clarifies some details of the proposed method.

Overall, the manuscript is well prepared and its contribution falls into the scope of the SI. thus i suggest publication in its present form

Response:

We would like to thank the reviewer for recognizing the value of the paper. We appreciate the reviewer’s decision to allow us to publish our manuscript in its present form.

Reviewer 3 Report

Dear authors,

thank you for your study titled “Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages”. It presents the feasibility of using the eXtreme Gradient Boosting (XGBoost) as a state-of-the-art machine learning model to forecast gauge stage levels

There are some aspects related to the aim and the layout of the manuscript not discussed and/or presented properly. According to these aspects, the paper could be published after minor revisions.

Please note my comments and suggestions in the attached Microsoft WORD document.

Comments for author File: Comments.pdf

Author Response

Dear authors,

thank you for your study titled “Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages”. It presents the feasibility of using the eXtreme Gradient Boosting (XGBoost) as a state-of-the-art machine learning model to forecast gauge stage levels

There are some aspects related to the aim and the layout of the manuscript not discussed and/or presented properly. According to these aspects, the paper could be published after minor revisions.

Please note my comments and suggestions in the attached Microsoft WORD document.

In the paper titled “Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages” the authors test the feasibility of using the eXtreme Gradient Boosting (XGBoost) as a state-of-the-art machine learning model to forecast gauge stage levels. A flood alert system based on the XGBoost models, which can be used for local flood warning and mitigation practices, is also evaluated.

Concerning the figures’ layout, please note the following suggestions:

- Figure 1: Please revise it adding (with a subfigure; Figure 1a) a geographical location map in the regional context using to directly locate the study area.

Response:

We would like to thank for the reviewer’s efforts to improve the Figure 1. We have updated the Figure 1 with a geographical location map.

Concerning the text, English language and style are fine and only minor corrections, indicated below, are required:

- lines 274-275: Each model with lead time between 5 minutes and 2 hours (24 models in total) are included in each figure […]

The subject of the sentence is “Each model”, the verb should be “is”.

Response:

We have fixed as required in line 274-275:

“…..Each model with lead time between 5 minutes and 2 hours (24 models in total) is included in each figure, with the primary axis showing….”

- lines 279-280: Unlike Gauge 520, the S#1 models 279 at Gauge 540 (Figure 4b) shows RMSE […]

The subject of the sentence is “the S#1 models”, the verb should be “show”. Same suggestion at lines 285, 295, 302.

Response:

We have fixed as required in line 280-281:

“….. at Gauge 540 (Figure 4b) show RMSE values gradually increase up to around 0.3 m at the 120-minute lead time.”

In line 284-285:

“…while the S#1 models at Gauge 540 (Figure 4d) show RMSE values gradually increase up to around 0.5 m at the 120-minute lead time.”

In line 294-295:

“… Gauge 540 (Figure 5b) show RMSE values gradually increase up to around 0.3 m at the 120-minute lead time, which is similar to what is shown in Figure 4b.”

In line 300-301:

“… at Gauge 540 (Figure 5d) show RMSE values gradually increase up to above 0.4 m at the 120-minute lead time …”

- lines 281-282: KGE values at both gauges show similar patterns that values decrease along with longer lead times with lowest KGE value at around 0.95 […]

The sentence is not very clear. Please consider rephrasing.

Response:

We have rephrased the sentence as required in line 281-282:

“The KGE values decrease along with longer lead times and the lowest value is around 0.95, as presented in both gauges (Figures 4a and 4b).”

 

- line 335: […] comparison to the observed.

Please consider adding a noun (e.g. ones) to complete the sentence. Same suggestion at lines 417, 418, 454, 482.

Response:

 

Line 334:

“…the prediction of the rising and falling of the stages in comparison to the observed stages…”

Line 416:

“…observed peak stage (Figure 12b, Table 4). For the September 2019 event, the S#1 models…”

Line 417:

“…also achieve accurate forecasted peak stage 5 minutes later than the observed peak stage…”

Line 454:

“…demonstrate a satisfactory consistency with the observed stages in terms of timings and….”

Line 507:

“…of the predicted stage hydrographs match well with the observed stage hydrographs..”

 

- line 437: Besides its good usability of continuous operations […]

Please consider rephrasing.

Response:

We have rephrased the sentence as required in line 437:

“…Besides its unique feature of continuous operations, the XGBoost-based FAS…”

 

- line 446: […] for its less sensitive to the high values […]

“Sensitive” is an adjective. Please consider substituting with a noun such as “sensitivity”.

Response:

We have changed it to “sensitivity” in Line 446:

“…evaluating the performance of hydrologic models for its less sensitivity to the high values…”

 

- line 465: […] diminished performance (than S#1 models (Table 4) […]

There is an extra bracket in this sentence.

Response:

We have fixed it in Line 478:

“…diminished performance than S#1 models (Table 4) and that KGE and RMSE metrics…”

 

- line 504: Introducing spatial information in relative to the watershed […]

Please consider deleting “in”.

Response:

We have fixed it in Line 529:

“….Introducing spatial information relative to the watershed:…”

 

Finally, please consider a wider state of art by citing the following articles:

Response:

We thank the reviewer for suggesting of the articles and we have included the following article in our discussion section per suggestion.

Piacentini, T.; Carabella, C.; Boccabella, F.; Ferrante, S.; Gregori, C.; Mancinelli, V.; Pacione, A.; Pagliani, T.; Miccadei, E. Geomorphology-Based Analysis of Flood Critical Areas in Small Hilly Catchments for Civil Protection Purposes and Early Warning Systems: The Case of the Feltrino Stream and the Lanciano Urban Area (Abruzzo, Central Italy). Water 2020, 12, 2228. https://doi.org/10.3390/w12082228

Back to TopTop