Next Article in Journal
Performance of Three Sorghum Cultivars under Excessive Rainfall and Waterlogged Conditions in the Sudano-Sahelian Zone of West Africa: A Case Study at the Climate-Smart Village of Cinzana in Mali
Previous Article in Journal
Investigating the Status of Cadmium, Chromium and Lead in the Drinking Water Supply Chain to Ensure Drinking Water Quality in Malaysia
 
 
Article
Peer-Review Record

Machine Learning-Based Water Level Prediction in Lake Erie

Water 2020, 12(10), 2654; https://doi.org/10.3390/w12102654
by Qi Wang 1 and Song Wang 2,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Water 2020, 12(10), 2654; https://doi.org/10.3390/w12102654
Submission received: 15 August 2020 / Revised: 12 September 2020 / Accepted: 21 September 2020 / Published: 23 September 2020

Round 1

Reviewer 1 Report

Review “Machine Learning-Based Water Level Prediction in 2 Lake Erie” for MDPI Water

Comments: This paper compares the accuracies of six machine learning (ML) algorithms in predicting water level. The ML algorithms were also compared with a process-based prediction system. The ML algorithms were shown having smaller forecasting biases than the process-based prediction system. Two of the six ML algorithms were demonstrated applicable for predicting the water level in Lake Erie and useful for future water resources.         

This is a well-structured interesting paper with a clear goal of exploring new applicable ML algorithms for water level predictions. The study is important for prediction applications.

One of the key steps in testing ML algorithms’ performance is selecting a complete set of training data, which is not trivial especially considering the complexity of the physical or dynamical system involved.

 

I have three questions and one suggestions:

(1) p3, lines 102-106: The paper mentioned that the process-based water level forecasting models take account the meteorological variable “evaporation” as one factor impacting water level fluctuation. Then, the paper mentioned: “Similarly, this study also considers these physical processes, …” which was not including “evaporation”.  

Question: Is there a specific reason for the authors to exclude “evaporation”?

 

(2) p5, line 147: “Where k* can be obtained from x and x*, and k** can be obtained from x*….”

Question: Can you put concise derivations of mathematical expression of K* and k**?  It is not clear by saying “can be obtained”.  

Other parts of “2.3  Machine learning Algorithms” also have similar issues. It would be nice to include concise derivations of mathematical expressions of all the six ML algorithms in the paper.

 

(3) p8, line 240-244: Is there any specific methods applied in selecting the 84% training data set and the 16% testing data set? Is this a one-time testing selection or multiple-time testing selections (of 84% and 16% data sets)? How might your testing results change if using different testing selection of the 84% and 16% data sets? In other words, how large of the uncertainty of your testing results?

 

One suggestion: To enlarge font sizes in Figures’ labels and legends (e.g., Figures 3 and 4) and in Tables.

Author Response

Please see the attachment. 

Author Response File: Author Response.docx

Reviewer 2 Report

Overall this paper is well written and presents valuable scientific methods and results in a very satisfactory way. After reading the paper I have a few suggestions the authors may want to consider prior to publication and several questions regarding their results, which, if addressed in the paper might help other readers as well. In no particular order, my comments and suggestions are:

I think it would be good to know if the results (table 3) are robust to the training and evaluation data-sets selected. That is, if I used many different sets would I get similar results, and what are the uncertainties?

Structurally and in general with regards to style, there are many shifts from previous research to this study that, for me anyway, impede readability. I would like to see the structure within section/subsection to go along the lines of what was done, why, and advantages/disadvantages, followed by your study, what you did that was different or new, why, and advantages/disadvantages. More specifically:

The structure of the introduction could be improved, and the description of the science seems incomplete. Currently the end of the sentence from line 49 “so in this paper…” occurs in the middle of a description of previous methods used. Consider a reorg of the intro beginning section where para one describes the impact/importance, para two describes previous methods in order from earliest to latest, or perhaps simplest to the most complex. Para three could then start clearly with “In this paper…” followed by “Overall there are three innovations in this work…” Para 4 could then continue with the paper organization.

Similar comment in section 2 subsets. It would be helpful to consistently organize each subsection by what was used/done in previous studies, then note what is carried over into this study, followed by what is new and different and how that might be better. For example, in section 2.2 data sources the only data sources for previous studies called out by your paper are precipitation and evaporation. Followed by limitations of linear correlation, then the development of MI.

 

How the water levels are used as independent values while they are also the dependent result is a bit confusing as presented. The important phrase/concept “one-day ahead” is contained in the abstract and conclusion, but otherwise is not mentioned. Also, recommend that the authors more clearly and concisely define the predicted level and the observation by which that result is compared. Perhaps this should have been obvious to me, but after reading the preceding sections a sentence or two at the beginning of section 2.4 Model performance evaluation would have been a very helpful reminder. Something along the lines of “The output of each method evaluated is a one-day ahead prediction of the mean water level of all four stations described in section 2. This predicted output is measured against the mean of the actual water levels at the four stations.”

The article begs the question of how well the approaches described would perform at longer-term prediction.

Presumably, you could have trained the model to predict levels at the individual stations. Is the actual water level at a particular station important, or are we only interested in the mean level? How do the results at particular stations differ? On the larger lakes local dramatic fluctuations in level are of particular concern. Are there instances on Lake Eire? If so, how do the models perform?

The precision in the results is impressive, but it is not clear to me how precise and accurate they need to be.

The provided explanation of each machine learning method may be insufficient for scientists not familiar with ML techniques (except for the very thorough discussion of GP). Given space constraints, a reference to the paper which method yours most closely resembles for each would be helpful. For example, you could add a sentence at the end of the RF paragraphs along the lines of, “For this study we used the method described in (cite), modified to include…”

In figures 4 and 5, would it be more informative to display values based on the difference of the method to the observations as opposed to the absolute water level? Why do some of the methods appear to perform better during different observational time periods? Is there some rhyme or reason regarding how the various models correlate to each other that might provide some insight as to why they perform differently? Why isn’t AHPS shown in the figures?

There are some instances of very long and complicated sentences that impede understanding. For example, line 48 Also, process-based models are often time-48 consuming [16], so in this paper, we explore different machine learning methods [17], with fast 49 leaning speed, to forecast daily water level of Lake Erie, and then analyze their advantages in water 50 level prediction based on the comparison with the process-based model (i.e., AHPS). There are 3 ideas tied into one complicated sentence. Consider breaking into 3 separate shorter and simpler sentences.

There are some instances of confusing pronoun-antecedent usage. For example, line 45  However, the effectiveness of process-based models mainly depends on the model accuracy in representing the ecosystem and its ability to accommodate the observed variabilities at a scale of interest to the scientist or manager [14,15]. The immediate antecedent to “its” is the ecosystem. This would be clearer if “its” were replaced by “model’s”.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

In this paper, the authors used machine learning (ML) algorithms including Gaussian process (GP), multiple linear regression (MLR), multilayer perceptron (MP), M5P model tree, random forest (RF), and k-nearest neighbor (KNN) are applied to predict the water level in Lake Erie.

The authors then compare the performances of six ML models against the performance of the process-based Advanced Hydrologic Prediction System (AHPS). 

 

Although I appreciate the effort of putting together six ML algorithms, it seems like they are merely throwing everything at disposal without a clear vision of solving the problem.

 

  1. Why is the mutual information not used to filter out some of the 48 variables? 
  2. If you support using MI, You should use MI instead of r” for your tables? Did you check if they are Gaussian distribution?
  3. Show the residuals in figure 5 for the best fit.  
  4. The time period of comparison is not fair to the AHPS method. 
  5. Do you believe that the 10 years of training data is sufficient? The conclusion is pretty strong. My suggestion is to try a rolling window method to test out different periods and see if the prediction changes in various conditions.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

The authors have addressed my concerns.

 

Back to TopTop