Big Data Analytics from a Wastewater Treatment Plant
Round 1
Reviewer 1 Report
Using data to inform wastewater treatment plant operations is a very timely and relevant topic that is of great interest. The manuscript hints at using data analytics and tools such as predictive models to assist treatment plant operations. Ultimately, it is unclear how the methods and results presented in this manuscript could be used to improve operation. I believe the manuscript could be improved and provide the following comments:
General – The introduction and literature review could be improved. I would suggest having additional discussion on how wastewater treatment plants are currently using data to inform operations and decisions. There are examples of digital twins being used for wastewater treatment plants that should be noted here.
Line 33 – Check numbers. 1.2 million people?
Line 56-57 – Are these specific data points?
Section 2 – There are many very short subsections that do not add anything to this manuscript. For example, Section 2.5 just states that results were interpreted and new knowledge was evaluated. This section should be re-written and condensed.
Section 2.7 – 2.10 – Are these sections all referring to citation [7]? If o they should be subsections or a list as part of section 2.6.
Section 2.11.3 – Provide citation(s) for hypothesis testing.
Section 3.1.1 – This section should be moved to the beginning of Section 2. This provides background to the wastewater treatment plant and gives the reader some idea of why you are writing this manuscript.
Lines 357-358 – Did you look into potential reasons why BOD effluent was higher during 2014 than other years? If you are using these values for prediction model you should include higher values of BOD to ensure your prediction model would accurately capture these high values. If you develop a prediction model that is underestimating effluent values of pollutant that could potentially harm wastewater treatment plant operations.
Section 4 – What are you trying to say with this section? It is not clear why you are presenting Figure 23 or what the significance of the results presented are. This section requires more discussion of results to illustrate how they are relevant to the overall goal of your paper.
General – It is unclear what the goal of this manuscript is. Specifically, how does the analysis and results shown in this paper assist wastewater treatment plant operations? How is this being used to enhance operational performance of the wastewater treatment plant? Are these results going to be used to develop a predictive model to assist the wastewater treatment plant? Link back to the abstract in the paper and clearly frame your results and discussion.
Author Response
Thank you very much for your suggestions
Using data to inform wastewater treatment plant operations is a very timely and relevant topic that is of great interest. The manuscript hints at using data analytics and tools such as predictive models to assist treatment plant operations. Ultimately, it is unclear how the methods and results presented in this manuscript could be used to improve operation. I believe the manuscript could be improved and provide the following comments:
- General – The introduction and literature review could be improved. I would suggest having additional discussion on how wastewater treatment plants are currently using data to inform operations and decisions. There are examples of digital twins being used for wastewater treatment plants that should be noted here.
Added more information
- Line 33 – Check numbers. 1.2 million people?
I am sorry for my mistake. Yes, 1.2 million people.
- Line 56-57 – Are these specific data points?
Added more information
- Section 2 – There are many very short subsections that do not add anything to this manuscript. For example, Section 2.5 just states that results were interpreted and new knowledge was evaluated. This section should be re-written and condensed.
Improved the section.
- Section 2.7 – 2.10 – Are these sections all referring to citation [7]? If o they should be subsections or a list as part of section 2.6.
I am sorry. Yes, I corrected them.
- Section 2.11.3 – Provide citation(s) for hypothesis testing.
Provided citation.
- Section 3.1.1 – This section should be moved to the beginning of Section 2. This provides background to the wastewater treatment plant and gives the reader some idea of why you are writing this manuscript.
Moved to 2.2.1
- Lines 357-358 – Did you look into potential reasons why BOD effluent was higher during 2014 than other years? If you are using these values for prediction model you should include higher values of BOD to ensure your prediction model would accurately capture these high values. If you develop a prediction model that is underestimating effluent values of pollutant that could potentially harm wastewater treatment plant operations.
As the further investigation, the plant was modified in 2014. Therefore, the big data analytics results can show when the data is not normal.
- Section 4 – What are you trying to say with this section? It is not clear why you are presenting Figure 23 or what the significance of the results presented are. This section requires more discussion of results to illustrate how they are relevant to the overall goal of your paper.
I removed some irrelevant information, and the section shows the stability of the dataset after removing the outliers and the evaluation of the selected dataset.
- General – It is unclear what the goal of this manuscript is. Specifically, how does the analysis and results shown in this paper assist wastewater treatment plant operations? How is this being used to enhance operational performance of the wastewater treatment plant? Are these results going to be used to develop a predictive model to assist the wastewater treatment plant? Link back to the abstract in the paper and clearly frame your results and discussion.
Big data analytics can assist WWTPs by finding hidden information, visualizing datasets, and creating the appropriate dataset for the next step, model prediction. (I improved abstract, introduction, discussion, and conclusion)
Reviewer 2 Report
The present study presents different tools for big data analysis and visualization for WWTPs in order to improve the operational methods. The paper appeared with interesting results and new messages are given. The paper is of a satisfactory level, but in my opinion is not well organized.
comments
- Abstract: It looks strange to me to use references in the abstract section
- Introduction: Introduction section is limited. More information should be given about the current data analysis in WWTPs. The novelty of the present work should be presented. The aim of the study should be further analyzed.
- Figure 2: The processes described in the text (preliminary treatment, primary clarification, nitrifying activated sludge treatment incorporating biological phosphorus removal, ultraviolet disinfection, and effluent pumping) should be noted in the figure using e.g. a legend.
- Figure 3: The content presented in Figure 3 should be further analyzed in the text.
- Figure 4: Please improve figure’s quality.
- Data Understanding: The Madison Metropolitan Sewerage District 50-Year Master Plan was reviewed to research the background, goals, and WWTP processes. Explanation about the Madison Metropolitan Sewerage District 50-Year Master Plan should be given.
- 5. Evaluation: The impact of new knowledge was evaluated. How?
- 11. Statistical Analysis: Explanation for Pearson's correlation coefficient, normal distribution, and hypothesis testing is given; however, no explanation is given for boxplot.
- 1.1 to 3.1.10: This information should be given in Materials and Methods section
- 2.1. Data visualization: T.S.S., T.P., T.K.N. explanation should be given for the abbreviations.
Author Response
The present study presents different tools for big data analysis and visualization for WWTPs in order to improve the operational methods. The paper appeared with interesting results and new messages are given. The paper is of a satisfactory level, but in my opinion is not well organized.
Thank you very much for your suggestions
Comments
- Abstract: It looks strange to me to use references in the abstract section
Moved references to the introduction section
- Introduction: Introduction section is limited. More information should be given about the current data analysis in WWTPs. The novelty of the present work should be presented. The aim of the study should be further analyzed.
The current data management program in WWTPs was added to the introduction section
- Figure 2: The processes described in the text (preliminary treatment, primary clarification, nitrifying activated sludge treatment incorporating biological phosphorus removal, ultraviolet disinfection, and effluent pumping) should be noted in the figure using e.g. a legend.
Moved information of treatment process to 2.2 (Background and Goals)
- Figure 3: The content presented in Figure 3 should be further analyzed in the text.
Added more explanation
- Figure 4: Please improve figure’s quality.
Improved
- Data Understanding: The Madison Metropolitan Sewerage District 50-Year Master Plan was reviewed to research the background, goals, and WWTP processes. Explanation about the Madison Metropolitan Sewerage District 50-Year Master Plan should be given.
Improved
- 5. Evaluation: The impact of new knowledge was evaluated. How?
Hypothesis testing
- 11. Statistical Analysis: Explanation for Pearson's correlation coefficient, normal distribution, and hypothesis testing is given; however, no explanation is given for boxplot.
Box plot explanation was added
- 3.1.1 to 3.1.10: This information should be given in the Materials and Methods section
Moved
- 2.1. Data visualization: T.S.S., T.P., T.K.N. explanation should be given for the abbreviations.
Explained the abbreviations
Round 2
Reviewer 1 Report
Thank you for taking the time respond to previous comments. While the manuscript has been improved, additional discussion of results is warranted. Comments are provided below:
General –The manuscript needs to be checked for grammar throughout.
Line 45-46 – Operation status of what? A process in a wastewater treatment plant? This is unclear.
Line 57 – Obtainability and reliability of what? Data? This is unclear.
Figure 1- What is NSWTP? You have not introduced the name of the wastewater treatment plant that you are analyzing. Also – once you introduce the abbreviation NSWTP you should be consistent and use it throughout the manuscript.
Figure 1 – you show this figure twice.
Figure 2 – This figure would benefit from larger building labels since these names are being references in Figure 4b.
Figure 3 – What is the goal of showing this? You already mentioned that there are over 1 million data points. If this Figure serves a different purpose please make it clear. Otherwise, this can be deleted.
Figure 5 – This looks very similar to Figure 2. If so, you can delete Figure 5 and just refer to Figure 2.
Section 2.2 – You can condense most of the information regarding processes in WWTP to one section. For your analysis, discussing the processes in detail is somewhat irrelevant.
Line 361-364 – I would reword points 1-4. Use words like “Determine” instead of “Figure out”. Do not use “you” or “your” in formal writing.
Lines 371-376 – There are specifics regarding what data was used that the reader does not need to know. The reader does not need to know what the names of the NSWTP data and effluent data were. They just need to know that they were used and evaluated.
Section 3 – Make sure section headings are numbered properly. It is hard to tell if they are with track changes.
Figure 12 – It is not necessary to show the correlation of Effluent BOD to Effluent BOD.
Table 4 – This can be removed. You have already stated that you are performing normality testing and you do not need to show the code.
Lines 451 – 452 – Please explain what is meant by “heavier” and “lighter” tails.
Lines 473-475 – How does the data from 2014 relate to the normality of the data? Figures 13 and 15 evaluate the normality of the data, but 2014 is specifically mentioned in terms of making the data “not normal”?
Lines 481 – 483 – What system failure are you referring to and how specifically does the information from Figure 17 inform an operator regarding system failure? Also, how does this data reduce operation and maintenance cost? More explanation is needed.
Discussion section – Discuss these results more. Why is it important that the dataset is stationary? What does that mean for creating predictive analytics for WWTPs? If you are trying to demonstrate how this analysis can be used to inform operations and maintenance you need to have a more detailed discussion in this section so your readers understand.
Author Response
Thank you very much for your time and suggestions.
I have attached my point-by-point responses to your comments here.
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The manuscript has been imporved the changed made by authors.
However, I have to mention that the article is not adequately referenced
Author Response
Thank you very much for your suggestion. I really appreciate it.
I added more references as your comments.
Round 3
Reviewer 1 Report
The manuscript is much improved with the most recent rounds of edits. After formatting and final spelling and grammar review I believe this will be ready for publications.