OptimalNN: A Neural Network Architecture to Monitor Chemical Contamination in Cancer Alley
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript "OptimalNN: A Neural Network Architecture to Monitor Chemical Contamination in Cancer Alley" by Uchechukwu Udeji and Martin Margala contributes to the field of environmental monitoring and pollution control through the innovative use of neural network models for the segmentation of chemical spills. Below is a detailed assessment of its main contributions, strengths, areas for improvement, and suggestions for enhancement.
Main Paper Contributions:
Innovative Neural Network Models for Chemical Spill Segmentation: The paper introduces UNET and UNETR models designed for image segmentation of chemical and oil spills, addressing a critical environmental issue in the Cancer Alley region. This approach is significant for its potential to improve early monitoring, estimation, and cleanup of spills.
Utilization of Specialized Datasets: The research leverages the CSIRO dataset and the Oil Spill Detection dataset, employing a specialized filtering technique to enhance detection accuracy. The adaptation and effective use of these datasets for training neural network models represent a key contribution to the field.
Integration of Mixed Precision for Efficient Model Training: By incorporating mixed precision, the study optimizes the model training process, resulting in accelerated computation and reduced memory usage. This technique, alongside the proposal to use FPGA architecture for further acceleration, stands out for its potential to enhance computational efficiency.
Positive Aspects:
Addressing a Significant Environmental Challenge: The focus on monitoring chemical contamination in a highly polluted area with severe health implications demonstrates the research's relevance and potential impact on public health and environmental protection.
High Model Performance: Preliminary simulation results indicate that the proposed models achieve substantial improvements in prediction accuracy compared to existing research, highlighting the effectiveness of the neural network architecture and training strategy employed.
Innovative Use of FPGA Architecture: The proposal to use FPGA architecture for model optimization is noteworthy for its promise to significantly reduce power consumption during model training and inference, aligning with the growing emphasis on sustainable AI practices.
Observed Deficiencies:
Dataset and Model Generalizability: While the paper provides compelling results, it could benefit from a broader discussion on the generalizability of the models to different environmental conditions and spill types. Exploring the models' performance on diverse datasets would enhance the understanding of their applicability.
Detailed Methodology on FPGA Implementation: The manuscript mentions the use of FPGA architecture but lacks comprehensive details on the implementation process and the specific gains in terms of power efficiency and processing speed. Expanding on these aspects would strengthen the paper.
Suggestions for Improvement:
Enhance Dataset Description: Provide a more detailed description of the datasets used, including their characteristics, limitations, and the process for generating ground truth labels. This information would help readers assess the robustness of the model training.
Broaden Comparative Analysis: Include a wider comparison with other state-of-the-art models in environmental monitoring and spill detection, particularly those that have been applied to similar challenges. This would contextualize the proposed models' performance within the broader research landscape.
Expand on FPGA Architecture Details: Elaborate on the FPGA architecture, including specific optimizations, the conversion process from the neural network model to FPGA, and quantitative benefits in terms of power efficiency and processing speed.
Further Validation on Diverse Environmental Conditions: Conduct additional validation studies to evaluate the models' performance under various environmental conditions and spill characteristics. This could involve testing on additional datasets or simulated scenarios to demonstrate the models' adaptability.
Comments on the Quality of English LanguageNone
Author Response
Please see the attachment. Thank you.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear author,
I have read the article named "Optimal NN: A Neural Network Architecture to Monitor 2 Chemical Contamination in Cancer Alley" and found it to be well-written. Despite the fact that the following points are in opposition to its quality:
1. The abstract does not contain any numerical or quantitative data.
2. Figure 1 must display and provide an explanation of the RGB colour spaces.
3. Figure 2 illustrates the U-Net design and its representation of the flow of the cells.
4. The technical explanation for figure 4 is lacking, indicating a need for justification.
5. The figure 5 lacks technical details and is not referenced in the text.
6. Table 1 lacks any explanation. Table 1 requires the provision of justification and comparison results.
7. If the author were to compare his results with those of other researchers, it would be ideal.
8. The conclusion has to be restated.
9. It is important to avoid using references that are not needed or are outdated.
Author Response
Please see the attachment. Thank you.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper presents a specialized neural network architecture for monitoring chemical pollution in Cancer Alley. Although the work may be interesting, there are several aspects that need clarification in order to demonstrate the contribution and real application of the proposal, as well as issues of clarity and format.
Content:
In the introduction, the scientific contribution is not clear. Works developed for the specific application proposed by the authors, "the monitoring of chemical pollution in Cancer Alley," should be reviewed and discussed. That is, what other works have been proposed for the same application? What are their disadvantages and/or advantages? From the detected research opportunities, which ones will be addressed in this work?
On the other hand, as emphasis is also placed on FPGA implementation, works proposing implementations in FPGA or other types of hardware should be reviewed and discussed.
The title of section 2 is segmentation, but nothing specific about segmentation is presented. In fact, neither the equations governing the different color maps are added or described, nor are the advantages of each color map explained, or why a certain map might be better as input for a segmentation algorithm. In this sense, what segmentation algorithm is used? Why is that one chosen over others?
It is not clear what modification the authors propose in the architectures presented in figures 2 and 3. Both structures should be shown and the difference clearly highlighted.
The work indicates an optimal neural network but does not mention which optimization algorithm is used nor how it is used. In fact, no evidence of the optimization process is shown either.
Table 1: Graphs of how accuracy behaves after each epoch should be included to observe its evolution.
A table comparing quantitatively and qualitatively the works presented in table 1 should be added, including, for example, complexity, computation time, training time, need for specialized software, etc.
In addition to accuracy, other metrics to evaluate the performance of the neural network should be added, including confusion matrices.
The authors never describe and justify why the proposed network is the best for the application they propose. In fact, they also do not indicate what are the characteristics of the problem they want to address, in order to propose a specialized neural network that can solve each of those needs. Thus, the proposed network, in addition to being optimized, will be specialized.
In the FPGA implementation, the FPGA resources that were used in this implementation should be clearly reported.
A photograph of the FPGA card used, as well as the resources it has, should be added.
It is observed that there is no contribution in the FPGA implementation. That is, only an existing tool was used to synthesize the structure of the neural network; besides, this has not been verified as indicated in section 7.1. An improvement that the authors propose in the FPGA architecture could be a notable contribution. Here, different structures should be proposed and compared to reduce resource consumption, energy, or computation time. However, this is not reported in the article.
The conclusions are relatively simple and do not show the qualitative and quantitative findings of the work. Nor is it correct to mention FPGA development as a conclusion when it has not been fully verified.
Format:
The font type and size should be improved, as this would substantially improve the quality and clarity of the figures.
The order of appearance of the references is not ascending, which is a basic error.
Conclusion:
Overall, this reviewer recommends rejecting the work so that all observations can be addressed and a more complete, clear, and functional version can be prepared.
Comments on the Quality of English LanguageMinor editing of English language required
Author Response
Please see the attachment. Thank you.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsUdeji and Margala utilized UNET and UNETR neural network models, combined with a mixed precision approach and an FPGA architecture, to enhance the accuracy of detecting chemical and oil spills. Their findings and results are compelling, but some revisions are necessary to address the following issues:
1. Avoid using undefined abbreviations in the abstract.
2. In the abstract, the authors claim that their results show "substantial improvements compared to existing research in this domain." Please provide more specific details.
3. In the third paragraph of the introduction, reduce the use of the word "segmentation" to improve the readability of the text. It may be beneficial to merge the third and fourth paragraphs since they cover similar topics.
4. Review the manuscript to eliminate grammar and spelling errors, such as "deploy a neural network models" (p.2, l.81), among others.
5. Figure 3 appears to be absent from the manuscript discussion.
Comments on the Quality of English LanguageMinor editing of the English is required.
Author Response
Please see the attachment. Thank you.
Author Response File: Author Response.pdf
Reviewer 5 Report
Comments and Suggestions for AuthorsThe authors describe a solution that may be important for health protection in a "Cancer Alley" in Louisiana. They aim to improve the possibility of detecting oil contamination.
However, their work has a few significant drawbacks.
The proposed methods are developed to detect oil spills in the sea.
So it may not be evident if they can also be used to detect oil contamination in rivers, which is most important in the "Cancer Alley."
I would suggest https://doi.org/10.3390/app12084016 as a source discussing that issue.
The authors claim (lines 20-21), "The results obtained from our study demonstrate substantial improvements compared to existing research in this domain."
However, that claim does not seem to be supported by the paper.
The authors' main contributions are introducing mixed precision into the neural network used in the research and preparing the FPGA-based accelerator.
The main "hard" results are shown in Table 1. However, it is difficult to state that they prove the "substantial improvement." The only clearly visible result is shortening the training phase (50 epochs instead of the ~500 epochs). However, it is impossible to compare the accuracy.
The works used for comparison ([6] and [20]) report the accuracy using the well-defined IoU metrics. Those values are reported in both articles (mean IoU 65.06% in [6], and 71.12% in [20]). This paper does not provide any accuracy measures in the first two rows of Table 1 (related to [6] and [20]). Instead, it gives the very high 94.2% or 87.3 "testing accuracy" but does not explain how this parameter was defined.
Therefore, it is difficult to say that the paper "demonstrates substantial improvements compared to existing research in this domain."
Regarding the FPGA-based accelerator, the presented results are very preliminary.
Section 7 describes mainly the properties of the tools used to generate the FPGA implementation, but there are no concrete results (e.g., resource utilization, maximum clock frequency, throughput, and processing latency). There is only a vague statement in section 7.1: "Although we were able to generate an FPGA design for our model, we haven't been able to full verify the design. Currently, resource usage by our FPGA design suggests low power consumption by the Pynq Z1 board."
There is yet another contribution reported in the conclusions: "A noteworthy contribution is the creation of reusable labeled ground truth images for the CSIRO dataset, a task not previously undertaken."
However, it also does not justify claiming "substantial improvement". Are those ground truth images published?
I also have some remarks regarding the editing aspects of the paper.
The References section is prepared inconsistently. The reference [10] is not cited at all (probably should be cited somewhere in lines 74-75).
The references [34]-[36] are also not cited. [36] points to an empty document.
The reference [9] points to arXiv publication, while the peer-reviewed version is also available as https://doi.org/10.1007/978-3-319-24574-4_28 .
The references are formatted inconsistently. For example, [32] does not provide any information on how it was published, [33] include also keywords.
The caption of Figure 2 should indicate that the figure is based on Figure 1 from [9].
Regarding Figure 1, what is the meaning of comparing an image of a golden fish (RGB from a standard camera) with the oil spill image from SAR?
Should not there be rather a comparison of SAR images of clean water and an oil spill?
There are statements about the energy efficiency of different networks in lines 120-126.
However, it is unclear if the authors have made the energy efficiency comparisons themselves (where is that described?) or if they are quoting results obtained by others. If the latter, there should be an appropriate reference.
Author Response
Please see the attachment. Thank you.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe article "OptimalNN: A Neural Network Architecture to Monitor Chemical Contamination in Cancer Alley" discusses the development of neural network models based on U-Net and U-Net Transformer (UNETR) architectures for the image segmentation of chemical and oil spills. Utilizing datasets such as CSIRO and Oil Spill Detection, the authors apply mixed-precision techniques and propose using FPGA architecture to optimize the model training process and improve streaming latency.
Positive Aspects
Technological Innovation: The study explores the use of advanced neural network architectures and FPGAs, which are relevant for rapidly and effectively detecting environmental contaminations.
High Training Accuracy: The models achieve impressive training accuracies of 95.35% and 91% for the Oil Spill and CSIRO datasets, respectively.
Resource Optimization: Implementing mixed-precision techniques demonstrates efficient use of computational resources, reducing training time and memory consumption.
Suggestions for Minor Review
Improvement of the UNETR Model: Investigate the causes of the lower accuracy of the UNETR model and explore adjustments in architecture or hyperparameters to enhance its performance.
Framework Stability: Explore more stable alternatives to the FINN and HLS4ML frameworks or collaborate with the developers of these frameworks to identify and correct stability issues.
Expansion of Training Data: To improve model generalization for the UNETR model, consider expanding the training dataset or using data augmentation techniques.
Comparison with Other Architectures: Include comparisons with other neural network architectures that have been applied to similar problems to contextualize the effectiveness of the U-Net and UNETR models.
Environmental Impact Analysis: Expand the discussion on the practical impact of implementing these models in real-world detection and mitigation of chemical spills, possibly including case studies or field simulations.
Comments on the Quality of English LanguageNone
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe article has been slightly improved. However, the main comments have not been analyzed or discussed adequately.
In the response letter, it is not sufficient to state that the comment has been addressed; it is always better to clearly indicate what has been added and why it was decided to add that. A simple example is comment one:
[C1] In the introduction, the scientific contribution is not clear. Works developed for the specific application proposed by the authors, "the monitoring of chemical pollution in Cancer Alley," should be reviewed and discussed. That is, what other works have been proposed for the same application? What are their disadvantages and/or advantages? From the detected research opportunities, which ones will be addressed in this work?
[R1] Thanks for this observation. We have made several changes to the introduction to include answers to this comment, and to reflect the problems being addressed in this study
The reviewers have answered R1, but looking at the introduction of the article, it is not apparent which articles were added and why? Only the enumeration was changed, and two lines of irrelevant text were added concerning the question at hand.
The same goes for comment two. A discussion of other works that have presented FPGA architectures was requested, and the authors only added a simple table in section 7 where no comparison with other literature is made. Additionally, it was recommended to add a photograph of the experimental setup (PC, software, FPGA card, etc.), and this was completely omitted.
Moreover, even the format was not improved, for example: the figures are still of poor quality. The text in some figures is not legible, in others, it is excessively large. Could the authors standardize with respect to the text size of the article?
As with these comments, there are many more. Therefore, this reviewer recommends clearly addressing the comments made in the previous review.
Comments on the Quality of English Language
Minor editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe authors responded to all concerns and revised the manuscript accordingly. My recommendation is accepted.
Comments on the Quality of English LanguageMinor editing of the English may be required.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 5 Report
Comments and Suggestions for AuthorsThe authors have considered remarks, and their answers, and introduced modifications are mostly acceptable.
A few remarks still persist:
1. In lines 24-25, the authors write: "The results obtained from our study demonstrate improvements in streaming latency on FPGA when compared to existing research in this domain."
a) The term "streaming latency" does not seem to describe the properties of the proposed network correctly. Later on, the authors use the better term: "inference latency" (line 425).
b) The text of the article does not further support the above statement. The inference latency is only shown in Table 2 (by the way, it is not mentioned in the caption). However, it is just a single number. There are no latencies obtained in other solutions listed for comparison. How can the claim of improvement in latency be justified?
c) The above improvement is mentioned in the abstract but not in the conclusions. I would expect that conclusions list all achievements stated in the abstract and summarize how the article's content proves their truthfulness.
2. Regarding response [R9], the link to the peer-reviewed version is "https://doi.org/10.1007/978-3-319-24574-4_28" (without any space inside. Please note that the text to PDF conversion sometimes breaks links). It points to:
Ronneberger, O., Fischer, P., Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science(), vol 9351. Springer, Cham.
A few editorial remarks:
1. Line 21 - "50epochs", a space should be inserted.
2. Figure 1 - legends at vertical axes of 3D charts are partially clipped.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsFigures still need formatting. Please use the same font and size for all figures.
Comments on the Quality of English LanguageMinor editing of English language required