An Efficient and Automated Image Preprocessing Using Semantic Segmentation for Improving the 3D Reconstruction of Soybean Plants at the Vegetative Stage
Round 1
Reviewer 1 Report
The authors present the manuscript entitled “An Efficient and Automated Image Preprocessing Method for Three-Dimensional Reconstruction of Soybean Plants at the Vegetative Stage Using Semantic Segmentation”, a study on the application of semantic segmentation models to improve image preprocessing for 3D reconstruction of soybean plants. While the information is presented in a detailed manner, the main issue is that the paper is very dense and repetitive, and several improvements in clarity, structure, and specificity can enhance the overall comprehensibility of the paper, as in its current state it is hard to read and difficult to understand.
The title is too lengthy and could be more concise. For example, consider using “3D” instead of “Three-Dimensional”
The abstract and introduction sections provide a comprehensive overview of the challenges in 3D plant reconstruction and the approaches taken in the present study to address these challenges. However, there are areas where clarity, brevity, and structure could be improved. While detailed and thorough, the introduction’s length might be reduced for clarity and to avoid redundancy.
The presented Materials and Methods section is comprehensive and outlines the methodological process undertaken in the study. The details provided give a clear idea of the steps involved, the tools used, and the specific procedures followed. However, the text could be shortened and benefit from improved clarity and organization. In general, the description of the methods is very dense. Consider using bullet points or tables in the main paper for better visualization. Furthermore, some redundancies and potential inconsistencies need to be addressed. Figure 1 should be more informative.
The results and discussion sections are good although they could be more connected to the methodology. Moreover, in figure 9, there are clear differences between models. How the authors explain these differences versus the outstanding metrics obtained? Furthermore, consider adding potential future implications or applications at the end of the discussion or conclusions sections to provide context for the study’s significance.
Finally, for figures, captions should be self-explanatory. For each figure and table, there should be clear descriptions and interpretations, allowing the reader to understand the significance of the data without referring to the text. Therefore, all captions in the paper should be enlarged to explain much better what is contained in the figure or table without resorting to the text.
Specific comments
11-12: Consider rephrasing “crucial area” to “significant field” to emphasize its importance.
19-20: Define what is meant by “model matching accuracy calculation” for clarity.
21: Specify what “vegetative stage (V)” means to avoid ambiguity.
25-26: The phrase “problems of difficult image preprocessing” sounds rare. Recommend: “challenges of image preprocessing.”
28-29: The abstract ends very conclusively; consider adding potential future implications or applications to provide context for the study’s significance.
46-49: Consider using bullet points for the different methods to improve readability.
60-62: This sentence is repetitive. Consider merging it with the previous lines for brevity.
75-77: Highlight the primary reasons for choosing the image-based multi-view stereo vision method more explicitly.
80-82: The advantages listed here are already hinted at in lines 60-62. This repetition could be avoided.
85-86: Be explicit about the origin of the noise.
92-99: The description of the methods is dense. Consider using bullet points or tables in the main paper for better visualization.
110-112: Specify why it’s beneficial to perform denoising on 2D image data versus after generating the 3D point cloud.
115-117: The sentence is verbose. Suggest: “The study aimed to improve image preprocessing efficiency by using semantic segmentation on raw soybean plant images.”
125-128: While comprehensive, the concluding statement might benefit from a more concise summary of the key findings.
141: Specify the country or region for readers who might not be familiar with Northeast Agricultural University’s location.
146: A clarification might be needed as to why the experiment materials were placed 20 cm underground.
148: “multi-view stereo vision as the foundational technology” – Consider rephrasing for clarity.
150: A little more detail or context about the significance or reason for the multi-view stereo vision method might be beneficial.
160-167: These steps are clear, but a detailed diagram or flowchart might help visualize the procedure.
175-178: It would be clearer if the specific techniques or tools used for filtering, smoothing, and segmentation were mentioned here.
188: “labeled as a whole and marked as ‘soybean’” – Does this mean the plants and the calibration pad were both labeled ‘soybean’? Clarification might be needed.
196-200: These links can be formatted in a more organized manner, perhaps as a list or table, to improve readability.
201-218: The explanation of the DeepLabv3+ method is comprehensive. However, consider summarizing or streamlining for brevity, as not all readers might be interested in the intricate details of the method.
220-234: Similar to the DeepLabv3+ method, the explanation for Unet is detailed. Consider a more concise description, focusing on its relevance to the study.
236-252: Again, while the PSPnet method’s description is thorough, a more concise overview might be beneficial.
254-273: The HRnet method’s details are extensive. Aim for a more streamlined explanation, focusing on its unique features and relevance to the study.
Lines 276-278: The term “mask processing” is introduced without any prior explanation here. The significance of this process and its steps need to be elaborated for clarity.
Line 285-288: The description of “camera calibration” and its purpose within the context of the 3D reconstruction process could be more concise and clear.
Lines 293-298: These steps describe various processes in the modeling but could benefit from clearer subsections or bullet points for better readability and comprehension.
Line 331-345: Redundancies are present here. For instance, the definition and purpose of point cloud registration are reiterated multiple times. It would be clearer to state its significance once and then move into the specific methods used.
Lines 347-349: The concept of ‘coarse registration’ is introduced, but it’s repetitive since it’s defined again in line 351.
Lines 353-366: The description of the RANSAC algorithm would benefit from bullet points or a numbered list for each step. This format can aid in clarity and ease of understanding.
Lines 370-381: Like the RANSAC, the ICP algorithm is another area where step-by-step formatting would enhance comprehension.
414: The title “Evaluation index” is ambiguous. A more descriptive title might better convey the content of the section.
417-418: The formula for “mean Pixel Accuracy” is mentioned as “mAP” when it should be “mPA.”
425: This is a clear example of non-necessary information. Do the authors employ RMSE? Then why is explained here? The manuscript is unnecessarily large.
427: Specify why RMS is chosen as criterion and no others. Provide references.
434-438: Consider explaining the significance of calculating both the mean distance and the standard deviation.
448-450: Equations (23) and (24) introduce the criteria for αi without detailing the significance of αi or its role in the evaluation process.
506-514: The use of “C0” in describing threshold levels is repeated but not adequately explained here or in the material and methods. Explain the rationale behind these thresholds.
532-537: While the importance of semantic segmentation is stressed, a clearer linkage between the identified problem (tedious data preprocessing) and the proposed solution (semantic segmentation) would be helpful.
539-562: The authors mention various other studies and methodologies. It might be beneficial to highlight more specifically how their study differs from or improves upon these mentioned methods.
584-599: This section provides information on the variability of soybean plants. However, it is very large and could be shortened.
604-618: The conclusion is comprehensive but could be enhanced with a forward-looking statement, discussing the broader implications of the study and potential future research avenues.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
"An Efficient and Automated Image Preprocessing Method for Three-Dimensional Reconstruction of Soybean Plants at the Vegetative Stage Using Semantic Segmentation" is an interesting research work the following modifications are needed to improve the paper:
1- Change the title to "An Efficient and Automated Image Preprocessing using Semantic Segmentation for Improving the Three-Dimensional Reconstruction of Soybean Plants at the Vegetative Stage"
2-When I read the abstract I get confused because it is very hard to understand the objectives of the authors. The authors indicated that their objective is to improve image preprocessing to speed up 3D image construction by using semantic deep learning. Later, their objective is to compare three different models.
Finally, They indicated that semantic segmentation plays an important role in improving image preprocessing although, preprocessing is a step that precedes the segmentation task in image processing!
Please rewrite the abstract.
3-Try to reduce the keywords from three words to a maximum of 2 representative words.
4-The authors should concentrate on the literature review that tackles image preprocessing before building the 3D models rather than methods of 3D construction.
5-Please place the links in subsection 2.4.2 in the references. It is recommended to add references to the literature that introduce these semantic segmentation models and not to the code as you did in the following subsections.
There is no need to explain each model in a separate subsection, the references are enough for the user if more information is needed.
6-Looking at the graphs in Figure 5 one can notice easily that the models converge very fast to high accuracy (first two epochs). This means that the solutions converge to local optima instead of the global ones.
There are many reasons it could be because of the data or the parameter settings of the models.
7-No need for the confusion matrices if you have 2 classes only..
The paper requires moderate English language modification.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
In this revised manuscript, the authors have addressed the concerns previously raised. They have enhanced the readability and clarity of the study.
While the English is acceptable, certain sections, like in line 74 which reads "representation of a plant by utilizing a multi-view image sequence as the basis," could be more smoothly phrased as "representation of a plant using a multi-view image sequence." Hence, the manuscript requires thorough English revision for enhanced clarity.
Reviewer 2 Report
The paper "An Efficient and Automated Image Preprocessing Method for Three-Dimensional Reconstruction of Soybean Plants at the Vegetative Stage Using Semantic Segmentation" has been modified and improved by the authors according to the reviewers' remarks.
.
Minor English modifications