YOLOLens: A Deep Learning Model Based on Super-Resolution to Enhance the Crater Detection of the Planetary Surfaces
Round 1
Reviewer 1 Report
This manuscript presents a deep learning model for crater detection on lunar images, which combines a “Generator” for enhancing the image resolution and the YOLOv5 for crater detection. The described work is interesting. Further clarifications and improvements in terms of methodology and experimental analysis are required.
Below are some specific comments for the authors to consider:
- “craters detection” should be changed to “crater detection” throughout the manuscript.
- Line 151: You Look Only Once (YOLO) -> You Only Look Once (YOLO).
- Line 200: The manual-derived catalogue [43] may not necessarily be 100% accurate, as many craters on images with poor illuminations are difficult to be recognized even for human eyes. Please make sure your ground truth data are further checked and accurate. Figures showing examples of the detected craters and the corresponding ground truth data will be helpful.
- Page 4, Figures 1 and 2: Apart from the used network models of the “Generator” and the YOLOv5, please elaborate on the novel aspects in your proposed network model to justify the statement of “A new deep-learning model …” in the title.
- Page 7: In terms of methodology, how do you deal with those craters across the boundary of the image tile in the prediction step?
- Page 9: Please also add the ground truth data for each case in Figure 4 for comparison.
- Page 10: Please add some discussion about the minimum size of craters that can be detected using your proposed algorithm. What is the detection rate for craters with diameters of several hundred meters?
Author Response
In attach our reply.
Thanks again.
Author Response File: Author Response.pdf
Reviewer 2 Report
The purpose of this paper is to present and test a new deep-learning model that seems to better automatically identify km-scale craters. The model combines deep-learning algorithm trained by the manual Robbins lunar catalog with an algorithm to improve the resolution of the imaging ~2x. The model is then applied to regions not used for training and verified against the Robbins catalog. The testing demonstrates how the image resolution improvement and deep-learning algorithm combined produces more reliable results than previous automated tools in both identification and diameter precision. The authors further test by artificially reducing the image resolution to potentially demonstrate their method could work on lower-resolution imaging.
Overall, I think the topic of this paper is worthy of publication, but the paper requires additional information before I can recommend publication. The current manuscript appears from my point of view to depend too much on previous publications/knowledge of the reader. As a scientist studying craters, but not using AI, I found that I could not follow some of the details since they were glossed over in this write up. I will detail these below, but I think the most important one is how the imaging is improved in resolution and some non-annotated examples of that. Furthermore, there seems to be some circularity in the argument for use on lower-resolution imaging that I will detail more below.
Detailed comments:
Line 22: There are many references for the collisional history of the Moon and Solar System, thus if just one is cited than an “e.g.,” should be added to the citation.
Lines 22-27: These two sentences are repetitive of one another. I suggest deleting the first one (starting with “Since the impact…”) and combining all of the references to cite in the remaining sentence (starting “The crater-based dating…”).
Line 80: I would disagree with the statement that “computer-assisted methods provide more accuracy than manual annotations”. Even in this paper the AI results are compared to a manual database, which is proposed to be the “ground truth”. Although, it is unclear what is thought more accurate – identification, size, location, etc? Please either specify with a rewrite or delete.
Line 97: Recommend replacing “planets” with “bodies” to be more general (i.e., include moons, asteroids, etc).
Lines 106-107: I think some of this paragraph overstates the current state/usefulness of AI (otherwise why would researchers still be developing/testing), but this sentence in particular needs to have a caveat. It is one example and is not cratering, thus it cannot be fully generalized to cratering, yet. “with equivalent or better performance than manual experts” should be transformed to something like: “with performance continuing to improve in comparison to manual expert annotations”
Lines 133-136: I don’t remember seeing an application to Mercury, but only reduced resolution Moon images? Please update this to accurately reflect what the paper presents.
Lines 142-149: This is the section devoted to the image resolution improvement that seems too short to me and uses terminology that not everyone may follow. There seem to be a lot of steps in Figure 1 that don’t seem to be explained in any detail. I am not suggesting a full rewrite of references 59 & 60, but a more detailed summary than what is provided here. E.g., What type of interpolation is done? How does the RLNet help with the interpolation? And many more. Also, I would like to see at least one unannotated example of how the imaging is improved including a “before” and “after” (and maybe even sub-steps if they exist). To keep the paper itself a reasonable length, this (and even some of the additional summary text I would like) could be in supplemental material.
Line 241: From the current text I am not able to determine how any of the positives and negatives are determined since some YOLOLens identifications appear below the 1 km cutoff of the Robbins database? Are those ignored in the testing? If not, what is the criteria used?
Lines 251-255: Furthermore, the training data set is different, and likely better, than what previous groups used, which should be discussed. The discussion of manual vs. automated also seems a bit contradictory, since the manual data set (Robbins database) is assumed to be 100% for training and evaluation, but then it is stated humans are only 75% correct. This contradiction needs to be discussed. Also, Robbins et al., (2014) compared multiple human counters and their results could be included too.
Table 1: I have a recollection of mAP50 and mAP95 being discussed somewhere in the article, but it is not close enough to where this table in introduced to be useful. Please define these where “P” and “R” are defined in Lines 254.
Lines 274-275: Larger craters are rarer than small, but more large craters can be found in the lunar highlands. Are any lunar highlands sites included? Overall, it would be very helpful to have a map of where the training and test sites are located in supplemental material. Also, it would be interesting to have the data like in Fig. 4 for a highlands terrain.
Section 5.1.2: I would be interested to know if the authors have any thoughts on why YOLOLens appears to underestimate diameters more often than overestimating them.
Figure 5: As plotted, I don’t find the histograms very informative. I would suggest normalizing them to the same count (x-axis) value to make differences in count are more easily visible. Another approach could be plotting the outline of the baseline histogram on top of YOLOLens histogram.
Line 298: I don’t think I would call a variance of 40-50% “very low” as this is about 2x higher than the variation in human measurement from Robbins et al. (2014), if I understand the calculations here.
Lines 305-306: I am confused by a report of confidence on a “true positive” – how can a true positive have a low (to no) confidence? i.e., the neural network found a crater that Robbins identified, so what makes it low confidence? Also, I am intrigued by the “U-shape” of Fig. 6. Why are there so few moderate confidence larger craters (i.e., why are these either “yes or no” in confidence)?
Section 5.1.3: This feels like a very circular argument and does not necessarily prove to me that the system can work on lower resolution data. The human would have differences too (e.g., the matches are inconsistent because a human could not measure the small craters either on a lower resolution image. Also, YOLOLens re-upscales, which I understand is the benefit, but does re-upscaling a purposefully downscaled image really reproduce what the real effort would be like (i.e., how much does the upscale process know about the downscale)? Maybe the better test would be to take larger craters on a natively poorer resolution set (e.g., Mars Themis or the Mercury work described in the Intro)?
Sections 6 & 7: Much of Section 6 reads like a restatement of the results, which is a conclusion. The processing time seems new and useful though. It seems like all of this could be added to Section 7 to make a conclusion section with a bit more detail.
Author Response
In attach our reply.
Thanks again.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
I thank the authors for considering my review and making several suggested changes that improved the manuscript. I have some minor comments I hope the authors can address before final publication.
Detailed comments:
Line 47: 5-10% is actually the best case for the agreement – I suggest replacing with 20-30%, which was the average.
Lines 57-58: Since there is only one reference sited, I suggest “research is” instead of “researchers are”
Line 104: I suggest starting a new paragraph with “In the last decades,…”
Line 104: The parentheses with e.g., appears to be missing additional text?
Unnumbered lines at top of page 5 (in new text): From the current text I don’t follow why the source resolution needs to be reduced first in order to increase the resolution again? There is a statement about reducing the error, but I don’t follow how this step reduces the error? Please add text to clarify.
Figure 3: I don’t see the obvious use of this figure (and don’t even remember seeing it cited), thus I suggest it could be removed.
Lines 207-208: The clause “, and finally, the WAC, TC…” could be deleted as repetitive of the previous part of the sentence.
Lines 226-227: To make the sentence a bit less wordy, I suggest deleting “All coordinates are converted into a YOLO coordinates system” and connect the previous sentence to “useful to run our model...”
Lines 229-230: Actually, it would be more useful to state the minimum number of pixels across that was considered, rather than saying “all craters less than 416 pixels” because that is ultimately not true – there had to be a minimum pixel cutoff.
Section 4.1: The summary steps at the end is a very nice recap, but the steps aren’t actually presented in that order in the above paragraphs. I suggest some re-ordering of the paragraphs/sentences in this section to match the order of the summary.
Figure 5 caption: Remind the reader of what the colored boxes represent in the caption, so they don’t have to go back to the main text: model predictions in red and Robbins ground truth in blue.
Lines 315-316: I do not follow what “balanced crater sample techniques” refers to? A balance of what – size, number, and how would it be achieved?
Line 322: Please add a brief descriptor of the Wang dataset here – specifically that it is another neural network based automated detection.
Line 324: Actually it looks like detection is starting to drop off for D<0.7 km. If full detection was continuing the histogram would continue to rise. I assume this is a resolution effect even given the improved resolution of YOLOLens (it still will have a resolution cut-off). Please add a sentence or two to discuss the identification resolution cut-off in terms of pixel scale of the images.
Lines 336-340: The authors gave a nice response to my question about the confidence of true positives and I would like to see that added here for the general reader.
Appendix A: I am very happy to see this additional detail about the training and testing of the model. However, I was wondering if any of the image panels were examples of the higher resolution output of YOLOLens? If so, this should be noted in the captions. If not, is that because the code does not put out this intermediate product, i.e., the image is only used by the code and not produced as a product for the user to view? I would argue that producing the improved resolution image for the user to view would be an interesting product for the code to output, if it doesn’t already do so.
Author Response
Please see the attachment. Thanks a lot again.
Author Response File: Author Response.pdf