Two-Step Approach toward Alignment of Spatiotemporal Wide-Area Unmanned Aerial Vehicle Imageries
Round 1
Reviewer 1 Report
In this manuscript the authors present an approach for automatic alignment of UAV image mosaics using a two-step procedure: first a global and afterwards a local alignment based both on homography matrix (2d projective transformations) estimated from keypoints matches among the two mosaics.
Comments for the authors:
· During local alignment stage, the mosaics are split into patches to eliminate any misalignments remaining after global alignment. Aiming to prevent the generation of new artefacts at patch boundaries, a boundary-aware scheme is introduced leaving unaffected the areas close to the boundaries of all patches. if I am not mistaken, remaining misalignments on these boundary areas, will not be corrected by local alignment. So, if this fact is accurate:
- it should be clearly stated in the manuscript
- authors should propose ways to address it
· On line 95 it is not clear what authors mean by “Spatial data-based approaches use either a digital-surface model (DSM) or a GCP with expensive computation.” It seems not be very accurate.
· The explanation of how the coefficients in the homography matrix represent various transformations is not accurate and needs to be corrected. (lines 202-205)
· On line 321, it should be stated whether the mosaicing concern simple image stitching or a orthorectification following a photogrammetric workflow.
· In the evaluation process, does the baseline method represents a pure translation? If so, should be disregarded as it can confuse the reader.
· The accuracy improvement should be mentioned against the state-of-the-methods (MDLT [60] and Ransac-flow [24]) and not to the baseline (pure translation).
· It is not clear how the different algorithms ([60], [24]) have been applied for the results comparison. Should be presented in more details.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Two-Step Approach Towards Alignment of Spatiotemporal Wide-Area Unmanned Aerial Vehicle Imageries
A summary
This study proposed a method to align wide-area UAV images through two steps alignment procedure at the global and local levels. The global alignment employs a single homography using key-point pairs created from a DNN-based extractor and the local one improves the alignment by removing the local misalignments within small patches.
General concept comments
The manuscript reports an interesting methodology described in its mathematical specifications. The major limitation regards the overall structure of the manuscript. As highlighted in the specific comments section, there are parts in the materials and methods part that belong to the introduction and sections in the results and discussion that belong to the materials and methods one. Details like the specific software employed during the experiments or comprehensive caption below the images are missing. The only plot reported in the manuscript is not self-explanatory and needs some improvements to allow the readers to understand the obtained results. I suggest the authors improve and simplify the numerous terms used to define methods, inputs, and outputs inside the manuscript; a legend at the beginning of the work could help the reader.
Review
The work reports an improving approach for UAV remote sensing development, trying to solve a specific problem with an innovative methodology. I suggest the authors consider the extension of results and discussion part, providing more details about how methodology improvement and integration at the farm level. Is it a tool that every SfM software should include to make a difference in agricultural scenarios? As highlighted in the specific comment section, some references are too old or inappropriate. Try to provide a more recent one and highlight why your method is helping evolve the UAV image alignment procedure.
Specific comments
Abstract
· Lines 3-4: It is not clear what the authors intended with “that are geometrically 3 aligned with individual transformations because of inconsistent UAV operations over target site.”. Can you provide an explanation and make the sentence comprehensible?
· Line 11: I suggest the authors to avoid use “we” in academic writing, please find a more appropriate way.
· Line 14: If the authors are referring to accuracy improvements, it would be better to define them using percentage (%) values to make the results easier to be understood and compare with other authors’ results.
Introduction
· Lines 22-24: I suggest the authors explain why the UAV images are generally narrow compared to airborne and satellites. Try to connect this sentence to the previous one and specify which heights above the ground level UAVs commonly fly.
· Lines 27-28: Most of the works cited by the authors refers to machine/deep learning approaches. Is there a connection between the machine learning approach and the necessity of spatiotemporal data for enhancing crop growth analysis and yield estimation?
· Lines 33-34: Are the low-quality images related to airborne remote sensing and high-quality ones to the UAV sensing? If yes, can the authors provide explanations related to this statement and provide appropriate references?
· Lines 34-36: Generally, images alignment is one of the first step for the orthomosaic obtainment. Can you provide more details about your statement?
· Line 42: It is preferable to avoid image references in the introduction part if not essential, especially if discussed later in the manuscript.
· Lines 54-56: Can the authors provide more recent works about key points matching? Is there something specific related to UAV remote sensing? This integration would help the readers to have a comprehensive overview of recent improvements.
· Lines 62-75: Even if what the authors have written is correct and well explained, I suggest reviewing this part. Try to be concise, avoid repeating concept highlighted previously in the introduction, and, since it is the last part of the introduction section, you should only define the goals of your work. There are specific parts to report and discuss the obtained results. Avoid using bullet points and to use this form “our contributions are”, “We propose a”, “We address artifacts that”, etc.
Related work
· Line 76: Even if the reviewers’ duty does not regard the correction and formatting of manuscripts, I recommend the authors follow the submission guidelines and organize the manuscript to guarantee a standard representation and make the article comprehensible to the readers. The manuscript preparation section is reported as follows: “Research manuscript sections: Introduction, Materials and Methods, Results, Discussion, Conclusions (optional).” The “related work” part could be integrated into the introduction section.
· Line 82: the citation n° 27 is an old one, can the authors provide a recent one? Do the same with all old references.
· Line 112: Do the authors think the term “agnostic” fits the discussed topic?
· Line 117: DTM and DSM are orthomosaics generated by the dense cloud reporting information regarding each pixel altitude above sea level or another reference point. Their use can lead to specific orthomosaic where only the terrain surface is represented (DTM derived ones) or the entire surveyed surface (DSM derived ones). Probably what the authors wanted to describe is the general process to obtain an orthomosaic which could pass through a Digital Model (DM) to obtain an high resolution orthomosaic. In some cases, it is also possible to use the dense cloud and not the DM.
· Line 122: The required computational power represents a limit of the described process but there are already multiple user-friendly software able to automatize the process, even if with some limits in GCP recognition and coordinates attribution.
· Lines 123-127: this part is a bit confusing; I suggest the authors leave the methodology explanation for the materials and methods part, be more schematic and clear about the already know orthomosaic creation methodology.
· Lines 146-147: Please provide some UAV application.
· Line 184: Which software and programming language did you use to run this and the other algorithms?
· Figure 1: probably its best location is after section 3.2, closer to the related part which discuss about the methodology.
· Line 195-198: To create a more fluent materials and methods section, the authors should consider avoiding the repetition of the scope of the work or part of it, providing a general procedure overview, and further the global and local alignment details as intended by a top-down approach.
· Line 262-263: this is a crucial information to understand Figure 4 heat map, please provide this information in Figure 4 caption and close to the graduated scale inside the image.
· Figure 6: If possible, try to use a different colour for the black X inside the image, it is hard to distinguish it from the background. Why do the authors used different colours for reference points illustration? Should be defined/described in the caption.
· Line 312: there is a typo in this sentence: “therefore accurate prediction of its prediction is essential”. “Prediction” word is repeated two times.
· Line 317: there is a typo in this sentence: “it is necessary to ensure that images are continuously acquired images from a single flight of UAV”, “images” world is repeated two times.
· Line 329: why the authors decided to exploit 20 RTK based reference points only in dataset DD?
· Line 331: how the authors applied DNN-based method, did they use a specific software? Be more specific please.
· Line 343: what is the measurement unit of 2000×2000 value and the others described in the same section?
· Lines 363-371: this part aims at comparing different methods. For this reason, I suggest the authors to move it to the discussions part.
Results and discussions
· Line 372: “Performance Comparisons” section reports a materials and methods information, specifically it explains how the accuracy has been calculated.
· Table 1: please specify what Da, Db, Dc, and Dd stand for.
· Lines 400-402: this sentence is not clear.
· Lines 411-413: can the authors provide an explanation, why DNN-based alignment performed better?
· Figure 8: plots interpretation is not easy, it is not clear what blue columns stand for; also the positioning of each timing variation based on threshold legend voice (es. “Matching time”) over one plot, give a wrong interpretation of plots a, b, and c. It seems like the first plot belongs to “matching time”, the second to “local Alignment time”, and so on.
· Lines 447-450: how did the authors evaluated the low the almost absence of side effects, please indicate the section where this issue is solved?
· Line 459: what the numbers stand for? Provide an explanation, please.
· Figures 9 and 10: what the three colours in the images stand for?
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Accept in present form
Reviewer 2 Report
Important improvement have been done. Thank authors to justified each correction and provided appropriate explanation.
I have no changes or correction to suggest.