FPGA Implementation of Shack–Hartmann Wavefront Sensing Using Stream-Based Center of Gravity Method for Centroid Estimation
Round 1
Reviewer 1 Report
Good paper. I would appreciate a short reference to a similar method by Talmi and Ribak, JOSAA ,21, 632-9 (2004) (Appendix B) which similarly multiplies incoming pixels by corresponding filter values and adds them.
Author Response
We are very encouraged by the reviewer's encouragement and interest in this work.
Response:
We recognise this work by adding the following words between line 61 - 68, and adding the, quite rightly, requested paper to bibliography ref[21]:
“Talmi and Ribak [21] showed that gradient calculation over the whole aperture is possible by direct demodulation on the grid, without reverting to Fourier Transforms. This method is especially suited to very large arrays due to the saving of computation by removing the two inverse Fourier Transforms. Importantly they considered that incomplete spots, for example, at the edge of the aperture, would create bias on the complete reconstruction, and showed that processing these in a sensible way in the image domain could have less effect on the whole reconstruction.”
Reviewer 2 Report
The manuscript presents an implementation of a classical Shack-Hartmann wavefront sensor on a FPGA, using a different centroiding algorithm, to lower the computational lag and improve the overall performances of an adaptive optical system. The manuscript is well written and is definitely interesting for the growing Adaptive Optics community.
Author Response
We are very encouraged by the reviewers comments and interest. Thankyou.
Reviewer 3 Report
The authors made excellent research by implementing a fast and reconfigurable architecture concerning the Shack-Hartmann wavefront sensing. Realtime is becoming more and more important for various systems and FPGA provides a promising solution with low latency, low cost, etc. The research provides valuable references for the subject.
The manuscript is well-organized and written with clear logic and expressions. The experimental results also show the efficiency of the research. I have not found obvious problems concerning the research contents.
However, there are some minor problems that need to be improved:
1. Please check the contents on line 366: “if” maybe should be “is”.
2. Please check the details on figures such as Future 8: it is hard to understand why there are two histograms for notes like “Slice LUT”.
Author Response
We thank the reviewer of the encouraging comments and address the specific concerns below:
- Please check the contents on line 366: “if” maybe should be “is”.
Response: corrected
- Please check the details on figures such as Future 8: it is hard to understand why there are two histograms for notes like “Slice LUT”.
Response:
The “two histograms” are the default legend style of the LaTeX package (pgfplots) that we used to draw the figures. We use different resources in the FPGA for each filter(e.g. 3x3, 5.x5 etc), and so divide these contributions in the graph. We believe FPGA designers will correctly interpret these but to make it clearer we alter the caption of Figure 8 to below:
“Figure 8: Resource usage for the SCoG module with four different filter sizes. Contributing resources for each filter size include Slice LUT, Block RAM, and Slice Registers. “
Reviewer 4 Report
The paper is well written, and clearly introduces the FPGA implementation of the system. However, I have just a few amendments.
- Some numerical results about the hardware implementation can be added in the abstract.
- How was the system described? Hand-written VHDL/Verilog, HLS, other automatic design methods?
- There is no data about the power dissipation of the system.
- Which is the maximum achievable clock frequency on the considered device?
- A comparison table between the previous work of the authors and this one should be added to highlight their performance.
Author Response
We thank the reviewer for the positive encouragement. Here are our responses to specific questions.
- Some numerical results about the hardware implementation can be added in the abstract.
Response:
We describe a scalable architecture working over a range of input data speeds and different filter sizes etc. Therefore using absolute numbers in the abstract would seem counterproductive.
- How was the system described? Hand-written VHDL/Verilog, HLS, other automatic design methods?
Response:
We have mentioned using VHDL and common IP blocks, on line 343-344.
- There is no data about the power dissipation of the system.
Response:
As this is a scalable architecture, the actual power consumption will depend on scale, but it is reasonable to note that the FPGA implementation consumes much less power than competing technologies. Our FPGA system consumes no more than 4W, but we do not consider this worth involved discussion in an already long paper.
- Which is the maximum achievable clock frequency on the considered device?
Response:
Similar answer to above, as the architecture is scalable. We use a megapixel sensor with a pixel clock of 50 MHz for which the system is synchronous. The maximum achievable clock frequency relates to the kernel size and ultimately the timing closure. We reported a 50 MHz timing closure for a 30x30 pixels kernel size on lines 403-404.
However a fast sensor or elevating the master clock to 350-500 MHz is also possible, as is parallel streams of pixel data. It is important to note the sensor must operate at speeds commensurate with the available light for the image sensor. This is application specific, and FPGA designers will easily identify with the possibilities for enhanced clocking etc. While we would love to expand on all the possibilities, we do not consider this worthy of involved discussion in an already long paper.
- A comparison table between the previous work of the authors and this one should be added to highlight their performance.
Our previous work ref[33] focused on the introduction of the Stream-based CoG concept and detailed performance comparison against other centroiding algorithms, as outlined between line 79-88. This manuscript detailed the FPGA implementation of a complete Shack-Hartmann WFS using the Stream-based CoG algorithm for centroid estimation which includes not only the FPGA implementation of SCoG itself, but also the treatment of missing/multiple centroids in each sub-aperture, parallel implementation of Slope calculation and a common m odal wavefront reconstructor.
Reviewer 5 Report
This paper presented a complete hardware implementation of the Shack- Hartmann wavefront sensor in a modular design. This work is well-written and organized. Comments to the authors:
1) Rewrite the sentences by removing the word "we".
2) Explain the reason for selecting the Stream-based CoG algorithm over other algorithms reported in the literature.
3) Support the equations with suitable explanations or references.
4) The concept of "stream of centroids" requires more details.
5) How the G-tilt is implemented in this work.
6) The proposed methodology of this paper must be explained in a flowchart format.
7) Provide more details of FIFO.
8) There are some grammatical errors. Thorough proofreading is required.
Author Response
We thank the reviewer for a detailed set of questions. Here are our responses.
1) Rewrite the sentences by removing the word "we".
Response: We believe it is acceptable to use “we” in journal publication.
2) Explain the reason for selecting the Stream-based CoG algorithm over other algorithms reported in the literature.
Response: We do this in detail in the Introduction and describe the advantages in section 2, lines 149-164. Detailed performance comparison between Stream-based CoG over other algorithms is covered in our previous work ref[33] and is mentioned between line 79-88 in this manuscript.
3) Support the equations with suitable explanations or references.
Response: The mathematics of centroiding is commonplace but explained in sections 2.1 and 2.2, and the optical use as wavefront reconstruction in page 4 is referenced at the start of section 2.4 for those who are not familiar with the application area. This work is particularly examining the implementation in an FPGA.
4) The concept of "stream of centroids" requires more details.
We thank the reviewer for outlining this confusion. Section 2.1 explains traditional CoG and Section 2.2 explains the extension of traditional CoG calculation to each pixel, which leads to the SCoG.
Centroid estimates are calculated for every new pixel arriving from the sensor, and true candidates selected by the methods given. This results in a (delayed) but synchronous set of centroids tagged against the stream of incoming pixels, which we term stream of centroids, which are further processed as they arrive by a modal wavefront reconstruction using matrices. To make this clearer we add the following at line 149.
“In this work, an estimate of centroid is made based on each and every pixel streamed from the image sensor, for which best centroids are tagged resulting in a stream of centroids or SCoG synchronous with the stream of pixels.”
5) How the G-tilt is implemented in this work.
We use the modal reconstruction based on centroids and any discrepancy where it matters is well known and well handled in adaptive optics research. To discuss this in depth is difficult in an already long paper.
6) The proposed methodology of this paper must be explained in a flowchart format.
We show the process diagrammatically in Fig 3,4 and 5, and where the algorithm is more complex for centroid sorting we define it in pseudo-code in Algorithm 1. We believe this shows the concept of the methodology, particularly now we have addressed the reviewer's comment above about "stream of centroids".
7) Provide more details of FIFO.
FPGA designers will be aware of the term First-In, First-Out or FIFO. In our case these are scalable based on filter size , both in lines of image data for vertical filters and in pixels for the horizontal filters. They implemented as block RAM as per Figure 8, and are shown in Figure 4 FIFO per line of the filter, and in Figure 5 for centroids.
8) There are some grammatical errors. Thorough proofreading is required.
Thankyou. Done.