Construction of Edge Computing Platform Using 3D LiDAR and Camera Heterogeneous Sensing Fusion for Front Obstacle Recognition and Distance Measurement System
Round 1
Reviewer 1 Report
The authors fused 3D Light Detection and Ranging and cameras, combined with YOLOv4-Tiny as the detection network target recognition system and a ranging system, to build an edge computing platform that enables offline real-time computing in vehicles equipped with the device. At the same time, the two methods, ‘minimum point in box’ and ‘median point in box’, are compared in the construction of detecting distance. This system incorporates heterogeneous sensors for superior detection speed and accuracy. Here are several comments to this manuscript.
1. This paper only uses a simple YOLOv4-Tiny model and does not compare the advantages and disadvantages with other target detection network, and only states that this algorithm is superior through language expressions, which is not convincing.
2. It is not clear how edge devices play a role and how algorithms are deployed to edge devices. Furthermore, the data transmission are not clear.
3. No physical representation and pictures of edge equipment.
4. The section on heterogeneous sensors is too lengthy and is recommended to be shortened.
5. The distinction between the two meanings of CPU average usage in Table 3 is not obvious.
6. The camera matrix is recommended to be drawn as a diagram, and the screenshots should be removed from the borders.
7. The results of the comparative experiments with two calibration plates and three calibration plates in Figures 11 and 12 are not convincing when conclusions are drawn at different distances between the sensor and the calibration plate.
8. The actual distances are not stated in Figures 13 and 14, the measured distances are not compared with the actual distances to determine which result is better. In addition, the measurement description in Figure 15 is vague and it is not known exactly how the application of ‘minimum point within the bounding box’ or the ‘median point within the bounding box’.
Author Response
The authors fused 3D Light Detection and Ranging and cameras, combined with YOLOv4-Tiny as the detection network target recognition system and a ranging system, to build an edge computing platform that enables offline real-time computing in vehicles equipped with the device. At the same time, the two methods, ‘minimum point in box’ and ‘median point in box’, are compared in the construction of detecting distance. This system incorporates heterogeneous sensors for superior detection speed and accuracy. Here are several comments to this manuscript.
- This paper only uses a simple YOLOv4-Tiny model and does not compare the advantages and disadvantages with other target detection network, and only states that this algorithm is superior through language expressions, which is not convincing.
Author Reply: Thank you for your valuable comment. YOLOv4-Tiny is used as the obstacle recognition network. Its lightweight architecture and high computing speed are suitable for hardware with limited performance such as edge computing platforms. Experiment results show that even with an edge computing platform with limited computational capacity, the system reaches 60 fps, above 70% accuracy and a margin of error less than 3 cm using the YOLOv4-Tiny network.
- It is not clear how edge devices play a role and how algorithms are deployed to edge devices. Furthermore, the data transmission are not clear.
Author Reply: Thank you for your comments. In this article, the edge device adopts Jetson AGX Xavier, which is equipped with Ubuntu system and supports languages such as Python to complete our algorithm. ROS (Robot Operating System, ROS) is the communication bridge between hardware and algorithm software.
- No physical representation and pictures of edge equipment.
Author Reply: Thank you for your comment. Figure 18 is the physical representation and pictures of edge equipment.
- The section on heterogeneous sensors is too lengthy and is recommended to be shortened.
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
- The distinction between the two meanings of CPU average usage in Table 3 is not obvious.
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
- The camera matrix is recommended to be drawn as a diagram, and the screenshots should be removed from the borders.
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
- The results of the comparative experiments with two calibration plates and three calibration plates in Figures 11 and 12 are not convincing when conclusions are drawn at different distances between the sensor and the calibration plate.
Author Reply: When using three calibration plates, for the camera’s and LiDAR’s fields of view to accommodate three calibration plates simultaneously, the distance between the sensor and calibration plate must be increased.
- The actual distances are not stated in Figures 13 and 14, the measured distances are not compared with the actual distances to determine which result is better. In addition, the measurement description in Figure 15 is vague and it is not known exactly how the application of ‘minimum point within the bounding box’ or the ‘median point within the bounding box’.
Author Reply: Thank you for your comment. Figure 15 uses the minimum point in 100% of the bounding box. This method ensures that we find the shortest distance between the sensor and the object but is more susceptible to environmental interferences. In Figure 15, the chair interfered with the human body, resulting in an inaccurate distance. Uses the medium point within the bounding box to measure distance. The size of the bounding box can be ignored when using the median points within the bounding box to ensure that the measured distance is focused on the correct object.
Author Response File: Author Response.docx
Reviewer 2 Report
1) The structure of the article does not comply with the MDPI standard, it needs to be reworked.
I would recommend: 1 - Introduction, 2 - Literature Review, 2.1 - Robot Operating Systems, 2.2 - Sensors; 2.3 - Heterogeneous Sensor Fusion, 2.4 - Object Detection; 3- Materials and Methods; 3.1 - Heterogeneous Sensor Fusion, etc.; A) Camera calibration:….; B) LiDAR-Camera Calibration:…..
4 - results; 4.1 - Steps and Goals of the Experiment ....5 - Discussion and perspective; 6-conclusion
(for some reason, the Conclusion section is missing, it is necessary to add and formulate the results obtained)
2) At the end of section 2 (Literature review), it is necessary to formulate the purpose of the work and the tasks that the Conclusions will correspond to.
3) Figures 1 and 2 are repeated, that is, after Figure 2, there is again Figure 1. It is necessary to correct
4) The methodology should be improved by adding mindmaps/graphs and further clarifications. The article should be read as a single logical sequence, and not a set of separate fragments.
5) Fig. 6 - Please explain the meanings of the results, where they come from and where they are used. Comment on the meaning of numbers - is it good or bad? When viewing the figure with the digital results of the program, it should be clear to the reader that they show in a practical sense.
6) It is necessary to explain what the numbers in the rows and columns of the matrices represent (in Figures 9, 10). Describe how the calibration results are used? The figure should be clear to the reader. What are the input data, what algorithm and how the results are used.
7) Lines 365-368. Figure 10 shows the calibration results of using three calibration plates. The Root Mean Square Error of the final conversion (….. - specify a value) is approximately thrice than that when using two calibration plates (…. - specify a value), and the deviation in the actual final projection is also more obvious
8) In the "Discussion and Perspectives" section, please write the practical application of your system, in what areas can it be applied?
9) Table 3. What are the units of measurement in columns 3 and 5? Please sign. Why are columns 2 and 3 named the same?
10) Improve the quality of the drawings, make them larger, with good resolution, so that it is easy to read. The font in the figures should be commensurate with the font of the text.
Author Response
1) The structure of the article does not comply with the MDPI standard, it needs to be reworked.
I would recommend: 1 - Introduction, 2 - Literature Review, 2.1 - Robot Operating Systems, 2.2 - Sensors; 2.3 - Heterogeneous Sensor Fusion, 2.4 - Object Detection; 3- Materials and Methods; 3.1 - Heterogeneous Sensor Fusion, etc.; A) Camera calibration:….; B) LiDAR-Camera Calibration:…..4 - results; 4.1 - Steps and Goals of the Experiment ....5 - Discussion and perspective; 6-conclusion (for some reason, the Conclusion section is missing, it is necessary to add and formulate the results obtained)
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
2) At the end of section 2 (Literature review), it is necessary to formulate the purpose of the work and the tasks that the Conclusions will correspond to.
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
3) Figures 1 and 2 are repeated, that is, after Figure 2, there is again Figure 1. It is necessary to correct
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
4) The methodology should be improved by adding mindmaps/graphs and further clarifications. The article should be read as a single logical sequence, and not a set of separate fragments.
Author Reply: Thank you for your comment. The flow chart can be found in Figure 1 (architectural diagram) and Figure 7 (experiment steps and goals).
5) Fig. 6 - Please explain the meanings of the results, where they come from and where they are used. Comment on the meaning of numbers - is it good or bad? When viewing the figure with the digital results of the program, it should be clear to the reader that they show in a practical sense.
Author Reply: Thank you for your comment. In this figure, we mainly want to express how we capture the internal parameters and projection matrix of the camera we use.
6) It is necessary to explain what the numbers in the rows and columns of the matrices represent (in Figures 9, 10). Describe how the calibration results are used? The figure should be clear to the reader. What are the input data, what algorithm and how the results are used.
Author Reply: Using the LiDAR-Camera Calibration method proposed in A. Dhall etc., a calibration plate with Augmented Reality (AR) graphics is used to make the 2D image also provide 3D information and mark the four edges of the plate on the 3D point cloud map through the manual box selection method. The iterative closest point is used to find the correspondence between the two point clouds. The transformation matrix will align the two point clouds by minimizing the Euclidean distance between the corresponding points. After the conversion relationship is obtained, in order to project the 3D point cloud onto the image, in addition to the relationship between the two, the internal parameters of the camera and its projection matrix are required.
7) Lines 365-368. Figure 10 shows the calibration results of using three calibration plates. The Root Mean Square Error of the final conversion (….. - specify a value) is approximately thrice than that when using two calibration plates (…. - specify a value), and the deviation in the actual final projection is also more obvious
Author Reply: After the LiDAR and the camera are jointly calibrated, the transformation matrix between the two can be obtained. The 3D point cloud of the LiDAR can be instantly projected onto the 2D image of the camera using the ROS Package designed in our research. Figure 13 and Figure 14 show the projection results using the transformation matrix obtained via the joint calibration. The color of the points in the figures represents the distances. The nearer a point, the redder it becomes, and the further away, the greener it becomes. In Figure 14, three calibration plates are used to obtain the transformation matrix. Compared with Figure 13, the deviation of the projection of Figure 14 is more obvious. Therefore, points cannot be accurately projected onto the correct position.
8) In the "Discussion and Perspectives" section, please write the practical application of your system, in what areas can it be applied?
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
9) Table 3. What are the units of measurement in columns 3 and 5? Please sign. Why are columns 2 and 3 named the same?
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
Table 3. Jetson AGX tests for each power mode
Power Mode |
CPU average usage |
The number of CPU working cores |
GPU average usage |
YOLOv4-Tiny FPS |
0 (MAXN) |
53.3% (2.3GHz) |
8 |
>77% (1.4GHz) |
60 |
1 (10W) |
× |
× |
× |
× |
2 (15W) |
100% (1.2GHz) |
4 |
>95% (675MHz) |
22 |
3 (30W ALL) |
72.5% (1.2GHz) |
8 |
>80% (905MHz) |
40 |
4 (30W 6core) |
82% (1.4GHz) |
6 |
>80% (905MHz) |
35 |
5 (30W 4core) |
95.8% (1.8GHz) |
4 |
>88% (905MHz) |
32 |
6 (30W 2core) |
100% (2.1GHz) |
2 |
>50% (905MHz) |
20 |
7 (15W DESKTOP) |
98% (2.2GHz) |
4 |
>95% (675MHz) |
24 |
10) Improve the quality of the drawings, make them larger, with good resolution, so that it is easy to read. The font in the figures should be commensurate with the font of the text.
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
Reviewer 3 Report
See attached PDF file
Comments for author File: Comments.pdf
Author Response
The paper describes a way to use lidar, bounding box determination algorithm YOLOv4-Tiny and with a camera calibration to determine distances of objects on an offline computing mode. The paper itself is well written, has nearly zero typos, and is detailed. The work appears novels and has wide-ranging and immediate applications. I recommend this manuscript be published after addressing some minor questions and comments: Many of the figures can be improved significantly to improve readability and presentation quality.
Need more detail in most Figure captions
Author Reply: Thank you for your comment. More detailed descriptions were added for Figures 15-16.
Line 191, remove “caused by a lens”
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
Table 3 has “cpu average usage” twice. In table 3, is oen of the columns GPU frequency? If so, why is it 905 GHZ?? Should this be MHz?
Author Reply: Thank you for your comment. Table 3 has been fixed in the revised manuscript.
Table 3. Jetson AGX tests for each power mode
Power Mode |
CPU average usage |
The number of CPU working cores |
GPU average usage |
YOLOv4-Tiny FPS |
0 (MAXN) |
53.3% (2.3GHz) |
8 |
>77% (1.4GHz) |
60 |
1 (10W) |
× |
× |
× |
× |
2 (15W) |
100% (1.2GHz) |
4 |
>95% (675MHz) |
22 |
3 (30W ALL) |
72.5% (1.2GHz) |
8 |
>80% (905MHz) |
40 |
4 (30W 6core) |
82% (1.4GHz) |
6 |
>80% (905MHz) |
35 |
5 (30W 4core) |
95.8% (1.8GHz) |
4 |
>88% (905MHz) |
32 |
6 (30W 2core) |
100% (2.1GHz) |
2 |
>50% (905MHz) |
20 |
7 (15W DESKTOP) |
98% (2.2GHz) |
4 |
>95% (675MHz) |
24 |
Why were the aruco markers placed at the same distance? Is there value to putting another one at a different distance? Why wasn’t a wider angle camera used, as would be expected in self-driving cars. Is this a limitation of the LIDAR field of view? Seems like this technique would be more important in wider angle cameras due to the distortion of the image. Also, would a better correction be obtained if the aruco markers were farther apart? You mention the average obstacle detection accuracy of YOLOv4-Tiny dynamic detection is 70% within 10m. What does this mean? Perhaps state a standard deviation and average error in metres? Please discuss what the LIDAR distance errors are.
Author Reply: The farther the ArUco marker is placed, the lower the number of points detected by LiDAR on the marker. During the experiment, we hope to maximize the placement of the marker within the camera's field of view so that the number of points that LiDAR hits on the marker will increase as much as possible. Figure 17 shows the results of the actual dynamic test on the road. The effective detec-tion distance of LiDAR is 100 m, and the detection error is less than 3 cm. Within 10 m, the average obstacle detection accuracy of YOLOv4-Tiny dynamic detection can surpass 70%.
Author Response File: Author Response.docx
Reviewer 4 Report
In this paper, the authors present a front obstacle recognition and distance measurement system using 3D LiDAR and camera heterogeneous sensing fusion. However, several problems were identified during this review round.
1. General workflow
The authors propose an intuitive workflow, which consists of calibration between 2D optical camera images and 3D LiDAR data, object detection using the optical camera, and distance measurement using calibrated 3D LiDAR data. The proposed solution also involves many uncertainties.
1.1 The transformation matrix of the calibration parameter is not self-adaptable. The Non-adaptable parameter will introduce uncertainties in real-world applications
1.2 Compared to image segmentation, the object detection algorithms detect the bounding boxes which usually cover a lot of background objects. It has a great impact on the accuracy of distance measurements. Personally, I encourage the authors to replace object detection with segmentation algorithms.
1.3 The robustness and scalability of this workflow. The authors are encouraged to present more analysis of the robustness and scalability of the proposed workflow
2. The presentations of this research
This manuscript needs to be reorganized and include more experiments on real-world scenarios. For example, Line 400-403, the author is supposed to provide more experimental information and evidence before drawing a conclusion (such as the description of the dynamic test dataset, the evaluation metric of obstacle detection, and so on).
Besides, the authors are supposed to alleviate the confusion regarding the whole workflow within this manuscript. For example, the author introduces two strategies of distance measurement in Line 385-397, including the minimum point and medium point, both of which are seemingly unsuitable in this research. Although the author indicates that they use a hybrid plan, yet the reviewer can hardly find any description of how to combine them in the final workflow.
The reviewer also suggests that the writing should also be furtherly improved.
Author Response
- General workflow
The authors propose an intuitive workflow, which consists of calibration between 2D optical camera images and 3D LiDAR data, object detection using the optical camera, and distance measurement using calibrated 3D LiDAR data. The proposed solution also involves many uncertainties.
- The transformation matrix of the calibration parameter is not self-adaptable. The Non-adaptable parameter will introduce uncertainties in real-world applications
Author Reply: Thank you for your helpful comments, we will continue to think about this interesting topic in future work.
- Compared to image segmentation, the object detection algorithms detect the bounding boxes which usually cover a lot of background objects. It has a great impact on the accuracy of distance measurements. Personally, I encourage the authors to replace object detection with segmentation algorithms.
Author Reply: Thank you for your helpful comments, we will continue to think about this interesting topic of segmentation algorithms in future work.
- The robustness and scalability of this workflow. The authors are encouraged to present more analysis of the robustness and scalability of the proposed workflow
Author Reply: Thank you for your helpful comments, we will continue to think about this interesting topic in future work.
- The presentations of this research
This manuscript needs to be reorganized and include more experiments on real-world scenarios. For example, Line 400-403, the author is supposed to provide more experimental information and evidence before drawing a conclusion (such as the description of the dynamic test dataset, the evaluation metric of obstacle detection, and so on).
Author Reply: Thank you for your helpful comments, we will continue to think about this interesting topic in future work.
Besides, the authors are supposed to alleviate the confusion regarding the whole workflow within this manuscript. For example, the author introduces two strategies of distance measurement in Line 385-397, including the minimum point and medium point, both of which are seemingly unsuitable in this research. Although the author indicates that they use a hybrid plan, yet the reviewer can hardly find any description of how to combine them in the final workflow.
Author Reply: Thank you for your comment. Figure 15 uses the minimum point in 100% of the bounding box. This method ensures that we find the shortest distance between the sensor and the object but is more susceptible to environmental interferences. In Figure 15, the chair interfered with the human body, resulting in an inaccurate distance. Uses the medium point within the bounding box to measure distance. The size of the bounding box can be ignored when using the median points within the bounding box to ensure that the measured distance is focused on the correct object.
The reviewer also suggests that the writing should also be furtherly improved.
Author Reply: This paper has been edited and revised by the professional English editing agency (enago academy), and all of the comments and suggestions have been revised carefully.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
The comments has been replied and I would recommend "publish as it is".
Author Response
Response to Reviewer 1 Comments
Comments and Suggestions for Authors
The comments has been replied and I would recommend "publish as it is".
Author Reply: Thank you for review’s comments.
Reviewer 2 Report
1) Please paste this in the text of the article:
Author Reply: Using the LiDAR-Camera Calibration method proposed in A. Dhall etc., a calibration plate with Augmented Reality (AR) graphics is used to make the 2D image also provide 3D information and mark the four edges of the plate on the 3D point cloud map through the manual box selection method. The iterative closest point is used to find the correspondence between the two point clouds. The transformation matrix will align the two point clouds by minimizing the Euclidean distance between the corresponding points. After the conversion relationship is obtained, in order to project the 3D point cloud onto the image, in addition to the relationship between the two, the internal parameters of the camera and its projection matrix are required.
2) Author Reply: Thank you for your comment. In this figure, we mainly want to express how we capture the internal parameters and projection matrix of the camera we use.
It still remains unclear to the reader what the numbers of the matrix mean (Fig. 8 according to the new numbering) and what conclusion follows from these results?
3) Author Reply: Thank you for your comment. The flow chart can be found in Figure 1 (architectural diagram) and Figure 7 (experiment steps and goals).
I meant a diagram like the one in the link:
https://www.matchware.com/examples/mind-map/the-scientific-method/641
4) Some of the pictures still have small text (especially fig. 3), please also enlarge fig.4
Author Response
Response to Reviewer 2 Comments
Comments and Suggestions for Authors
1) Please paste this in the text of the article:
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
2)It still remains unclear to the reader what the numbers of the matrix mean (Fig. 8 according to the new numbering) and what conclusion follows from these results?
Author Reply: Thank you for your comment. In order to achieve the fusion of image and 3D point cloud data, it is necessary to correct the camera and 3D LiDAR to find out the conversion relationship between the two data. This maps to equation (1) camera internal matrix and camera external matrix in preparation for joint correction.
3) Author Reply: Thank you for your comment. The flow chart can be found in Figure 1 (architectural diagram) and Figure 7 (experiment steps and goals).
I meant a diagram like the one in the link:
https://www.matchware.com/examples/mind-map/the-scientific-method/641
Author Reply: Thank you for your comment. The flow chart can be found in Figure 2 (Mind Map)
4) Some of the pictures still have small text (especially fig. 3), please also enlarge fig.4
Author Reply: Thank you for your comment. This problem has been fixed in the revised manuscript.
Author Response File: Author Response.pdf
Reviewer 4 Report
Although I have no other concerns about the methods section, several problems were identified during this review round, especially in the results and discussions section. As I pointed out in the last round, this method still lacks experiments in a larger number of real-world scenarios and a detailed analysis of the experimental results. For example, authors should provide more real-world experimental results and analyze the accuracy and efficiency under different occlusion types, such as pedestrians, walking animals, vehicles, traffic settings, and so on
Author Response
We are very grateful for your meaningful and valuable comments. We have added some real-world experiments as shown in Figure 18,20 and 21 in the revised manuscript. Due to the current limitations of funding and human resource, this study cannot complete more real-world experiments, in result of we cannot provide analysis and results in this topic. We hope that we could reach to higher level for next project in near future. Thanks again for your attention to this article and suggestions.
Author Response File: Author Response.pdf