Neural Network-Based Learning from Demonstration of an Autonomous Ground Robot
Round 1
Reviewer 1 Report
The authors present an implementation of deep-leaning for autonomous robot navigation in cluttered environments. That being said, the idea of using ConvNet and LSTM is not new. Here's e.g. a study from 2015 using the same combination of AI methods to classify EEG readings:
Bashivan, Pouya, et al. "Learning representations from EEG with deep recurrent-convolutional neural networks." arXiv preprint arXiv:1511.06448 (2015).
Overall, I find the paper to be well written and with a good structure. However, there are a few problems:
The state of the art is poor. I've seen conference papers which better analyze the previous work. In my opinion, a journal paper in the effervescent field of Neural Networks should have at least 40 references.
Considering that you use the front LIDAR data as the only input, I don't understand why you also need the inertial measurement unit 9-DOF-IMU and the onboard RGB camera.
I've inspected some of your json files. There are places where the front readings from LIDAR are 0. Doesn't that mean that the robot crashed into a wall? Looking at the construction of the robot, I'm not even sure how the reading is 0, since I don't think the LIDAR sensors can actually touch a wall.
Line 156: "Using , data were collected from 9"
Line 173: "in this dataset, which provides the trajectory of ."
Author Response
Dear reviewers:
Thanks for your kind suggestions. We fully believed that they are of much importance to improve our paper. Enclosed, please find our point-to-point response in PDF file.
Best regards
Author Response File: Author Response.pdf
Reviewer 2 Report
The paper presents an end-to-end, learning from demonstration approach for autonomous steering of a ground robot using LiDAR sensor inputs. Training data were collected from a human driver (remote control) on different tracks. And the proposed LSTM+CNN architecture outperforms a few other ML models for supervised learning in this task.
The paper, for the most parts, is easy to follow. And I like the comparison between different ML models. My main concern is the evaluation of performance:
The accuracy and RMSE make less sense if the input sensor data sets are distributed differently during training and testing (covariate shift). Since this is a sequential decision-making problem, the steering output at each time step will affect future sensor inputs. So for each run and each method, the resulting sensor input set is going to be different even for the same track. If you evaluate the performance of each learned model offline using the training/testing sets, the results become less convincing because a higher accuracy/lower RMSE does not indicate better steering performance (i.e., on-the-fly performance). In addition, offline performance evaluation using a dataset indicates that each input sensor data is independent, this contradicts to what the authors are claiming in this paper, i.e., use LSTM to incorporate temporal features during driving.
Author Response
Dear reviewers:
Thanks for your kind suggestions. We fully believed that they are of much importance to improve our paper. Enclosed, please find our point-to-point response in PDF file.
Best regards
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Thanks for the response and revision. However, my main concern has not been addressed.
First, (newly added) sec 2.3 and the rest of the paper are not coherent. At least the authors should point out where the proposed method stand (i.e., behavioral cloning).
Second, instead of citing work on forecasting stock prices/air pollution, or unrelated work on imitation learning, the authors should focus on end-to-end learning approaches to control of autonomous vehicles. And there is plenty of closely related work are ignored. For example
"Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues" by Lu et al.
"End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies" by Eraqi et al.
"Agile Autonomous Driving using End-to-End Deep Imitation Learning" by Pan et al.
The authors should discuss the relationship between their method and these closely related approaches.
Third, given the authors' response "That is why a video is included in the paper, which...",
I am not convinced that a video can be used to quantify the performance of any technical approach. If the accuracy and RMSE are not good metrics, the authors may consider other metrics, e.g., success rate, to quantify the on-the-fly performance. Currently, the reported results only show off-line performance.
Author Response
Dear reviewers:
Thanks for your kind suggestions. We fully believed that they are of much importance to improve our paper. Enclosed, please find our point-to-point response in PDF file.
Best regards
Author Response File: Author Response.pdf