Reducing the Reality Gap Using Hybrid Data for Real-Time Autonomous Operations
Abstract
:1. Introduction
- Can synthetic image data generation achieve effective results on real-world problems with the help of domain randomisation?
- Which parameters of domain randomisation affect the results most?
- How much does the augmentation improve the accuracy of the neural network?
- Which layers of the neural network affect the accuracy most?
- What are the main benefits of using hybrid image data to train the neural network?
- Can the synthetic image data be fully relied on to train the neural network?
2. Literature Review
3. Methodology
3.1. Dataset Development
- A sturdy mounting system: To capture clear and stable images of the refuelling adaptor, it is crucial to have a sturdy mounting system that can securely attach to the Intel® RealSense™ D435 depth camera. The system should be designed to minimise any movement or vibrations, as this can result in blurry or distorted images. Additionally, the mounting system should be adjustable to ensure the camera is positioned at an optimal distance and angle to capture the refuelling adaptor and surrounding area.
- High-resolution cameras: The camera used for this application should be of high resolution to capture clear and detailed images of the refuelling adaptor. The resolution should be high enough to capture fine details.
- Appropriate distance and angle: The Intel® RealSense™ D435 depth camera has been positioned at an appropriate distance and angle to capture the refuelling adaptor and surrounding area to capture the images. The ideal position will depend on the specific aircraft and refuelling setup and may require some experimentation to find the optimal position. However, the Intel® RealSense™ D435 depth camera should be positioned to capture a wide field of view that includes the entire refuelling adaptor and any relevant surrounding components.
- Consistent lighting conditions: To ensure consistent lighting conditions across all captured images, an array of high-intensity LED lights could be positioned around the camera rig. The LED lights should be positioned and angled to provide even illumination of the refuelling adaptor without creating harsh shadows or over-exposed areas. This is important to ensure that the images are clear and easy to interpret and that any potential issues or anomalies during the process are clearly visible.
- Model the refuelling adaptor: This has been accomplished by importing the 3D model of the adaptor. It is important to ensure that the model is accurate and to scale to ensure a realistic final product.
- Texture the models: To make the model looks more realistic, textures and materials need to be added to them. This involves adding textures and materials to the model in Blender. It is important to consider factors such as the paint and the material of the refuelling adaptor.
- Set up lighting and virtual camera: The lighting and virtual camera setup are crucial components in creating realistic images. Appropriate lighting needs to be set up in the 3D environment to create shadows and reflections that are similar to those found in real life. This involves adding lights to the scene in Blender and using ZPy to programmatically change the lighting to cover every potential scenario. A virtual camera also needs to be set up to capture the images. This was done by positioning a virtual camera in Blender and ZPy to program the camera position.
- Render the images: The final step in the process is to render the images. This has been done using Blender’s built-in rendering engine, which offers a variety of settings to create high-quality images. It is important to consider factors such as the resolution of the images to ensure the highest quality output.
3.2. Domain Randomisation
- The process of collecting and annotating thousands of images can be time-consuming;
- Even though there are many freely available datasets, the dataset needs to be collected and annotated for custom objects;
- Annotations are generally created by humans and humans tend to make mistakes;
- The content of the dataset might involve the wrong classes of images;
- Real datasets may only include basic annotations such as bounding boxes, segmentation, or labels.
- While synthetic data can replicate many of the properties of real data, it may not be able to accurately replicate all aspects of the original content, which can negatively impact the accuracy of the model.
- The quality of the generated data is heavily dependent on the quality of the 3D model.
3.3. Tranining Neural Networks
3.4. Ablation Study
3.5. Experimental Setup
4. Results
- Task-specific design: A custom-designed neural network is specifically designed and optimised for the refuelling adaptor detection task, while pre-trained models are usually designed to be versatile and adaptable to a wide range of tasks. As a result, a custom-designed network is able to outperform pre-trained models on the refuelling adaptor detection task.
- Hyper-parameter optimisation: Pre-trained models are generally trained using a fixed set of hyperparameters, whereas the custom-designed neural network is fine-tuned using techniques such as hyper-parameter optimisation to find the best set of hyperparameters for the refuelling adaptor detection task. This helps improve the performance of the custom-designed network.
- Dataset characteristics: Pre-trained models are trained on large datasets that may have different attributes from the dataset that has been used for this research. The custom-designed neural network is trained specifically on the hybrid dataset, which resulted in better performance.
- Architectural differences: Pre-trained models have a fixed architecture that may not be optimal for every task. The custom-designed neural network is designed with an architecture that is more suitable for the refuelling adaptor detection task, which resulted in improved performance.
- Increased diversity: Hybrid dataset offers a greater variety of data compared to solely real or synthetic datasets. This is especially useful as the refuelling adaptor detection task requires the neural network model to generalise to a wide range of conditions and the real dataset is limited in size or diversity.
- Improved annotation quality: Synthetic dataset has precise, accurate annotations, while the real dataset may have less accurate or incomplete annotations. By combining real and synthetic data, it is possible to take advantage of the precise annotations in the synthetic data while also incorporating the complexity and variability of real data.
- Reduced cost and ethical concerns: Synthetic dataset can be generated at a lower cost and with fewer ethical concerns than a real dataset. By using a hybrid approach, it is possible to reduce the amount of real data that needs to be collected, while still incorporating the benefits of real data.
5. Discussion
6. Conclusions
7. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SGD | Stochastic Gradient Descent |
CNN | Convolutional Neural Network |
NAS | Neural Architecture Search |
VGG | Visual Geometry Group |
HDRI | High Dynamic Range Imaging |
References
- Blender—A 3D Modelling and Rendering Package. Available online: https://www.blender.org (accessed on 20 May 2022).
- Unreal Engine—The Most Powerful Real-Time 3D Creation Tool. Available online: https://www.unrealengine.com (accessed on 28 May 2022).
- Unity—Real-Time Development Platform. Available online: https://unity.com (accessed on 16 May 2022).
- Butler, D.J.; Wulff, J.; Stanley, G.B.; Black, M.J. A naturalistic open source movie for optical flow evaluation. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 611–625. [Google Scholar]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; Van Der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
- Gaidon, A.; Wang, Q.; Cabon, Y.; Vig, E. Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4340–4349. [Google Scholar]
- Handa, A.; Patraucean, V.; Badrinarayanan, V.; Stent, S.; Cipolla, R. Understanding real world indoor scenes with synthetic data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4077–4085. [Google Scholar]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4040–4048. [Google Scholar]
- McCormac, J.; Handa, A.; Leutenegger, S.; Davison, A.J. Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv 2016, arXiv:1612.05079. [Google Scholar]
- Müller, M.; Casser, V.; Lahoud, J.; Smith, N.; Ghanem, B. Sim4cv: A photo-realistic simulator for computer vision applications. Int. J. Comput. Vis. 2018, 126, 902–919. [Google Scholar] [CrossRef] [Green Version]
- Qiu, W.; Yuille, A. Unrealcv: Connecting computer vision to unreal engine. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 909–916. [Google Scholar]
- Richter, S.R.; Vineet, V.; Roth, S.; Koltun, V. Playing for data: Ground truth from computer games. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 102–118. [Google Scholar]
- Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A.M. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEE Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3234–3243. [Google Scholar]
- Tsirikoglou, A.; Kronander, J.; Wrenninge, M.; Unger, J. Procedural modeling and physically based rendering for synthetic data generation in automotive applications. arXiv 2017, arXiv:1710.06270. [Google Scholar]
- Zhang, Y.; Qiu, W.; Chen, Q.; Hu, X.; Yuille, A. Unrealstereo: Controlling hazardous factors to analyze stereo vision. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 228–237. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference On Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar]
- Sadeghi, F.; Levine, S. Cad2rl: Real single-image flight without a single real image. arXiv 2016, arXiv:1611.04201. [Google Scholar]
- Hinterstoisser, S.; Lepetit, V.; Wohlhart, P.; Konolige, K. On pre-trained image features and synthetic images for deep learning. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- James, S.; Davison, A.J.; Johns, E. Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task. In Proceedings of the Conference on Robot Learning—PMLR, Mountain View, CA, USA, 13–15 November 2017; pp. 334–343. [Google Scholar]
- Zhang, F.; Leitner, J.; Ge, Z.; Milford, M.; Corke, P. Adversarial discriminative sim-to-real transfer of visuo-motor policies. Int. J. Robot. Res. 2019, 38, 1229–1245. [Google Scholar] [CrossRef]
- James, S.; Johns, E. 3d simulation for robot arm control with deep q-learning. arXiv 2016, arXiv:1609.03759. [Google Scholar]
- Peng, X.; Sun, B.; Ali, K.; Saenko, K. Learning deep object detectors from 3d models. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1278–1286. [Google Scholar]
- Dwibedi, D.; Misra, I.; Hebert, M. Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1301–1310. [Google Scholar]
- CLA-VAL 340AF, Pressure Fuel Servicing Adapter. Available online: https://cla-val-europe.com/en/product/cla-val-340af-pressure-fuel-servicing-adapter/ (accessed on 11 February 2021).
- Department of Defense, Defense Standardization Program. Available online: https://www.dsp.dla.mil/Policy-Guidance/ (accessed on 4 April 2021).
- Department of Defense, Defense Standardization Program Procedures. Available online: https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodm/412024m.pdf (accessed on 4 April 2021).
- Google, The Size and Quality of a Data Set. Available online: https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality (accessed on 18 October 2022).
- Altexsoft, Preparing Your Dataset for Machine Learning. Available online: https://www.altexsoft.com/blog/datascience/preparing-your-dataset-for-machine-learning-8-basic-techniques-that-make-your-data-better/ (accessed on 12 December 2021).
- SkyGeek, Military Standard Adapter. Available online: https://skygeek.com/military-standard-ms24484-5-pressure-adapter.html (accessed on 10 May 2022).
- Yildirim, S. Autonomous Ground Refuelling Approach for Civil Aircrafts using Computer Vision and Robotics. In Proceedings of the IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 3–7 October 2021. [Google Scholar] [CrossRef]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the CoRR, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- NVIDIA, What Is Synthetic Data? Available online: https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data (accessed on 19 October 2021).
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile 2018. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
- Meyes, R.; Lu, M.; de Puiseau, C.W.; Meisen, T. Ablation Studies in Artificial Neural Networks. arXiv 2019, arXiv:1901.08644. Available online: http://xxx.lanl.gov/abs/1901.08644 (accessed on 27 February 2021).
- Ponte, H.; Ponte, N.; Crowder, S. Synthetic data for Blender. Available online: https://github.com/ZumoLabs/zpy (accessed on 2 August 2022).
Neural Network Techniques | Data Type | Learning Rate | Optimiser Function | Accuracy | Validation Loss | Precision (%) | Recall (%) | (%) |
---|---|---|---|---|---|---|---|---|
Compound Scaling Neural Architecture Search Inverted Residual Block | Real Dataset | Adam | 46.36 | 23.981 | 38.30 | 37.63 | 41.48 | |
SGD | 47.22 | 22.585 | 39.31 | 38.46 | 28.49 | |||
Rmsprop | 42.87 | 26.458 | 36.12 | 35.22 | 29.21 | |||
Compound Scaling Neural Architecture Search Inverted Residual Block | Hybrid Dataset | Adam | 52.19 | 19.329 | 41.27 | 46.38 | 38.58 | |
SGD | 56.44 | 15.832 | 43.83 | 48.37 | 40.37 | |||
Rmsprop | 47.67 | 20.832 | 42.14 | 42.57 | 39.47 | |||
Compound Scaling Neural Architecture Search Inverted Residual Block | Real Dataset | Adam | 72.8 | 14.254 | 61.74 | 62.19 | 53.85 | |
SGD | 77.42 | 12.832 | 62.18 | 63.28 | 54.32 | |||
Rmsprop | 64.13 | 16.239 | 60.49 | 60.43 | 51.32 | |||
Compound Scaling Neural Architecture Search Inverted Residual Block | Hybrid Dataset | Adam | 73.5 | 10.329 | 63.21 | 64.20 | 57.47 | |
SGD | 78.43 | 9.848 | 64.37 | 65.33 | 58.38 | |||
Rmsprop | 68.79 | 12.328 | 62.47 | 63.27 | 56.47 | |||
Compound Scaling Neural Architecture Search Inverted Residual Block | Real Dataset | Adam | 91.87 | 0.325 | 92.22 | 94.67 | 92.57 | |
SGD | 94.31 | 0.209 | 93.83 | 95.88 | 93.24 | |||
Rmsprop | 86.99 | 0.465 | 90.62 | 94.19 | 91.42 | |||
Compound Scaling Neural Architecture Search Inverted Residual Block | Hybrid Dataset | Adam | 98.37 | 0.044 | 97.03 | 98.34 | 97.14 | |
SGD | 99.19 | 0.023 | 98.26 | 99.58 | 97.92 | |||
Rmsprop | 97.56 | 0.078 | 95.47 | 96.01 | 96.67 | |||
EfficientNet-B0 | Real Dataset | Adam | 72.82 | 6.449 | 61.40 | 60.14 | 56.16 | |
SGD | 76.11 | 6.214 | 62.89 | 60.98 | 55.37 | |||
Rmsprop | 75.37 | 8.823 | 60.95 | 59.16 | 54.56 | |||
EfficientNet-B0 | Hybrid Dataset | Adam | 78.09 | 5.382 | 64.31 | 65.35 | 58.43 | |
SGD | 83.11 | 4.974 | 66.49 | 67.15 | 59.47 | |||
Rmsprop | 82.47 | 6.238 | 62.58 | 64.75 | 56.48 | |||
VGG-16 | Hybrid Dataset | Adam | 80.88 | 2.374 | 84.74 | 83.19 | 82.48 | |
SGD | 85.56 | 2.249 | 86.38 | 86.17 | 83.29 | |||
Rmsprop | 81.74 | 3.958 | 81.36 | 80.12 | 80.44 | |||
ResNet-18 | Hybrid Dataset | Adam | 87.44 | 0.402 | 90.11 | 92.89 | 91.23 | |
SGD | 90.89 | 0.388 | 92.18 | 93.77 | 91.88 | |||
Rmsprop | 88.32 | 0.627 | 88.76 | 91.03 | 90.58 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yildirim, S.; Rana, Z.A. Reducing the Reality Gap Using Hybrid Data for Real-Time Autonomous Operations. Mathematics 2023, 11, 1696. https://doi.org/10.3390/math11071696
Yildirim S, Rana ZA. Reducing the Reality Gap Using Hybrid Data for Real-Time Autonomous Operations. Mathematics. 2023; 11(7):1696. https://doi.org/10.3390/math11071696
Chicago/Turabian StyleYildirim, Suleyman, and Zeeshan A. Rana. 2023. "Reducing the Reality Gap Using Hybrid Data for Real-Time Autonomous Operations" Mathematics 11, no. 7: 1696. https://doi.org/10.3390/math11071696
APA StyleYildirim, S., & Rana, Z. A. (2023). Reducing the Reality Gap Using Hybrid Data for Real-Time Autonomous Operations. Mathematics, 11(7), 1696. https://doi.org/10.3390/math11071696