Efficient and Scalable Object Localization in 3D on Mobile Device
Abstract
:1. Introduction
- Our work enables 2D object detection in a mobile device using a pretrained CNN model.
- Once the 2D bounding box for the detected object in the image scene is obtained, a 3D cuboid for the object is estimated using 2D bounding box coordinates and vanishing point sampling. ARCore is used to determine camera pose and rotation matrix for the vanishing point computations.
- Overall processing time is reduced by optimizing the number of generated 3D cuboid proposals using additional information from the horizontal planes detected using ARCore. The proposed framework works well with everyday objects.
2. Related Work
3. Proposed Method
3.1. 2D Object Detection
3.2. 3D Cuboid Computation
3.3. Optimization
- The and costs of the cost function are applied in the 2D image space. Therefore, before applying the 2 costs, we reduce the number of cuboid proposals using . We evaluate the angle made by the screen normal with the x axis in 2D image space () using the following equation:
- We further minimize the number of cuboid proposals using the . The is computed from the plane detected using ARCore. Direction angles computed for the are , and which represent the angles formed by the normal with positive x, y and z axes, respectively, and are given as:
4. Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- ARCore, Build the Future. Available online: https://developers.google.com/ar (accessed on 20 March 2022).
- Jian, M.; Wang, J.; Yu, H.; Wang, G.G. Integrating object proposal with attention networks for video saliency detection. Inf. Sci. 2021, 576, 819–830. [Google Scholar] [CrossRef]
- Han, S.; Shen, H.; Philipose, M.; Agarwal, S.; Wolman, A.; Krishnamurthy, A. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, Singapore, 26–30 June 2016; pp. 123–136. [Google Scholar]
- Ran, X.; Chen, H.; Zhu, X.; Liu, Z.; Chen, J. Deepdecision: A mobile deep learning framework for edge video analytics. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 15–19 April 2018; pp. 1421–1429. [Google Scholar]
- Liu, L.; Li, H.; Gruteser, M. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of the The 25th Annual International Conference on Mobile Computing and Networking, Los Cabos, Mexico, 21–25 October 2019; pp. 1–16. [Google Scholar]
- Chen, W.; Wilson, J.; Tyree, S.; Weinberger, K.; Chen, Y. Compressing neural networks with the hashing trick. In Proceedings of the International Conference on Machine Learning. PMLR, Lille, France, 7–9 July 2015; pp. 2285–2294. [Google Scholar]
- Wu, J.; Leng, C.; Wang, Y.; Hu, Q.; Cheng, J. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4820–4828. [Google Scholar]
- Apicharttrisorn, K.; Ran, X.; Chen, J.; Krishnamurthy, S.V.; Roy-Chowdhury, A.K. Frugal following: Power thrifty object detection and tracking for mobile augmented reality. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems, New York, NY, USA, 10–13 November 2019; pp. 96–109. [Google Scholar]
- Liu, M.; Ding, X.; Du, W. Continuous, Real-Time Object Detection on Mobile Devices without Offloading. In Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), Singapore, 29 November–1 December 2020; pp. 976–986. [Google Scholar]
- Lane, N.D.; Bhattacharya, S.; Georgiev, P.; Forlivesi, C.; Jiao, L.; Qendro, L.; Kawsar, F. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In Proceedings of the 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), Vienna, Austria, 11–14 April 2016; pp. 1–12. [Google Scholar]
- Huynh, L.N.; Lee, Y.; Balan, R.K. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, Niagara Falls, NY, USA, 19–23 June 2017; pp. 82–95. [Google Scholar]
- Mathur, A.; Lane, N.D.; Bhattacharya, S.; Boran, A.; Forlivesi, C.; Kawsar, F. Deepeye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, Niagara Falls, NY, USA, 19–23 June 2017; pp. 68–81. [Google Scholar]
- Naderiparizi, S.; Zhang, P.; Philipose, M.; Priyantha, B.; Liu, J.; Ganesan, D. Glimpse: A programmable early-discard camera architecture for continuous mobile vision. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, Niagara Falls, NY, USA, 19–23 June 2017; pp. 292–305. [Google Scholar]
- Xu, M.; Zhu, M.; Liu, Y.; Lin, F.X.; Liu, X. Deepcache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, New Delhi, India, 29 October–2 November 2018; pp. 129–144. [Google Scholar]
- Fang, B.; Zeng, X.; Zhang, M. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, New Delhi, India, 29 October–2 November 2018; pp. 115–127. [Google Scholar]
- Liu, S.; Lin, Y.; Zhou, Z.; Nan, K.; Liu, H.; Du, J. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, Munich, Germany, 10–15 June 2018; pp. 389–400. [Google Scholar]
- Engineering at Meta. Delivering Real-Time AI in the Palm of Your Hand. 2016. Available online: https://code.facebook.com/posts/196146247499076/delivering-real-time-ai-in-the-palm-of-your-hand/ (accessed on 31 March 2022).
- TensorFlow. TensorFlow Lite. Available online: https://www.tensorflow.org/lite/guide (accessed on 31 March 2022).
- Chen, Q.; Sun, L.; Wang, Z.; Jia, K.; Yuille, A. Object as hotspots: An anchor-free 3d object detection approach via firing of hotspots. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 68–84. [Google Scholar]
- Shi, W.; Rajkumar, R. Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1711–1719. [Google Scholar]
- Pang, S.; Morris, D.; Radha, H. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25 October 2020–24 January 2021; pp. 10386–10393. [Google Scholar]
- Shi, S.; Wang, Z.; Shi, J.; Wang, X.; Li, H. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2647–2664. [Google Scholar] [CrossRef] [Green Version]
- Google AI Blog. Real-Time 3D Object Detection on Mobile Devices with MediaPipe, 2020. Available online: https://ai.googleblog.com/2020/03/real-time-3d-object-detection-on-mobile.html (accessed on 15 February 2022).
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Yang, S.; Scherer, S. Cubeslam: Monocular 3-d object slam. IEEE Trans. Robot. 2019, 35, 925–938. [Google Scholar] [CrossRef] [Green Version]
- Luo, X.; Tan, Z.; Ding, Y. Accurate line reconstruction for point and line-based stereo visual odometry. IEEE Access 2019, 7, 185108–185120. [Google Scholar] [CrossRef]
- Ravi, N.; Reizenstein, J.; Novotny, D.; Gordon, T.; Lo, W.Y.; Johnson, J.; Gkioxari, G. Accelerating 3D Deep Learning with PyTorch3D. arXiv 2020, arXiv:2007.08501. [Google Scholar]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. Epnp: An accurate o (n) solution to the pnp problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef] [Green Version]
Object Category | Image No.1 | Image No.2 | Image No.3 | Image No.4 |
---|---|---|---|---|
Book | ||||
Cellphone | ||||
Chair | ||||
Dog | ||||
Laptop | ||||
Mug | ||||
Potted_Plant | ||||
Table | ||||
Tennis_Racket |
Object Category | Object Predicted | Yang and Scherer [26] | Ours |
---|---|---|---|
Book | TV 56% (wrong prediction) | ||
Chair | Chair 56% (correct prediction) | ||
Dog | Dog 76% (correct prediction) | ||
Potted_Plant | Potted Plant 53% (correct prediction) |
Object Category | SSD-MobileNetV1 | Manual | ||
---|---|---|---|---|
2D Bounding Box | 3D Cuboid | 2D Bounding Box | 3D Cuboid | |
Mug | ||||
Table | ||||
Tennis_Racket |
Object Category | Book | Cellphone | Chair | Dog | Laptop | Mug | Potted Plant | Table | Tennis Racket |
---|---|---|---|---|---|---|---|---|---|
Yang and Scherer [26] | 0.0903 | 0.0036 | 0.2804 | 0.0303 | 0.1199 | 0.0248 | 0.0993 | 0.1934 | 0.0555 |
Ours | 0.0989 | 0.0037 | 0.2806 | 0.0354 | 0.1135 | 0.0238 | 0.1188 | 0.1847 | 0.0529 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gupta, N.; Khan, N.M. Efficient and Scalable Object Localization in 3D on Mobile Device. J. Imaging 2022, 8, 188. https://doi.org/10.3390/jimaging8070188
Gupta N, Khan NM. Efficient and Scalable Object Localization in 3D on Mobile Device. Journal of Imaging. 2022; 8(7):188. https://doi.org/10.3390/jimaging8070188
Chicago/Turabian StyleGupta, Neetika, and Naimul Mefraz Khan. 2022. "Efficient and Scalable Object Localization in 3D on Mobile Device" Journal of Imaging 8, no. 7: 188. https://doi.org/10.3390/jimaging8070188
APA StyleGupta, N., & Khan, N. M. (2022). Efficient and Scalable Object Localization in 3D on Mobile Device. Journal of Imaging, 8(7), 188. https://doi.org/10.3390/jimaging8070188