Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System
Abstract
:1. Introduction
- A DL-based instance segmentation model fine-tuned on a custom Jenga tower dataset realized in simulation, which effectively allows the system to segment and select single blocks;
- An eye-in-hand visual control strategy that carefully handles the block extraction;
- A 1D force sensor to correctly evaluate the removability of a specific block.
2. Related Works
2.1. Playing Jenga with a Robot
2.2. Deep Learning for Object Recognition
2.3. Visual Servoing in Robotics
3. Methodology
- The identification and pose estimation of each block of the tower is the first step to selecting a suitable piece and approaching it;
- A removable block cannot be identified only by visual analysis: the integration of a tactile perception system is needed;
- A sufficiently precise alignment between the end-effector and the center of a target block allows the arm to extract it successfully by simply pushing.
3.1. Deep Instance Segmentation and Pose Estimation
Model Architecture
3.2. Block Selection Policy
3.3. Block Pose Estimation
3.4. Tracking and Visual Servoing
3.5. Tactile Perception
4. Experiments
4.1. Reaction Force Threshold
4.2. Instance Segmentation Experiments
4.2.1. Training Setup
- The materials are randomly sampled and assigned to all 48 cuboids to change their look;
- A total of 6 to 24 cuboids are randomly displaced along their x and z axes by a distance between −4 mm and 4 mm;
- A point light source is positioned by randomly sampling a height of 0.1 ÷ 0.5 m spanning a circular arc of centered on the tower with a random radius between 0.4 and 0.7 m;
- A total of 2 to 9 random blocks are removed from the tower to create holes;
- Camera position is sampled on a circular arc of with a height between 0.05 and 0.2 m and a radius between 0.25 to 0.45 m.
4.2.2. Instance Segmentation Results
4.3. Tracking Robustness
4.4. Visual Servo Convergence and Accuracy
4.4.1. Timing Convergence
4.4.2. Spatial Accuracy
4.5. Consecutive Block Extractions
- A mobile block is correctly pushed without perturbing the stability of the tower;
- The binary classification model correctly identifies the block as immobile, and the arm retracts without causing the tower to fall.
- The robot reaches a singularity position when the target block level is at the edge of the workspace;
- The tracking is lost as either the tower is in bad light conditions or the target block lies outside the field of view;
- The initial pose estimation is not precise enough because the segmentation model does not detect all the corners of the block correctly.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sun, X.; Zhu, X.; Wang, P.; Chen, H. A review of robot control with visual servoing. In Proceedings of the 2018 IEEE 8th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Tianjin, China, 19–23 July 2018; pp. 116–121. [Google Scholar]
- Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, X.; Chen, Z.; Wu, Q.J.; Cai, L.; Lu, D.; Li, X. Fast semantic segmentation for scene perception. IEEE Trans. Ind. Inform. 2018, 15, 1183–1192. [Google Scholar] [CrossRef]
- Martini, M.; Cerrato, S.; Salvetti, F.; Angarano, S.; Chiaberge, M. Position-Agnostic Autonomous Navigation in Vineyards with Deep Reinforcement Learning. In Proceedings of the 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, 20–24 August 2022; pp. 477–484. [Google Scholar]
- Salvetti, F.; Angarano, S.; Martini, M.; Cerrato, S.; Chiaberge, M. Waypoint Generation in Row-based Crops with Deep Learning and Contrastive Clustering. arXiv 2022, arXiv:2206.11623. [Google Scholar]
- Bac, C.W.; van Henten, E.J.; Hemming, J.; Edan, Y. Harvesting robots for high-value crops: State-of-the-art review and challenges ahead. J. Field Robot. 2014, 31, 888–911. [Google Scholar] [CrossRef]
- Berenstein, R.; Shahar, O.B.; Shapiro, A.; Edan, Y. Grape clusters and foliage detection algorithms for autonomous selective vineyard sprayer. Intell. Serv. Robot. 2010, 3, 233–243. [Google Scholar] [CrossRef]
- Kletz, S.; Schoeffmann, K.; Benois-Pineau, J.; Husslein, H. Identifying surgical instruments in laparoscopy using deep learning instance segmentation. In Proceedings of the 2019 International Conference on Content-Based Multimedia Indexing (CBMI), Dublin, Ireland, 4–6 September 2019; pp. 1–6. [Google Scholar]
- Hasan, S.K.; Linte, C.A. U-NetPlus: A modified encoder-decoder U-Net architecture for semantic and instance segmentation of surgical instruments from laparoscopic images. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 7205–7211. [Google Scholar]
- Chen, X.; Guhl, J. Industrial robot control with object recognition based on deep learning. Procedia CIRP 2018, 76, 149–154. [Google Scholar] [CrossRef]
- Domae, Y. Recent trends in the research of industrial robots and future outlook. J. Robot. Mechatronics 2019, 31, 57–62. [Google Scholar] [CrossRef]
- Juel, W.K.; Haarslev, F.; Ramirez, E.R.; Marchetti, E.; Fischer, K.; Shaikh, D.; Manoonpong, P.; Hauch, C.; Bodenhagen, L.; Krüger, N. Smooth robot: Design for a novel modular welfare robot. J. Intell. Robot. Syst. 2020, 98, 19–37. [Google Scholar] [CrossRef]
- Eirale, A.; Martini, M.; Tagliavini, L.; Gandini, D.; Chiaberge, M.; Quaglia, G. Marvin: An Innovative Omni-Directional Robotic Assistant for Domestic Environments. Sensors 2022, 22, 5261. [Google Scholar] [CrossRef]
- Yu, X.; He, W.; Li, Q.; Li, Y.; Li, B. Human-robot co-carrying using visual and force sensing. IEEE Trans. Ind. Electron. 2020, 68, 8657–8666. [Google Scholar] [CrossRef]
- Goldau, F.F.; Shastha, T.K.; Kyrarini, M.; Gräser, A. Autonomous multi-sensory robotic assistant for a drinking task. In Proceedings of the 2019 IEEE 16th International Conference on Rehabilitation Robotics (ICORR), Toronto, ON, Canada, 24–28 June 2019; pp. 210–216. [Google Scholar]
- Dong, J.; Cong, Y.; Sun, G.; Zhang, T. Lifelong robotic visual-tactile perception learning. Pattern Recognit. 2022, 121, 108176. [Google Scholar] [CrossRef]
- Kroger, T.; Finkemeyer, B.; Winkelbach, S.; Eble, L.O.; Molkenstruck, S.; Wahl, F.M. A manipulator plays Jenga. IEEE Robot. Autom. Mag. 2008, 15, 79–84. [Google Scholar] [CrossRef]
- Yoshikawa, T.; Shinoda, H.; Sugiyama, S.; Koeda, M. Jenga game by a manipulator with multiarticulated fingers. In Proceedings of the 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Budapest, Hungary, 3–7 July 2011; pp. 960–965. [Google Scholar]
- Fazeli, N.; Oller, M.; Wu, J.; Wu, Z.; Tenenbaum, J.B.; Rodriguez, A. See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion. Sci. Robot. 2019, 4, eaav3123. [Google Scholar] [CrossRef] [PubMed]
- Bauza, S.; Castillo, J.; Nanz, A.; Kambalur, B. Deep Q-Learning Applied to a Jenga Playing Robot. Preprint ResearchGate 2017. Available online: https://www.researchgate.net/publication/336778754_Deep_Q-Learning_Applied_to_a_Jenga_Playing_Robot (accessed on 5 November 2022).
- Justesen, N.; Bontrager, P.; Togelius, J.; Risi, S. Deep learning for video game playing. IEEE Trans. Games 2019, 12, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv 2017, arXiv:1712.01815. [Google Scholar]
- Caccianiga, G.; Mariani, A.; de Paratesi, C.G.; Menciassi, A.; De Momi, E. Multi-sensory guidance and feedback for simulation-based training in robot assisted surgery: A preliminary comparison of visual, haptic, and visuo-haptic. IEEE Robot. Autom. Lett. 2021, 6, 3801–3808. [Google Scholar] [CrossRef]
- Zheng, C.; Chen, P.; Pang, J.; Yang, X.; Chen, C.; Tu, S.; Xue, Y. A mango picking vision algorithm on instance segmentation and key point detection from RGB images in an open orchard. Biosyst. Eng. 2021, 206, 32–54. [Google Scholar] [CrossRef]
- Zheng, W.; Xie, Y.; Zhang, B.; Zhou, J.; Zhang, J. Dexterous robotic grasping of delicate fruits aided with a multi-sensory e-glove and manual grasping analysis for damage-free manipulation. Comput. Electron. Agric. 2021, 190, 106472. [Google Scholar] [CrossRef]
- Wang, J.; Rogers, P.; Parker, L.; Brooks, D.; Stilman, M. Robot Jenga: Autonomous and strategic block extraction. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 5248–5253. [Google Scholar]
- Kimura, S.; Watanabe, T.; Aiyama, Y. Force based manipulation of Jenga blocks. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 4287–4292. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Li, Y.; Qi, H.; Dai, J.; Ji, X.; Wei, Y. Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2359–2367. [Google Scholar]
- Chen, L.C.; Hermans, A.; Papandreou, G.; Schroff, F.; Wang, P.; Adam, H. Masklab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4013–4022. [Google Scholar]
- Kirillov, A.; Levinkov, E.; Andres, B.; Savchynskyy, B.; Rother, C. Instancecut: From edges to instances with multicut. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5008–5017. [Google Scholar]
- Liang, X.; Lin, L.; Wei, Y.; Shen, X.; Yang, J.; Yan, S. Proposal-free network for instance-level object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2978–2991. [Google Scholar] [CrossRef] [PubMed]
- Newell, A.; Huang, Z.; Deng, J. Associative embedding: End-to-end learning for joint detection and grouping. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT++: Better Real-time Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1108–1121. [Google Scholar] [CrossRef] [PubMed]
- Evjemo, L.D.; Gjerstad, T.; Grøtli, E.I.; Sziebig, G. Trends in smart manufacturing: Role of humans and industrial robots in smart factories. Curr. Robot. Rep. 2020, 1, 35–41. [Google Scholar] [CrossRef] [Green Version]
- Hill, J.; Park, W.T. Real Time Control of a Robot with a Mobile Camera. In Proceedings of the 9th ISIR, Washington, DC, USA, 13–15 March 1979. [Google Scholar]
- Azizian, M.; Khoshnam, M.; Najmaei, N.; Patel, R.V. Visual servoing in medical robotics: A survey. Part I: Endoscopic and direct vision imaging-techniques and applications: Visual servoing in medical robotics: A survey (Part I). Int. J. Med. Robot. Comput. Assist. Surg. 2014, 10, 263–274. [Google Scholar] [CrossRef] [PubMed]
- Dewi, T.; Risma, P.; Oktarina, Y.; Muslimin, S. Visual Servoing Design and Control for Agriculture Robot; a Review. In Proceedings of the 2018 International Conference on Electrical Engineering and Computer Science (ICECOS), Malang, Indonesia, 2–4 October 2018; pp. 57–62. [Google Scholar]
- Staub, C.; Osa, T.; Knoll, A.; Bauernschmitt, R. Automation of tissue piercing using circular needles and vision guidance for computer aided laparoscopic surgery. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 4585–4590. [Google Scholar]
- Voros, S.; Haber, G.P.; Menudet, J.F.; Long, J.A.; Cinquin, P. ViKY Robotic Scope Holder: Initial Clinical Experience and Preliminary Results Using Instrument Tracking. IEEE/ASME Trans. Mechatronics 2010, 15, 879–886. [Google Scholar] [CrossRef]
- Krupa, A.; Gangloff, J.; Doignon, C.; de Mathelin, M.; Morel, G.; Leroy, J.; Soler, L.; Marescaux, J. Autonomous 3D positioning of surgical instruments in robotized laparoscopic surgery using visual servoing. IEEE Trans. Robot. Autom. 2003, 19, 842–853. [Google Scholar] [CrossRef]
- Hutchinson, S.; Hager, G.; Corke, P. A tutorial on visual servo control. IEEE Trans. Robot. Autom. 1996, 12, 651–670. [Google Scholar] [CrossRef] [Green Version]
- Barth, R.; Hemming, J.; van Henten, E.J. Design of an eye-in-hand sensing and servo control framework for harvesting robotics in dense vegetation. Biosyst. Eng. 2016, 146, 71–84. [Google Scholar] [CrossRef] [Green Version]
- Mehta, S.; MacKunis, W.; Burks, T. Robust visual servo control in the presence of fruit motion for robotic citrus harvesting. Comput. Electron. Agric. 2016, 123, 362–375. [Google Scholar] [CrossRef]
- Lippiello, V.; Cacace, J.; Santamaria-Navarro, A.; Andrade-Cetto, J.; Trujillo, M.Á.; Rodríguez Esteves, Y.R.; Viguria, A. Hybrid Visual Servoing With Hierarchical Task Composition for Aerial Manipulation. IEEE Robot. Autom. Lett. 2016, 1, 259–266. [Google Scholar] [CrossRef] [Green Version]
- Comport, A.; Marchand, E.; Pressigout, M.; Chaumette, F. Real-time markerless tracking for augmented reality: The virtual visual servoing framework. IEEE Trans. Vis. Comput. Graph. 2006, 12, 615–628. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Collins, T.; Bartoli, A. Infinitesimal plane-based pose estimation. Int. J. Comput. Vis. 2014, 109, 252–286. [Google Scholar] [CrossRef]
- Siciliano, B.; Sciavicco, L.; Villani, L.; Oriolo, G. Robotics: Modelling, Planning and Control; Advanced textbooks in control and signal processing; Springer: London, UK, 2010. [Google Scholar]
- Marchand, E.; Spindler, F.; Chaumette, F. ViSP for visual servoing: A generic software platform with a wide class of robot control skills. IEEE Robot. I Autom. Mag. 2005, 12, 40–52. [Google Scholar] [CrossRef]
- Denninger, M.; Sundermeyer, M.; Winkelbauer, D.; Olefir, D.; Hodan, T.; Zidan, Y.; Elbadrawy, M.; Knauer, M.; Katam, H.; Lodhi, A. BlenderProc: Reducing the Reality Gap with Photorealistic Rendering. In Proceedings of the Robotics: Science and Systems (RSS), Virtual Event, 12–16 July 2020. [Google Scholar]
Dataset | AP | AP | AP | Mean |
---|---|---|---|---|
Synth | 90.08 | 87.76 | 63.2 | 78.37 |
Real | 75.98 | 53.40 | 11.53 | 53.09 |
Level | Vel [] | Single [%] | Group [%] |
---|---|---|---|
5 | 36.9 | 100 | |
5 | 17.0 | 100 | |
6 | 13.1 | 100 | |
6 | 8.0 | 100 | |
8 | 28.9 | 100 | |
8 | 10.6 | 100 | |
9 | 19.4 | 100 | |
9 | 29.5 | 100 | |
10 | 18.4 | 100 | |
10 | 8.1 | 100 | |
11 | 24.7 | 100 | |
11 | 14.8 | 100 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Marchionna, L.; Pugliese, G.; Martini, M.; Angarano, S.; Salvetti, F.; Chiaberge, M. Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System. Sensors 2023, 23, 752. https://doi.org/10.3390/s23020752
Marchionna L, Pugliese G, Martini M, Angarano S, Salvetti F, Chiaberge M. Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System. Sensors. 2023; 23(2):752. https://doi.org/10.3390/s23020752
Chicago/Turabian StyleMarchionna, Luca, Giulio Pugliese, Mauro Martini, Simone Angarano, Francesco Salvetti, and Marcello Chiaberge. 2023. "Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System" Sensors 23, no. 2: 752. https://doi.org/10.3390/s23020752
APA StyleMarchionna, L., Pugliese, G., Martini, M., Angarano, S., Salvetti, F., & Chiaberge, M. (2023). Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System. Sensors, 23(2), 752. https://doi.org/10.3390/s23020752