Beyond Binary Dialogues: Research and Development of a Linguistically Nuanced Conversation Design for Social Robots in Group–Robot Interactions
Abstract
:1. Motivation and Related Work
1.1. Natural Language Understanding in Human–Robot Interaction
1.2. The Importance of Groups in HRI
1.3. Computer Vision in Group–Robot Interactions
2. Materials
2.1. The Use Case
2.2. The Social Robot
- Robot front end: This subsystem serves as the user interface, managing most necessary interaction functions, including automatic speech recognition (ASR), person detection, and text-to-speech (TTS) output. It also manages some dialogue management tasks, such as initiating and concluding conversations, and managing speaker turns. The front end is connected to a tablet with flag buttons that allow users to select the system language. In this work, we used a Furhat robot, which provides essential components such as ASR, person detection, and TTS. However, the Furhat robot’s built-in person detection had limitations, particularly when individuals were not facing the robot directly or were looking in different directions, which often disrupted dialogue. To address this issue and improve the robot’s ability to detect multiple users simultaneously, we integrated a YOLO model for more robust multi-party detection.
- Intent-based back end: This core subsystem includes a NLU model and a related knowledge database. The database stores predefined subject areas (intents) set by example user queries, along with fixed, preformulated responses. These intents address specific user inquiries, such as identifying the right contact for an issue, providing directions to the nearest restrooms, engaging in small talk, and integrating APIs like a weather forecast.
- LLM back end: In the event that queries are not aligned with the predefined parameters, this subsystem is engaged. The intent-based back end forwards the user’s statement, dialogue history, user count, and language to the LLM server, which then generates an appropriate prompt to produce a free response to the current user query. For this, LLaMA 2 (Meta, USA) [38] in its 13 billion-parameter quantized version, designed to run on consumer-grade graphics cards [22], was integrated and used as a fallback. The selection of LLaMA 2 was made according to the Open LLM Leaderboard [39], where the LLaMA 2 model was ranked the top in summer 2023, just prior to the execution of the field system assessments.
3. Development and Results of the Multi-Party Dialogue Management
3.1. Integration of Multi-Party Person Detection
3.2. Development of the Knowledge Expander
3.3. Development of Linguistically Nuanced Intent-Based Responses
3.4. Development of Linguistically Nuanced LLM Responses
4. Conclusions, Limitations, and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
API | Application programming interface |
ASR | Automatic speech recognition |
CNN | Convolutional neural network |
CV | Computer vision |
GDPR | General Data Protection Regulation |
GPU | Graphics processing unit |
GRI | Group–robot interaction |
HRI | Human–robot interaction |
LLM | Large language model |
MPI | Multi-party interaction |
NLU | Natural language understanding |
NLP | Natural language processing |
R&D | Research and development |
STT | Speech-to-text |
TTS | Text-to-speech |
YOLO | You Only Look Once |
References
- Breazeal, C. Social robots for health applications. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 5368–5371. [Google Scholar] [CrossRef]
- Cerrato, L.; Campbell, N. Engagement in dialogue with social robots. In Dialogues with Social Robots: Enablements, Analyses, and Evaluation; Springer: Singapore, 2017; pp. 313–319. [Google Scholar]
- Jayaraman, S.; Phillips, E.K.; Church, D.; Riek, L.D. Social Robots in Healthcare: Characterizing Privacy Considerations. In Proceedings of the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, Boulder, CO, USA, 11–15 March 2024; HRI ’24. pp. 568–572. [Google Scholar] [CrossRef]
- Webster, C.; Ivanov, S. Robots in Travel, Tourism and Hospitality: Key Findings from a Global Study; Zangador: Varna, Bulgaria, 2020. [Google Scholar]
- Hameed, I.A.; Tan, Z.; Thomsen, N.B.; Duan, X. User Acceptance of Social Robots: A Case Study. In Proceedings of the International Conference on Advances in Computer-Human Interaction, Venice, Italy, 24–28 April 2016. [Google Scholar]
- Williams, M.A. Robot Social Intelligence. In Proceedings of the 4th International Conference on Social Robotics (ICSR 2012), Chengdu, China, 29–31 October 2012; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2012; Volume 7621. [Google Scholar] [CrossRef]
- Correia, F.; Melo, F.S.; Paiva, A. Group Intelligence on Social Robots. In Proceedings of the 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Daegu, Republic of Korea, 11–14 March 2019; pp. 703–705. [Google Scholar]
- Sabanovic, S.; Michalowski, M.P.; Simmons, R. Robots in the wild: Observing human-robot social interaction outside the lab. In Proceedings of the 9th IEEE International Workshop on Advanced Motion Control, Istanbul, Turkey, 27–29 March 2006; pp. 596–601. [Google Scholar]
- Šabanović, S.; Reeder, S.M.; Kechavarzi, B. Designing robots in the wild: In situ prototype evaluation for a break management robot. J. Hum. Robot. Interact. 2014, 3, 70–88. [Google Scholar] [CrossRef]
- Oliveira, R.; Arriaga, P.; Paiva, A. Human-robot interaction in groups: Methodological and research practices. Multimodal Technol. Interact. 2021, 5, 59. [Google Scholar] [CrossRef]
- Adam, M.; Wessel, M.; Benlian, A. AI-based chatbots in customer service and their effects on user compliance. Electron. Mark. 2021, 31, 427–445. [Google Scholar] [CrossRef]
- Addlesee, A.; Cherakara, N.; Nelson, N.; Hernández García, D.; Gunson, N.; Sieińska, W.; Romeo, M.; Dondrup, C.; Lemon, O. A Multi-party Conversational Social Robot Using LLMs. In Proceedings of the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, Boulder, CO, USA, 11–14 March 2024; HRI ’24. pp. 1273–1275. [Google Scholar] [CrossRef]
- Porcheron, M.; Fischer, J.E.; Reeves, S.; Sharples, S. Voice interfaces in everyday life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–12. [Google Scholar]
- Lin, C.W.; Auvray, V.; Elkind, D.; Biswas, A.; Fazel-Zarandi, M.; Belgamwar, N.; Chandra, S.; Zhao, M.; Metallinou, A.; Chung, T.; et al. Dialog Simulation with Realistic Variations for Training Goal-Oriented Conversational Systems. arXiv 2020, arXiv:2011.08243. [Google Scholar]
- Bapat, R.; Kucherbaev, P.; Bozzon, A. Effective crowdsourced generation of training data for chatbots natural language understanding. In Proceedings of the Web Engineering: 18th International Conference, ICWE 2018, Proceedings 18, Cáceres, Spain, 5–8 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 114–128. [Google Scholar]
- Parrish, A.; Huang, W.; Agha, O.; Lee, S.H.; Nangia, N.; Warstadt, A.; Aggarwal, K.; Allaway, E.; Linzen, T.; Bowman, S.R. Does Putting a Linguist in the Loop Improve NLU Data Collection? arXiv 2021, arXiv:2104.07179. [Google Scholar]
- Monta, M.; Androulakis, S. Intent-Utterance-Expander. 2017. Available online: https://github.com/miguelmota/intent-utterance-expander (accessed on 22 August 2024).
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
- Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar]
- Kwon, M.; Hu, H.; Myers, V.; Karamcheti, S.; Dragan, A.; Sadigh, D. Toward grounded social reasoning. arXiv 2023, arXiv:2306.08651. [Google Scholar]
- Rosenbaum, A.; Soltan, S.; Hamza, W. Using Large Language Models (Llms) to Synthesize Training Data. 2024. Available online: https://www.amazon.science/blog/using-large-language-models-llms-to-synthesize-training-data (accessed on 17 August 2024).
- Frantar, E.; Ashkboos, S.; Hoefler, T.; Alistarh, D. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arXiv 2023, arXiv:2210.17323. [Google Scholar]
- Paetzel-Prüsmann, M.; Kennedy, J. Improving a Robot’s Turn-Taking Behavior in Dynamic Multiparty Interactions. In Proceedings of the Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, Stockholm, Sweden, 13–16 March 2023; HRI ’23. pp. 411–415. [Google Scholar] [CrossRef]
- Müller, A.; Richert, A. No One is an Island-Investigating the Need for Social Robots (and Researchers) to Handle Multi-Party Interactions in Public Spaces. In Proceedings of the 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Busan, Republic of Korea, 28–31 August 2023; pp. 1772–1777. [Google Scholar]
- Fraune, M.R.; Nishiwaki, Y.; Sabanović, S.; Smith, E.R.; Okada, M. Threatening Flocks and Mindful Snowflakes: How Group Entitativity Affects Perceptions of Robots. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, Vienna, Austria, 6–9 March 2017; HRI ’17. pp. 205–213. [Google Scholar] [CrossRef]
- Šabanović, S. We’re in This Together: Social Robots in Group, Organizational, and Community Interactions. In Proceedings of the 8th International Conference on Human-Agent Interaction, Virtual Event, 10–13 November 2020; pp. 3–4. [Google Scholar]
- Abrams, A.M.; der Pütten, A.M.R.v. I–C–E Framework: Concepts for Group Dynamics Research in Human-Robot Interaction: Revisiting Theory from Social Psychology on Ingroup Identification (I), Cohesion (C) and Entitativity (E). Int. J. Soc. Robot. 2020, 12, 1213–1229. [Google Scholar] [CrossRef]
- Reig, S.; Luria, M.; Wang, J.Z.; Oltman, D.; Carter, E.J.; Steinfeld, A. Not Some Random Agent: Multi-person interaction with a personalizing service robot. In Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, Cambridge, UK, 23–26 March 2020; pp. 289–297. [Google Scholar]
- Faria, M.; Melo, F.S.; Paiva, A. Understanding robots: Making robots more legible in multi-party interactions. In Proceedings of the 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), Vancouver, BC, Canada, 8–12 August 2021; pp. 1031–1036. [Google Scholar]
- Fraune, M.R.; Šabanović, S.; Kanda, T. Human group presence, group characteristics, and group norms affect human-robot interaction in naturalistic settings. Front. Robot. AI 2019, 6, 48. [Google Scholar] [CrossRef] [PubMed]
- Taylor, A.; Chan, D.M.; Riek, L.D. Robot-Centric Perception of Human Groups. J. Hum.-Robot Interact. 2020, 9, 15. [Google Scholar] [CrossRef]
- Pathi, S.K.; Kiselev, A.; Loutfi, A. Detecting Groups and Estimating F-Formations for Social Human–Robot Interactions. Multimodal Technol. Interact. 2022, 6, 18. [Google Scholar] [CrossRef]
- Luber, M.; Spinello, L.; Silva, J.; Arras, K.O. Socially-aware robot navigation: A learning approach. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal, 7–12 October 2012; pp. 902–907. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Dinh, M.C.; Tien, N.T.; Tuyen, T.M.; Xuan, N.V.; Anh, P.T.Q.; Bay, H.V.; Truong, X.T. Socially Aware Robot Navigation Framework: Automatic Detecting and Autonomously Approaching People in Unknown Dynamic Social Environments. In Proceedings of the 2023 12th International Conference on Control, Automation and Information Sciences (ICCAIS), Hanoi, Vietnam, 27–29 November 2023; pp. 751–756. [Google Scholar]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Sutawika, L.; Gao, L.; Schoelkopf, H.; Biderman, S.; Tow, J.; Abbasi, B.; Fattori, B.; Lovering, C.; Phang, J.; Thite, A.; et al. EleutherAI/lm-Evaluation-Harness: Major Refactor. 2023. Available online: https://zenodo.org/records/10256836 (accessed on 18 August 2024).
- Jocher, G.; Stoken, A.; Borovec, J.; Liu, C.; Hogan, A.; Diaconu, L.; Ingham, F.; Fang, J.; Wang, M.; Gupta, N.; et al. Ultralytics/yolov5: V3.1—Bug Fixes and Performance Improvements. 2020. Available online: https://zenodo.org/records/4154370 (accessed on 7 August 2024).
- Sabharwal, N.; Agrawal, A. Introduction to Google Dialogflow. In Cognitive Virtual Assistants Using Google Dialogflow: Develop Complex Cognitive Bots Using the Google Dialogflow Platform; Apress: Berkeley, CA, USA, 2020; pp. 13–54. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bensch, C.; Müller, A.; Chojnowski, O.; Richert, A. Beyond Binary Dialogues: Research and Development of a Linguistically Nuanced Conversation Design for Social Robots in Group–Robot Interactions. Appl. Sci. 2024, 14, 10316. https://doi.org/10.3390/app142210316
Bensch C, Müller A, Chojnowski O, Richert A. Beyond Binary Dialogues: Research and Development of a Linguistically Nuanced Conversation Design for Social Robots in Group–Robot Interactions. Applied Sciences. 2024; 14(22):10316. https://doi.org/10.3390/app142210316
Chicago/Turabian StyleBensch, Christoph, Ana Müller, Oliver Chojnowski, and Anja Richert. 2024. "Beyond Binary Dialogues: Research and Development of a Linguistically Nuanced Conversation Design for Social Robots in Group–Robot Interactions" Applied Sciences 14, no. 22: 10316. https://doi.org/10.3390/app142210316
APA StyleBensch, C., Müller, A., Chojnowski, O., & Richert, A. (2024). Beyond Binary Dialogues: Research and Development of a Linguistically Nuanced Conversation Design for Social Robots in Group–Robot Interactions. Applied Sciences, 14(22), 10316. https://doi.org/10.3390/app142210316