Next Article in Journal
Barriers to the Adoption of Augmented Reality Technologies for Education and Training in the Built Environment: A Developing Country Context
Previous Article in Journal
Backfill for Advanced Potash Ore Mining Technologies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Intent-Bert and Universal Context Encoders: A Framework for Workload and Sensor Agnostic Human Intention Prediction

Electrical and Computer Engineering Department, University of Florida, Larsen Hall, 968 Center Drive, Gainesville, FL 32611, USA
*
Author to whom correspondence should be addressed.
Technologies 2025, 13(2), 61; https://doi.org/10.3390/technologies13020061 (registering DOI)
Submission received: 25 November 2024 / Revised: 18 January 2025 / Accepted: 23 January 2025 / Published: 2 February 2025

Abstract

Determining human intention is a challenging task. Many existing techniques seek to address it by combining many forms of data, such as images, point clouds, poses, and others, creating multi-modal models. However, these techniques still often require significant foreknowledge in the form of known potential activities and objects in the environment, as well as specific types of data to collect. To address these limitations, we propose Intent-BERT and Universal Context Encoders, which combine to form workload-agnostic framework that can be used to predict the next activity that a human performs as an Open Vocabulary Problem and the time until that switch, along with the time the current activity ends. Universal Context Encoders utilize the distances between the embeddings of words to extract relationships between Human-Readable English descriptions of both the current task and the origin of various multi-modal inputs to determine how to weigh the values themselves. We examine the effectiveness of this approach by creating a multi-modal model using it and training it on the InHARD dataset. It is able to return a completely accurate description of the next Action performed by a human working alongside a robot in a manufacturing task in ∼42% of test cases and has a 95% Top-3 accuracy, all from a single time point, outperforming multi-modal gpt4o by about 50% on a token by token basis.
Keywords: HRC; HRI; multi-modal human prediction; open vocabulary; cross-attention HRC; HRI; multi-modal human prediction; open vocabulary; cross-attention

Share and Cite

MDPI and ACS Style

Panoff, M.; Acevedo, J.; Yu, H.; Forcha, P.; Wang, S.; Bobda, C. Intent-Bert and Universal Context Encoders: A Framework for Workload and Sensor Agnostic Human Intention Prediction. Technologies 2025, 13, 61. https://doi.org/10.3390/technologies13020061

AMA Style

Panoff M, Acevedo J, Yu H, Forcha P, Wang S, Bobda C. Intent-Bert and Universal Context Encoders: A Framework for Workload and Sensor Agnostic Human Intention Prediction. Technologies. 2025; 13(2):61. https://doi.org/10.3390/technologies13020061

Chicago/Turabian Style

Panoff, Maximillian, Joshua Acevedo, Honggang Yu, Peter Forcha, Shuo Wang, and Christophe Bobda. 2025. "Intent-Bert and Universal Context Encoders: A Framework for Workload and Sensor Agnostic Human Intention Prediction" Technologies 13, no. 2: 61. https://doi.org/10.3390/technologies13020061

APA Style

Panoff, M., Acevedo, J., Yu, H., Forcha, P., Wang, S., & Bobda, C. (2025). Intent-Bert and Universal Context Encoders: A Framework for Workload and Sensor Agnostic Human Intention Prediction. Technologies, 13(2), 61. https://doi.org/10.3390/technologies13020061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop