Beyond Sight: Enhancing Augmented Reality Interactivity with Audio-Based and Non-Visual Interfaces

Li, Jingya

doi:10.3390/app14114881

Open AccessArticle

Beyond Sight: Enhancing Augmented Reality Interactivity with Audio-Based and Non-Visual Interfaces

by

Jingya Li

School of Architecture and Art, Beijing Jiaotong University, Beijing 100091, China

Appl. Sci. 2024, 14(11), 4881; https://doi.org/10.3390/app14114881

Submission received: 24 April 2024 / Revised: 21 May 2024 / Accepted: 22 May 2024 / Published: 4 June 2024

(This article belongs to the Special Issue Virtual/Augmented Reality and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Augmented Reality (AR) is rapidly advancing, with a new focus on broadening accessibility beyond the visually dominant interfaces. This study explores the integration of audio-based non-visual interfaces within AR, aiming to cater to a diverse audience, including users with visual impairments. The objective was to develop a prototype that leverages audio feedback to facilitate interaction with the AR environment, enhancing spatial awareness and mental imagery for all users without relying on visual cues. Employing a user-centered design approach, we conducted a comprehensive evaluation with university students to assess the prototype’s usability and immersive potential compared to traditional touchscreen interfaces. The findings highlighted a pronounced preference for the Audio-Based Natural Interface, emphasizing its capacity to provide an intuitive and immersive AR experience through sound alone. These results underline the potential of audio feedback in creating more inclusive AR experiences, suggesting a paradigm shift towards developing AR technologies that are accessible to a wider user base. Our study concludes that audio-based non-visual interfaces represent a viable and innovative direction for AR development, advocating for their further exploration to ensure AR’s universality and inclusivity.

Keywords:

augmented reality (AR); audio-based interfaces; non-visual interaction; accessibility in AR; immersive technology; universal design; AR user experience; inclusive technology

1. Introduction

Augmented Reality (AR) technology has transitioned remarkably from a conceptual novelty to a transformative force across a spectrum of industries, including entertainment, healthcare, transportation, and education [1]. It distinguishes itself through the innovative overlay of digital information onto our physical reality, crafting immersive experiences that have been empirically shown to enrich learning, amplify motivation, and bolster interaction among participants [1,2,3]. These advancements underscore AR’s unique capability to evolve traditional activities into interactive, dynamic experiences tailored to meet the diverse preferences and needs of its users.

Within the educational sector, AR’s potential has been particularly noteworthy, evidenced by its successful integration into a variety of learning activities [1]. These AR-enabled educational interventions have not only enhanced learning outcomes but have also invigorated the learning environment, making it more interactive and engaging [4,5]. Additionally, the deployment of AR in special education has seen a remarkable increase, attributed to its versatility in facilitating personalized learning experiences. This adaptability has proven effective in improving comfort levels and skill sets and in promoting social interactions among students with special needs [6]. However, despite AR’s extensive applicability, its development has predominantly focused on individuals with autism spectrum disorders and mental disabilities, often overlooking the creation of AR solutions tailored for people with visual impairments [6,7].

The primary challenge in making AR technologies accessible to people with visual impairments lies in their inherent reliance on visual overlays—such as texts, images, and AR models—for user interaction [4]. This reliance constitutes a substantial barrier for users with visual impairments, who may struggle to grasp the spatial context of these virtual elements [4,8]. With over 2.2 billion people globally experiencing visual impairments, a statistic that includes 19 million children under the age of 15, the imperative for accessible AR technologies has never been more critical [9,10]. This demographic includes approximately 1.4 million children facing irreversible blindness, underscoring the dire need for innovative solutions that render AR technologies accessible and beneficial for this particular group [10].

In addressing these challenges, our research endeavors to expand the application scope of AR through the development and exploration of non-visual interfaces, focusing primarily on audio feedback and sound effects. This initiative seeks to democratize AR experiences, making them accessible and enjoyable for an inclusive audience. By harnessing audio cues and eliminating reliance on haptic feedback, our prototype applies an approach that allows users to interact with AR content and digital environments through auditory senses. This design not only caters to those with visual impairments but also enhances spatial awareness and mental imagery for all users, leveraging the AR technology’s potential to transcend visual limitations and offer immersive, multi-sensory experiences.

Our prototype illustrates the feasibility of integrating audio descriptions and sound effects, which adjust in volume to signify the proximity of digital objects, enabling users to navigate AR settings with heightened orientation and depth perception. Furthermore, it facilitates direct interactions with virtual objects, providing continuous audio feedback and sound effects to enrich user engagement and intuitiveness within AR explorations.

This paper provides a comprehensive examination of our AR prototype’s concept, development, and practical application, highlighting its significance in fostering an inclusive and accessible AR landscape. Through a detailed exploration of the prototype’s features and their implementation, we contribute insightful perspectives to the AR technology’s evolution. Our research underscores the critical need for developing versatile and inclusive AR applications that address the preferences and requirements of a diverse user base, including those with visual impairments, thus paving the way towards a more inclusive future in AR technology.

2. Related Work

The utilization of AR technology in enhancing the daily experiences of individuals, particularly those with visual impairments, presents a significant paradigm shift in the perceptions and interactions with our surroundings [11]. AR’s potential has been extensively demonstrated through its application in various domains such as navigation systems [11], text and image readers [12], and object detection mechanisms [13]. These advancements have showcased AR’s capacity to seamlessly integrate digital information into the physical world, thereby offering enriched, accessible experiences to users with visual impairments. Despite these progresses, the broader scope of AR application continues to marginally address the specific accessibility needs of this demographic, primarily due to a predominant focus on visual-centric interfaces [4,7]. This oversight not only underscores a critical gap in the inclusive development of AR technologies but also highlights the pressing need for interfaces that embrace a wider spectrum of sensory inputs, thereby fostering a more inclusive user experience.

In response to the growing awareness of these accessibility challenges, recent academic efforts have begun to pivot towards enhancing the inclusivity of AR applications. Research works conducted by Naranjo-Puentes et al. [8] and Herskovitz et al. [4] provide a thorough examination of the accessibility features within widely utilized mobile applications, revealing a significant deficiency in AR content descriptive enough for users with visual impairments to engage with it meaningfully. These studies have catalyzed the development of AR prototypes engineered to transcend traditional visual barriers by substituting visual cues with auditory information and Voice-Over commands. Such innovations mark a commendable stride towards broadening AR’s accessibility. However, it is acknowledged that these adaptations, while progressive, fall short of addressing the entirety of challenges faced by users with visual impairments, indicating a need for more comprehensive solutions.

The educational sector, in particular, illustrates a poignant example of how visual dependencies can restrict access to learning materials for students with visual impairments [14]. Emerging solutions have sought to bridge this gap, employing interactive tactile graphics and real-world objects augmented with audio feedback, leveraging cutting-edge technologies such as laser cutting, swell paper, and 3D printing to enhance tactile interactions and learning through auditory feedback [14,15]. This approach is complemented by pioneering efforts such as the audio-tactile experience developed by Agnano et al. [16] for cultural heritage exploration and the machine-vision-based tactile graphics system by Fusco and Morash [17], aimed at facilitating STEM education. These initiatives underscore the potential of integrating tactile and auditory feedback to provide a more accessible educational experience for students with visual impairments.

Further innovations in this domain include the strategic use of QR codes to link tactile graphics to audio descriptions [18], enabling students with visual impairments to gain detailed information about graphical content through their mobile devices. Additionally, projects such as Thévin et al. [15] and the MapSense initiative [19] have showcased the effectiveness of augmenting physical objects and interactive maps with audio feedback and tactile inputs, thus supporting the development of social skills and spatial awareness in children with visual impairments. Despite these significant advancements, the inherent complexity and cognitive demands associated with interpreting tactile graphics highlight the ongoing challenges and the need for methodologies that simplify the process of modifying these graphics [20].

While these foundational efforts have paved the way for more accessible AR applications, there remains an underexplored potential in the domain of interaction with 3D virtual objects and the comprehension of spatial relationships within AR environments. The exploration of spatial audio as a means to create immersive soundscapes [21,22] and the application of haptic feedback technologies to simulate the physical properties of virtual objects [23,24] represent promising avenues for research. These technologies stand at the forefront of efforts to make AR experiences not only more inclusive but also deeply immersive and accessible to a broader audience, including those with visual impairments.

The journey towards developing AR technologies that surpass traditional visual-centric interfaces to include audio and other non-visual cues is pivotal in unlocking new realms of interaction and engagement within augmented environments. The emphasis on developing inclusive, multi-sensory interfaces will be crucial in realizing AR’s full potential as a universally accessible and immersive technology. This endeavor not only aligns with the overarching goal of enhancing the quality of life for individuals with visual impairments but also contributes significantly to the collective advancement of our digital and physical worlds.

3. Materials and Methods

3.1. Proposed AR System

3.1.1. System Architecture

The architecture of the proposed AR system is designed to facilitate a dynamic and responsive user experience by integrating real-time image and hand recognition with auditory feedback. This integration is achieved through a multi-tiered system architecture, which is comprised of the following key components (see Figure 1):

Capture and Detection: The process starts with the capture of the physical environment through the mobile device’s camera, which is tasked with identifying images and recognizing user gestures. For image and gesture recognition, we employed standard AR marker tracking techniques using AR Foundation (AR Foundation: https://docs.unity3d.com/Packages/[email protected]/manual/index.html, accessed on 23 April 2024).
Data Retrieval and Integration: Upon image capture, the data are transmitted to the database. Corresponding 3D models and associated auditory files are fetched from a pre-existing database to provide real-time feedback to the user.
Interface and Interaction—User Interaction and Feedback: The user interacts with the AR system via a mobile device with touchscreen interaction or natural interaction, which acts as the intermediary between the virtual and real worlds. This interface allows users to initiate and control their interactions with the augmented environment.
Visualization and Auditory Feedback: The system’s emphasis on auditory feedback enables a non-visual interaction paradigm, making AR experiences more inclusive, particularly for users with visual impairments. By prioritizing audio over visual feedback, our system opens up new possibilities for AR applications beyond traditional visual-based interfaces. The audio feedback in our experiments was delivered through the mobile device’s built-in speakers, providing clear and contextually relevant cues directly to the user.

The architecture is designed to be flexible, scalable, and adaptable to a variety of scenarios, ensuring the development of AR applications that are not only technologically advanced but also socially inclusive and engaging for a diverse user base.

We aim to explore the integration of advanced AI techniques to further enhance the responsiveness and adaptability of AR systems. Specifically, we intend to:

Implement Pre-trained Models: Utilize pre-trained convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for accurate and efficient image and gesture recognition. These models will be fine-tuned to recognize a wide range of gestures and objects within the AR environment.

Develop Custom AI Algorithms: Create custom algorithms that analyze user interactions in real-time, adapting audio feedback dynamically to enhance the user experience. These algorithms will consider user preferences, interaction patterns, and contextual information to provide personalized and context-aware feedback.

Enhance Context-Aware Interactions: Integrate AI to make the AR system more responsive to environmental changes and user behavior. For instance, the system could adjust audio cues based on the user’s location, actions, and even emotional state, thereby creating a more immersive and intuitive experience.

User Personalization: Use machine learning techniques to learn from individual user interactions over time. This personalization will allow the AR system to adapt to each user’s unique needs and preferences, making the experience more engaging and effective.

Evaluation and Iteration: Continuously evaluate the effectiveness of the AI-enhanced AR system through user feedback and iterative testing. This will ensure that the system remains user-centered and meets the accessibility needs of users with visual impairment users.

By integrating these AI techniques, we aim to develop an AR system that not only enhances user interaction and immersion but also provides a highly personalized and context-aware experience for users, particularly those with visual impairments.

3.1.2. Interface Design

Our prototype was crafted to enhance the accessibility and engagement of AR experiences by integrating audio feedback within an image-based application. This approach addresses the need for inclusive AR solutions in domains such as education and entertainment, where 3D object imposition onto real-world imagery is prevalent [25,26].

Touchscreen Interaction-Based Audio Interface (TAI)

Common AR applications often display virtual objects atop real-world elements, such as books or image markers, or on flat surfaces like tables or walls [1]. Sighted users typically interact with this digital overlay through AR-enabled devices like iPads and smartphones using touchscreen interaction (e.g., [26]). For individuals with visual impairments, however, discerning what is shown on the screen poses a significant challenge. Our system addresses this by presenting AR virtual objects overlaid on images, which users can interact with directly on the AR device’s screen. Contact with specific objects triggers auditory information describing the item being touched—for instance, touching a virtual tree could produce the sound of rustling leaves, which stops when the user ceases contact.

Moreover, virtual objects within AR are typically positioned in a 3D space, a concept that may be challenging to grasp via a 2D interface for some individuals, for example children with visual impairments. Previous studies have highlighted the necessity for tools and methods that facilitate the acquisition of spatial skills by children with visual impairments [19]. Our solution to this limitation involves audio modulation related to the proximity of the virtual objects; the sound volume increases as the object approaches the AR camera and decreases as it moves away. This method allows users to not only identify the virtual objects but also to comprehend their spatial positioning by correlating their location on the screen with the volume of the sound.

The development of this AR feature was accomplished using the Unity 5.x and Vuforia SDKs 10.22. Figure 2 illustrates the configuration of the AR device and how users interact with virtual objects on the screen.

Natural Interaction-Based Audio Interface (NAI)

The advent of advanced AR Head-Mounted Displays (HMD), such as Microsoft HoloLens and Apple Vision Pro, has introduced a user interface that supports natural and intuitive interaction, allowing users to interact with digital objects directly, without the need for touchscreen interactions [27]. The growing use of AR content viewed through HMDs prompts a valuable exploration into how children with visual impairments can comprehend virtual objects without needing a tangible interface. Our approach enables users to physically touch the virtual objects, at which point they receive corresponding audio feedback and sound effects. For instance, when a user touches a representation of a house, they might hear sounds of people conversing within, thereby supporting the formation of a mental model of a spatial 3D world.

For the implementation of this AR functionality, we utilized the Vuforia SDK and Unity, coupled with ManoMotion SDK 2.0 for advanced hand-tracking capabilities. Figure 3 illustrates the configuration of the AR device setup and a user interacting with virtual objects through direct hand movements.

3.2. User Evaluation

3.2.1. Participants

A group of 28 university students from China were invited to participate in an experimental evaluation of two proposed AR interfaces. These individuals were between the ages of 18 and 24, with the cohort comprising 16 female and 12 male participants.

Each participant underwent a standardized introduction to the respective modality they were evaluating to ensure familiarity with the system’s operation. The primary goal was to evaluate the user experience in terms of effectiveness, satisfaction, and the potential for non-visual AR interfaces to enhance user engagement. This evaluation aimed to discern the impact of interaction modality on user engagement and to identify potential areas for the further enhancement of AR interfaces, particularly in the educational context.

The demographic breakdown of the participants is depicted in Table 1, detailing their age, gender distribution, and prior exposure to AR technology.

3.2.2. Questionnaires

In our evaluation of the AR system, we aimed to assess the usability and immersive experience offered by two interfaces: touchscreen interaction and natural interaction. To achieve this, we utilized the System Usability Scale (SUS) [28] and the User Engagement Scale (UES) [29], administering these instruments to 28 participants to gather feedback on both AR interfaces.

The SUS, a reliable tool for assessing the usability of various systems, consists of ten items that participants respond to on a five-point Likert scale ranging from ‘strongly disagree (1)’ to ‘strongly agree (5)’. This scale enabled us to capture participants’ perceptions of the usability of each interface, providing insights into how intuitive and user-friendly they found the touchscreen and natural interaction modalities.

To complement the SUS and gain a deeper understanding of the engagement and immersion facilitated by each interface, we introduced UES. The UES was utilized to measure the participants’ experience regarding the level of engagement on each AR interface. We used a version of the questionnaire with 30 questions, proposed by O’Brien et al. [29], to be answered by the participants with five scales from ‘strongly disagree (1)’ to ‘strongly agree (5)’. We modified the words in the questions to fit into our context. Through the questions, user engagement was measured in the following four dimensions: “focused attention” (FA), “perceived usability” (PU), and “reward factor” (RW). We randomized the order of questions and hid the dimension label to avoid potential biases that might influence participants’ responses [29].

Both questionnaires were made available in English and Chinese, ensuring accessibility and minimizing potential misunderstandings due to language barriers.

In addition to these structured questionnaires, qualitative feedback was collected during the evaluation sessions. Participants’ verbal comments were meticulously recorded and subsequently coded to distill valuable insights into their experiences and perceptions of the AR interfaces. These qualitative findings, alongside the quantitative data from the SUS and UES, equipped us with a holistic understanding of the user experience across both interface modalities.

3.2.3. Procedures

To investigate and explore the design opportunities presented by audio and non-visual AR interfaces, our study engaged participants in a unique evaluation process. The primary aim was to understand how users interact with and perceive AR environments through audio cues as non-visual interactions, focusing on the exploration of interface designs rather than the comparison of their efficacies. Given this focus, the evaluation was carefully designed to ensure that participants relied solely on auditory feedback without visual cues from the AR visualization.

Upon arrival, all participants were briefed on the study’s objectives, with an emphasis on our interest in non-visual experiences within AR. Each participant signed a consent form acknowledging the voluntary nature of their participation and the confidentiality of their responses.

The evaluation sessions were conducted in a controlled environment specifically tailored to minimize visual distractions and highlight auditory interactions. Participants were comfortably seated in a quiet, dimly lit room to prevent visual influences on their experience of the AR interfaces.

In a departure from the original design, where participants were divided into two distinct groups, each participant in the revised study design experienced both versions of the AR interface in a randomized order to mitigate order effects:

In one session, participants interact with an interface that simulates TAI in AR, relying solely on audio feedback. Wearing blindfolds to eliminate visual input, participants explore the interface’s functionalities, such as identifying objects or navigating through a virtual space using touch gestures on a device, with the AR system providing corresponding audio feedback.

In another session, participants explore an AR interface that facilitates NAI without visual cues. Also wearing blindfolds, participants are introduced to tasks that require the performance of physical gestures in the air, with the system delivering audio feedback based on their actions. This session assesses the intuitiveness and immersion of interactions with AR content through movement and sound without visual confirmation.

Following their interaction with both interface versions, participants completed the SUS and UES to provide quantitative data on usability and overall user experience. Additionally, in-depth interviews were conducted to gather qualitative feedback on their experience, focusing on the perceived effectiveness of audio feedback, challenges encountered during the tasks, and suggestions for enhancing the non-visual AR experience.

The combination of quantitative and qualitative data allows for a comprehensive analysis of participants’ experiences with audio and non-visual AR interfaces. This revised approach facilitates an in-depth exploration of design opportunities within non-visual AR, aiming to uncover innovative ways to enhance accessibility and user engagement through auditory and tactile interactions.

4. Results

4.1. System Usability Scale

Figure 4 shows the comparison of each participant’s SUS score between the TAI and NAI. The mean of the overall SUS score for the TAI was 2.65 (SD = 0.21), and for the NAI it was 3.95 (SD = 0.35).

To determine the applicability of the paired sample t-test for our analysis, we conducted a Shapiro–Wilk test to assess the normality of the SUS scores. The Shapiro–Wilk test results confirmed that the SUS scores were normally distributed (p > 0.05). Based on this verification, we proceeded with the paired sample t-test. There was a statistically significant difference in the scores for TAI (M = 2.65, SD = 0.21) and NAI (M = 3.95, SD = 0.35); t(27) = −5.378, p < 0.001. This suggests that the experience of natural touch (NAI) is rated significantly higher than screen touch (TAI) by the subjects in this sample; see Table 2.

4.2. User Engagement Scale

Similar to the SUS scores, we assessed the normality of the UES scores using the Shapiro–Wilk test. The test confirmed that the UES scores were normally distributed (p > 0.05), justifying the use of a paired sample t-test. The mean and standard deviation of participants’ UES in the three dimensions (“focused attention: FA”, “perceived usability: PU”, and “reward factor: RW”) are displayed in Figure 5.

FA represented a concentration level of participants, including seven items such as “I was so involved in this experience that I lost track of time”, and the TAI (M = 3.89, SD = 0.31) was evaluated with a higher score than NAI (M = 2.65, SD = 0.30); t(27) = 16.15, p < 0.001.

PU asked about aspects of perceived usability, and the analysis revealed a significant difference in the mean scores between the TAI (M = 2.60, SD = 0.28) and the NAI (M = 4.11, SD = 0.23); t(27) = −23.02, p < 0.001.

RW, which shows positive experiential outcomes (e.g., willingness to recommend the AR interface to others and having fun with the interaction), also scored a higher mean in the TUI (M = 3.96, SD = 0.55) than ARUI (M = 3.93, SD = 0.76); t(27) = −23.54, p < 0.001.

4.3. Qualitative Results

In addition to the quantitative analysis, qualitative feedback from participants offers deeper insights into their experiences with the TAI and the NAI. This feedback was instrumental in understanding users’ subjective perceptions and personal narratives concerning their engagement with the AR interfaces.

Participants’ responses to the TAI were mixed, with several highlighting its familiarity and ease of use, given the widespread prevalence of touch interfaces in daily technology interactions. However, a common theme among the feedback was a desire for more immersive and interactive experiences. Participants often described the TAI as functional but somewhat detached, lacking the depth of engagement they anticipated from AR technology. Some noted that while the TAI was straightforward, it did not significantly enhance their experience beyond traditional screen-based interactions.

Feedback for the NAI was positive, with participants frequently commenting on the intuitiveness and immersive quality of the interaction. The ability to manipulate virtual objects directly through gestures was cited as a significant factor in the enhanced engagement and satisfaction with the AR experience. Participants described the NAI as “innovative” and “engaging”, emphasizing how natural movements contributed to a more seamless integration of digital content into the physical world. The sense of presence and spatial awareness afforded by the NAI was highlighted as particularly impactful, with many expressing a preference for this interaction style for future AR applications.

The qualitative insights reveal a distinct preference for the NAI over the TAI. Participants valued the NAI’s capacity to provide a more engaging and life-like experience, transcending traditional interaction paradigms. It was not just about replacing the sense of touch with audio; it was about augmenting the sensory experience and enriching the user’s mental model of the AR environment. Particularly for applications that require spatial manipulation or benefit greatly from an enhanced sense of presence, the NAI was seen as a key to unlocking a more profound and meaningful interaction.

Furthermore, participants offered constructive feedback for improving both interfaces. For the NAI, suggestions included incorporating more granular and diverse audio feedback to simulate a wider array of textures and interactions, potentially enriching the realism of the AR experience. For the TAI, enhancements in touch sensitivity and response consistency were proposed to streamline the interface’s functionality and responsiveness.

The qualitative feedback, when considered alongside the quantitative findings, paints a comprehensive picture of user preferences and experiences with AR interfaces. It underscores the importance not only of usability, but also the emotional and experiential aspects of interaction in the design of AR systems. By continually refining and innovating natural user interfaces, developers can create AR experiences that are not just functionally robust but also deeply engaging, memorable, and accessible to all users, including those with visual impairments. This study illuminates the path forward, advocating for a user-centric approach that places immersive experience and inclusivity at the heart of AR technology development.

5. Discussion

Our study explores the usability and immersive experience of two AR interfaces, focusing particularly on the potential of audio and non-visual interfaces to enhance accessibility for users.

5.1. Natural and Touchscreen Interactions

Our study initially found that user satisfaction with AR-based audio interaction is superior to touchscreen interfaces. While this finding aligns with established knowledge about the immersive benefits of HMDs, we further contextualized our results by comparing them with existing literature on HMD-based audio interactions in AR.

Existing studies, such as those by Azuma et al. [30] and Martinez and Fernandez [22], have extensively documented the immersive audio experiences provided by HMDs. These studies highlight how HMDs can enhance spatial audio perception and user engagement in virtual environments. However, our study contributes to this body of knowledge by focusing specifically on the accessibility of AR for users with visual impairments, an area that has been less explored.

In comparison with other advanced AR interfaces, our audio-based non-visual interface offers the following unique advantages.

Spatial Audio Techniques: While existing studies often emphasize visual enhancements in AR, our research underscores the importance of sophisticated audio rendering techniques, such as varying sound volumes to indicate the proximity and orientation of virtual objects. This method provides users with an intuitive understanding of spatial relationships within the AR environment, which is crucial for users with visual impairments.

User Engagement and Satisfaction: Our findings indicate a significant preference for audio-based natural interaction over traditional touchscreen interfaces. This preference suggests that users find the audio cues more intuitive and engaging, aligning with results from studies like those by Jones and Smith [21], which also emphasize the benefits of spatial audio in enhancing user experience.

By comparing our study with existing literature, we highlight the unique contributions of our approach in creating a more inclusive and immersive AR experience. Our discussion demonstrates that while the benefits of HMDs are well documented, our specific focus on audio-based non-visual interfaces for accessibility sets our research apart and advances the field of AR technology.

5.2. Audio and Non-Visual AR Interfaces

The audio feedback features in our proposed prototype demonstrate considerable potential for broad application scenarios. Compared to traditional solutions like braille and tactile graphics, our audio feedback mechanisms offer an easier, more intuitive way for users to perceive and interact with content without the need for extensive learning. These features can transform how common books or textbooks convey images, making them accessible through audio descriptions of visual elements and spatial layouts.

The implementation of audio feedback for spatial virtual objects not only enhances the AR experience for users by providing them with an understanding of objects and their spatial relationships but also addresses common challenges in conveying spatial information in AR applications. This is achieved through varying sound volumes to simulate the proximity and orientation of virtual objects within the user’s environment.

In the findings from our study, we see a marked preference for the Natural Interaction-Based Audio Interface (NAI) over the Screen-Touch Interaction-Based Audio Interface (TAI), which offers significant insights into the future of AR interface design.

The quantitative data derived from the SUS and UES unequivocally favored the NAI. This preference underscores the assertion that users are seeking more intuitive and natural ways to interact with technology—a trend that is particularly relevant in the context of AR. Our results indicate that when users engage with interfaces that leverage natural gestures coupled with audio feedback, they report a significantly enhanced user experience. This resonates with the broader discourse on human–computer interaction, where the move towards more ‘natural’ interfaces has been identified as a key driver for user satisfaction.

Qualitatively, the insights gathered point towards an immersive interaction experience that users perceive as more intuitive and lifelike when using the NAI. Participants described the NAI as an innovation that seamlessly blends the digital with the physical, providing a richer, more engaging experience. This feedback is invaluable as it highlights the immersive potential of AR when it is freed from the constraints of traditional touch-based interactions.

Furthermore, our study contributes to the conversation on inclusivity within AR. By focusing on audio-based, non-visual interfaces, we recognize the necessity of creating AR experiences accessible to users with visual impairments. The qualitative feedback particularly emphasized the enhanced spatial awareness provided by the NAI, an aspect critically important for users who rely on auditory cues to interact with their environment.

6. Limitations and Future Research Directions

Our research is not without its limitations. While our participant pool provided a range of insights, it was limited in demographic diversity, potentially impacting the generalizability of the findings. Future research could address a more diverse population to validate the universal applicability of our results. Additionally, our study’s scope was limited to comparing two specific types of interaction modalities, and further research could explore additional interfaces, such as voice control or gesture recognition, which may offer other avenues for engaging with AR.

6.1. Justification for Participant Selection

Participants with normal vision were chosen to optimize the audio interface and identify any usability issues before conducting targeted studies with users with visual impairments. By first ensuring the system’s effectiveness and ease of use with sighted participants, we can make necessary adjustments and improvements to better cater to the specific needs of users with visual impairments in subsequent studies. The current study was a preliminary investigation to establish the feasibility of the proposed interfaces. We will conduct a follow-up study specifically involving participants with visual impairments. Additionally, we will provide a clear justification for the initial use of participants without visual impairments, explaining that this phase was necessary to identify and resolve potential usability issues before extending the study to the intended user group.

6.2. Clarification of Technologies Used

Our method qualifies as AR because it emphasizes interactions with real-world elements that are enhanced by audio cues. Participants interacted with physical objects and environments augmented by audio feedback, rather than being fully immersed in a virtual space. This approach allows for a more accessible and contextually relevant user experience, particularly for individuals with visual impairments.

The audio feedback in our experiments was delivered through the mobile device’s built-in speakers. This choice was made to provide isolated and precise feedback directly to the user, enhancing the immersive experience by ensuring that the audio cues were clear and contextually relevant.

6.3. Future Work

The exploratory nature of this study indicates a journey towards an AR that transcends traditional visual interfaces, suggesting a sensory-rich future in which AR experiences become more tangible and multi-dimensional.

Voice control integration stands as a frontier for user interactions within AR, potentially simplifying the complexity of tasks and making the technology more accessible. This natural mode of communication could prove especially beneficial for users with motor disabilities or those in situations where manual interaction is hindered.

Advancements in AI offer a promising trajectory for enhancing AR systems. Image recognition algorithms could allow the AR environment to be more responsive and context-aware, tailoring experiences that are both adaptive and personalized. These intelligent systems could assist in recognizing user intentions, environmental factors, and even emotional responses to create a more intuitive user interface.

The iterative design and evaluation process is a critical step towards refining the inclusivity of AR technologies, particularly for users who have visual impairments. As we expand our understanding of diverse user experiences and preferences, the development of design guidelines for accessible AR applications becomes a foundational aspect of inclusive technology design.

This study illuminates the need for a paradigm shift in AR development, one that prioritizes sensory diversity and inclusivity. The positive reception of the NAI by participants demonstrates a clear market for AR experiences that engage senses beyond sight. For individuals with visual impairments, such developments are not just an enhancement but a necessity, opening up new avenues for education, entertainment, and assistance.

The implications of expanding AR into non-visual domains are profound. In education, it can democratize learning materials, making them more accessible for students with visual impairments and potentially enhancing the learning experience for all students through multisensory engagement. In the realm of gaming, it can introduce novel gameplay mechanics that offer unique and immersive experiences. For navigation, it can offer intuitive guidance to users, relying on audio cues and spatial awareness to aid those unable to depend on visual information.

7. Conclusions

In this study, our primary aim was not to juxtapose two distinct versions of AR interfaces but to validate the feasibility and efficacy of the proposed audio and non-visual AR interfaces designed for enhancing accessibility, particularly for users with visual impairments. Through comprehensive user evaluations involving university students, we sought to assess how these innovative interface modalities could foster user engagement and improve usability within AR environments.

The feedback and data gathered from the SUS and the UES, supplemented by qualitative insights, underscored the significant potential of audio-enhanced and non-visual interactions in AR. Participants’ experiences with the NAI demonstrated a clear appreciation for the immersive and intuitive nature of the interaction, highlighting the value of integrating auditory feedback and natural interaction techniques to convey spatial and contextual information effectively.

The findings from this investigation offer several actionable insights for developers and researchers aiming to advance AR technology. Firstly, the positive reception of the audio and non-visual interfaces among users reinforces the importance of accessibility in AR design, suggesting that future developments should prioritize inclusivity. Secondly, the study illustrates the utility of natural user interfaces in creating more engaging and intuitive AR experiences, advocating for a user-centered approach in the design process. Lastly, the research highlights the need for ongoing exploration into multisensory feedback mechanisms, including not just audio but potentially haptic and olfactory cues, to further enhance the immersiveness of AR applications.

In conclusion, this research affirms the viability of audio and non-visual AR interfaces as effective means to extend the accessibility and appeal of AR technologies. By demonstrating the positive impact of these interfaces on user engagement and usability, the study provides a foundational basis for future work in this area. It is our hope that the insights gleaned will inspire other developers and researchers to pursue innovative approaches in creating AR experiences that are not only accessible to a wider audience, including those with visual impairments, but also more immersive and intuitive for all users.

Funding

This research was funded by Beijing Jiaotong University grant number [A24JBRCW00010]. And the APC was funded by [A24JBRCW00010].

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in the study are included in the article, AND further inquiries can be directed to the corresponding author.

Acknowledgments

The author would like to extend their sincere gratitude to all the participants who generously dedicated their time and effort to contribute to this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, J.; Van Der Spek, E.; Hu, J.; Feijs, L. Extracting Design Guidelines for Augmented Reality Serious Games for Children. IEEE Access 2022, 10, 66660–66671. [Google Scholar] [CrossRef]
Chang, Y.L.; Hou, H.T.; Pan, C.Y.; Sung, Y.T.; Chang, K.E. Apply an augmented reality in a mobile guidance to increase sense of place for heritage places. J. Educ. Technol. Soc. 2015, 18, 166–178. [Google Scholar]
Kamarainen, A.M.; Metcalf, S.; Grotzer, T.; Browne, A.; Mazzuca, D.; Tutwiler, M.S.; Dede, C. EcoMOBILE: Integrating Augmented Reality and Probeware with Environmental Education Field Trips. Comput. Educ. 2013, 68, 545–556. [Google Scholar] [CrossRef]
Herskovitz, J.; Wu, J.; White, S.; Pavel, A.; Reyes, G.; Guo, A.; Bigham, J.P. Making Mobile Augmented Reality Applications Accessible. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility, Virtual, 26–28 October 2020; pp. 1–14. [Google Scholar]
Shidende, D. Design and Implementation of Accessible Open-Source Augmented Reality Learning Authoring Tool. In Proceedings of the Doctoral Consortium of Sixteenth European Conference on Technology Enhanced Learning, Bolzano, Italy, 20–21 September 2021; pp. 22–30. [Google Scholar]
Yilmaz Yenioglu, B.; Ergulec, F.; Yenioglu, S. Augmented Reality for Learning in Special Education: A Systematic Literature Review. Interact. Learn. Environ. 2021, 31, 4572–4588. [Google Scholar] [CrossRef]
Žilak, M.; Car, Ž.; Culjak, I. A Systematic Literature Review of Handheld Augmented Reality Solutions for People with Disabilities. Sensors 2022, 22, 7719. [Google Scholar] [CrossRef] [PubMed]
Naranjo-Puentes, S.; Escobar-Velásquez, C.; Vendome, C.; Linares-Vásquez, M. A Preliminary Study on Accessibility of Augmented Reality Features in Mobile Apps. In Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Honolulu, HI, USA, 15–18 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 454–458. [Google Scholar]
World Health Organization. Blindness and Vision Impairment. Available online: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment (accessed on 24 March 2023).
Yekta, A.; Hooshmand, E.; Saatchi, M.; Ostadimoghaddam, H.; Asharlous, A.; Taheri, A.; Khabazkhoob, M. Global Prevalence and Causes of Visual Impairment and Blindness in Children: A Systematic Review and Meta-Analysis. J. Curr. Ophthalmol. 2022, 34, 1. [Google Scholar] [PubMed]
Yoon, C.; Louie, R.; Ryan, J.; Vu, M.; Bang, H.; Derksen, W.; Ruvolo, P. Leveraging Augmented Reality to Create Apps for People with Visual Disabilities: A Case Study in Indoor Navigation. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility, Pittsburgh, PA, USA, 28–30 October 2019; pp. 210–221. [Google Scholar]
Mathur, A.; Pathare, A.; Sharma, P.; Oak, S. AI Based Reading System for Blind Using OCR. In Proceedings of the 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 12–14 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 39–42. [Google Scholar]
Mambu, J.Y.; Anderson, E.; Wahyudi, A.; Keyeh, G.; Dajoh, B. Blind Reader: An Object Identification Mobile-Based Application for the Blind Using Augmented Reality Detection. In Proceedings of the 2019 1st International Conference on Cybernetics and Intelligent System (ICORIS), Bali, Indonesia, 22–23 August 2019; IEEE: Piscataway, NJ, USA, 2019; Volume 1, pp. 138–141. [Google Scholar]
Kasowski, J.; Johnson, B.A.; Neydavood, R.; Akkaraju, A.; Beyeler, M. Furthering Visual Accessibility with Extended Reality (XR): A Systematic Review. arXiv 2021, arXiv:2109.04995. [Google Scholar]
Thévin, L.; Jouffrais, C.; Rodier, N.; Palard, N.; Hachet, M.; Brock, A.M. Creating Accessible Interactive Audio-Tactile Drawings Using Spatial Augmented Reality. In Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces, Daejeon, Republic of Korea, 10–13 November 2019; pp. 17–28. [Google Scholar]
Fusco, G.; Morash, V.S. The tactile graphics helper: Providing audio clarification for tactile graphics using machine vision. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, Lisbon, Portugal, 26–28 October 2015; pp. 97–106. [Google Scholar]
D’Agnano, F.; Balletti, C.; Guerra, F.; Vernier, P. A case study of augmented reality for an accessible cultural heritage. Digitization, 3D printing and sensors for an audio-tactile experience. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, 40, 207. [Google Scholar] [CrossRef]
Baker, C.M.; Milne, L.R.; Drapeau, R.; Scofield, J.; Bennett, C.L.; Ladner, R.E. Tactile graphics with a voice. ACM Trans. Access. Comput. (TACCESS) 2016, 8, 1–22. [Google Scholar] [CrossRef]
Brule, E.; Bailly, G.; Brock, A.; Valentin, F.; Denis, G.; Jouffrais, C. MapSense: Multi-sensory interactive maps for children living with visual impairments. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016; pp. 445–457. [Google Scholar]
Götzelmann, T. Visually Augmented Audio-Tactile Graphics for Visually Impaired People. ACM Trans. Access. Comput. (TACCESS) 2018, 11, 1–31. [Google Scholar] [CrossRef]
Yang, J.; Barde, A.; Billinghurst, M. Audio Augmented Reality: A Systematic Review of Technologies, Applications, and Future Research Directions. J. Audio Eng. Soc. 2022, 70, 788–809. [Google Scholar] [CrossRef]
Hong, J.Y.; He, J.; Lam, B.; Gupta, R.; Gan, W.-S. Spatial Audio for Soundscape Design: Recording and Reproduction. Appl. Sci. 2017, 7, 627. [Google Scholar] [CrossRef]
Bermejo, C.; Hui, P. A Survey on Haptic Technologies for Mobile Augmented Reality. ACM Comput. Surv. 2021, 54, 1–35. [Google Scholar] [CrossRef]
Yang, T.H.; Kim, J.R.; Jin, H.; Gil, H.; Koo, J.H.; Kim, H.J. Recent advances and opportunities of active materials for haptic technologies in virtual and augmented reality. Adv. Funct. Mater. 2021, 31, 2008831. [Google Scholar] [CrossRef]
Boonbrahm, S.; Kaewrat, C.; Boonbrahm, P. Using augmented reality technology in assisting English learning for primary school students. In International Conference on Learning and Collaboration Technologies; Springer: Cham, Switzerland, 2015; pp. 24–32. [Google Scholar]
Li, J.; Van der Spek, E.D.; Yu, X.; Hu, J.; Feijs, L. Exploring an Augmented Reality Social Learning Game for Elementary School Students. In Proceedings of the Interaction Design and Children Conference, London, UK, 17–24 June 2020; pp. 508–518. [Google Scholar]
Jin, Y.; Ma, M.; Zhu, Y. A Comparison of Natural User Interface and Graphical User Interface for Narrative in HMD-Based Augmented Reality. Multimed. Tools Appl. 2022, 81, 5795–5826. [Google Scholar] [CrossRef] [PubMed]
Brooke, J. SUS: A "Quick and Dirty" Usability. In Usability Evaluation in Industry; Taylor & Francis: London, UK, 1996; pp. 189–194. [Google Scholar]
O'Brien, H.L.; Cairns, P.; Hall, M. A Practical Approach to Measuring User Engagement with the Refined User Engagement Scale (UES) and New UES Short Form. Int. J. Hum.-Comput. Stud. 2018, 112, 28–39. [Google Scholar] [CrossRef]
Azuma, R.; Baillot, Y.; Behringer, R.; Feiner, S.; Julier, S.; MacIntyre, B. Recent advances in augmented reality. IEEE Comput. Graph. Appl. 2001, 21, 34–47. [Google Scholar] [CrossRef]

Figure 1. The system architecture of the proposed AR system.

Figure 2. Setup for the AR feature of audio feedback for spatial virtual objects on screen in the AR prototype.

Figure 3. Setup for the AR feature of audio feedback for spatial virtual objects with hands in the AR prototype.

Figure 4. Comparative SUS scores of TAI (screen-touch) vs. NAI (natural).

Figure 5. The mean and standard deviation of participants’ UES in the three dimensions (“focused attention: FA”, “perceived usability: PU”, and “reward factor: RW”).

Table 1. Demographic characteristics of 28 participants.

Characteristics		Number of Participants (n = 28)	Percentage (%)
Gender	Male	12	42.86%
Gender	Female	16	57.14%
Age	18	4	14.29%
	19	8	28.57%
	20	6	21.43%
	21	2	7.14%
	22	4	14.29%
	23	3	10.71%
	24	1	3.57%
Prior knowledge on AR	Yes	26	92.86%
Prior knowledge on AR	No	2	7.14%

Table 2. Statistical summary and paired sample t-test results for SUS scores.

Category	Mean (M)	Standard Deviation (SD)	t-Value	Degrees of Freedom (df)	p-Value
TAI (Touchscreen)	2.65	0.21	−5.378	27	<0.001
NAI (Natural)	3.95	0.35	-	-	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J. Beyond Sight: Enhancing Augmented Reality Interactivity with Audio-Based and Non-Visual Interfaces. Appl. Sci. 2024, 14, 4881. https://doi.org/10.3390/app14114881

AMA Style

Li J. Beyond Sight: Enhancing Augmented Reality Interactivity with Audio-Based and Non-Visual Interfaces. Applied Sciences. 2024; 14(11):4881. https://doi.org/10.3390/app14114881

Chicago/Turabian Style

Li, Jingya. 2024. "Beyond Sight: Enhancing Augmented Reality Interactivity with Audio-Based and Non-Visual Interfaces" Applied Sciences 14, no. 11: 4881. https://doi.org/10.3390/app14114881

APA Style

Li, J. (2024). Beyond Sight: Enhancing Augmented Reality Interactivity with Audio-Based and Non-Visual Interfaces. Applied Sciences, 14(11), 4881. https://doi.org/10.3390/app14114881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Beyond Sight: Enhancing Augmented Reality Interactivity with Audio-Based and Non-Visual Interfaces

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Proposed AR System

3.1.1. System Architecture

3.1.2. Interface Design

3.2. User Evaluation

3.2.1. Participants

3.2.2. Questionnaires

3.2.3. Procedures

4. Results

4.1. System Usability Scale

4.2. User Engagement Scale

4.3. Qualitative Results

5. Discussion

5.1. Natural and Touchscreen Interactions

5.2. Audio and Non-Visual AR Interfaces

6. Limitations and Future Research Directions

6.1. Justification for Participant Selection

6.2. Clarification of Technologies Used

6.3. Future Work

7. Conclusions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI