1. Introduction
VR and AR techniques can render a virtual environment or superimpose a virtual object onto a physical target; thus, the interaction experience with VR/AR is different from conventional displays that include monitors, tablets, and projectors. VR/AR devices are used for sports training, entertainment, tourism, manufacturing, warehouse work, and medical assistance [
1,
2,
3]. They have the potential to change how we learn, recreate, and work [
4,
5]. The modes of communication between humans and VR/AR systems have a large impact on their implementation, use, and acceptance. The human interface requirements for VR/AR devices are different from other human–computer systems; the visual demands are greater, the command set is different, and the user may be sitting, standing, or walking. Therefore, researchers are designing tracking, interaction, and display techniques to improve comfort, and efficiency particularly [
6], then applying those techniques to the implementation of AR systems.
Integrating a human factor approach to the design of gestures with gesture recognition that optimizes latency and accuracy is vital to facilitating effective interactions between humans and VR/AR systems [
7,
8,
9,
10,
11]. Users are familiar with touch gestures and voice control due to their pervasive use of smart devices and these input methods have been evaluated by users interacting with VR/AR systems. FaceTouch (FT), a touch–input interface mounted on the backside of a head-mounted display (HMD) [
12] was accepted by participants because of its low error rate, short selection time, and high usability. However, it could be only used for short utilitarian purposes because users’ arms tend to fatigue easily. The use of pre-programmed voice commands to manipulate objects in VR demonstrated the usefulness of voice control as a practical interface between users and systems [
13,
14]. However, the use of voice commands can reveal user actions and disturb others in public. Office workers are accustomed to using a keyboard and mouse. A virtual keyboard to execute commands has been evaluated in an immersive virtual environment by office workers [
15,
16]; however, the accurate detection of fingertips and the provision of feedback to users while typing on the virtual keyboard were difficult and limited usability and interaction efficiency.
Another common user interface approach when interacting with VR head-mounted displays (HMDs) is the hand-held controller, similar to a game controller, which provides accurate input and feedback while minimizing latency (Controller, HTC VIVE, HTC Inc., New Taipei City, Taiwan, 1997). However, the visual demands to identify buttons on a controller can distract the users and negatively impact their performance. Additionally, the number of buttons that can reasonably fit onto a controller is limited which constrains input efficiency. Furthermore, the use of hand controllers has been shown to increase motion sickness during VR HMD use [
17] and the sustained grasp of a hand-held controller may increase the muscle load on the upper extremities. Interaction with VR or AR systems that rely on extra hardware like a controller or touch screen prevents users from performing other tasks with their hands such as manipulating physical objects.
Due to these constraints, mid-air, non-contacting, and 3D hand gestures have emerged as an alternative to a controller, touch screen, or voice control while interacting with high-resolution displays, computers, and robots [
18,
19,
20,
21]. Importantly, the human proprioception system allows users to perceive the spatial position of the hands relative to the trunk with high precision. The use of VR/AR HMDs is becoming widespread and user experience with HMDs is different from conventional smart devices such as phones, tablets, and TV. In response, the design and use of hand gestures for interacting with VR/AR applications have been investigated [
22,
23,
24]. Navigation, object selection, and manipulation commands are commonly used in VR/AR systems and have been evaluated by researchers. The conversion of fingertip or hand position movement to the control of the velocity of self-movement in the virtual environment (VE) has been demonstrated [
25,
26] as has the use of gestures for the point and selection of virtual objects [
27]. For example, to select virtual objects, users preferred the index finger thrust and index finger click gestures [
27]. However, the lack of feedback can have a negative impact on user input confidence and efficiency while performing gestures [
28,
29]. Therefore, researchers have designed hardware, like a flexible nozzle [
30] and ultrasound haptics [
31], to provide feedback while using gestures in different application scenarios. Physical surfaces like tables and walls are additional ways to provide feedback for VR/AR systems, and, being ubiquitous, can provide feedback to users by mimicking interactions with a touch screen. Researchers developed the MRTouch system which affixed virtual interfaces to physical planes [
32] and demonstrated that feedback from physical surfaces can improve input accuracy.
Gestures can provide an unconstrained, natural form of interaction with VR/AR systems. However, gestures can differ widely in how they are performed. For example, performing large gestures that require whole arm movement can lead to shoulder fatigue and are difficult to use for extended durations [
33]. Sign language interpreters who perform mid-air gestures for a prolonged period experience pain and disorders of the upper extremities [
34]. To avoid fatigue, studies suggest that repeated gestures should not involve large arm movements and should avoid overextension or over-flexing of the hand, wrist, or finger joints [
34]. Microgestures, defined by utilizing continuous small hand or finger motion, have been developed to reduce the muscle load on the upper extremities associated with large and/or mid-air gestures. Microgestures can be inconspicuous and less obvious to nearby people and are more acceptable in public settings compared to large hand or arm motions [
35,
36]. Single-hand microgesture sets have been proposed to interact with miniaturized technologies [
37] and to grasp hand-held objects with geometric and size differences [
38]. However, some of the microgestures studied could not be easily performed. For example, the gesture to use the thumb tip to draw a circle on the palm of the same hand was found to be very difficult and uncomfortable to perform. Additionally, the proposed microgesture set was designed with the forearm resting on the tabletop in a fully supinated (palm up) position, a posture that is associated with discomfort and should be avoided if performed repeatedly [
34].
Although the use of microgestures for VR/AR is appealing, as yet, there is no universally accepted microgesture set developed for VR/AR systems designed with a human-centered approach [
37,
38,
39,
40,
41,
42,
43,
44,
45]. Therefore, the design of a microgesture set for common VR/AR commands that are intuitive, easily recalled, and minimize fatigue and pain is warranted. The primary purpose of this study was to design a microgesture set for VR/AR following human factors and ergonomic principles that consider user interaction habits and gesture designs that minimize hand fatigue and discomfort. Additionally, to improve the wide acceptance and usability, participants from different cultural backgrounds were recruited to build mappings between microgestures and commands for the VR/AR system.
The paper is organized as follows: We first describe the methodology of the study including the selection of VR/AR commands, the design of the microgestures, the pre-assignment of microgestures to the commands by experts, and the design of software for participants to assign microgestures to the commands and rate the microgesture–command sets. Next, we describe the data analyses used. Then, we present study findings by first presenting the proposed microgesture set for VR/AR commands based on popularity and the user preference for microgestures’ characteristics. Finally, in the discussion, we compare the proposed microgestures to prior studies and discuss the broader implications and limitations of this work.
3. Data Analysis
The data were processed with MATLAB 9.4, and statistical analysis was conducted with the R language. The rating scores for the gesture–command combinations were normalized to a value with a mean of 10 and a standard deviation of 1 across participants to adjust for differences in rating scales.
The ultimate assignment of a gesture to a given command, to build the proposed gesture-command set, was primarily determined by its popularity among participants.
The agreement score reflects the consensus among users for selecting the same microgesture for a given command. For this study, a modified agreement Equation (
1) [
47] was used to calculate the agreement score:
where
P is the number of different gestures selected for command
r, and
is the number of participants who selected the identical gesture for command
r. As an example of an agreement score calculation, the command
shrink/enlarge had seven different gestures selected by participants and the number of participants selecting each of the seven gestures was 37, 34, 32, 3, 2, 1, and 1. Therefore, the agreement score for the command
shrink/enlarge was:
Differences in comfort and preference between microgestures were analyzed using a repeated-measures ANOVA. For example, estimated comfort ratings between microgestures performed with a pronated forearm (palm down forearm) versus a neutral forearm (thumb-up forearm) were compared as were differences between familiar (gestures
p and
q form the okay posture) versus unfamiliar (gesture
f thumb tip slides on the index finger) gestures, identified by a “*” in
Figure 1.
The preference, match, comfort, and privacy ratings for each gesture–command combination reflect different attributes or dimensions of the combination. The agreement scores indicated the consensus of the groups of gestures and commands among users. The importance score was the rating by VR/AR developers on the importance of that command to VR/AR systems (
Table 1). The extent that the different dimensions were correlated was evaluated using Pearson’s correlation coefficient. The preference of interaction methods for VR/AR systems was evaluated using the Skillings–Mack test.
5. Discussion
Based on the results of this study, a 3D microgesture set of 19 microgestures is proposed for 20 commands used in a VR or AR system, determined by popularity with adjustment. The microgestures proposed in this study were from a set of microgestures designed by ergonomists who have experience in designing gestures and tools for human–computer interaction that minimize discomfort and fatigue and optimize interaction efficiency. Users performed 3D microgestures with the forearm and hand on a table to reduce the loads to the neck and shoulder muscles [
48]. Therefore, with these gestures, users should be able to perform the gestures repeatedly while interacting with VR/AR displays. Furthermore, the gesture-command combinations were selected by participants based on their prior experience and familiarity with gestures used for touch screens making the findings acceptable to others. Additionally, the assignment of microgestures to commands was completed by 40 participants with multinational backgrounds improving acceptability across cultures. There was a significant difference in comfort and privacy ratings on the proposed microgesture set between participants from China and those from the US and Europe (
p < 0.05) which demonstrates that nationality has some effect on user preference of microgestures. Therefore, the development and evaluation of universal gesture sets should include participants of different nationalities.
The commands assigned to microgestures will vary from application to application. It is likely that a particular application will just use a subset of the proposed microgesture–command set. For example, microgestures used to watch an interactive movie with a VR HMD are likely to be needed for just a few commands, such as volume up/down and pause. A larger set of command and microrgestures are likely to be needed by CAD designers using VR/AR HMDs, who may perform gestures for a prolonged period while conceptualizing their ideas. More research may be needed to optimize the microgesture–command set for specific applications. However, the generalized microgesture–command set developed here can be a starting point for such research.
For the proposed gesture-command set, the correlation between the characteristics of preference and match was high, indicating that assigning gestures to commands based on popularity is a reasonable approach. Interestingly, comfort had a very strong correlation with privacy (R = 0.75). This may be due to comfort including assessment of both physical and emotional comfort; gestures involving very small finger or hand motions are both less physically demanding and are not easily noticed by others. The agreement score had a weak positive relationship with preference and comfort, which may have been influenced by the way the experiment was conducted. The agreement score calculated based on multiple gestures selected for a given command in this study was lower than observed in prior studies [
37,
42,
43,
44]. Recalculating the agreement score based on the most preferred microgesture for commands increased the score. In addition, relative to other studies, subjects had to select two or more gestures from a larger number of gestures for a given command and this reduced the agreement score.
The difference in comfort while performing gestures with a pronated or neutral forearm indicated that users had no preference between the forearm postures (
p > 0.10). It may be difficult to note fatigue or discomfort when gestures are performed very briefly. During the matching, subjects were not required to repeatedly perform the gesture. However, future designs of 3D microgestures should avoid fully supinated postures (palm up) [
34]. The learnability of familiar microgestures should be high, and, therefore, it would be expected that users could incorporate these gestures with VR/AR with minimal training. It was surprising to find that there was little difference in comfort and preference between familiar and unfamiliar gestures (
p > 0.10,
Table 5). During training, participants performed all of the microgestures properly, and the learnability for unfamiliar microgestures was similar to familiar microgestures. In addition, participants were open to performing unfamiliar microgestures to replace the existing gestures as long as the gesture matched the task and was easy and comfortable to perform. For example, gesture e (index finger circles CW/CCW) is familiar and intuitive to interpret as a command to adjust volume up or down and was selected by 10 participants. Surprisingly, 24 participants preferred gesture
f (
thumb slides on the side of the index finger) for the volume command, a gesture that is not widely used.
The comparison of the number of fingers to move while performing a microgesture, e.g., index, index and middle, vs. all fingers, did not reveal a strong preference. This finding may be useful for gesture designers—that any of the three types of finger movements can be used. A prior study found that, for individual digit movement, the thumb was most popular, while the little finger was least popular in performing microgestures [
37].
Current interfaces are usually designed to require users to input parameters along with the commands through typing which may cost extra time and increase workload—such as users possibly wanting to change the mapping weights between the translation distance of hand in physical 3D spaces to the distance of virtual cursor in VEs based on their actual demand [
25], similar to the value of dpi (dots per inch) for a conventional computer mouse, both of which impact the sensitivity of HCI input tools. Therefore, the number of fingers used in a microgesture can be an independent input parameter with some commands in order to skip the extra step of typing on the interfaces. Moreover, using a different number of fingers as a parameter for a command is intuitive to understand, thereby reducing memory load, improving interaction efficiency, and making interaction more natural. The conversion of numbers of fingers to parameters can be applied in various scenarios. Commands such as
acceleration,
slow down, and
fast forward are difficult for users to execute while gaming or wandering in a VE. Users could perform a gesture with different numbers of fingers pointing forward to control the magnitude of navigation speed. For example, a hand pointing forward with the index finger, index and middle finger, or index to small finger extended could represent 1×, 2×, or 4× speed, respectively. The use of VR and AR devices can facilitate designers to conceptualize their ideas [
22,
49] but repeatedly creating components can be annoying. From the findings of this study, performing the tapping gestures with index, index and middle, or index to small fingers, respectively, could duplicate the component at different speeds which can accelerate the process of idea conceptualization.
In contrast to prior studies, the aim of recruiting participants with multi-cultural backgrounds was to build mappings between 3D microgestures and commands so they were not required to design gestures for a given command with a ‘think out loud’ strategy [
41]. Thus, a comparison to the selection of microgestures for similar commands in other elicitation studies is vital to support the feasibility of our experimental design. Gesture
q (
form the okay posture) was assigned by 29 participants to activate the interface to accept hand gestures as input. The same gesture was preferred by users from China while gesture
thumb up was preferred by users from the US to commit the command [
42,
44]. From our study, gesture
form the okay posture can be a replacement for the gesture
thumb up to execute commands like
confirmation,
accept, and
activation for users from the US and Europe.
Studies [
25,
26,
27] pointed out that the consensus on the importance of designing gestures for commonly used commands
navigation and
selection. The gesture
index finger points forward with the forearm in a neutral position was adopted to control the 3D virtual cursor while the gesture
index finger points forward with the forearm in a pronated position was utilized to translate the virtual target which was the same as prior studies [
25,
27,
41]. Moving the cursor movement is usually required for selecting a virtual object, so the choices of gesture combinations for controlling the virtual cursor and for object selection are important so that users can execute the sequential commands continuously with fluid movements and transitions. The microgestures assigned to the commands’ cursor and selection were similar to those of a prior study where users preferred the gestures
index finger points forward with the forearm in a neutral position and
thumb taps on the side of the middle finger with the index finger pointing forward and forearm in a neutral position [
27]. Similarly, the gesture
index finger points forward with the palm down and
the gesture index finger taps on the table with the palm down were selected for the two commands, respectively, by more than 20 participants. Therefore, gesture developers may provide the above two combinations as choices for users to control the cursor and select an object.
The commands
confirm,
reject, and
undo are commonly used in human–computer interaction. The microgesture
index fingertip taps thumb tip was performed to complete commands
select,
play,
next, and
accept, while gesture middle fingertip taps thumb tip was assigned to complete
pause,
previous, and
reject commands in the study [
37]. However, such connections between gesture and commands may not be intuitive. In our study, participants preferred performing gestures
palm scrolls repeatedly with palm down and
index finger scrolls repeatedly with palm down to reject the call and delete the object, respectively. The same gesture was proposed to undo an operation while interacting through a touch screen [
41]. For the command
rotation objects, it has been shown that users preferred to match the rotation of the hand wrist with the rotation of the object in a metaphor [
42]. However, the rotation of the object based on such mapping is limited by the dexterity of the hand/wrist, and over-extreme hand/wrist postures pose a health risk [
34]. Importantly, the accurate detection of the hand/wrist angle for rotating an object is a challenge, and hand tremors are inevitable while performing mid-air gestures. To address the above problems, we converted the distance of hand translation to the amount of object rotation. Users could rotate an object with high accuracy while removing the negative consequences caused by hand tremors in mid-air.
In the future, users may work or engage in recreational activities with VR or AR HMD for long periods of time. Watching a movie with a virtual display created with a large size and high resolution is a potential replacement for cinemas. Thus, gestures to browse movies and play/pause a video are desired. Gesture
x (
index and middle fingers scroll left/right with the pronated forearm) was assigned to show the previous or next item. For the command
play/pause, gesture
j (
index finger taps on the table with the forearm in pronated position) was the most popular gesture among users. The same gesture was proposed in another study to play the selected channel while watching TV [
39].
From the proposed microgesture–command sets, we can acknowledge that tapping and swiping microgestures were popular among participants which was discovered while designing gestures to control mobile devices [
43,
50]. The preference for tapping and swiping gestures may be caused by the users’ past experience with touch screens. The preference for microgestures revealed that understanding the users’ interaction habits is vital to implementing an interface based on microgestures for VR or AR systems.
The proposed 3D microgestures are not limited to commands investigated in this study (
Table 1). The microgestures designed to manipulate an object could be used to set parameters for the virtual scene. For example, microgestures with the purpose of enlarging or shrinking an object could be performed to zoom in or out of a virtual scene while no object was selected. Similarly, microgestures used to translate or rotate an object could control the coordinate systems in a VE. Although the proposed 3D microgesture set is designed for VR/AR systems specifically, the use of it can be extended to other platforms using similar context-based commands. Interfaces based on hand gestures have been developed to complete secondary tasks while driving a car so that drivers can pay attention to the road safely without distraction while manipulating a control panel through a touchscreen or keypad [
44,
51]. However, another study [
52] found that an interface based on mid-air gestures takes longer to complete a task and takes a higher workload compared to a touch-based system. Perhaps, performing 3D microgestures with the hands resting on the steering wheel or one forearm resting on an armrest could allow drivers to reduce muscle load, cognitive workloads, and the duration to complete a task when performing secondary tasks. Similar to feedback provided to users when resting their arms on a table, drivers may receive feedback when resting their hands on the steering wheel when compared to mid-air gestures.