1. Introduction
Dietary choices are linked to several chronic diseases including cardiovascular disease [
1], some types of cancer [
2], type 2 diabetes [
3], and obesity [
4]. It has been estimated that, each year, 11 million premature deaths and 255 million disability-adjusted life years are attributable to a poor diet [
5]. Reducing the amount of energy, saturated fat, trans fat, sodium, and added sugar in the diet while increasing beneficial nutrients, such as from whole grains, fiber, fruits, and vegetables, is a key public health goal. In 2020, the amount of food purchased for consumption at home (e.g., from supermarkets, grocery stores, or convenience stores) accounted for 51.9% of all food expenditures [
6]. Consequently, changing food-purchasing behavior in supermarkets may be a key target to improve dietary choices. A recent scoping review highlighted a number of interventions that have been used in a retail environment to improve consumer knowledge of food or to promote healthier food choices [
7].
According to the socio-economic model of health, individuals do not act in isolation but interact with their environments and communities, which influences their behavior [
8]. This model may be useful to identify potential targets (modifiable intrapersonal, interpersonal, organizational, community, and/or policy level factors). Policies could be introduced to change the price of foods through subsidies or taxation or to provide new front-of-package information or warnings [
9]. Another approach is to increase self-efficacy by educating consumers so that they feel able to make improved food choices [
10]. For instance, consumers could be educated about the use of nutrition labels, ingredient lists, or front-of-package labels. Moreover, instruction can be given about alternatives to foods that they commonly consume or ways to incorporate healthier options into the diet. While there are many methods to educate consumers, a dietitian-led grocery store tour (GST) may offer advantages over alternative education approaches.
A GST has been defined as “the dissemination of nutrition information/and or shopping strategies by an educator to a small group of individuals while moving from aisle to aisle within a market that sells a wide variety of food products” [
11]. The individuals can receive nutrition education in the context in which they typically buy food, ask questions, or be shown alternative purchases that are more nutritious or cost-effective. A GST can also be customized to meet the needs of various populations, for example, individuals with conditions such as diabetes, heart disease, or food allergies, or for groups such as parents/caregivers, pregnant individuals, or athletes. At this time, little is known about the effectiveness of a GST, as only a limited number of studies have been conducted and a meta-analysis of GST studies found that supermarket tours improve knowledge and encourage positive behavior change [
11]. However, the authors concluded that the quality of studies was low and no long-term studies to demonstrate sustained behavior change have been conducted. They also noted that only one of the studies was rooted in behavior change theory. Consequently, further research is required to understand the effect of a GST on sustained food choices and health. Moreover, the attributes of a successful grocery store tour have yet to be established.
There are potentially significant barriers to conducting a GST, which may limit its reach as a method to improve a diet. First, gaining access to grocery stores to conduct tours may be difficult as these establishments may not be willing to allow access to facilitators to conduct educational tours. Second, it may be difficult to arrange a time that is convenient for the facilitator and the client (or multiple clients) to attend. In addition, it may be difficult for clients to spend up to 90 min (in addition to time spent traveling to and from the grocery store) to attend a grocery store tour. Third, some people may have difficulties remaining standing for a grocery store tour that may last up to 90 min. Fourth, due to the logistics of organizing a tour, it may be most efficient to conduct tours that are longer in duration to minimize the number of visits to a grocery store. The length of a GST may be too long for participants to concentrate and absorb all of the information presented in a GST and participants may suffer attention decline. For example, in a study of chemistry students, it was found that students do not pay attention continuously during a 50 min lecture [
12] and would cycle through periods of attention and inattention. Students became engaged and non-engaged in shorter cycles as the lecture progressed. Little is known about participant engagement in a GST, as the environment is different from a lecture hall and a higher degree of interaction may raise the level of attention. Fifth, in a physical GST there are many distractions, such as poor acoustics, store music, store announcements, and other customers moving throughout the store. This may impair the learning experience. In addition, participants in the tour may feel a lack of privacy when taking part in a GST. In a virtual GST, background noise can be removed and there are no other customers in the scene that may distract the participant and impair the learning experience. A virtual GST can also offer privacy or anonymity, which may encourage the participants to take a GST or ask questions while they are taking part in the tour.
Digital technologies have been used to provide nutrition education and may provide a new avenue to provide nutrition education. Current approaches have included the use of social media [
13], computer kiosks [
14], and podcasts [
15] to educate consumers. An alternative approach to a physical GST is to use digital technology to create virtual worlds that could be used to conduct a virtual GST. For instance, a virtual GST could be created that can be viewed by anyone with a cell phone, tablet, PC, or immersive virtual reality headset. This would potentially widen access to grocery store tours as they could be accessed and viewed at any time from any location by anyone who has the required technology and an internet connection. As the virtual grocery store is computer generated, it provides the ability to supplement additional information such as videos or computer animations that aid engagement or give further explanation of concepts. In addition, gamification approaches could be incorporated to promote learning [
16]. Moreover, the clients can view the virtual tour over multiple occasions, allowing them to revisit information or view specific sections based on their specific interests or as their time or concentration allows.
While the use of virtual worlds as a medium for learning has gained increased attention in many areas [
17], little is currently known about the use of virtual worlds to provide nutrition education. Due to the increasing requirement for remote interaction between individuals it is likely that there will be significant innovation in digital approaches to facilitate interaction between dietitians and their clients. A virtual GST may be one innovation that aids the communication of nutrition information. It is essential that dietitians are involved in the development of a virtual GST due to their domain-specific knowledge. The objective of this paper was to provide a description of the development of a virtual GST that can be experienced on a tablet, PC, or virtual reality (VR) headset. A full discussion of the factors that influenced the design process is included.
2. Methods
For this present study, a virtual supermarket was built that can be experienced using a tablet, PC, or immersive virtual reality headset. A 3D model of a supermarket was purchased from Turbosquid (
www.turbsquid.com, accessed on 1 March 2021) (
Figure 1) and imported into the Unity game engine (Unity Technologies, San Francisco, CA, USA). The Unity game engine provides a platform to create 3D virtual worlds that can be deployed across multiple platforms.
The supermarket was populated with 3D models of foods purchased from Turbosquid or the Unity Asset Store (
https://assetstore.unity.com/, accessed on 1 March 2021). Turbosquid and the Unity Asset Store are excellent sources of premade models that can be easily incorporated into scenes created using Unity. Many of the models are free or reasonably priced. However, attention must be paid to the license agreement before using models in commercial products. This prototype was developed to run on a tablet (
Figure 2a) or PC (
Figure 2b). The PC version can be experienced using a standard monitor or a VR headset.
For our prototype, we created an avatar using Unity Multipurpose Avatar version 2 that is available from the Unity Asset Store (
https://assetstore.unity.com/packages/3d/characters/uma-2-unity-multipurpose-avatar-35611, accessed on 1 March 2021). This software allows for the creation and customization of 3D avatars for use in virtual worlds and is currently available free of charge. The gender, body proportions, hair, clothes, and facial features of the avatar can be modified to create avatars with a wide range of characteristics. In our prototype, the avatar was a young female (
Figure 3) that represented the person who recorded the voiceovers. The MP3 that was recorded was linked to the avatar by using Salsa Lip Synch Suite (Crazy Minnow Studio,
www.crazyminnowstudio.com, accessed on 1 March 2021), which is available from the Unity Asset Store. Salsa Lip Synch Suite is a Unity asset that is used to provide lip synch approximation with a recorded soundtrack. In addition, Salsa Lip Synch software (Crazy Minnow Studio,
www.crazyminnowstudio.com, accessed on 1 March 2021) comes with other modules (EmoteR and Eyes) that allow for random, emphasis-timed expressions or manually programmed expressions. The EmoteR can be linked with SALSA to add emphasis emotes with audio-based timing while the Eyes module can be used to animate eyes, eyelids, and head movements so that the avatar can look around randomly or can be configured to focus on a target object. These modules can be used to add emphasis to certain parts of the tour by modeling different movements to draw the user’s attention to specific pieces of information.
In this present study, users ‘stop’ at various sections of the virtual supermarket including the grains section, canned goods, snacks, dairy, meat, and produce. The tour covers concepts such as reading nutrition labels, ingredient lists, and per-unit food costs. The tour uses graphics of nutrition labels, pictures to illustrate differences between lean and fatty cuts of meat, and a video to explain per-unit food costs. A video of a PC-based prototype of this tour is provided in
Supplemental Video 1.
A registered dietitian recorded voiceovers for the tour using a studio condenser microphone (Samson Technologies, New York, NY, USA). The voiceover was recorded using Windows Sound Recorder and converted to an MP3 file that was uploaded into the Unity game engine. A separate recording was created for each aspect of the tour so that the recording was associated with a specific action. (For example, clicking on dairy moved the user to the dairy section and started the voiceover. When the participants were instructed to click on ‘nutrition information’ and they performed this task, the voiceover linked to that action was started).
In our prototype, users interacted with the app by clicking on virtual user interface buttons (
Figure 3). In the tablet version, users select an option by pressing the button on the screen. In the PC version, the users use a mouse to click on the button on the screen. In the immersive VR version, the users use the hand controller to emit a laser pointer. They point this at the user interface button and click the trigger button to select. A brief video demonstrating this process in immersive VR can be viewed as
Supplemental Video S2.
In our prototype, the users clicked on a user interface button that transported them to a new location. In the tablet version, the participant could not move their virtual ‘head’ to view other parts of the store. In the PC version, the user could use the keyboard to move their virtual ‘head’ around the store (the user uses the curser keys to look around the store). In the VR version, the participant’s actions in the real world were mirrored in the virtual world. That is, if the participants moved their head, they would move their head in VR and their viewpoint would change accordingly. Moreover, if they walked around in real life, they would also walk around the virtual store.
4. Discussion
To date, it is not clear how effective dietitian-led grocery store tours, conducted in real-life settings, are in changing food-purchasing habits to improve markers of health [
11]. However, it is not clear that findings from real-world studies would hold in the virtual world (or vice versa), and research is warranted to determine the best practices for providing grocery store tours in virtual worlds. Randomized, controlled trials of sufficient duration to determine the effectiveness of this approach are required. While many questions remain about the effectiveness of using virtual grocery stores to provide nutrition education, there is significant potential for this approach; further research to develop this approach and to determine the best practices and effectiveness of this approach is warranted.
Before developing a virtual GST, it is important to set clear goals for what the GST is trying to achieve. Ultimately, a virtual GST should aim to change food-purchasing behavior so that there is an improvement in markers of health. At this time, best practices for the development of a GST have not been established and further research is required. It is important to base a GST on theories of learning or behavioral change such as the theory of planned behavior [
18], the Transtheoretical Model [
19], or self-efficacy theory [
20]. For a virtual GST, while lessons can be learned from a ‘real-life’ GST, the use of technology changes the interaction between the client and the facilitator. Moreover, there are many things that are possible in virtual worlds that are not possible in ‘real life’, which could be exploited to improve the learning experience. For instance, animations, videos, and the ability to revisit specific topics repeatedly may be useful.
Game engines such as Unity or Unreal can be used to create virtual grocery stores. These applications are free to use for many users (however, a license needs to be purchased once a company has a certain financial turnover) and have drastically lowered the barriers to entry for application development. These game engines simplify the process of developing 3D worlds for multiple platforms (e.g., mobile devices, PCs, gaming consoles, or VR headsets). Consequently, people with limited programming or development experience can create applications that can be deployed on mobile devices, PCs, or VR headsets. This means that a wider range of individuals, from different backgrounds, can create digital content, which potentially facilitates innovation. However, the relative ease of creating applications with these game engines enables the creation of poorly designed applications that frustrate the user and potentially limit the development of this approach. While these game engines can be used to create commercial applications (for instance, Fortnite was created with Unreal Engine), these should be developed with software developers and rigorously tested to produce a quality product. However, these game engines are extremely useful to create rapid prototypes or research applications that aim to develop virtual approaches to nutrition education.
In this present study, a generic supermarket was used. An alternative approach is to collaborate with supermarkets to create a virtual store that mirrors the layout and product selection of that store that could be used by store dietitians as a learning tool. There would be significant advantages to this approach. For instance, users could learn the location of specific products that may enable them to learn a route through the store that avoids foods that contribute to poor dietary habits. Moreover, the dietitian could draw attention to price reductions or new products that are being offered. The creation of a replica of a large grocery store would be a substantial undertaking and the cost effectiveness of this would need to be determined. However, the emergence of the ‘metaverse’ may lead to shopping being conducted in 3D virtual environments that replicate real stores rather than the current approach of 2D websites. This may provide new and exciting opportunities to provide nutrition education.
At this time, there are several platforms (mobile devices, PCs, or VR headsets) that could be used to provide a virtual GST. It is not clear that each platform will be equally effective. Moreover, users from different age groups or educational, cultural, or socio-economic backgrounds may interact with virtual worlds differently and prefer different platforms. It should also be considered that if one platform is more effective than the others, but more expensive or less widely available, it may exacerbate current health inequalities.
Cell phones and tablets are widely available and could be used to experience a GST. However, they have limitations, which might limit the experience. Cell phones and tablets have less processing power, which limits the quality of the graphical representation of the grocery store. As the importance of emotion, enjoyment, fun, and memorability of an application may be influenced by the visual appeal [
21], this limitation should not be underestimated. Moreover, cell phones have small screens, which may influence the user experience. Studies indicate that viewing media on a larger screen increases emotional arousal and enjoyment of the media and leads to an overall positive media experience [
22,
23,
24]. In another study, the memory of a 6-second exposure to pictures was significantly different when viewed on screens of different sizes [
25]. However, in a study of advertising, it was found that screen size had no effect on recall or purchase intentions [
26]. Tablets offer larger screens and may be easier to interact with (e.g., the user interface (UI) may be easier to read and to select), but they are also limited by processing power.
PCs offer more processing power and higher graphical quality. Moreover, it is likely that a large proportion of the population has used PCs and will be familiar with their use. However, fewer people have PCs at their home, limiting access to some degree. PCs that are used in the workplace (or owned by a company) may not be used to download software or applications, potentially further limiting access to a virtual GTS.
Immersive VR headsets are currently the least widespread technology, although this may change as newer headsets become available and with the rise of the metaverse. A key advantage to using VR headsets is that they create a sense of immersion and presence [
27,
28,
29]. Whereas experiencing a scene using a phone, tablet, or PC provides limited immersion or presence, VR can provide fully immersive experiences where individuals feel they are present in the scene [
30]. This may have implications for the success of these approaches as data indicate that the sense of presence experienced using immersive VR could aid learning [
31,
32]. However, there are several limitations with VR such as issues with cybersickness and the usability of VR applications [
33]. Cybersickness refers to a constellation of symptoms experienced by some users of VR, such as nausea, disorientation, oculomotor disturbances, and drowsiness. Moreover, VR headsets can be costly and, until recently, required gaming PCs to run the hardware. The release of standalone headsets such as the Oculus Quest 2 has reduced the cost of VR, although the costs may still be prohibitive.
The application should be engaging. User engagement is tightly associated with the user’s enjoyment [
34]. With regards to gaming, approaches have been proposed to increase the likelihood that players enjoy games [
35]. This information may be useful when developing a virtual GST application.
A virtual GST will likely involve the interaction with an avatar. An avatar can be an animated representation of a human (or other character such as a cartoon character, animal, or inanimate object) that repeats a prerecorded script. Alternatively, an avatar may be the representation of another human being (i.e., the avatar is the embodiment of a real-life human in the virtual world who is controlling the speech) and, therefore, is merely a conduit for human–human interaction. While the presence of an avatar is not necessary to communicate information in a virtual world (information can be provided merely by a voiceover), the use of an avatar may have positive effects on the motivation to learn [
36,
37]. Indeed, a recent meta-analysis found that learning with avatars was more effective than learning without these agents [
38]. The characteristics of the avatar should be considered when designing a virtual grocery store tour as it may influence the client’s emotional response or the effectiveness of the program [
39].
One factor to consider when designing avatars is the “uncanny valley”. The uncanny valley is a concept that was introduced to describe observations that as robots become more human-like they become more appealing [
40]. However, there comes a point where the robot tries, but fails, to mimic a realistic human and this causes a sense of unease, weirdness, or causes people to ‘freak out’ [
41]. While considerable research has been conducted, the support for this hypothesis remains inconsistent, and it is likely that, while the uncanny effect does occur, it is not generalizable across different individuals, situations, times, or task [
42]. For example, in an e-commerce study of a non-interactive, talking avatar, it was found that the presence of an avatar increased website trust and patronage intention among male participants in the study but had a negative effect on these outcomes in females [
43]. Consequently, it is likely that different approaches may be necessary to communicate effectively with different groups.
The psychological effects of talking avatars on clients can be ascribed to the theory of Computers are Social Actors (CASA) [
44]. CASA states that humans mindlessly apply the same social heuristics used for human interactions to computer interactions [
45]. That is, while adult humans recognize that computers are not humans, they often behave as if the computer is a real person. Consequently, factors such as the age, gender, race, size, or dress of the avatar may influence the credibility of the information to different audiences [
46,
47]. If the avatar does not reflect the identity of the person providing the information, the user may regard the interaction as deceitful [
48]. It is possible that the avatar characteristics could be used to deceive individuals regarding the credibility of the information [
49], and this could be used to manipulate users towards certain outcomes (e.g., the purchase of specific foods or types of foods). While this could be used to nudge users towards ‘healthy’ food choices, the ethics of deceiving people should be considered. Still, making the avatar seem approachable, for instance, making the avatar smile, may lead to a more positive interaction with the avatar [
50]. Successful programs to change behavior to manage chronic disease rely on a trustful collaboration between the health care provider and the patient [
51]. A recent study found that a conversational agent, which communicated with patients via email, mobile chat app, or text messaging, was accepted by patients [
52].
A user interface (UI) is the space where interactions between the user and the computer take place. A key characteristic of a good UI is that it facilitates effective and intuitive interaction with a computer. It is noteworthy that different groups (e.g., different ages, educational backgrounds) may interact with a UI differently and have different requirements [
53]. Consequently, the UI should be designed with the target audience in mind. There are several methods available to interact with computers. On cell phones and tablets, touch screens provide a useful method for interaction. The user can select an option by pressing the UI on the screen. On PCs, physical input hardware such as keyboards, mice, or joysticks can be used. For immersive VR, there are multiple possibilities. The user can select UI interfaces using hand-held controllers. This can be done by interacting with a button, as a person would do in the physical world (e.g., the user’s hands are modeled in virtual space, and they can press buttons or toggle switches as they would in the physical world). For this approach, the interaction must be robust; if it takes multiple attempts to correctly interact with a virtual element, the user will become frustrated. Another approach is to use a ‘raycast’. In this approach, a laser is emitted from the controller (which is modeled in the virtual world) and can be pointed at UI elements so they can be selected. VR also allows the user to interact with eye movements and gestures. For instance, the user can look at a UI object (typically, a reticule is used to show where the user is looking) and make a hand gesture to select the UI object. It is also possible to use voice interaction. That is, the user speaks commands, which are interpreted by the computer to complete an action.
Movement through the GST should be considered. A GST could be set up so that the user clicks a user interface button in the program and is instantly transported to the next stop in the tour. This is a straightforward approach and requires little thought from the user, although the user’s movement through the store is sequential and predetermined. Alternatively, the users may move freely around the grocery store scene to view information that is relevant to them (by choosing different modules such as the dairy section, produce, or meats) or may be led through the store by a dietitian to create a bespoke tour. If the tour is being provided on a personal computer, then the user can navigate the store using a joystick or keyboard and using a mouse to interact with the software user interface. Moreover, in a previous project, we used speech recognition technology to allow users to interact with the grocery store application using voice commands [
30]. When using tablets or cell phones, users can interact with the application’s user interfaces using a touch screen. The user could navigate through the store using a touch-screen joystick.
Immersive VR provides different challenges when navigating the store due to the possibility of cybersickness. Cybersickness could limit the usefulness of immersive VR as 60–95% of users experience some level of cybersickness, with 6–12.9% of users prematurely ending their exposure due to the severity of the experience [
54,
55,
56,
57,
58]. These symptoms can last for hours or days after exposure to virtual reality [
59,
60,
61]. Cybersickness may appear after only 10–15 min of immersion [
62]. However, recent advances in headset technology have led to significantly reduced cybersickness [
34]. Good VR scene design is essential to avoid cybersickness [
63]. In particular, locomotion methods can lead to cybersickness [
64]. It is thought that navigating a scene in VR can cause cybersickness due to sensory conflict where visual self-motion information conflicts with information provided by the stationary user’s other senses [
65]. For instance, the visual system indicates the user is moving while the vestibular system indicates the user is stationary. For immersive VR, locomotion techniques can broadly be split into two types: physical and artificial locomotion.
Physical locomotion is when movement in the virtual world is controlled by your movement in the real world. That is, when you walk and turn in the real world, you walk and turn in the virtual world. There are considerable strengths to this approach. First, movement is intuitive. The users ‘feel’ like they are walking through the scene. Second, as the participant’s movement matches the movement in the VR, this reduces the sense of cybersickness. However, there are drawbacks. Space is required to allow the user to walk freely without bumping into objects or walls. The users may also feel nervous walking with a headset on, so that they cannot see the real world. There are also design questions. What happens if the users bump into a ‘solid’ object in the virtual world? Do they just pass through this object? This may break the sense of presence or cause discomfort for the user (e.g., the camera is halfway between a solid object and an open space, leading to visual disturbances and disorientation).
Artificial locomotion is when movement through the virtual world does not correspond with movement in the physical world. For instance, the movement through the virtual world may be controlled by a controller (e.g., joypad thumbstick); so that when the users move or turn in VR, they do not move in the physical world. This method of locomotion can lead to feelings of cybersickness [
65]. An alternative is the use of teleportation. In this method, the users point to a position in the store that they want to move to (typically a laser pointer is emitted in the scene) and then press a button to move instantaneously to that point. A possible downside to this approach is that it breaks the sense of immersion and presence felt by the user [
65].
In this present study, the GST followed a predetermined order that can be viewed asynchronously. An advantage of this approach is that, once the tour is created, it can be viewed by a large number of people. However, this approach also has limitations. The users cannot ask questions or seek clarification on aspects of the material they do not understand. Moreover, the tour will be generic and not deal with the nutrition needs of individuals. However, multiple tours could be created to appeal to people of different ages, disease profiles, etc.
An alternative approach is to create a virtual world that can be simultaneously inhabited by the GST facilitator and user (or multiple users). The GST facilitator can lead a tour through the store and respond to users’ questions immediately. The tour can then be individualized to the individuals attending. However, while this approach may widen access to a GST (i.e., a facilitator does not need a physical store to conduct a tour and users do need to attend a physical store), it still requires users and facilitators to agree on timings.
With a synchronous GST there may also be other limitations. Providing a virtual GST on a PC is currently the optimum method for this approach. Multiplayer games, such as Fortnite, are played for several hours with minimal breaks. However, while VR headsets would provide an appropriate medium for a virtual GST, there are limitations on the amount of time they can be worn without causing fatigue, cybersickness, or general discomfort. It is unlikely that a GST facilitator could spend significant periods of time each day, even if this is spread over multiple sessions, in immersive VR without discomfort. Consequently, the widespread use of VR headsets to provide a synchronous GST is likely limited until advances in hardware (e.g., smaller and lighter headsets) are made. Even then, the wisdom of spending significant time in immersive VR each day is debatable and further research is required to understand how long-term use of VR influences a user’s health [
66].
Some supermarket chains in the USA are providing virtual nutrition services. The approach described in this manuscript could potentially provide new approaches to virtual nutrition education. For instance, as the store is computer generated, other teaching methods could be introduced such as serious games (a game designed with the intent of educating consumers rather than entertainment). In addition, ‘what-if’ scenarios can be created to see how the consumer responds so that the dietitian can then discuss the consumers’ choices with them. The use of virtual worlds provides numerous opportunities for education, and further research is required to develop this approach.
In conclusion, it is likely that there will be an increased use of virtual worlds. The potential to use virtual worlds to provide nutrition education requires further study so that effective approaches can be developed.