1. Introduction
Mixed Reality (MR) is closely associated, if not synonymous, with digital applications in which visual representations are based on the overlaying of virtual objects on the real world [
1]. As such, MR is considered a highly demanding experience in terms of the size of the content involved, the processing capacity of the equipment involved as well as telecommunication media and relevant infrastructure capabilities required. For example, the digital representation of an area along with virtual reconstructions and sophisticated visualizations involves heavy-sized multimedia content. For this content to be transmitted from the content provider to the content consumer and be executed, there is a need for high-speed networks and devices with high-process capabilities. However, several technological advancements have emerged and are being applied, promising to satisfy the above demands. Focusing on these, first and foremost one should definitely mention the boosting of wireless data transmission with the fifth-generation technology standard, 5G, which introduces data transmission capabilities with speeds up to 1 Gbps [
2]. Practically this means that a rich Digital Surface/Terrain Model (DSM/DTM) will potentially be downloaded on a mobile device in a few seconds. Secondly, but equally ranked, is the ever-increasing processing capacity, combined with the continuous cost reduction of affordable mobile smart devices [
3]. Last but not least, the penetration of cloud services into organizations [
4] is a quite convenient setting, releasing them from the obligation of developing and maintaining demanding hardware and software infrastructure for their services to be provided. Finally, one can imagine how easy it has become today for an authority or a content provider to share high-resolution digital content without the need for telecommunication infrastructure on-site and/or special equipment for visitors, just a mobile phone with normal capabilities.
All this technological progress unlocks potential synergies of MR and other disciplines involved in digital content development, such as geoinformatics, which, among others, provides a set of tools and technologies for digitizing real-world objects including surfaces, borderlines, and areas, as well as spatial entities of the natural and anthropogenic environment such as trees, buildings, etc. Although these two disciplines have no previous and clearly defined established synergies as such, we can mention some critical paradigms that prove how strongly they are connected. Firstly, the DSM/DTM, as well as the digital aspect of the spatial entities of the physical environment in an MR experience, can be developed via topographic surveying and mapping methods such as LiDAR and Photogrammetry. In addition, the spatial reference of a Virtual/Augmented Reality (VR/AR) world may be achieved by Georeference procedures [
5], while many other functionalities related to Geographical Information Systems (GIS), are applicable in the digital content, such as the association of spatial features with descriptive attributes. Furthermore, by employing Javascript geospatial libraries it is possible to develop virtual Geospatial Worlds. A Geospatial World practically includes the development of a 3D terrain upon which sophisticated visualizations and animation and motion effects on spatial objects take place [
6]. As said above, a heavy-sized digital content can be easily transmitted by a provider and received on a smart device; therefore, a Geospatial World containing DTM/DSM and other georeferenced 3D models can exist on an end-user device and can be overlaid with the real world captured by the camera.
In the present study, we make an attempt to reconsider the concept of MR focusing on the interaction between virtual and real objects, stressing that this is the key issue that differentiates MR from VR and AR. We also highlight the contribution of geospatial technologies for such interactions to be achieved since they not only provide contemporary tools for the deployment of the DSM/DTM of an area but most importantly they implement the spatial reference of the involved 3D models, thereby enabling their exact positioning. Our approach adopts a Mixed Object and terms of the mixed interaction model introduced by Coutrix and Nigey [
7], which focus on the linkage between physical (real) and digital (virtual) worlds. Practically, a Mixed Object is a real object processed appropriately via geospatial technologies so that its 3D digital form is created. That way it is possible for it to interact with other real or virtual objects, or to be partially or totally occluded as well as to interact with end-users immersed in an MR experience. The generated process model seems to ideally fit Microsoft’s approach for MR [
8]. Having combined the above and further enhanced them with what we call Geospatial Linking Modalities, we finally reconsider the MR concept, and we demonstrate an MR experience. The innovation of this study relates to the inclusion of geoinformatics and, most importantly, of Geospatial Linking Modalities, thereby enabling the employment of Mixed Objects and thus functionalities such as occlusion between real and virtual objects.
2. What Is MR?
MR is typically linked to applications in which virtual objects are projected on an actual-world background [
9], thus equating MR to Augmented Reality. Coutrix and Nigey [
7] point out that: “Historically, mixed reality systems have been dominated by superimposing visual information on the physical environment”. According to the classical definition of Milgram and Kishino [
9], MR is the in between area bridging reality and virtual representations, encompassing augmented reality and augmented virtuality alike. Nevertheless, the more recent formulations tend towards the conceptualization of MR as one in which real and virtual objects and space alike, are in fact interrelated and interact in several ways [
10].
As Schnabel explains [
11] the given perception of MR needs to be more inclusive and expansive as new technological developments allow for a more holistic conceptualization of MR. It is quite characteristic of the plural manifestations that MR acquires that Schnabel refers to Mixed Realities rather than using the term in its singular form. He reiterates the hitherto standard definition of MR only to open a discussion for the new formulations that lie ahead. He explains that “according to Milgram and Colquhoun (1999) the realms, Augmented Reality (AR) and Augmented Virtuality (AV) are the two major subsets lying within the MR range of the Reality-Virtuality (RV) Continuum”. Nevertheless, as Schnabel contends “with today’s possibilities to influence the RV Continuum, a simple classification such as that presented by Milgram and Colquhoun (1999) is no longer sufficient”. More specifically, he claims that in view of technological advancements and increasingly diversified uses of MR (
Figure 1) it is “necessary to incorporate finer subdivisions of the various MR applications and to enlarge the scale whilst differentiating between them”. He offers as an example, a classification of MR technologies [
12,
13]:
Mediated reality is at the center of the spectrum and thus at the heart of this categorization of MR concepts. It refers to novel applications in which the user sees visual information selectively added or removed in dynamic ways, altering, for example, the background, or presenting a nonexistent building in a “real scene”, thus incorporating data relevant to both physical and virtual worlds. Such applications are of particular value for architects or researchers reconstructing historical sites or monuments. Other concepts that are equally innovative e.g., that of Virtualized Reality that describes MR application, are in fact based on recordings of actual environments that by means of VR-related technology allow the users to see the scenes rendered from various angles—a concept of multiple viewpoints that can foster other MR applications as well. Nevertheless, within the scope of this paper emphasis is given to conceptualizations that lie closer to the core of MR as a fusion of the Real and the Virtual. In this vein, of particular interest is also the term Mixed Environment, which illustrates and underlines the proliferation of technological approaches that combine real and virtual objects and spaces in meaningful ways as they interrelate on many levels and through different modalities: “The intersection of real and virtual environments is defined as a Mixed Environment (ME), within which physical and digital elements coexist, and interact and intermingle in a more expansive form” [
11].
This adds more complexity and depth to the established model according to which MR is merely about forming an environment in which real-world and virtual world objects are presented together on a single display [
14]. More specifically, new conceptualizations of the term have emerged in which other modalities of digital/physical interfaces are addressed, beyond the strictly visual aspects, involving other parameters and input/output elements: Coutrix and Nigey [
7], present a more inclusive approach to MR as an “interaction paradigm”. In their words, “Mixed Reality is an interaction paradigm that seeks to smoothly link the physical and data processing (digital) environments. Although Mixed Reality systems are becoming more prevalent, we still do not have a clear understanding of this interaction paradigm”.
What Coutrix and Nigey suggest is a paradigm shift in the perception of MR that moves beyond the visual and the representational, into a truly MR where “real” and “virtual” objects acquire mixed properties, that is to say, they become sensitive to e.g., physical input while translating such input through a symbolic system into digitized, virtual output, thereby blurring the boundary between the virtual and the real (objects). They explain that “the design and realization of the fusion of the physical and data processing environments … may also rely on the use of other interaction modalities than the visual ones” [
7]. At his point, they effectively introduce the concept of the mixed object, one that “behaves” not as a physical entity and a virtual object as, by design, it has incorporated elements of both the physical and the virtual world. They contend that “the design of mixed reality systems gives rise to new challenges due to the novel roles that physical objects can play in an interactive system; … interacting within such mixed environments composed of physical, mixed and digital objects, involves novel interaction modalities and forms of multimodalities that require new interaction models” [
7]. One such set of novel interaction modalities this paper puts forward are geospatial, taking advantage of the advent of pertinent technologies.
So even though as late as 1999 authors such as Billinghurst and Kato do describe MR as the “overlaying of virtual objects on the real world” [
1] they do so within the context of describing a collaborative project (a webspace) in the article’s abstract to frame a specific application rather than provide a theoretical elaboration on the matter of MR definition. This case is indicative of the confusion that often confounds the very term MR, given the prevalence of descriptions of what in fact is augmented reality (AR) that encroach into MR by falsely conflating the two terms, rendering them interchangeable.
This definitional mishap adds to the ambient delimitation of MR as a matter of appearances, framing the term as one that describes superimposition of virtual artifacts unto “real” backdrops or at best the showcasing of effects such as occlusion and lighting/shading in which virtual objects may be covered by real objects and vice versa and/or can “sense” actual light sources and conditions via specifically designed sensors, e.g., an HDR light probe that senses the ambient light and through this information allows the altering of the shading of the virtual narrator embedded in the MR experience, which feed in real-time such data into imaging devices (as is the case with the relevant example of the Asinou church application in Cyprus) [
15].
Coutrix and Nigey [
7] draw heavily on the work of Ishii and Ullmer [
16], which offers a hinging point for the formulation of their schematic description of the new interaction model for the mixed reality systems that they present. Ishii and Ullmer [
16] have created an interactive application which can be described as a concatenation of physical input on actual objects that double as devices, that in turn functions (or, for the same matter, is translated) as a command eliciting a respective process (through digital means) that results in visual output rendered via an imaging device such as a screen. This blending of physical/digital properties regarding actual objects that resemble e.g., prisms/magnifying glasses (as in the case of Ishii and Ullmer’s research), sets a precedent and inspires the analysis of “mixed objects” within MR systems of interaction that may well extend beyond the remit of the visual by means of including a plethora of data input/output regarding e.g., climatic factors such as temperature/airspeed, GIS-related data, e.g., geolocation, or aural/other sensorial elements, parameters, and factors. Therefore, an MR paradigm is heralded that relates to multifaceted and complex functionalities and “sensory” interaction modalities including, most notably, geospatial ones. In this sense, even the boundaries between input devices, sensors, virtual/actual objects and, quite importantly, physical and virtual space are blurred, leading to MR proper that could be termed as fused reality. The term is used in this paper in order to illustrate the point of new-generation MR applications in which the data from diverse input/sensor devices fuse actual and virtual realities. The term has been coined and already used by Ed Bachelder [
17], the inventor of a cutting-edge immersive MR system bearing the same name, used by NASA. The phrase in the context of this paper simply aims at conveying the unlimited potentialities for data-fusing MR. This serendipitous coincidence of terminology is, in turn, indicative of MR’s global momentum to a “quantum leap”. The very categorization between actual and virtual objects, given that mixed objects embody properties of both, render the distinction between the virtual/digital and the physical all but irrelevant. As noted, the same applies to the spatial environment were such (mixed) objects are included which, under a set of new and multiple modalities afforded by the inclusion of diverse input/output provisions, e.g., through geospatial processes, as well as new interaction models, diffuse the boundaries between physical and virtual space.
In this paper, therefore, we present the parameters involved as well as schematic models to provide a detailed framework of a holistic MR that includes geoinformatics techniques (e.g., photogrammetry, GIS), thereby incorporating Geospatial Linking Modalities. Environmental data input into computers in accordance with Microsoft’s approach to MR enriches and enhances the experience of virtual and physical spaces and (mixed) objects alike, forming a fused reality in the users’ perception. Geospatial technologies are tentatively incorporated in cutting-edge MR immersive systems in instances such as flight simulation. This paper, nevertheless, investigates the potential for the inclusion of geospatial data in an expanded paradigm relating to MR that will benefit culture. MR is increasingly used in cultural tourism and virtual museums [
18,
19,
20,
21,
22,
23,
24,
25,
26] thus setting the scene for a new step including the use of geoinformatics under an innovative, multimodal paradigm underpinned by the inclusion of varied data input and enriched sensuous experience. The use of accurate and realistically rendered visual data acquired by means of photogrammetry along with other geospatial data, enable the presented application to generate immersive and meaningful experiences that bring to life cultural heritage sites.
3. MR: A Reconsideration Based on Mixed Objects and Geospatial Processes
MR was initially introduced by Milgram and Kishino [
9] and Milgram et al. [
14], in an attempt to classify display equipment visualizing mixed environments i.e., environments containing both real and virtual objects. The aim of their work was not to analyze the term MR, nor to provide exact specifications or scientific documentation on it. In contrast, they widened the term so much, considering it as the space between reality and virtuality, naming it as the well-known until today “RV Continuum” shown in
Figure 2.
Since then, MR and the RV-continuum have faced broad recognition in the research community; however, there are still misunderstandings and even confusion with respect to the exact differentiations between MR and AR or VR [
27].
Microsoft [
8] extended the application of MR beyond displays to include environmental input to computers and form the so-called perception. Perception, according to Microsoft, encompasses critical environmental features of conventional reality that may be computerized via sensors and input devices. Such features are the position of a person, the borderlines of real objects and surfaces (object recognition) captured in real-time. These are then blended with virtual objects and the real world to enhance human–computer interaction (HCI) in an MR experience as a result of the interactions between computers, humans, and environments, as shown in
Figure 3 [
8].
From the abovementioned, the authors pay special attention and structure their efforts and contribution around the MR ecosystem, to geospatial technologies able to implement perception of critical environmental features and subsequently enable human–computer interaction. Specifically, we realize the importance of digitally recognizing the real objects of a scene, their position in three-dimensions as well as their coexistence and interaction with other real or virtual objects. It is likewise important is to digitally recognize the terrain by constructing the DSM/DTM upon which interactions will take place. The principal component to achieve MR, according to our proposal presented herein, is the Mixed Object.
The Mixed Object was initially introduced in 2006 by Coutrix and Nigey in their proposed model of mixed interaction [
8], which focuses on the linkage between the physical (real) and digital (virtual) worlds. As a result, a Mixed Object is composed of a set of physical properties linked with a set of digital properties. For the digital properties to be generated, input and output linking modalities are required. The term linking modality corresponds to an interaction technique, which was defined in Bernsen’s modality theory [
28] and was applied by the aforementioned in multifeature systems such as multimodal and multimedia user interfaces [
29]. A linking modality m is a pair (
d, l) and is expressed as the coupling of (a) a physical device
d that acquires or delivers information, with (b) an interaction language
l that defines a set of well-formed expressions that convey meaning.
Figure 4 illustrates the creation of a Mixed Object as the result of the linkage between a set of physical properties and a set of digital properties. An object input device (
) acquires a subset of physical properties and an object input language (
) interprets them in terms of digital properties. Based on the latter, the object output language (
) generates physical data, which are translated into perceivable physical properties thanks to the object output device (
) [
7].
A faithful application of Coutrix and Nigay’s considerations, employing contemporary geospatial technologies, is presented in
Figure 5. We identify two input linking modalities that lead to the development of a digital object, herein defined as the Mapped Object. A Mapped Object is the digital form of a Real Object. Ideally, it is a 3D model produced with high resolution capturing equipment (object input device), such as Unmanned Aerial Vehicles (UAV) properly equipped for a high-precision mapping process, in conjunction with advanced surveying techniques such as photogrammetry or 3D scanning (object input language). In addition, a Mapped Object is spatially referenced (object input language) according to a coordinate reference system (georeferenced) and therefore it is identical to the existing real, in terms of location, orientation but also shape and texture.
After being created, a Mapped Object may be further processed and rendered to the end-user visual display (object output device) through geovisualization frameworks (object output language) such as WebGL. As a result, the Real Object is still perceived by the end-user as a Real Object; however, it has technically been transformed into a Mixed Object. In fact, the Real Object is “merged” with an identical invisible digital object created by applying typical image processing techniques (blackening and additive blending). The physical properties of the Real Object are now connected with digital properties, therefore forming the Mixed Object.
We are now ready to form our proposal by putting together Microsoft’s approach to MR from the one side (
Figure 3) and Coutrix and Nigay’s approach to the Mixed Object as applied with our proposed Geospatial Linking Modalities on the other side (
Figure 5). The result is shown in
Figure 6 and the first impression is that there is a clear and explicit correspondence between the two approaches. One can tell that the Mixed Object approach is verified by Microsoft’s MR representation.
We have enhanced the whole schema by introducing Geospatial Input and Output Linking Modalities as two major groups of technologies capable of digitizing environmental input on the one side and enabling human–computer interaction on the other side. Object input devices are employed to capture environmental input. As such, we consider input that is related to real objects and is captured via high-resolution cameras; and also input related to the position of the end-user experiencing MR, which is captured via geospatial positioning technologies. The object input language may include 3D mapping methods, interpreting environmental input data, to provide DSM/DTM or 3D models of real objects; also, methods providing RGB-D data for object recognition. Object output languages such as geovisualization frameworks are employed to process acquired 3D mapped areas and 3D digital objects so that they may be visualized and rendered on object output devices such as simple visual displays.
Based on the terms used in the generated diagram, we specify all involved types of Objects and the resulting Mixed Object as well as the contribution of Geospatial Linking Modalities on the above:
A Real Object containing physical properties is sourced from conventional reality, which is the common ground between humans and computers.
A Mapped Object is a 3D model of an acquired Real Object possessing shape, geo-location, texture, orientation and attributes, and is rendered in conjunction with the end-user positioning data and behavior (geolocation, direction of view, rotation and velocity). It is the result of the appliance of fundamental 3D mapping processes, such as photogrammetry, over Real Objects and represents the perception of the environment by computers.
A Rendered-Transparent Object is a Mapped Object participating in a 3D Scene by overlaying the Real one; however, being invisible. As a result, a Rendered-Transparent Object acts as a 3D transparent mask of a Mapped Object and is expected to be blended with its Real Object during human–-computer interaction. This is achieved by employing geovisualization frameworks and applying traditional blending techniques.
A Mixed Object is the result of “merging” a Real Object and a Rendered-Transparent Object and the essential component that represents the common space between humans, the environment and computers. It enables interactivity between Real and Virtual Objects and the end-users in the field, thus implementing MR experience.
Geospatial Input Linking Modalities contain any input device and any input geospatial technology employed to transform the physical environment (convention reality) to be perceptible by computers.
Geospatial Output Linking Modalities contain any output geovisualization and image processing technique employed to enable human–computer interaction and any output device to provide the MR experience to the end-user.
4. MR Demonstration: Mixed Objects and Geospatial Modalities
We demonstrate every single component of our approach and implement an MR experience by employing Geospatial Modalities. The scenario includes an area of Northern Greece, Thessaloniki, around the White Tower, a famous monument. A visitor scans a visual tag placed in a specific position in front of the tower with the camera of his smartphone and experiences MR: a virtual helicopter and virtual men are moving around the monument, demonstrating occlusion with the real objects of the environment.
Figure 7 demonstrates the occlusion of the virtual helicopter from the tower, for which a mixed object coexists. A YouTube video of the whole demonstration is available at:
https://youtu.be/DyBLPymyEXI (accessed on 28 January 2021).
The description in the next paragraphs follows the discrete components of
Figure 6, beginning from the conventional reality, clockwise, to reach MR. The components of the diagram are grouped in two major modality categories, the Geospatial Input and the Geospatial Output Linking Modalities
4.1. Geospatial Input Linking Modalities
Geospatial Input Linking Modalities contain any input device and any input geospatial technology employed to transform the physical environment (convention reality) so that it is perceived by computers. At every step of this procedure, the acquired physical data are presented. To eliminate cost, the minimum resources are used:
A simple smartphone is used to capture the physical environment on a real-time basis.
A printed visual tag with a QR code is used to position the end-user on the field.
Free, photogrammetrically mapped areas captured by UAVs are used to demonstrate Mapped Objects.
4.1.1. Object Input Devices
Smartphone Camera. The simplest input device for real-time capturing of the real-world scene during an observer’s existence on-site experiencing MR is a smartphone of normal capabilities: a Xiaomi Redmi Note 7 with an Octa-core CPU (4 × 2.2 GHz Kryo 260 Gold and 4 × 1.8 GHz Kryo 260 Silver) an Adreno 512 GPU, and 3GB RAM. The data acquired is the real-time capturing of the real world in 2D by the camera sensor.
Visual Tag with QR Code. Typical smartphone devices have an Inertial Navigation System (INS) with gyroscope, accelerometer, and magnetometer [
30] to detect changes in rotation, velocity, and azimuth (direction of view) respectively. A major issue is the smartphone initialization in terms of position and azimuth. Global Positioning Systems (GPS) through GNSS antenna receivers may result in poor accuracy that is increased with bad weather conditions and their use is limited to outdoor environments. Further, accuracy in azimuth may vary on different devices. To handle the peculiarities of any potential mixed environment (indoor, outdoor, or lack of sensor calibration) we have selected a marker-based positioning and azimuth system using QR Codes, as shown in
Figure 8. These QR Codes contain stringified JSON data that describe the perceived digital world’s ID, the observation point ID, the position in x, y, z coordinates, and the azimuth.
UAVs properly equipped for capturing the area. UAVs have become the proper technology for capturing an area and acquiring high-resolution images to be used for creating the 3D mapping of the area via 3D mapping methods such as photogrammetry. For the purposes of the demonstration, we used free 3D photogrammetrically mapped models of the area as shown below.
4.1.2. Object Input Languages
4.2. Geospatial Output Linking Modalities
Geospatial Output Linking Modalities contain any output geospatial and image processing technique employed to enable human–computer interaction and any output device to provide the MR experience to the end-user. The modalities used for demonstration purposes include:
The open Javascript geovisualization framework of Three.js.
Traditional image processing techniques.
The smartphone used for capturing the real world is used to display the MR environment.
4.2.1. Object Output Languages
Geovisualization Frameworks (Three.js). Geovisualization frameworks reside on top of 3D Graphics APIs and undertake the task of creating 3D digital worlds (scenes) with spatially referenced objects [
6]. The Three.js Javascript library built on top of WebGL is employed for rendering the 3D Scene that hosts digital objects, mixed or virtual ones.
Image processing (additive blending, blackening). As already analyzed, a Rendered-Transparent Object acts as a 3D transparent mask of a Mapped Object. This is achieved by blackening the Mapped Object and applying the linear dodge blend mode [
31], also known as Additive Blending. These simple techniques implement Mixed Object and achieve occlusion as shown in
Figure 10, where the real building partially covers the virtual helicopter.
4.2.2. Object Output Devices
Smartphone camera. The smartphone employed to capture the real-world environment is now employed to provide the MR experience via its visual display.
4.3. Experiencing MR
All data transformed from the physical environment in order to be perceived by computers and then appropriately processed to be provided to humans have been concentrated in a 3D Scene which is rendered in the observer’s visual display. The result is then blended with the real environment captured by the smartphone camera in the field to form an MR experience.
Figure 11 provides the stages of MR implementation and its final demonstration.
4.4. Contribution and Innovative Features
The innovation of the study presented in this article relates to the inclusion of geoinformatics and most importantly Geospatial Linking Modalities, which in turn enable the rendering of features of the environment, such as historical buildings, as Mixed Objects. This enhances the MR experience significantly by allowing new interaction affordances of real and virtual objects (such as mutual occlusion of moving objects in real-time). The use of geoinformatics allows the visualization of e.g., early construction phases of a heritage building [
6] as the 3D modelling of the existing edifice is merged with that of its appearance in the past. Such projects may already, exist such as Okura et al. [
32] in CH, but the inclusion of the Mixed Object principle is not developed in these existing instances, delimiting the scope of the MR application that does not allow e.g., Cultural Heritage (CH) buildings (as mixed objects) to interact with virtual objects. The integration of mixed object capabilities is an innovative feature that does not exist hitherto in CH-related research projects such as those carefully categorized in the recent and extensive survey of Bekele et al. [
33]. Likewise, in [
34] the articles presented within Part 4 addressing MR efforts in CH that closely relate to geospatial data (under the title “Geospatial”) often incorporate innovative approaches such as e.g., the concept of 5D modelling [
35] that enables visualization of consequent construction phases of a historical church. Nevertheless, they do not incorporate the geospatial modalities present in this study that allow for the full exploitation of the mixed object concept. In instances such as in Reitmayr and Schmalstieg [
36], although static virtual objects (pathway 3D/2D markers) interact with the built environment in CH sites, the modelled buildings are not accurately mapped, nor can previous states (e.g., initial monument condition) be overlaid. Moreover, the application cannot accommodate dynamic/moving virtual objects, such as virtual crowds interacting with buildings rendered as mixed objects (i.e., by means of occlusion). In the same vein, all the instances of pertinent research that are referred to within this article do not integrate a comprehensive Geospatial Linking Modalities framework that allows real-time interactions between environmental features rendered as mixed objects, virtual and actual objects, and actors.
This article draws heavily from previous research such as that conducted by Coutrix and Nigey [
7] and previous work in the field that provides a framework for the key term Mixed Object and pertinent linking modalities. However, these instances of practice that provide the basis for this research did not ingrate geoinformatics in an MR model in a way that combines both multilayered and accurate visualization and real-time interaction between actual, mixed, and virtual objects (perhaps due to the technological constraints). This integrated and comprehensive use of Geospatial Linking Modalities and mixed objects proper in an MR ecosystem described in scheme 6 is the innovation this article presents. Last but not least, since the 2010s, augmented reality with the use of geospatial technologies for e.g., 3D building features’ tracking [
37,
38] has proliferated in sectors such as construction and environmental tracking [
39], but Geospatial Linking Modalities in these applications are not orientated towards providing an MR experience of the past states, or the full and real-time interaction of mapped/mixed objects fusing real and virtual worlds to the extent needed to bring the past into life through activated spectatorship and visitor engagement through a user-friendly interface that is addressed not to experts/professionals but to people who wish to savor the experience of merging a site’s past spatial and temporal frameworks under a new MR model.
Occlusions or interpositions between real and virtual objects constitute one of the challenges of MR [
40]. Many solutions have been proposed for real-time occlusion including stereo vision-based techniques [
41], the 3D reconstruction of real objects using depth estimation techniques [
42] and visibility-based blending methods and semantic segmentation [
43]. Fukiage et al. [
44] have proposed a blending algorithm by examining the behavior of human transparency perception, which can result in achieving occlusion effects by applying a foreground mask. However, the foreground mask has to be obtained by using other techniques, such as a depth map, to create a probability map used by the blending mode. In [
44] the authors focused on realizing the occlusion between real and virtual objects when perceived through an HDM by using depth maps and mask patterns. The common objective of the aforementioned is spatial mapping and their success depends on accuracy and performance. In [
45], a study that demonstrates the ability to perform in situ simulations of dynamic spatial phenomena using geospatial mobile augmented reality, the authors mention that “a clear, cost-effective resolution for MR occlusion in such contexts remains elusive”.
In our development, we took for granted that in the short-term, the geometry of most landscapes and cityscapes will not be significantly altered and can therefore be obtained in advance. Our design provides a solution for interposition and perceptual cogency for mixed realities, by merging the real environment, recreated and reified by geospatial techniques (photogrammetry, DEMs), with the virtual one. This is not a an approach commly used by involved experts and can provide a solution to (a) the interposition problem (by applying an appropriate blending mode) and (b) improving performance (computational and power resource issues, such as 3D mapping, are not taking place in real-time). Thus, its minimum equipment specifications include a modern-capabilities smartphone. However, with the appropriate hardware, a transition from static to real-time/dynamic 3D mapping could take place on a real time basis, by using relevant 3D geospatial techniques, such as LIDAR.
5. Discussion
This paper presents both an argument for a more expansive, inclusive, and holistic reconceptualization of the term MR, and a tangible example as an instance of its practical realization. Thus, this paper contributes to three main fashions: an explanation is provided of the concept of Mixed Object in theoretical as well as practical terms. Furthermore, Geospatial Linking Modalities are introduced as a means to enrich the potential of a more comprehensive and meaningful MR experience. Geospatial data input is crucial, and provisions (such as QR code tags) are incorporated. Moreover, in this paper, a new approach with respect to the very term MR is delineated with an emphasis on the potential of mixed objects and enhanced input/output provisions and linking modalities.
MR has been foregrounded as effectively a tripartite affair, in accordance with Microsoft’s approach as it involves human–computer–environment interaction, physical data acquired and translated to generate digital properties via geospatial linking modalities. In this model, physical objects in the environment become perceived not only by humans but as mapped objects by computers, in a close approximation of their spatial/physical properties, allowing for interrelation with virtual, computer-generated objects mainly in the forms of interaction such as occlusion. In a sense, this model presented here is based on the ability of all three “input” and “output” factors engaged (namely, users, digital devices, and the actual world) to effectively exchange data and all the more, to become aligned in the sense that computers can perceive the physical environment (e.g., buildings) in a way akin to that of humans, thereby “sharing views” and fusing them under an MR paradigm proper.
A pivotal contribution of this paper is the schematic description of the interrelations of the factors involved in an MR ecosystem based on the concept of the Mixed Object and conditioned by the use of geoinformation technologies, as illustrated in
Figure 6. It delineates the input and output modalities involved, which in turn rely on certain devices, technologies, and languages with which data become translated or perceived within processes that regulate human–computer interfacing in general. Reading this scheme clockwise, from the top, Object input devices such as (handheld) device cameras, or GPS devices, capture physical data. Moreover, contemporary 3D mapping equipment, such as UAVs with high-resolution cameras and photogrammetrical software, deploy high-quality DSM/DTM information. The physical data are then translated into a form that the computer can “understand” and further process. Making raw data perceptible by a computer (thus enabling computers to make meaning from physical data) takes place through a variety of conversion processes that the scheme refers to under the description “object input language”. It is possible to use 3D mapping methods, for example, to turn physical objects such as e.g., buildings, into digital entities that the MR system can process, understand, and interrelate with purely virtual objects. In the heart of the space where the computer and environment overlap in the scheme, is the notion of perception (of the environment by the machine) as well as the Mapped Object, which is a crucial factor for achieving this perceptibility (of the digital device). This perception then enables and triggers the output linking modalities in an opposite way, starting with the Object output language that will translate the digital processes of visualization through e.g., 3D graphics procedures, blend modes and the use of game engines, the way the computer perceives the actual world, into (mostly) imagery.
The computer-generated imagery (as well as other sensuous and spatial data) is described as generated physical data that will take the form of perceptible (and meaningful) visual/sensuous data for end-users. The generated physical data, in fact, are blended as physical, virtual, and mixed objects (“mixed” being mapped real objects that can now “interact” with virtual objects) appear on displays, interrelated in an MR experience. Visual display units are in fact the “object output devices” on which human–computer interaction is premised. Therefore, in this MR ecosystem, the physical data captured by the devices in the first step are translated into information that PCs understand and then render into a (mostly) visual output, generating an MR experience based on the real-time perception of the environment, interconnecting the actual and the virtual in fascinating ways.
The concept of the Mixed Object becomes central in this approach given that physical objects acquire a virtual “doppelganger” or a double, in the form of a “blackened”, invisible twin object that is linked to them. This allows purely digital objects, even moving ones, to interact with physical objects, that now, given their coinciding with digital twins, become Mixed Objects, in effect having both actual and virtual presence as well as being almost identically perceived by humans and machines alike. In addition, another important factor involved is that of the field end-users’ position and angle of view, being perceived by (and relevant data fed to) the machine and subsequently reflected on the displayed mixed world. Geospatial modalities allow for the real-time merging of humans’ perceived reality with that of machines, as well as the rendering of mixed objects interrelations in space and time as they change (relevant) positions amongst themselves and in connection with the user(s). In a nutshell, the common space between humans, computers, and the environment is a space of common sensibilities (sensory abilities and perceptions), real-time communication/gathering of data and most importantly merging of data, sensory and geospatial information in a shared MR, where properties, perceptions (by humans and PCs) and input become linked, and to an extent, fused.
MR is, therefore, a space that interrelates or even merges the appearances of the actual and digital worlds, (e.g., through the use of visual data acquired by Photogrammetry that allows for the creation of mapped objects), perceptibilities (those of humans and machines) and finally, the location and even movement of the actual and virtual objects or humans. MR applications that involve mixed objects and geoinformatics allow users to savor immensely enhanced immersive experiences. They can significantly foster human understanding about the surrounding physical or built environment by introducing new tools, functionalities, and modalities. They can allow users to revisit their relationship with the world that surrounds them and gain insights on e.g., environmental, cultural, and historical or scientific issues. The advent of the new forms of enhanced mixed realities, which allow merging reality and virtuality, serve real and pressing needs: they herald a new paradigm of a tripartite connection be-tween people, technology, and the environment.
6. What’s Next? Prospects and Limitations of This Study
With the conclusion of this study, the prospects, as well as the limitations, become more evident. Having established a basic framework to implement an MR experience, the next step is to focus on and resolve the issues related to misalignments between real (mixed) and virtual objects during syncing of the two worlds. Misalignments may be due to configuration errors and lack of accuracy of the device sensors, but they may also be due to the use of low-quality imagery data during the 3D mapping process. An MR system could also take advantage of the 3D Tiles specification for tilesets, an Open Geospatial Consortium standard [
46] for streaming and rendering massive 3D geospatial content, served asynchronously, in tiles and different levels of detail, depending on the position of the end-user in the field. Today’s MR experiences are based on near real-time 3D scanning methods when the device used for immersion supports such technologies (LiDAR, Kinect). For the mixed object approach presented in the present study, the prerequisites are the 3D mapping of an area and real objects for interaction with other virtual objects to take place. The combination of the prescanned 3D mapping data of an area with real-time data is expected to contribute to the optimization of immersing MR experiences with the appropriate smart equipment (e.g., Microsoft HoloLens).
The inclusion of interactivity provisions, such as gesture or voice commands, could enhance the information-related interactivity interfaces e.g., multimodal pop-up widgets or narratives (especially as virtual narrators [
47] and digital storytelling become increasingly embedded in MR [
48]). The existing linking modalities as described in this paper lay the foundations for a fully interactive experience, which will be more content-rich in terms of information and knowledge provided as well as more appealing with respect to the modalities employed in delivering this content to the users who navigate e.g., a cityscape or a historical site. In other words, the modalities built into the system, which could support a seamless, adaptable, and customized framework that provides information about features of the environment, are something that have not been adequately developed at this stage. This would be invaluable in information-rich spaces, such as heritage sites. The next step for this ongoing research is to move from the innovative linking modalities presented in the MR ecosystem delineated above, to an application that enables its users to enjoy true interactivity modalities that provide not only impressive but also meaningful, insightful, and innovative relation with an environment. This study presents an MR application and its underpinning principles, as well as the conceptual model it introduces, based on more comprehensive linking modalities through geoinformatics, contributing to a more holistic conceptualization of what MR is, and what it can achieve.