1. Introduction
For almost one and a half millennia, the Byzantine Empire (4th-mid. 15th c. AD) stood as a prime cultural center [
1]. Traces of the empire’s culture survive in an extensive geographical area spanning from the Mediterranean, Middle East, North Africa, and the Balkans to the end of the Russian Empire due to historical ties with the Byzantine Court [
2]. The most common monument category is churches, with some of them still functioning and some of them transformed into museums [
3], the most famous being the Hagia Sophia [
4]. Taking as an example the Hagia Sofia, as early as 2002, there were proposals for the use of immersive technologies like virtual worlds [
5] and this topic remains a high priority for scholars until today e.g., [
6,
7,
8]. Numerous other such monuments (mainly churches) attract visitors interested in this particular era who admire their architecture and the particular art that is a precursor to the Renaissance, e.g., [
9,
10]. One of the most influential works in Post-Byzantine painting is the treatise of hieromonk Dionysios of Fourna, referred to as
Hermeneia [
11], which is actually a painter’s manual written between 1729 and 1732 [
12]. In his work, Dionysios gathers all previous instructions, pieces of advice, and information about the Byzantine painting, and since then it has become a basic reference point for painting various religious
themes in the Post-Byzantine world that connects the Byzantine art to modern times. Nowadays, an increased research volume with respect to the instructions of
Hermeneia is taking place due to the interest in Byzantine culture (e.g., [
13]). The focus here is primarily on churches (i.e., Byzantine monuments that have sometimes been transformed into museums) influenced by Byzantine culture that may belong either to the Byzantine era (4th-mid. 15th c.), post-Byzantine era (late. 15th–18th c.) or modern era (19th c.-today). For simplicity, hereafter by the term “Byzantine monument” we will refer to any church, monument, or museum influenced by the Byzantine culture.
A visitor to any of these monuments can be assisted by Virtual Reality (VR), Augmented Reality (AR), or Mixed Reality (MR) applications hosted by their portable device (e.g., a smartphone or a tablet) by exploiting the tradition that is delivered mainly in the aftermath of
Hermeneia and the fact that Byzantine interiors are lavishly painted (e.g., [
14]). The literature provides us with numerous works in Digital Culture, demonstrating that the advancements in immersive technologies have led to a large number of applications targeted at Cultural Heritage. Reference [
15] provides a review on AR for Cultural Heritage. Among the reviewed works are the Virtual Hagia Sophia, the Ancient Malacca Project, the Virtual Campeche, the Virtual Pompeii, the Ancient Pompeii, and the ArcheoGuide. All the above are well-known projects in the research community as they introduced novel approaches, including VR and AR in Cultural Heritage. Among them, of special interest to this work are the Ancient Pompeii and the ArcheoGuide as they employ AR on site. Ancient Pompeii by [
16] aims to create AR simulations of events and people that are depicted in Ancient Pompeii’s fresco paintings and present them on site. ArcheoGuide by [
17] is a mobile AR application that allows visitors to see a virtual reconstruction of monuments while exploring the site.
The proposed framework takes advantage of the proliferation of powerful smartphones, the new horizons envisaged by the Fifth Generation of Telecommunications (5G) [
18,
19], and the availability of cloud/fog computing [
20,
21] capabilities (including artificial intelligence and Machine Learning (ML) [
22,
23]) close to the end-user that allow us to envision Digital Culture scenarios where a user may enter a Byzantine monument and receive essential information about it by capturing its interior using his/her smart device. For example, a photograph captured with any ordinary smartphone could then be sent to a cloud/fog computing service that will detect each captured theme and its location inside the particular monument. Subsequently, according to the particular requirements set by the user, certain information will be received back (e.g., concerning a particular image, recommending a full/short/theme-based tour in the monument, etc.). An interesting case arises in collaboration-based applications (e.g., crowdsourcing [
24]) where images from previous visitors are stored—perhaps edited by people that are aware of specific details regarding the particular monument—and then processed to provide improved results to the end-user.
The goal of this paper is to present the latest advancements in AR/MR software libraries, technologies and Human–Computer Interaction (HCI) approaches, in image recognition approaches, and cloud/fog architecture schemes so that they can be combined to provide promotion and documentation of Byzantine monuments and paintings. The challenge is to provide a clear Digital Culture framework of how visitors to these monuments could improve their experience using standard devices like smartphones or tablets. For this purpose, a brief description of the leading painting themes—also influenced by
Hermeneia—is also given within this paper in a way that non-experts can exploit. A case study for a particular painting theme, i.e., the Crucifixion, will be introduced since it is considered one of the most iconic in the Byzantine world. Based on this case study, the required immersive technologies will be described along with the various tools needed to provide the required services, e.g., painting/theme recognition. Finally, a corresponding architecture will be proposed regarding the computation purposes and customization required to increase a visitor’s experience, which could be potentially employed as a depository for academic purposes accompanied with appropriate interfaces to other platforms (e.g., Europeana [
25]).
Related past work concerning immersive technologies, cloud/for computing for AR environments, and image recognition approaches are presented in
Section 2. A description of the basic painting themes within a church and their categorization is presented in
Section 3.
Section 4 describes in detail the technologies that should be considered to assist a visitor in such a Byzantine monument to maximize his/her experience. The case study is presented in
Section 5 and the framework for immersive applications in this area is proposed in
Section 6. A brief discussion stressing the contribution of this work is presented in
Section 7. Challenges as well as future steps are described in
Section 8 and the conclusions are drawn in
Section 9. In
Appendix A well-known Byzantine painting themes are presented to contextualize common aspects, whereas in
Appendix B and
Appendix C the architecture plan of the interior of a particular monument is presented (church of Prophet Elijah in Paidonia, Ioannina, Greece [
14]) along with the descriptions for each label, to give an idea of how lavishly painted the interior can be even for small monuments like this one.
2. Literature Review
A systematic review on the types of Digital Culture technology that are used to enhance visitor experience while visiting museums is presented by [
26]. The authors’ search yielded 22 articles in which the subject of digital technology in museums is studied and which were published between 2013 and 2017. According to their analysis, 10 works from the selected 22 use mobile devices for various types of applications including VR and AR. The analysis of the aforementioned works shows that there is a set of common issues to be addressed and in which the marker-less tracking and the quality of the visual augmentations are included.
The authors of [
16] use real-time marker-less camera tracking as there exists the need to estimate the user’s position and orientation while at the same time preserve the site without leaving traces, i.e., markers, position trackers, etc. Due to the above, marker-less tracking is considered to be the most appropriate solution. It is noted that real-time marker-less camera tracking is a challenging operation as the scene may not possess easily recognizable objects and the overall process is computer-resource demanding and so the authors explicitly state that “careful attention was given to algorithm design and implementation in order to ensure an efficient tracking system running at high frame rates”. The authors of [
17] present ArcheoGuide, a mobile AR application that allows the visitors to see a virtual reconstruction of monuments while exploring the site. Its functionality is supported by accurate position and orientation tracking algorithms and image processing techniques [
27].
Latest technological advancements drive the evolution of the applications towards more mobility, e.g., lightweight devices, wearables, Head Mounted Displays (HMDs) [
28,
29], etc., offering increased performance, improved usability, new interaction techniques (hand tracking, speech recognition, etc.) and more realistic and immersive experiences. The authors of [
29] present an MR application developed on Microsoft Hololens for discovering cities. Their work provides two scenarios: in the first scenario, visitors receive a guided tour (real guide) while wearing an HMD. Through the HMD, visitors are able to observe augmentations that highlight the peculiarities of the monument visited while their guide is able to manipulate the three-dimensional (3D) models that all visitors observe at the same time. This functionality is supported by low latency provided by real-time networking, such as 5G. In the second scenario, the visitors navigate autonomously (without a guide) and the application automatically recognizes the framed monument and provides the information. The recognition is performed by a visual search engine running on a 5G-ready and cloud infrastructure. The application is developed with Unity Game Engine and the MixedRealityToolkit Software Development Kit (SDK). Hololens devices are connected to a Java Web Service through a 5G module and the image recognition is performed by Compact Descriptor Visual Search (CDVS).
The authors of [
30] present the design and implementation of an AR application for smartphones serving as a guided tour for visitors employing image recognition and utilizing a location-based approach. The POI recognition is performed by using fiducial markers that visitors have to scan with their smartphones. Once a target is recognized, visitors are able to obtain information about the monument, while in particular cases they can observe an augmented recreation of settlements. In the latter case, the 3D model used to project the restored settlements are aligned with the real environment by using location-based functionality based on Global Positioning System (GPS) coordinates. The application is developed using the Unity game engine and AR Foundation with the target recognition being performed by the AR foundation.
An additional aspect is the delivery of the AR experiences. The work in [
31], introduces a delivery model for Cultural Heritage services in smart cities environments. The proposed delivery model is based on the cloud computing paradigm and requires advanced network infrastructures that exist in smart cities environments. According to the article, their approach uses cloud services for communication, diffusion, monitoring, and management of Cultural Heritage which are offered as a Service. The proposed model combines the Software as a Service, the Platform as a Service, and the Infrastructure as a Service functionalities, to provide a fully virtualized environment. In addition, their model is designed to use Fog computing and Software Defined Network (SDN) controllers in order to cache cultural services. With this addition, requests about services can be answered by the Fog, when the service exists there, else the Fog retrieves the service from the cloud and provides it to the client. Furthermore, cloud and fog computing can be used for computational offloading in resource-demanding applications executed by mobile devices that typically lack processing power. The literature provides us with works that exploit cloud resources for immersive applications, e.g., [
32], as well as with works that take advantage of networking advancements and the concept of Edge Computing, e.g., [
33], while in parallel novel approaches for enhancing the provided Quality of Experience (QoE) in cloud/fog powered solutions, e.g., [
34], also emerge.
In the present work, the focus is primarily on Byzantine monuments where paintings are abundant and visitors should be provided with descriptive and explanatory information. In order to achieve this, the recognition of the paintings in a marker-less environment is considered to be of critical importance.
The author in [
35], discusses a number of computer vision approaches for the analysis of paintings and drawings, as well as their suitability for specific tasks. Point-based or pixel-based approaches process the image pixel by pixel based on the color-value of each pixel. A common application of this approach is to adjust the color-values of pixels in order to reveal details that otherwise are not perceptible by the human eye. Other approaches are defined as area-based because the processing algorithms split the image into areas or regions on which the analysis is performed rather than processing each pixel on its own. Area-based approaches can be quite effective in edge detection or in quantifying similarity between different shapes.
On the other hand, the work in [
36], presents an approach for artistic movement recognition and classification of portrait paintings. As they report, the process of identifying the style of a picture by using automatic methods is a challenging task due to the differences in stylistic behavior among different artists and the fact that styles are not defined by fixed rules. Their approach for describing art movements of portrait paintings is based on color features. It is reported to provide improved classification accuracy, being effective with smaller datasets than those needed by other approaches, and at the same time being computationally inexpensive.
The authors in [
37] investigate the use of deep Convolutional Neural Networks for artistic movement recognition where the authors test their approach on a dataset with paintings of movements ranging from Byzantine iconography to modern pop art. The provided results show that their approach performs well for classifying paintings belonging to the Byzantine artistic style. The authors in [
38] investigates the use of Support Vector Machines (SVMs) for automatic artistic movement recognition with promising results. The used dataset contains paintings of numerous artistic movements including Byzantine and the results show that the presented system is able to accurately recognize paintings belonging to the Byzantine movement. On a similar note, [
39] investigates the use of Multi-Layer Perceptron (MLP) classified data with SVMs for automatic artistic movement recognition. As in previously discussed works [
37,
38], their approach is tested with a dataset comprising paintings of various movements including Byzantine icons. The results confirm the findings of [
37,
38] and show that for Byzantine paintings their approach provides high detection rates and reduced errors.
The literature review provides us with works demonstrating that the proposed system in this article remains challenging but it is technologically and technically feasible.
3. The Most Common Themes in Byzantine Monuments
Some paintings in Byzantine monuments are more common than others and easily identifiable. For example, the Crucifixion is a well-known theme and so it will be the focus of the case study later in this paper. Another well-known theme is that of the Virgin holding child Christ. However, each Byzantine monument has its specific iconographic program that is influenced by numerous factors like the architecture of the building, the funding of the frescoes, etc. For example, a church donor frequently asks for specific paintings and location details (e.g., his/her favorite saint to be painted in a prominent location). Still, there are some rules of what is painted and where inside a Byzantine monument.
For instance, the leading figures are typically depicted with a halo around their heads to emphasize their divinity for each theme. Next to the halo, in most cases, the name of the figure appears. Detecting a halo is helpful for the subsequent processing phase that is proposed in this paper. Note, however, that not all figures have a halo around their heads, since these may also refer to important personalities, e.g., ancient Greek philosophers, etc., or even ordinary people. Moreover, there exist some patterns regarding the presence of paintings. Thus, a visitor is expected to see the Bema Frescoes (paintings on the temple at the east side of the monument), the Christological Cycle (paintings about the life of Christ), the Mariological Cycle (paintings about the Virgin Mary) and the Saints’ Cycle (most common saints), Martyrdoms and Old Testament scenes. In the following, some of the basic painting cycles are enlisted and further divided for reference.
3.1. Common Painting Cycles
The Christological Cycle is divided into the Dodecaorton sub-cycle (that corresponds to twelve events about the life of Christ, e.g., the Nativity, Crucifixion, Resurrection, etc.), as reported in
Table 1. Two more important sub-cycles correspond to the Public Life and the Passion, as also given in
Table 1. The other three painting cycles that are most commonly seen in a Byzantine monument are given in
Table 2 (i.e., Holy Bema, Mariological, and Saints’ Cycle). The exact location in the monument varies as already mentioned. The architectural plan in
Appendix B and
Appendix C corresponds to the example of the Prophet Elijah Byzantine monument [
14]. Note that names like Plato do appear, even though Plato was no Christian.
3.2. Description of Dodecaorton
Focusing on the Deodecaorton, twelve themes appear as part of it, even though in many cases they are “shared” by other cycles or sub-cycles. For example, the Descent can also be part of the Passion sub-cycle. For a brief description regarding each theme of the Dodecaorton (as presented in
Table 1) the reader may consult
Appendix A. Note, however, that for the remainder of the paper our focus will be on the “Crucifixion” theme as a use case, since it constitutes one of the most widely-recognized themes across the Christian world. Next, a brief overview of its frequent incarnations is provided.
3.3. The “Crucifixion” Theme
As a rule, it is a sparse, austere composition developed symmetrically with the Cross, its center, affixed atop an angular rock, beneath which is a cave containing a human skull. Christ wears a short loincloth. Mary stands on the left, together with two women, while John and perhaps Longinus look on the scene in grief on the right. The two thieves on either side of Christ are tied to smaller crosses. In the scene may also appear the grieving angels, the soldiers bearing the lance and the sponge, the episode of the dividing of Christ’s clothes, the usual crowd of Judaeans and soldiers who observe the Crucifixion and the deaths of the condemned men, the episode of the fainting of Mary and the mounted Longinus with his escort. However, the symmetrical arrangement of the figures on either side of the Cross is a characteristic element of this scene. The central figure of the Scene, the crucified Christ, is depicted with His body curved. His head, which bears the crown of thorns, is turned slightly to the right (or left) and resting on His chest, while His body is turned to the other side. Moreover, the depiction of the two thieves with their hands above the horizontal axis of the Cross and their legs tied with rope suffering. The individual figures of the scene display reserved grief, with no emotional displays. John, austere and composed. On the other side of the central axis, the Virgin turns towards the viewer in a three-quarter stance, extending her right hand towards Christ while (typically) touching her face with her left in an indication of sorrow. The composition concludes on the right (or left) with the figure of the centurion Longinus. The centurion who recognizes the divine nature of Christ represents the external witness, and for this reason, commonly occupies a prominent position in the scene. This external witness, in recognition of the divine nature of Christ, gestures in surprise at the events.
4. Immersive Technologies in Byzantine Monuments
According to the analysis of the paintings found in Byzantine monuments, it is apparent that: (i) most paintings in them follow a particular pattern that conforms in most of the times with the Hermeneia, (ii) each painting has a specific theme, and (iii) the paintings of a theme share many common characteristics that are easily recognized.
Given the aforementioned, it is assumed that image automatic recognition and processing approaches are an appropriate solution for providing a set of intelligent and automated services: (i) the classification of a photographed painting to a theme, (ii) finding and demonstrating similarities and differences with other paintings of the same theme, (iii) virtual restoration of discolored or partially destroyed paintings, (iv) perspective analysis and 3D transformation of paintings, (v) analysis of brush strokes, and (vi) analysis of composition. These services are hereinafter considered to be the basic interpretative options of the observed paintings.
In order to enhance visitors’ experience, the visual representation of the above services has to be provided in a visiting-friendly manner and consequently, the immersive technologies are considered to be an appropriate solution. Specifically, AR and MR are preferable as the provided visuals blend with the surroundings and do not isolate visitors.
4.1. Application Mode, User Requirements and Provided Functionality
For the design of the provided functionality, a basic use case is considered. According to this use case, a visitor enters a Byzantine monument and uses his/her mobile device in order to explore the site. In cases where the mapping information is available the application can enter the “Visitor Mode” and the “Contributor Mode” is enabled when mapping information is absent or when visitors are willing to contribute. These modes are then used in order to extract the basic user requirements and basic functional requirements.
4.1.1. Visitor Mode
Once the monument is recognized, visitor positioning is performed in order to assist users in finding the existing items of interest, e.g., paintings. It is stated that for the support of the available interpretative options in a monument, this particular monument must be already mapped and registered in the system’s database. When the visitor points the device towards an area of a known monument or a specific item of interest, the application captures its image and uses it in order to perform the item recognition so that relevant information can be retrieved.
4.1.2. Contributor Mode
A fundamental purpose of the presented framework is to turn visitors into “explorers” and allow them to collaboratively contribute to its knowledge base. In order to achieve this goal the presented system should not be solely dependent on the mapping performed by a small number of specialists. The system has to be open and friendly for hobbyists and practitioners who are willing to participate in such knowledge-based projects without sacrificing the integrity and scientificity of those efforts. According to this usage scenario, visitors are encouraged to take photographs of the monument. Multiple images, taken by many visitors can be then combined in order to provide a model of the monument in which paintings in different photographs are identified and anchored to specific areas of the monument. In addition, visitors can provide supplementary information to assist the framework process. An example is given below: a visitor takes a photograph of a painting depicting a Crucifixion scene. As stated in the previous
Section 3.3, its dominant feature is the presence of the Cross. Given that an automatic classification can be performed, the application can then suggest that the painting is a Crucifixion scene and can be tagged by a user as such. In this way, an unaware visitor is given extra information to consider regarding the observed painting while the aware visitor can confirm its theme. Moreover, partially destroyed paintings may be photographed. In such cases, the automatic recognition may fail to provide reliable results but an educated visitor may provide valuable information. In addition, automatic reconstructions based on the preserved parts and on well-known patterns of the estimated theme can be proposed and augmented on the preserved part. Visitors can then evaluate the reconstruction as potentially correct or as false. If the reconstruction is considered to be correct then it can be used in future visits in order to provide a visual reconstruction.
4.1.3. User Requirements
Based on the described use case and the designed application modes the following user requirements are extracted:
Retrieval of general information about the currently visited monument, e.g., its name, location, important dates and historical data.
Attainment of navigational instructions for specific items of interest, e.g., particular paintings.
Retrieval of interpretative information for the selected items of interest.
Addition of information for the selected items of interest:
- (a)
Addition of textual information.
- (b)
Annotation of specific parts of items of interest.
- (c)
Drawing on of items of interest (in order to show persons, objects or other details).
4.1.4. Basic Functional Requirements
Based on the described use case and the designed application modes the following basic functional requirements are extracted:
Identification of the visited monument.
Accurate position tracking.
Accurate spatial mapping.
Identification of the items of interest in each monument.
Accurate placement of augmentations and interpretative information.
Methods for user input.
4.2. Mobile Augmented Reality Experiences
The proposed framework aims to provide visual and audio augmentations while maintaining a high degree of interaction. According to relevant studies [
26,
40], mobile devices are a valuable computing platform with which immersive experiences can be delivered. Nowadays, mobile devices have satisfying computer power, memory, sensors, etc., that can be used for image- position- and orientation-tracking as well as a screen which can be used for projecting visualizations and interaction (touch interactions).
Technological Aspects in Mobile Augmented Reality
Position tracking in outdoor environments is easily performed with the use of GPS, although deviations may appear. Outdoors, position tracking is important when an application needs information about the current position so that geographically dependent actions can be triggered (i.e., loading a 3D model to be used as a visual augmentation), and orientation tracking can be used to align the view of this model in accordance to user’s orientation. In the case of well preserved Byzantine monuments, the experience environment is indoors and the GPS tracking and orientation do not apply. Under certain conditions, GPS values can be used in order to recognize the current Byzantine monument (given that the location of the specific monuments is known by the system and that GPS location values are retrieved before GPS signal loss), but orientation tracking inside the monument cannot be performed by solely depending on inertial sensors. It is also known that the utilization of artificial, visible markers for marker-based tracking is not an acceptable solution in protected cultural sites. In this case, Natural Feature Tracking (NFT), a method that enables the use of images instead of markers, can be used. NFT methods track interesting points in images and use them to estimate the position of the camera with respect to the image. The process of creating the NFT markers is Computationally Intensive (CI) and so the solution of offloading it to a resourceful computational infrastructure is promoted. In order to perform position and orientation tracking the application has to detect and track natural features in the images provided by the camera in combination with inertial measurements, a process known as Simultaneous Localization and Mapping (SLAM). SLAM functionality for mobile devices is provided by modern AR libraries such as Google’s ARCore and Apple’s ARKit.
Given that the paintings lay on specific areas of the monument, it is imperative that the application not only recognizes relative changes in position and orientation but also is able to keep track of them in relation to one or more fixed points, which are called anchors and are placed in reliable feature points. Additionally, anchors are used as fixed points to which virtual objects are attached. In conjunction, multiple objects are allowed to be attached to one anchor, while still remaining distinguishable, since each of them resides on its own position which is defined in relation to the anchor’s position. For the application to operate as described above, the monument must have been mapped beforehand and the virtual and audio augmentations must have been already anchored.
It is noted that ARCore and ARKit can be imported and used in custom-developed applications and additionally they are both available in appropriate software packages that can be used with the Unity 3D game engine. Furthermore, Unity is one of the most common development platforms for applications in Cultural Heritage [
40]. In addition to those libraries intended for mobile native applications, the latest advancements in web AR frameworks and Javascript libraries (e.g., A-Frame, AR.js, etc.) allow developers to build interactive web AR applications. Still, Web AR solutions face many challenges such as browser applications usually being outperformed by native applications and the available libraries having not yet achieved the quality level of the well-established industry frameworks.
4.3. Mixed Reality Experiences
Now that the AR paradigm has been well established and numerous applications on Cultural Heritage are already realized, AR can be considered as the minimum acceptable approach. The next step in the enhancement of the visiting experience in Byzantine museums will be the employment of MR. MR has been available since the recent availability of MR Headsets [
41], offering spatial awareness-based experiences in which virtual elements are precisely placed in the environment in such a way that they realistically blend with the real world so that they can occlude real-world objects and so that the virtual elements can also be occluded by real-world objects, while in addition providing natural interactions in a way that the technology and the use of equipment is mostly invisible [
40].
MR headsets provide important advantages as users do not have to hold their mobile device, their experience is not limited to the device’s screen, and the natural interaction methods replace the point-and-touch through the screen interactions. Hololens 2 [
41] is used in this study as a point of reference for MR headsets.
Technological Aspects in Mixed Reality Experiences
As mentioned above (
Section 4.1), the application needs to scan, through the device’s camera, part of the monument in order to recognize the paintings and provide position and orientation tracking in it. In this case, in order to perform the scanning, the user has to point with the device towards the desired location and hold it constantly until the scan has ended. In the special case when MR headsets are used, the headsets’ cameras scan the environment as users move, look and explore the monument and so users do not have to perform any additional action than just look around and enjoy the experience. Additionally, the holographic display provides more space for the visual elements to be projected and increases the interaction space. An important functionality provided by such MR headsets (e.g., Microsoft Holonens 2), is the hand tracking and the gestures’ recognition which enables users to engage with their applications by using hand gestures.
MR headsets currently available do not have a GPS sensor and it is questionable if manufacturers will integrate them, since holographic displays do not perform well in the sunlight and the GPS does not provide the level of accuracy that is required for MR experiences. Due to the above, the recognition of the monument is dependent on image recognition or user selection, while Bluetooth beacons are also welcome when possible.
4.4. User Interaction and Interfaces
The proposed framework is intended for on-site use and the user interaction methods and the respective interfaces should be appropriately designed. Provided that users will be using mobile devices, intuitive, easy to use and efficient user interaction methods should be employed. As already discussed, it is important that users are able to draw marks on the paintings in order to highlight regions of interest, objects, persons, or words that exist in paintings. Two main interaction mechanisms are proposed for supporting the considered functionality: (i) point and touch; and (ii) pointing recognition with raycasting.
4.4.1. Point and Touch Interactions
The point and touch interaction paradigm is a well-known and largely used approach in mobile applications. Users can tap on items projected on a touch display in order to carry out the intended activities. In the discussed framework, the camera input is displayed through the touch display and users can tap on the display in order to interact with an augmentation or point to an area of the projected environment (
Figure 1).
In addition, point and touch interactions can be calculated accurately in the 3D space, as related works show [
42], enabling the application to register actions in the real environment, e.g., in order to select a real object.
4.4.2. Pointing Recognition with Raycasting
Hand gesture recognition offers new interaction capabilities in AR applications [
43,
44]. Under this light, the pointing gesture can be used in order to interact with virtual elements or to target parts of the paintings.
Pointing actions have to be precise and users should feel confident about them. To this end, a virtual laser [
45] beamed from the fingertip could be embedded in order to free users from the necessity of using their fingers in front of the camera and occlude details of distant objects (
Figure 2).
4.4.3. User Interface Layers
The requirements analysis indicates that there is a need for separate User Interface (UI) layers. In this case two UI layers are designed: (i) the inner UI layer; and (ii) the outer UI layer (
Figure 3).
The purpose of the inner UI layer is to provide control over the application and bring information close to the user in order to facilitate its perception. The inner UI layer resides close to the user and it follows him/her as he/she moves inside the monument (
Figure 4).
In contrast, the purpose of the outer UI layer is to host interaction elements that are closely tied with the areas of interest of the monument, and thus its presence is not affected by the user’s position. The augmentations of the outer UI layer are anchored to the monument and appear to be engraved in the areas of interest (
Figure 5).
5. A Case Study: The Crucifixion
Since the existing themes are numerous and each painter is an artist and has a personal style (even if the instructions of
Hermeneia [
11] are followed), we focus here on one particular theme, i.e., The Crucifixion, as a case study. This is a central theme in the Christian world and there is hardly a Byzantine monument without a depiction of it. Moreover, the Cross dominates the scene, so it is easily identifiable even by non-experts in Byzantine art.
Nevertheless, not all Crucifixion paintings are the same. Four such paintings (frescoes actually) from four different churches in the area of Epirus, Greece, are depicted in
Figure 6. These frescoes are photographed using a smartphone (so it is evident that there is no processing regarding angles, cropping contours, etc.). They are depicted here not in the original form but after going through an edge detection processing and a transformation to black/white using Gimp [
46]. The reason for this “transformation” is to focus on the critical aspects of the painted theme.
As clearly depicted in
Figure 6a, the Cross and the Body of Christ are central and dominate the scene. Any framework, like the one proposed here, should be able to identify these two key elements. There are holy figures around (those with the halo around their heads). In
Figure 6b, the theme is depicted similarly to the previous one. However, in
Figure 6c, even though the theme is clearly a Crucifixion, it is painted in a very limited space (such limitations are frequent due to the building requirements, etc.) and, therefore, the Cross is not at the center, even though its composure dominates the scene.
The greatest challenge arises when identifying scenes where not all information is available, as is the case in
Figure 6d. The particular scene is hard to identify because the upper part is damaged. Moreover, it is painted in a limited space within the particular monument. Still, if identification is rendered impossible via an ML approach, an expert’s opinion will probably help resolve the issue (i.e., crowdsourcing).
Keeping the Cross as the main symbol may allow for erroneous identifications. The Cross is central in other themes like The Descent for the Cross as depicted in
Figure 7a. This theme may easily be confused with the Crucifixion if the Cross is assumed to be the dominant element of the latter theme. Hence, careful steps need to be taken to mitigate the risk. Still, in the other scene, as depicted in
Figure 7b, the Lamentation for the Dead Christ, the cross appears once more in the background, playing a secondary role.
Eventually, there are no certainties given the personal style of the painters, the area within the monument, and the condition of the painting. These are challenges that need to be addressed by the proposed framework.
6. A Framework for Immersive Applications
This section presents the design of the proposed framework for Digital Culture including: (i) its overall architecture, (ii) the information model and the database structure, (iii) the designed functionality of the application when a monument is completely mapped by domain experts, (iv) the designed functionality when a monument is partially mapped by the community (crowdsourcing), (v) the functionality when visitors visit a partially mapped monument, (vi) aspects related to image recognition functionality, and (vii) issues related to the computing and networking infrastructure that will be used by the proposed framework.
6.1. Overall Architecture
To provide the considered services to users an application executed on users’ devices is required. Additionally, considering that the framework is designed to host large volumes of data for which CI services are performed, a server-side implementation is also needed. The server-side implementation can be hosted in cloud computing infrastructure and consists of the framework services and the framework’s database,
Figure 8. Indicative CI services are: (i) painting recognition; (ii) theme identification; (iii) person/object identification; and (iv) Natural Feature Tracking marker creation. The users’ application will be able to submit the appropriate service requests in order to provide its functionality.
6.2. Information Model and Database Architecture
In order to support the user and functional requirements an information model that is able to keep and handle this information is necessary. In its general form (
Figure 9) a Monument comprises information about its position (GPS coordinates), its 3D reconstruction (3D model), supplementary information (Information), the paintings (Painting) and objects used for user interaction (Interaction Element).
The information model is served by an appropriately designed database of which the basic Entity-Relationship (ER) diagram is presented in
Figure 10. In this ER diagram four basic entities are depicted: (i) the entity Monument; (ii) the entity Painting; (iii) the entity Theme; and (iv) the entity Interaction Element.
The entity Monument is used to store information about the monuments. The GPS coordinates are considered to be important information for the monument identification in cases that GPS signal is present (when approaching a monument). Alternative data, e.g., Bluetooth beacons’ IDs, can also be used in cases that such solutions are feasible. Moreover, information about the monument itself is stored in the entity, this information can be text or even multimedia formats.
The entity Painting is used to store information about the available paintings. The relation of each painting with the monument in which it resides is mandatory. Additional information, such as the photographs of the painting, its position in the monument (3D coordinates with respect to the monument’s 3D model when it is fully mapped), and textual information used for its description is supported. Moreover, each painting has to be related to specific themes.
The entity Theme is considered to be crucial for the considered framework as the themes in which the paintings are categorized are important for their understanding.
The entity Interaction Element is used for assigning interactions within monuments. An interaction element can be a selectable area of a painting, an object, or a place in the monument. Each interaction element is placed in a specific position, it has a type, e.g., selectable element for retrieving information, a navigational element triggered when visitors approach it, etc., as well as the actions that are executed when interacting with it, e.g., load an image, play an audio file, send information to store in the database, etc.
6.3. Complete Mapping and Tracking
For the support of the available interpretative options in a monument, this particular monument must be already mapped and registered in the system’s database. The literature provides numerous examples of precise and complete mapping of monuments [
16,
17,
47] in which various approaches can be used, e.g., Computer-Aided Design (CAD), 3D scanning, photogrammetry, and combinations of these methods [
48]. For a complete mapped monument (
Figure 11), a 3D model of its appearance along with its textures is available, while additional information including specific areas of interest, objects and elements triggering actions can be added for interactivity purposes. In cases that complete mapping information, as in the example (
Figure 11), is provided, once the initial position of a user in the monument is registered, e.g., in the entrance, and if Six Degrees of Freedom (6 DoF) tracking is available, the application is capable of performing position updates and rotation tracking so as to identify where and what the user is looking at, as well as to enable interaction with specific objects. Six Degrees of Freedom tracking can be implemented by SLAM approaches which are available in both Google’s ARCore and Apple’s ARKit, mobile AR SDKs, and in MR headsets, such as Microsoft’s Hololens.
6.4. Partial Mapping
As already stated, one of the purposes of the proposed concept is to engage users in the process of collecting and documenting information related to Byzantine monuments. It is also stated that mapping a Cultural Heritage site [
47] is a difficult, time-consuming and costly task and, thus, crowdsourcing can be a valuable resource to this direction. The visitors entering a monument and using the proposed application are not trained to perform complete monument mapping and lack the required equipment and time to do so, nevertheless, they can contribute by taking photos of the monument, a task that is often performed willingly.
In order to perform the mapping, the application needs to tie the photographs with the monument. Users must be provided with the ability to register a new monument or select the monument if it is already on the list. Additional information, e.g., the GPS coordinates, is important to be collected as they can be used for monument identification. When a user takes a photograph, the framework searches its database for similar images, if the photographed painting does not exist in the database the image is uploaded and the user is encouraged to provide additional information (theme, title, painter, etc.). On the contrary, if the painting does exist in the database, then the user is encouraged to provide or review the additional information, and the new photograph can be added on the list of pooled images of the specific painting (
Figure 12). The existence of multiple images of the same painting is not considered to be undesired redundancy, since they provide the ability to choose the ones that have the most suitable characteristics depending on the purpose for which the image is used, and go through different views of the painting (the real painting is always there to be observed). Interestingly, since each visitor may see and correspondingly photograph the same painting from a different viewing angle, the timestamped photographs can also be used over time as a witness of time-induced changes, and the possibility for successful image identification is increased as the dataset grows larger and more appropriate photos become available.
6.5. Visiting Partial Mapped Monument
When a monument is partially mapped there is no spatial information that can be used for 6 DoF user’s tracking with respect to the monument interior. When a visitor enters a monument, identification of the monument can be performed based on the GPS coordinates or the visitor can select the monument from a list. Alternative options such as Bluetooth beacons can also be used for this purpose but such solutions are not considered to be a standard for the considered framework.
Once, the application knows the visiting monument, information about it can be retrieved. When the visitor scans one or more paintings with his/her device the image recognition service will carry out the identification task so that each painting in the captured frame is recognized and highlighted; in
Figure 5 it is stated that the scanned paintings have to be scanned and documented in prior stage by other users. If the monument or the painting is not already registered the partial mapping process is initiated and when the visitor selects one of the recognized paintings the available information and services can be loaded upon user request.
6.6. Image Recognition
It has already been mentioned that the use of visual markers is not considered to be a suitable option and the application should be equipped with image recognition functionality in order to be able to identify the paintings that visitors are looking at. In addition, other image-based services, e.g., identification of a painting’s theme, recognition of personalities in it, etc., demand fast image recognition and other image processing functionality. To this end, there exist various approaches for automatic image recognition, e.g., k-Nearest Neighbors (KNN), Suppor-Vector Machine (SVM), Back Propagation Neural Networks (BPNN), Convolutional Neural Networks (CNN), etc., all of which are considered to be applicable options [
49].
Regarding the classification of paintings according to their theme, [
37,
38,
39] show that the employed approaches perform surprisingly well in classifying Byzantine paintings. Given the significant style differences between the Byzantine paintings and the paintings of other styles, classifying Byzantine paintings among various movements is a relatively easy task for the proposed approaches but this does not guarantee that recognition of specific themes between the available paintings, where the differences may be less significant, will be as straightforward. Regarding the recognition of a specific painting that a user looks at, given that there is a sufficient number of photographs of the considered painting, the available approaches (KNN, SVM, BPNN, CNN) [
49] promise to give accurate results.
When performing identification of personalities in paintings, the regions of the paintings depicting the personalities have to be isolated and annotated in order to create an appropriate dataset to serve this purpose (
Figure 13). For the creation of the person’s dataset, automatic edge detection algorithms can be used for the isolation of the depicted personas, and the results can then be checked and corrected (if necessary) by specialists that are also responsible for annotating them accordingly. In cases that complete mapping is not available, users will be provided with the ability to perform this task themselves by pointing at the edges of the depicted persons.
6.7. Cloud/Fog Computing and Real-Time Networked Environments
Cloud computing is considered a suitable solution for providing seemingly unlimited data storage and processing power many times more than typical computer machines available to users. Researchers have taken advantage of the characteristics of cloud computing for various purposes including ML [
50] and 3D rendering algorithms [
51].
The proposed framework makes use of datasets and services that can not be hosted on mobile devices due to storage and computational limitations, or battery-constraints, and hence their placement on the server-side infrastructure is promoted. Additionally, processes that are supported by users’ devices can also be offloaded for increased performance. According to the overall architecture design (
Section 6.1) the CI services are placed on the server-side which can be hosted on a cloud computing infrastructure. This design will serve as a basis for further research on approaches that are deemed promising in improving the performance and the provided QoE.
Despite the important advantages, computational offloading of time-sensitive operations to cloud infrastructure is not a silver bullet as network delays pose a serious threat to the overall performance and the users’ experience [
52]. Fog computing, and its counterpart, edge computing, are used in order to minimize the effect of network delays with positive results in various cases including 3D rendering [
34], SLAM [
53], and MR applications [
33,
54,
55]. In addition, 5G networking is designed to support highly interactive applications with low latency and high throughput requirements [
56], and will enable mobile VR/AR applications [
57]. SDNs are also designed to support end-to-end support of Quality of Service (QoS) [
58] while other approaches, e.g., virtual backbone networks [
59] or distributed cluster-based load balancing methods [
60], aim at provisioning network services specifically tailored to the requirements of each application setting independently of user proportionality or workload fluctuations.
7. Discussion
The contribution of this work lies in the synthesis of different perspectives towards the described purpose. According to the presented literature review, there is no previous work aiming to provide the discussed services in Byzantine monuments and this paper takes into consideration various aspects including the particularities of the Byzantine monuments and paintings, the use of immersive technologies, issues of HCI, and image recognition approaches with respect to the characteristics of Byzantine paintings. The particularities of Byzantine monuments and their paintings with respect to augmented/mixed reality experiences have not been studied by previous works. Specifically, the latest reviewed works that use AR/MR for Cultural Heritage [
29,
30] do not provide any discussion on the images that are identified or potential challenges. Moreover, [
30] employs marker-based tracking that in Byzantine monuments and paintings can not be applied. In addition, image recognition techniques are poorly tested in Byzantine paintings as previous works have focused on classifying images according to their style but there is no previous work focusing on classifying Byzantine paintings according to their specific theme (as described in
Appendix A). Moreover, while providing valuable cultural communication services, the latest reviewed works [
29,
30] are not intended to provide cultural documentation services. Additionally, the presented framework combines knowledge and know-how from a network and cloud computing perspective intended to provide improved QoE. The proposed framework is still in theoretical design and is not yet fully implemented and tested in a monument.
8. Challenges and Future Steps
According to the previous sections, there are still many ongoing challenges. This is not a surprise as mobile immersive applications are relatively new and constantly evolve in order to provide more user-friendly, realistic, and immersive experiences. The same applies to the complementary technologies that are used in order to support and enhance the proposed framework, i.e., image processing approaches, cloud/fog computing, networking.
At the moment of writing, ARCore, ARKit, and Vuforia are considered to be some of the top AR SDKs [
61,
62], with the first two emerging as the more appropriate choices for marker-less tracking [
63]. AR SDKs do not support tracking and gesture recognition at the moment hand but researchers and practitioners have come up with solutions based on ML libraries that can be executed in mobile devices, e.g., TensorFlow [
64,
65]. In the case that MR headsets, e.g., Hololens, are used, both 6 DoF tracking and gestures recognition are performed by the corresponding SDK, e.g., MR Toolkit.
Given that little or no research has been conducted regarding the classification of Byzantine paintings according to their theme, the creation of an appropriate dataset (including Byzantine paintings of numerous themes) and its annotation are considered to be necessary. Image recognition algorithms and approaches have been thoroughly tested for various purposes but they have not been yet tested on their ability to identify specific themes for paintings of Byzantine style. Once the dataset is available the ability of the considered approaches (KNN, SVM, BPNN, CNN) to identify the painting theme should be evaluated. Additionally, the dataset should be developed to contain persons and objects of these paintings, and so these elements have to be isolated and annotated to verify the efficacy of the preceding approaches on such tasks.
Regarding the interpretative services, future work has to focus on the automatic image processing methods that will enable them. A common issue of old paintings that have not been restored is discoloration. In such circumstances, visitors may be unable to distinguish shapes of different colors in discolored paintings, and thus pixel-based approaches [
35] need to be researched on their efficacy to virtually restore the colors of these paintings. Another interesting view of the paintings is the brush stroke analysis, a domain in which interesting approaches have been proposed lately [
66] but none of them have been tested on Byzantine paintings.
Regarding the computing and networking infrastructure, 5G networking promises increased throughput and latency reduction, with complementary approaches, such as edge computing, the remaining key enablers, and so more research regarding the requirements of the proposed framework with respect to edge computing approaches has to be conducted.
9. Conclusions
In this paper, a framework that consolidates recent technological advances such as AR/MR, 5G, and ML to improve visitors’ experience of monuments (e.g., churches, museums) influenced by Byzantine culture, is outlined. The lavishly painted interior and the wide-spanning nature of these monuments in large geographic areas draw increased interest, both from tourists and scholars alike. Although significant differences can be derived between Byzantine paintings and other painting styles, the former frequently follow common rules. Thus, by exploiting the guidelines followed by the majority of the Byzantine-influenced painters and assisted by crowdsourcing approaches, targeted immersive services can be implemented that exploit common patterns to facilitate faster image recognition, processing, annotation, augmentation, etc. To this end, an extensive analysis of existing and future interactive technologies and applications is also reported, focusing on key characteristics and operations. Nevertheless, many of the embedded services are revealed to be computationally intensive, generating vast volumes of data and requiring high demands in resource provision with low delay tolerances. The proposed framework compensates for these aspects by additionally incorporating recent cloud/fog advents for real-time computational offloading and data storage based on the technical requirements provided by the functionality analysis. In this context, the presented architecture provides clear indications as to the possible uses, which among others includes the procurement of a unified depository front for academia to interconnect with alternative cultural platforms.
To contextualize the various dimensions of the framework, a use case regarding a popular Byzantine painting theme is also reported that demonstrates its potentiality in offering high-end immersive experiences, transforming in this way the visitors from passive spectators to proactive culture caretakers and content creators. Thus, it is expected to bring together these two distinct scientific areas: Byzantine painting and immersive technologies in real-time networked environments. However, it is not restricted to these scenarios. Instead, the framework is envisioned to play a fundamental role as a guide for similar implementations in the general domain of Digital Culture, and so the authors vouch for its future evaluation under actual deployment scenarios in monuments, where quantitative and qualitative analysis of its dynamics, conformation, visual performance, and overall QoE efficiency will be extensively conducted, documented, and standardized.