1. Introduction
The use of artificial intelligence (AI) represents a new phase and offers numerous opportunities for studies of cultural heritage [
1,
2,
3,
4]. An important aspect of this emerging research field focuses on the application of AI models in the visual inspection of historical buildings [
3]. In this context, the study by Silva and Oliveira [
4] highlights that AI techniques can be employed in conjunction with photography research for documentation, conservation, restoration, and education regarding cultural heritage.
In line with this, deep learning models are increasingly being adopted in research involving computer vision and cultural heritage [
5,
6]. In the literature, recent examples of applications include: monitoring of historic places using convolutional neural networks [
7], semantic segmentation and photogrammetry for monitoring historic facades [
8], remote sensing of historical architecture [
9], prediction and measurement of damage to architectural heritage facades [
10], detection of disaster-affected cultural heritage sites from social media images [
11], and statistics and location estimation of missing components in routine inspections of historic buildings [
12]. Furthermore, Mishra and Lourenço [
3] describe that applications of artificial intelligence in visual inspection for cultural heritage can be divided into several groups, such as detection of surface deterioration in cultural heritage, recognition of facade damages, crack detection, and degradation monitoring of stone cultural heritage. Another relevant research area described by Mishra and Lourenço [
3] focuses on the detection of components and key architectural elements in cultural heritage using AI. Indeed, the recognition of building elements in historical monuments is essential in the processes of conservation and digital documentation of cultural heritage [
3].
Recently, several studies in the literature have explored the classification of historical architectures and building elements using deep learning [
13,
14,
15]. The paper by Llamas et al. [
13] highlights that image classification is a relevant task for the digital documentation of cultural heritage, given that manual classification is a costly process that can involve many errors. Moreover, techniques to automate the digital documentation process can improve the management of cultural heritage archives, making searches more efficient and assisting in the study and interpretation of heritage [
13]. Based on this, Llamas et al. [
13] propose the Architectural Heritage Elements Dataset, which includes 10 types of architectural elements from heritage buildings, the majority of which are churches and religious temples. In another study, Siountri and Anagnostopoulos [
15] propose the classification of cultural heritage buildings in Athens using deep learning. For this, the authors introduce a dataset with images collected from photographs, Google Street View, and online sources. Additionally, they conduct simulations in two phases to identify four building elements (doors, windows, balconies, and corbels) and the architecture style (neoclassical, neoclassical-eclectic, interwar-eclectic, interwar, apartment building).
However, the literature still lacks image databases for the classification of Brazilian cultural heritage, particularly regarding the historic town of Ouro Preto. Ouro Preto has been a UNESCO World Heritage Site since 1980 and features well-preserved historical buildings from the Brazilian colonial period, such as mansions, public buildings, and museums. In this context, this World Heritage Site is also notable for its large number of religious monuments, primarily reflecting Baroque and Rococo architectural styles. Therefore, the objective of this paper is to propose The Image Dataset With Religious Buildings in the World Heritage Town of Ouro Preto for Deep Learning Classification (ImageOP). This new dataset comprises 1613 images of 32 Catholic religious temples in the city of Ouro Preto, with the data divided into five classes: fronton, door, window, tower, and church. The simulations were conducted on the Edge Impulse platform using two deep learning models: MobileNet V2 [
16] and EfficietNet B0 [
17]. These artificial intelligence structures are convolutional neural network architectures widely discussed in the literature for computer vision applications using mobile devices. The experiments demonstrate that the dataset is suitable for deep learning experiments, as the F-score values were generally above
for the detection of all classes in simulated experiments. Additionally, accuracies exceeding
were achieved in real-time detection of building elements in experiments conducted in real-world environments using computer vision and mobile devices. Thus, this study aims to contribute to important areas of the literature, such as digital documentation of cultural heritage [
3], classification of building components [
13], and the creation of new datasets for deep learning applications [
18,
19]. Accordingly, the main contributions of this work are:
A new image dataset featuring religious buildings in the World Heritage Town of Ouro Preto;
Deep learning for the recognition of five classes pertaining to historical temples of Brazilian religious architecture: fronton, door, window, tower, and church;
Simulated experiments and real-world applications using computer vision and mobile devices (smartphones) for the detection of components in historical religious buildings.
This paper is organized into five sections.
Section 2 provides a description and comparison with related work.
Section 3 presents a detailed description of the proposed new dataset.
Section 4 discusses the experiments with deep learning and their results. Finally,
Section 5 presents the conclusions and directions for future research.
2. Related Work
This section presents a comparison of the current proposal with related work in the literature. To this end, four papers from the field of machine learning for cultural heritage classification were selected: I—Llamas et al. [
13], II—Janković [
14], III—Siountri and Anagnostopoulos [
15], and IV—Lamas et al. [
20]. These articles were selected based on their high relevance to the research field and direct relationship with the theme of this paper. To achieve this, some necessary criteria to be met were defined. The first point was that related work should evaluate cultural heritage buildings. Another topic was that the analyzed papers should adopt deep learning methods for image classification. The third criterion for selecting studies for comparison was that the proposed or used datasets must be from the last ten years. Finally, relevance in the research field was assessed by the number of citations, with all selected studies being cited at least ten times (Google Scholar—accessed on 17 November 2024).
Table 1 presents this comparison.
Initially, it is noteworthy that three selected studies [
13,
15,
20] propose new image datasets for tasks related to cultural heritage classification. Llamas et al. [
13] presented the Architectural Heritage Elements Dataset, which includes ten types of architectural elements from heritage buildings. The study by Siountri and Anagnostopoulos [
15] introduces a new image database of cultural heritage buildings in Athens, and the work of Lamas et al. [
20] presents MonuMAI (Monument with Mathematics and Artificial Intelligence) with the aim of identifying facade styles of monuments. However, only the ImageOP dataset is specifically intended for the classification of elements in religious buildings of Brazilian cultural heritage. This point highlights one of the main contributions of this paper: the development of a dataset specifically aimed at enhancing the application of artificial intelligence in the identification and digitization of Brazilian religious heritage, particularly from the 18th century, a period characterized by the development of Baroque and Rococo architectural styles.
The second criterion addressed pertains to the analysis of the image categories in the datasets. In this regard, it is important to note that the ImageOP dataset includes five classes (fronton, church, window, door, and tower). These labels are significant for identifying the characteristics of the facades of religious buildings in the historic town of Ouro Preto. The other studies utilize combinations of different classes, which would be insufficient for the classification of the elements present in the architectural styles of the monuments evaluated in this paper. For example, Janković [
14] applies machine learning models to classify five other elements of cultural heritage buildings: altars, gargoyles, domes, columns, and vaults.
Finally, it is important to highlight the characteristics of the experiments in this paper compared to the related work. Most studies conduct experiments solely in simulated environments using deep learning. In contrast, the present proposal addresses the application of artificial intelligence models in real-world situations, utilizing computer vision with smartphones for the classification of building components. Therefore, it is worth emphasizing the importance of proposing a robust dataset for practical applications of built heritage detection using mobile devices. This approach allows for the evaluation of the proposed dataset’s performance in both simulated and practical real-time tests.
In summary, the following limitations of related studies that are improved in this paper are highlighted:
Dataset: the literature lacks new datasets of religious buildings, especially with images of Brazilian Baroque and Rococo architecture;
Classes: in general, previous work does not address the set of common building elements on the facades of Brazilian churches from the 18th century: fronton, window, door, and tower;
Experiments: the literature also needs new deep learning approaches for practical experiments in real environments using smartphones.
3. ImageOP: Our Contribution
This section presents and describes the new dataset proposed in this paper: ImageOP—The Image Dataset With Religious Buildings in the World Heritage Town of Ouro Preto for Deep Learning Classification.
Figure 1 summarizes the methodology used for the development of this new image database.
Initially, the historic town of Ouro Preto was selected for the development of a new dataset featuring images of religious buildings because it was the first World Heritage Site in Brazil to be included on the UNESCO list in 1980. Subsequently, 32 religious monuments located in Ouro Preto were identified. The third stage involved data collection of building components, resulting in five classes for experiments with deep learning: pediment, tower, window, door, and church. Thus, the ImageOP dataset comprises 1613 varied images of religious buildings from the historic town of Ouro Preto. The following subsections provide a more detailed description of the development process and characteristics of the ImageOP dataset.
3.1. Scientific Motivation
The scientific motivation for the development of this study focuses on the improvement of artificial intelligence techniques for the processes of digital documentation [
4] and visual inspection of cultural heritage [
3]. Digital documentation using intelligent systems can contribute significantly to the automatic management of archives and the interpretation and study of heritage, and can avoid manual classification errors [
13]. Following this line, the recognition of building elements in historical monuments is essential in the processes of visual inspection of cultural heritage. The correct detection of key architectural elements in cultural heritage using AI can contribute to the process of quantifying these components and evaluating conservation [
3].
Another point to highlight is that, for the improvement and precision of approaches with artificial intelligence, the availability of data is necessary to carry out effective training of deep learning models. More specifically, it is very important that the available dataset reflects the desired practical application. In this sense, the development of AI systems for digital documentation and visual inspection of cultural heritage with datasets that do not match the architectural reality of the usage environment can lead to serious interpretation errors. However, the literature still lacks datasets aimed at applying deep learning to classify components of Brazilian religious buildings. It is worth emphasizing that the state of Minas Gerais in Brazil has a significant number of religious temples built in the 18th and 19th centuries, with the historic town of Ouro Preto being one of these landmarks in the preservation and conservation of cultural heritage.
In this way, the objective of the proposed methodology is scientifically motivated by the importance of developing a new dataset of images for training deep learning models, and subsequent application in digital documentation and visual inspection of cultural heritage in Brazil, based on images of the historic town of Ouro Preto.
3.2. Historic Town of Ouro Preto
The historic town of Ouro Preto was the first city in Brazil added to the World Heritage List of UNESCO (United Nations Educational, Scientific and Cultural Organization) in 1980 [
21,
22]. Founded in the early 18th century, the city of Ouro Preto (meaning “Black Gold”) was an important center for gold mining and served as the capital of the state of Minas Gerais until 1897. The development of the cities in Minas Gerais was greatly driven by the quest for gold during Brazil’s period as a Portuguese colony (from 1500 to 1822) [
21,
23]. This phenomenon facilitated and financed the construction of significant monuments during this time, particularly in the Baroque architectural style, including squares, public buildings, residences, fountains, bridges, and churches [
22].
Figure 2 presents photographs of the historic town of Ouro Preto.
Currently, the town of Ouro Preto preserves several examples of religious and civic buildings in terms of design and materials used in the 18th and 19th centuries [
22]. As a result, the Brazilian government submitted the designation of the historic town of Ouro Preto as a World Heritage Site to UNESCO, which was approved on 5 September 1980. On the UNESCO World Heritage Convention page, several relevant criteria are highlighted for including Ouro Preto as a World Heritage Site, such as the aesthetic quality of the architecture represented by the religious monuments and administrative buildings in a remote and rugged landscape. Another criterion emphasizes the heritage constructed under Portuguese colonial rule during the mining period, leading to the construction of churches and chapels characterized by splendor, quality, and originality, blending European and Brazilian cultural traditions [
22].
The historical and cultural significance of the Ouro Preto has stimulated scientific production in various fields of knowledge, such as architecture [
24], biodiversity [
25], microbiology [
26], geotourism [
27], and mining [
23]. Another recent area of study involves the application of technologies in the analysis and documentation of religious monuments in the town of Ouro Preto [
28,
29]. However, the literature still lacks an image database intended for the classification of elements from the religious monuments of the historic town of Ouro Preto, which is the main motivation for this paper. This new dataset is described in the following subsections.
3.3. Religious Buildings
In this paper, 32 Catholic religious buildings from the historic town of Ouro Preto were selected to comprise the proposed dataset. These buildings were chosen for their historical, artistic, and architectural significance. Furthermore, six regions (districts) of the city of Ouro Preto were included: Amarantina, Cachoeira do Campo, Glaura, Santo Antônio do Leite, São Bartolomeu, and the central district (referred to in this study as the Center of Ouro Preto). It is noteworthy that the historic center of Ouro Preto was home to the majority of the analyzed buildings (15 monuments), as this area features much construction in the Baroque and Rococo styles from the 18th century.
Table 2 shows the number of photographed buildings by region of the city of Ouro Preto, and
Figure 3 illustrates the locations of the visited regions for the development of the ImageOP dataset.
Table 3 presents the complete list of the 32 photographed religious buildings (names in English and Portuguese).
Figure 4 (small churches—chapels) and
Figure 5 (churches) show examples of images of each of the photographed monuments. More information about these historical buildings can be obtained from the website of the Municipal Government of Ouro Preto
1,2.
3.4. Data Collection of Building Components
In this paper, the data collection process for the development of the new ImageOP dataset was carried out in the following stages: (1) definition of the camera; (2) definition of the building components to be photographed; (3) photography conducted in the historic town of Ouro Preto; (4) data annotation.
Figure 6 shows examples of the authors’ engagement in the image collection process in the town of Ouro Preto.
Initially, the Kodak
® PIXPRO AZ255 (Eastman Kodak Company, Rochester, New York, USA) digital camera was chosen for the image collection process. This equipment features a 25× optical zoom, which is suitable for capturing close-up photographs of the elements of historical buildings. Other characteristics of this digital camera model include a 16 MP CMOS sensor, optical image stabilization, 1080p full HD video, a 24 mm wide-angle lens, and a 3″ LCD (460K pixels).
Figure 7 displays images of the Kodak
® PIXPRO used.
Next, the facade elements of the historical buildings were defined for data collection. In this regard, the characteristics of the religious constructions present in the historic town of Ouro Preto were observed. From the images in
Figure 8, two common typologies of buildings can be identified: church (
Figure 8a) and chapel (
Figure 8b). Generally, the larger religious buildings in Ouro Preto, as exemplified in
Figure 8a, are characterized by having a fronton with a cross (central upper part), doors (front and side), windows (front and side), and towers. In contrast, the smaller monuments, such as the one shown in
Figure 8b, typically feature a central door and a fronton with a cross. Thus, the following building components were defined for photography: frontons, doors, windows, and towers. Additionally, images of complete religious monuments (containing more than one element in the photograph) were also collected, resulting in a total of five classes for the ImageOP dataset.
Subsequently, the image collection process was conducted in the historic town of Ouro Preto. To this end, the 5 regions (described in
Table 2) and the 32 religious monuments (listed in
Table 3) were visited. For each of the analyzed buildings, photographs of the facades were taken to gather data for each of the defined building components. In this context, the optical zoom feature of up to 25× was utilized to get close to the desired element of the building for photography. Finally, after recording images of the 32 religious buildings, the data annotation phase was conducted. For this purpose, each photograph was analyzed and organized into digital folders according to the building component present in the image. The next subsection provides a summary of the resulting dataset.
3.5. Dataset Result
In the data collection process, 1613 images of the analyzed religious buildings were recorded, as shown in
Table 4. Furthermore, the final dataset obtained in this paper contains annotated images in five classes:
- 1.
Fronton (Pediment): the upper part of religious buildings, usually accompanied by a cross;
- 2.
Church: class containing photographs of religious monuments (normal and small size) featuring multiple building components in the same image;
- 3.
Window: images of windows (front or side) of religious buildings;
- 4.
Door: images of doors (front or side) of religious buildings;
- 5.
Tower: photographs of towers of religious buildings, typically featuring a bell.
Figure 9,
Figure 10,
Figure 11,
Figure 12 and
Figure 13 show example images available in the ImageOP dataset for each of the five classes (fronton, church, window, door, and tower). The ImageOP dataset, containing all 1613 images, is publicly available on Mendeley Data repository
3.
4. Dataset Benchmarking for Deep Learning Classification
In this section, a benchmarking dataset for deep learning classification using ImageOP is proposed.
Figure 14 illustrates the proposed method for the simulations and experiments conducted in this stage of the paper.
The benchmarking dataset was divided into two main phases: (i) simulations for training and testing deep learning models on the Edge Impulse platform using data from ImageOP (images of the historic town of Ouro Preto, as described in
Section 3); (ii) application of the best model in a practical computer vision application with a mobile device for classifying building components of monuments from three other historic cities in the state of Minas Gerais, Brazil (São João del-Rei, Lagoa Dourada, and São Brás do Suaçuí). These phases are described in the following subsections.
4.1. Simulation Experiments
In this phase, simulated experiments were conducted for training, validation, and testing of deep learning classification using the ImageOP dataset. The Edge Impulse platform
4 was selected for this purpose. Edge Impulse is an online environment designed for training machine learning models, particularly for embedded systems or mobile devices. Additionally, the Edge Impulse platform features an interface for using deep learning models in computer vision applications on smartphones, which is one of the applications utilized in this paper. Subsequently, the ImageOP dataset was uploaded to the Edge Impulse software (Community plan version), and the 1613 images were divided into training (≈
), validation (≈
), and testing (≈
) sets, as shown in
Table 5.
Subsequently, two convolutional neural network architectures available on Edge Impulse were selected: MobileNet V2 [
16] and EfficientNet B0 [
17]. These architectures are extensively discussed in the literature [
30,
31,
32,
33,
34], particularly for their performance in applications that require low computational cost, such as those using mobile devices. The hyperparameter settings were maintained according to the default configurations of the Edge Impulse software: number of training cycles = 20, learning rate = 0.0005, and batch size = 32.
The four performance metrics evaluated (accuracy, recall, precision, and F-score) are presented in Equations (
1)–(
4):
where TP denotes true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives.
4.2. Simulation Results
This section presents the results of classifying historical monuments in Ouro Preto using convolutional neural networks (CNN). Two architectures were compared: MobileNet and EfficientNet. The evaluation was conducted using the dataset developed by this work, ImageOP, and the results were analyzed in terms of accuracy and F-score. Detailed comparisons between the two architectures are presented to identify which offers superior performance in the classification task.
Table 6 and
Table 7 present a summary of the results for the two analyzed neural architectures. It can be observed that the EfficientNet architecture achieved the best performance across all four evaluated metrics. Notably, the F-score attained by the EfficientNet model (
) was
higher than the results from the MobileNet architecture (
). Additionally, EfficientNet achieved the highest accuracy for the four evaluated classes: fronton (
), church (
), window (
), and tower (
).
For the MobileNet architecture,
Figure 15 displays the results from the training and testing history for (a) accuracy and (b) loss.
Figure 16 presents the results in the confusion matrix for the experiments using the MobileNet architecture. As observed, the architecture achieved good performance, successfully generalizing the ImageOP dataset.
Similarly, for the EfficientNet architecture,
Figure 17 presents the training and testing history results for (a) accuracy and (b) loss. EfficientNet also performed admirably, showing consistent improvement in both metrics throughout the training process.
Figure 18 shows the confusion matrix for the EfficientNet model. The EfficientNet architecture achieved an impressive F-score of about 92% across all classes, slightly outperforming MobileNet. This indicates EfficientNet’s superior capability in accurately classifying the historical monuments in the ImageOP dataset.
Table 8 also presents examples of inference results for ten test images. It can be observed that the model using EfficientNet achieved a performance of
in correctly identifying the category of the building component in the photographs. Thus, the deep learning model trained with EfficientNet was selected for the practical experiments with a mobile device, as described in the next subsection.
4.3. Computer Vision Using Mobile Device
In this phase, real-world experiments were conducted to validate the proposed dataset for deep learning classification. The best model (with EfficientNet architecture) obtained during the training and testing phase was utilized. Additionally, a mobile device (Samsung A14 smartphone) was employed for real-time detection of building elements using computer vision. Access to the trained model on the Edge Impulse software was achieved through the smartphone. It is noteworthy that in this phase, nine religious buildings from three historical cities in the state of Minas Gerais, Brazil (São João del-Rei, Lagoa Dourada, and São Brás do Suaçuí), were selected. These cities feature historical monuments with architecture and construction periods (18th and 19th centuries) similar to the churches in Ouro Preto. The objective was to evaluate the performance of the proposed dataset and trained model in real-world environments not included in the ImageOP dataset images.
Figure 19 illustrates all the steps of the proposed process to apply computer vision using a smartphone to recognize the elements of religious buildings. As described in the previous sections, photographs were initially collected in the historic town of Ouro Preto for the development of the ImageOP dataset with 1613 images. Subsequently, the deep learning models were trained on the Edge Impulse platform. Then, the Edge Impulse software makes the application of the artificial intelligence system available on mobile devices by reading a QR code. On the smartphone, the application opens in a web browser. Thus, it is possible to direct the device’s camera towards the church where you want to detect the building’s components. Real-time recognition results are shown on the smartphone interface.
Figure 20 and
Figure 21 presents examples of the computer vision process using a mobile device in the historic town of São João del-Rei. It can be observed (
Figure 21) that when the smartphone camera is pointed at the element of the religious building, the inference is displayed on the Edge Impulse web interface. In this case, the classification was performed correctly, identifying the element as window (
janela in Portuguese).
Figure 22 displays examples of real-time classification through screenshots of the Edge Impulse graphical interface accessed on the mobile device. Five examples of correct inferences for the five classes discussed in this paper can be observed.
Table 9 presents the real-time classification results for each of the nine religious monuments evaluated during the practical experiments using computer vision and a mobile device. It can be observed that an accuracy of up to
was achieved for the detection of building components in the case of the Church of Saint Francis of Assisi (São João del-Rei). Furthermore, the accuracy values in this phase were at least
for all the religious buildings assessed. Finally, it is noteworthy that the average accuracy reached
, indicating the effectiveness of the proposed dataset and models in classifying elements of religious buildings using computer vision.
5. Conclusions
The main contribution of this work is the proposal of ImageOP: The Image Dataset With Religious Buildings in the World Heritage Town of Ouro Preto for Deep Learning Classification. This dataset comprises 1613 images of religious buildings from the historic town of Ouro Preto (State of Minas Gerais, Brazil). Ouro Preto was the first Brazilian World Heritage site recognized by UNESCO and houses numerous well-preserved historical monuments from the 18th and 19th centuries. Thus, this paper contributes to the field of heritage digitization through images for the application of artificial intelligence in recognizing five classes: fronton, door, window, tower, and church. Additionally, this study also contributes with a new methodology for visual inspection using deep learning for recognition of architectural elements in religious buildings. For this purpose, experiments are proposed in simulation and practical inspection stages using computer vision and mobile devices for the detection of components in cultural heritage.
The proposed experiments evaluated two traditional convolutional neural network architectures from the literature: MobiletNet V2 and EfficientNet B0. The results from the simulations showed that the model trained with the EfficientNet architecture achieved the best performance, with accuracy = , precision = , recall = , and F-score = . Furthermore, this model achieved an accuracy greater than in detecting the five classes: fronton (), church (), window (), door (), and tower (). In general, the simulated results showed that the MobileNet architecture performed worse than EfficientNet, however, with metrics also superior to : accuracy = , precision = , recall = , F-score = , fronton (), church (), window (), door (), and tower (). Thus, simulated results show the performance capacity of the proposed method to provide detection of components in religious buildings using artificial intelligence.
The results from real-world experiments with the EfficientNet architecture using computer vision and smartphones also confirmed the effectiveness of ImageOP and the deep learning models in classifying building components. In this phase, accuracies exceeding were obtained for detection of elements in nine religious monuments. It is noteworthy that the buildings evaluated in this final phase are from three other historic cities in the state of Minas Gerais, Brazil, meaning they do not have data stored in ImageOP, thereby reinforcing the dataset’s potential for generalization and inference of building components.
In future work, it is anticipated that the proposed dataset will be utilized in various applications. The adoption of ImageOP is suggested for the development of an automatic categorization and digitization system for Brazil’s historic religious heritage. Additionally, the development of a computer vision system for detecting pathologies in the elements (fronton, window, door, and tower) of historical buildings is expected. Other technological devices using the ImageOP dataset will also be proposed, such as a mobile robot prototype that recognizes and describes religious buildings for tourists with visual impairments. Furthermore, it is expected to improve deep learning architectures for recognizing historical religious monuments. To this end, we intend to apply methods for hyperparameter tuning to increase the accuracy of convolutional neural networks, and, thus, enhance the performance of structures aimed at computer vision tasks on mobile devices such as MobileNet and EfficientNet. Furthermore, the data collection process can be improved in the development of the next datasets with the application of unmanned aerial vehicles to take photographs and visual inspection of the religious building roofs.