1. Introduction
Archaeological remains, such as ceramics, can be either below the ground or on the surface. These remains are evidence of historic and pre-historic activities [
1]. As stated by Orengo H.A. and Garcia-Molsosa A. (2019) [
2], the dispersion analysis of surface remains provides researchers with information related to potential changes in land use or the destruction of sites.
The surface survey is a straightforward method for discerning settlement patterns and forms of past human behaviour in the landscape. In addition, this method can study the interactions between past populations and their natural environment and discover archaeological heritage for protection and management purposes in the rapidly developing and changing modern landscape. Nevertheless, traditional ground surface surveys have several limitations, including the following: (a) they are considered time-consuming, (b) their use requires training, (c) they are based on sampling mainly conducted using grids, (d) only the parts of the archaeological record, that are exposed to the land surface can be detected, (e) methodological decisions may not be sufficient to reach the goals of the survey, and (f) certain areas cannot be surveyed due to their surface conditions, accessibility and other environmental conditions (lighting, weather, flora, fauna, etc.) [
2]
In recent years, remote sensing science has been increasingly applied to support archaeological research [
3,
4]. The ever-increasing use of space-based remote sensing applications has been supported by the technological development and improvement of space-based sensors, spatial and spectral resolution, and the implementation of open access and the free distribution of satellite datasets (Landsat and Sentinel products) [
5]. However, the traditional pattern recognition methods such as photo interpretation may prove inapplicable in archaeological research covering large areas or even searching an extensive archival dataset. A crucial factor determining surface research’s success is the research methodology, which may need to be revised or more reliable. Consequently, it is difficult to accurately evaluate the results and their interpretation’s validity, which affects whether the research objectives can be considered successful.
The development of remote sensing over the last 20 years has incentivised the exploration of new possibilities in archaeological research [
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23]. Archaeological research using remote sensing approaches has been prompted to exploit geospatial data systematically. In addition, the democratisation of low-altitude systems, with drones at relatively low costs, has been broadly implemented in archaeological research in the last decade, primarily for documentation cases [
24]. Concurrently, archaeological computational approaches and advanced artificial intelligence (AI) algorithms, rather than desktop-based approaches, are increasingly applied in cloud-based systems [
2]. AI is increasingly attracting widespread interest across various scientific disciplines due to its increasingly powerful predictive capabilities [
1]. Therefore, archaeologists can more fully exploit the knowledge gathered from extensive archaeological data through AI [
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50]. This enables them to make informed decisions about conservation and protection procedures for archaeological elements. Moreover, AI helps determine the most suitable excavation points in a complex cultural landscape.
An evolution of the analytical tools used to support archaeological research occurred during the last decade. This evolution includes techniques like machine learning (ML) combined with geometric morphometry. Machine learning can make detecting archaeological remains more accurate without requiring explicit programming. Lately, artificial intelligence has also been used through deep learning (DL) [
49], which processes these archaeological data based on artificial neural networks with representation learning. A recent study (2002) [
51] indicated that most of the ML and DL algorithms used in archaeology are for object classification and detection. Nevertheless, the detection of archaeological structures using DL algorithms still needs to be improved, specifically when employing aerial/drone imaging. One could argue that we are relatively at the beginning of a new era of so-called “remote sensing archaeology” if we consider that all the changes mentioned above occurred in a relatively short period.
Overall, the findings of Agapiou et al. [
26], together with the results presented by Orengo and Garcia-Molsosa [
2], showed that the application of deep learning algorithms to Unmanned Aerial Vehicles (UAV) images can be a ground-breaking innovation in the field of archaeological research, supporting future archaeological field projects. Additionally, it offers a cost-effective option that provides faster results when applied under favourable conditions, mainly in cases where the research time is restricted. However, its success and accuracy are influenced by multiple factors. To improve both the survey design and the results, we can combine additional complementary procedures like observation methods, remote sensing and AI techniques. Consequently, archaeological remains will be accurately detected by combining remote sensing, machine learning, and deep learning techniques. This will lead to a better understanding of the close relationship and interaction between man and the environment. By studying the environment of the past, we can better approach the study of man and culture and their potential interactions with the landscape in the past.
Our study aimed to investigate the feasibility of developing a semi-automatic archaeological feature detection using artificial intelligence in UAV images (multi-spectral and RGB). The research work of this study was implemented in a simulated field where low-altitude flights were carried out using UAV sensors. The simulated field was an area where no indication of archaeological remains existed. It was given the appearance of a real archaeological field, investigating synthetic elements with known properties like rocks, crops, slopes, soil, and ceramics. We used RGB and multispectral images in the developed methodology, applying artificial intelligence techniques to identify surface archaeological ceramics. The methodology initially included using supervised machine learning classifiers like Random Forest, Support Vector Machines, etc. Then, in a second step, improvement techniques for both data and classifiers were applied. Finally, various evaluation metrics were implemented to assess the classification performance and guide the classifier modelling. The initial results proved the existence of the “accuracy paradox” in the dataset, with an imbalanced class distribution between the archaeological ceramics and the field.
Furthermore, we aimed to answer research questions more efficiently in terms of time and accuracy of the process, compared to traditional archaeological fieldwork. The overall objective of this study was to evaluate whether using low-altitude and relatively low-cost remote sensing sensors can be efficient in detecting surface ceramics through artificial intelligence and image post-processing techniques. It is important to note that the method presented in this paper does not intend to replace archaeological surface surveys but rather to ensure that more time and resources can be allocated to automated or semi-automated technical procedures necessary for the survey.
2. Case Study
A simulation processing was implemented over a plot of approximately 90 m
2 in Alambra village in the Lefkosia District of the Republic of Cyprus (
Figure 1). The survey was conducted in May 2022, during a good period of visibility for archaeological material, as the fields in Cyprus had recently been ploughed.
The field selected for the pilot study was chosen as it represented ideal field conditions during a fieldwork period in Cyprus. The area had recently been ploughed, which would increase the visibility of ceramics, compared to fields with a extensive flora, rocks, and shadows, which can reduce the detection efficiency and cause false identifications, as soil shades resemble those of ceramics. The periodically cultivated plot corresponded to scenarios with appropriate soil visibility and offered an ideal ground for detecting ceramics (
Figure 2a). This approach allowed for the evaluation of the technique’s performance under the best conditions.
The field was almost flat, with a 2% slope; no ceramic was present. For the simulation research, 365 pieces of ceramics were scattered in the field. The size of the pottery fragments ranged from 3 cm to 6 cm. The colour of the ceramics varied, from reddish-orange to brown, depending on their firing (
Figure 2b). The selected area contained no other ceramic remains than those that were placed be us explicitly for this simulation.
3. Materials and Methods
A combination of several recent independent technological developments was applied to the workflow upon which the research was based:
Low-altitude and low-cost UAS have significantly improved their features and have become considerably more affordable to researchers, offering autonomy in flight time for surveying.
Digital photogrammetry is now more user-friendly and accessible by implementing semi-automated workflows that have been integrated into many archaeological workflows [
2].
Machine learning (ML) is an element of artificial intelligence that allows software applications to be more accurate for outcome predictions, without requiring explicit programming. Machine learning applications have significantly increased in recent years and have become a usable choice for data mining, analysis, and object detection in archaeological research [
1].
Deep learning (DL), as a subset of ML learning, computers simulate human behaviour by managing data using artificial neural networks incorporating representation learning. Significant growth in this research has also occurred in recent years [
1].
Finally, various evaluation metrics were implemented to assess the performance of the classification and guide the classifier modelling.
As previously mentioned, the simulation study presented here aimed to investigate whether we could develop a semi-automatic ceramic detection methodology to answer the research questions. These questions are related to the time-consuming data processing and detection accuracy in a typical field condition. To this end, a workflow incorporated low-altitude and low-cost drone imaging for the detailed recording of the surveyed fields, as well as photogrammetry to merge all these images into one orthoimage. Finally, AI techniques like machine learning and deep learning algorithms were tested to detect and classify ceramic fragments through photomosaic. In the following paragraphs, this workflow is presented in detail (
Figure 3).
3.1. UAV Image Acquisition
We used two drones to acquire drone-based images of the selected area of interest. Two flight campaigns were performed on the same day using first the DJI Phantom 4 Pro system (spectral bands: Blue (B): 468 nm ± 47 nm; Green (G): 532 nm ± 58 nm; Red (R): 594 nm ± 32.5 nm), while for the second campaign, we used the DJI P4 Multi-spectral system(spectral bands: Blue (B): 450 nm ± 16 nm; Green (G): 560 nm ± 16 nm; Red (R): 650 nm ± 16 nm; Red edge (RE): 730 nm ± 16 nm and Near-infrared (NIR): 840 nm ± 26 nm). For both flights, the height was 20 m above ground level (AGL). The selected height provided orthophotos with a ground sample distance of approximately 2 cm/px, considered sufficient to detect ceramics on the field under survey. The flight time for each campaign was about 20 min.
3.2. Photogrammetric Processing and Computational Processing
The final step included computational processing (AI techniques) to identify and isolate ceramic fragments using the orthophoto mosaic of the captured images. The photogrammetric processing of the photos involved the orthorectification of all photographs and combining them into a single orthophoto mosaic using the Terra software. Orthorectifying the image involves ensuring that the images are geometrically accurate and corrected from lens distortion, camera tilt, perspective, and topographic relief. Therefore, the images were orthorectified and merged into an orthomosaic map using the photos’ metadata, which contained information like drone model, types of camera sensor and lens, and GPS coordinates. After the mosaics were produced (
Figure 4), image-processing techniques were applied to detect surface ceramics. The same approach was followed for both UAV flights.
ArcGIS Image Analyst tools of the ArcGIS Pro software were used for computational processing. Within the ArcGIS Pro environment, a training model was created using the Training Samples Manager in the Classification Tools, consisting of three classes: ‘ceramics’ (class 1), ‘soil’ (class 2), and ‘crops’ (class 3). The training sample file included a class name indicating the name of the class category and a class value containing the integer value for each class category (class 1 = 1, class 2 = 2 and class 3 = 3). The initial training data were selected by drawing polygons on top of visible ceramic fragments, bare soil, and crops. The creation of the training data consisted of assigning to each class the values of the pixels delimited by the polygons in each composite band. Four supervised classifiers (K-Nearest Neighbour (KNN), Random Forest (RF), Support Vector Machine (SVM), and the Maximum Likelihood algorithm) were applied. We set 500 samples to the SVM, RF, and KNN classifiers as the maximum number of samples per class, considering this was a high enough number to ensure optimal results. The composite images were then classified using the trained classifier. This produced the first classification output. The classification was compared to the orthomosaic to evaluate how it fitted. This step included randomly sampled points creation for post-classification accuracy assessment. The Accuracy Assessment Points tool of the Image Analyst tools was then applied to all classification results. Randomly distributed samples were created in each class, each with an equivalent number of samples. These samples were then compared with the classification results. Based on the confusion matrix per classifier, we then calculated the user’s and producer’s accuracy for each class, as well as the overall kappa index. This procedure was performed for both drones’ images (RGB and multispectral), while all results were extracted and evaluated on a local computer.
3.3. Supervised Machine Learning Classifiers
This section briefly introduces the most well-developed supervised machine-learning classifiers for detecting archaeological ceramics.
The Random Forest algorithm is a viral supervised machine learning algorithm used in many archaeological classification cases. It is based on ensemble learning and is a set of individual decision trees. Each tree combines different samples and subsets of the training data [
52].
The Maximum Likelihood classifier is used for image classification. Its technique is based on two principles: the normal distribution of the pixels in each class sample in the multidimensional space and decision making using the Bayes’ theorem. Assuming a normal distribution of the class sample, then each class can be indicated by a mean vector and a covariance matrix. Considering these two characteristics for each cell value, the statistical probability for each class can be assessed to define the cell’s membership in the class [
53].
Another supervised classifier, the Support Vector Machine (SVM), is a powerful classification method that can also process a standard image or a segmented raster input. This classification method is widely used among researchers and is trained to classify everything as the prevalent class, minimising the error and increasing the margin [
54].
Finally, the K-Nearest Neighbor is another supervised classifier that can classify a pixel or segment using a plurality vote of its K neighbours. The data points in each category among these k neighbours can be counted if the Euclidean distance of the K number of neighbours is calculated [
55].
The final result was compared with the number of scattered ceramics placed at the beginning of the archaeological campaign. An evaluation of the classification was also made for all classes. The results are presented in the next Section.
5. Discussion
Previous results indicated that low-altitude sensors can provide significant detection results but also point out existing research limitations for detecting surface ceramics. These limitations restrict the accuracy of the detection of the minority class of ceramics. To overcome this ‘accuracy paradox’, future studies need to (re)consider ceramic surface detection as an ‘imbalanced data distribution’ problem.
Indeed, in previous studies, a problem with the misclassification of minority classes (i.e., archaeological ceramics) was found. Therefore, despite the high accuracy level, the actual detection rate for the ceramic class remained low. Classifiers tend to predict with higher accuracy classes with extensive data compared to those with few data.
Most classifiers assume a relatively balanced normal class distribution and equal misclassification costs. But when these classifiers are used to classify data with an imbalanced class distribution (skewed class proportions), their performance encounters significant drawbacks (
Figure 8). In these datasets, classes with a large proportion of the dataset are called majority classes. In contrast, those with a smaller proportion are minority classes. Sun et al. indicated in 2009 [
57] that the modelling can be influenced by factors besides skewed data, like a small sample size, separability, and sub-concepts within a class.
Similarly, widely used accuracy assessments need to be adopted. Traditionally, the most widespread metric to evaluate the performance of a classification model is accuracy. In the remote sensing community, the kappa coefficient has been considered an advanced evaluation metric in comparison to overall accuracy (Congalton et al., 1983 [
58]; Fitzgerald and Lees, 1994 [
59]). Nevertheless, Foody [
60] explained that the Kappa coefficient is unsuitable for assessing and comparing the accuracy of thematic maps obtained by image classification. This suggests that researchers should abandon the use of the Kappa coefficient in accuracy assessments. In addition, the author encouraged them to use a set of simple evaluation metrics and associated outputs, like estimating accuracy per class and a confusion matrix for evaluation and comparison of the classification accuracy.
As presented in all the above case studies, these metrics are widely adopted but are not reliable for imbalanced data classification. Joshi et al. in 2001 [
61] and Weiss in 2004 [
62] reported that accuracy is no longer a proper evaluation metric for classification cases with imbalanced data, since the minority class has an insignificant impact on accuracy compared to the majority class. The preliminary results of accuracy presented in this study confirmed the 2009 study of Prati et al. [
63], which stated that it is easy to achieve an accuracy of 99.9% in a domain where the majority class has a 99.9% prevalence. All these observations indicate that archaeological ceramics detection is characterized by imbalanced data related to surface ceramics, soil, and crops, where ceramics represent the minority class, and soil and crops represent the majority classes.
Improved classification results would be valuable for further analyses and the development of tools and a workflow to treat imbalanced data or to re-design learning algorithms. At the data level, a possible solution would be rebalancing the class distribution by resampling the data space. Meanwhile, at the algorithm level, a solution would be to adapt existing classifier learning algorithms to strengthen learning regarding the small ceramics class. Furthermore, boosting algorithms are considered for future work facing the problem of imbalanced data.
6. Conclusions
Our study aimed to investigate whether it is possible to detect archaeological ceramics in an automated way by applying artificial intelligence techniques to high-resolution images captured with UAVs. In addition, we aimed to provide answers regarding the development of a methodology that will perform efficiently in terms of time and accuracy compared to traditional archaeological field surveys. Thus, supervised machine learning algorithms were implemented using RGB and multispectral UAV images.
The overall findings of this study in a simulated environment, utilising the methodology presented by Orengo and Garcia-Molsosa [
2], showcased that low-altitude remote sensing sensors can be innovative in archaeological research. The classifiers tend to predict majority classes with high accuracy, while they are useless for predicting minority classes. In our study, a methodology was proposed to overcome this problem and detect surface ceramics using RGB and multispectral drone images.
In this paper, the detection of ceramics was limited to a single cluster of ceramics (one type), as this was the current archaeological record. Nevertheless, the authors expect to investigate the detection of different clusters of ceramics in the same area, i.e., archaeological findings of different chronological periods with different typologies and spectral behaviours. Of course, the detection of various classes of ceramics during the same flight requires a (statistically) significant spectral separability of the different types of ceramics. Controlled and laboratory spectral measurements may provide further insights into this direction (e.g., spectral windows to optimise and enhance the separability of the ceramics).
Future work will include new drone survey campaigns with surface ceramics in the same simulated and known archaeological area. These campaigns will increase the data available for training the algorithms and apply all the methodologies to evaluate and compare the results. Further applications include flights at different heights and further analyses using deep learning algorithms. Other classification improvements will include eliminating random noise, filtering noise, and separability or a combination of all of them, obtaining a combination of new data, modifications of the supervised classifiers that were used, and implementing other boosting algorithms. Evaluating imbalanced ceramics data will also assess the sensitivity of such data using other evaluation measures like F-measure, G-mean, and ROC analysis. These types of measures are ideal evaluation measures because they consider only the positive classes in the performance (True Positive Rate (TPrate) and Positive Predictive Value (PPvalue)). The basic steps of the future research methodology are illustrated in
Figure 9.