1. Introduction
Cultural assets are all movable and immovable assets on the ground, underground or underwater that are related to science, culture, religion and fine arts from historical periods or that have been the subject of social life in prehistoric or historical periods and have scientific and cultural original value. Immovable cultural assets consist of rock cemeteries, inscribed, illustrated and embossed rocks, illustrated caves, mounds, tumuli, archaeological sites, acropolises and necropolises; castles, churches, synagogues, mosques, basilicas, monasteries, social complexes, old monuments and wall ruins; and frescoes, embossments, mosaics, fairy chimneys and similar immovable assets. Movable cultural assets consist of tiles, ceramics, sculptures, figurines, tablets, papyrus, parchment or documents written or depicted on metal, ornaments or jewelry, coins, stamped or inscribed tablets, manuscript or illuminated books, miniatures, engravings, oil or watercolor paintings of artistic value, fabrics and similar items.
Today, these assets are usually investigated by characterization methods that lead to minimum destruction in the culture asset such as Raman Spectroscopy (RS), X-Ray Diffraction (XRD), X-Ray Fluorescence Spectroscopy (XRF), Fourier = Transform Infrared Spectroscopy (FTIR), Laser-Induced Breakdown Spectroscopy (LIBS), Inductively Coupled Plasma Mass Spectrometry (ICP-MS), Scanning Electron Microscopy Energy-Dispersive X-Ray Spectrometry (SEM-EDX), Atomic Force Microscopy (AFM) and by non-destructive methods such as Portable X-Ray Fluorescence Spectroscopy (pXRF), Macro X-Ray Fluorescence (MA-XRF), Portable Raman Spectrometry (Portable Raman), Portable Fourier-Transform Infrared Spectroscopy (Portable FTIR), Reflectance Spectroscopy and Multispectral/Hyperspectral Imaging [
1,
2,
3,
4,
5,
6].
Since these analyses performed on cultural assets enable the recognition and distinction of that period, culture or artist, they provide information about the originality of the work to professionals including art historians, archaeologists, etc. For the conservator-restorer, knowing the material chemistry of the work is also important for the conservation-restoration work to be carried out. Nevertheless, the necessity of employing destructive methods to obtain samples from the artwork, which is subject to legal permission, leads to hesitations among individuals and organizations tasked with its preservation. Generally, destructive characterization methods are employed on samples obtained from areas that do not impact the aesthetic value of the cultural asset, on shed fragments and amorphous groups. As a result, they offer information that reflects only a limited part of the entire work. Non-destructive methods provide ease of analysis for portable cultural assets. Since material analyses of immovable cultural assets are limited to the area that the analyst can reach, works such as mosaics, tiles, wall paintings, hand-drawn works, etc., on a dome and transition elements cannot be analyzed. When it comes to accessing the work, a power source is required for the operation of some devices. Working directly on the work is not always reliable with some devices, such as a Raman spectrometer, which tends to burn dark colors. It requires preliminary research. For this reason, analyses conducted to obtain information about that period or artist become a problem for art historians, archaeologists, conservators and restorers.
In addition to the cost, there is a need for a system that can be easily used by everyone to provide documentation of period materials in archaeology, art history, conservation and restoration science, which is also lacking in analysis and interpretation. This is because this negativity causes the inability to obtain information about many cultural assets that are the common heritage of humanity. This situation has recently been addressed in cultural heritage rights and human rights law [
7]. In this study, which was prepared to find solutions to the problems experienced in the national and international arenas, it is aimed at developing a non-destructive, cost-minimizing and easy-to-use analytical method for cultural assets. For this purpose, it is aimed at classifying digital images (RGB) of pigments used in all areas of cultural heritage. Harth investigated how pigments in cultural heritage could be studied using machine learning. He found that a recent trend in the field is toward spectral imaging techniques for the chemical mapping of paint surfaces. Although the tendency toward this type of examination in the literature is due to its low cost compared to non-destructive and spectroscopic methods, it still requires budgeting. Although classification and identification studies are carried out with deep learning architectures as a non-destructive method in the literature on pigment and color science, these studies have a costly approach since they are performed with data from multispectral/hyperspectral cameras and a reflectance spectrometer [
8,
9,
10,
11,
12,
13].
In a study by Andronache and colleagues, spectral data of 45 pigments painted on canvas and wood were analyzed using 30 Mu.S.I.S NIR cameras, equally spaced in the range of 400–1000 nm. These data were processed with statistical hierarchical methods, fractal algorithms and complexity measurements. PCA combined with clustering methods allowed the spectral data to be referenced with the Mahalanobis connection distance and to highlight clusters directly related to the intensity differences in the NIR range for the segmented spectral cubes of each panel.
Thanks to this research, it was found that the spectral cube of a painting in the spectral range of 420–1000 nm could be identified with the closest example (plain or overpainted) of the painting’s surface database, and the combination of colors or pigments that made up the color could be identified. However, this method, which allows for the non-destructive identification of pigment, is again costly as it requires a multispectral camera [
14].
In a study conducted by Mandal et al., pigments imaged hyperspectrally in near infrared were classified using CNN as well as a Spectral Angle Mapper (SAM), Spectral Correlation Mapper (SCM), Spectral Information Difference (SID), Spectral Similarity Scale (SSS), and hybrid combinations of SID-SAM and SID–SCM algorithms. In the study, it was determined that CNN performed better than other machine learning algorithms [
15].
In a study conducted by Pouyet et al., it was determined that material characterization was achieved when the hyperspectral reflections of historical pigments were examined with shortwave infrared (SWIR). Within the scope of the study, a new spectral database was developed using a deep neural network (DNN) to eliminate the complexity in the data of pigment references. When the historical image was examined with the created database, it was observed that the model showed good performance in identifying and mapping pigments in complex materials (spectrum matching) for unknown mixtures or multi-layered systems [
16].
In a study conducted by Chen et al., the structures of pure (not having a mixture of two dyes) pigments in images were determined by the XRF method and their reflections were recorded with a hyperspectral camera. In the study, which allowed the analysis of a hyperspectral image using a combination of convolutional neural networks and the SCM spectral metric function, image segmentation was performed based on a database of pure elementary reflections. The results obtained produced accurate results that were verified by the use of analytical techniques, namely XRF analysis [
17].
Except for historical pigments, when we looked at the literature on the identification of pigments in the food field, it was observed that similar classification methods were used. In a study conducted by Prilianti et al., digital images of three main photosynthetic pigments (anthocyanin, chlorophyll and carotenoid) found in plant leaves were taken with a multispectral camera. A convolutional neural network (CNN) model was developed to provide a real-time analytical system. The input of the system is a multispectral digital image of a plant leaf and the output is the content estimate of the pigments. From all experiments conducted with three different CNN models (ShallowNet, AlexNet and VGGNet), it was determined that the ShallowNet-based architecture was the best architecture for photosynthetic pigment estimation. It achieved a satisfactory estimation with an in-sample MSE of 0.0037 and an out-of-sample estimation of 0.0060. The real data range was from −0.1 to 2.2 [
18].
In a classification made by Kazdal on black teas of two different qualities, a CNN was used along with algorithms such as SVM and Naive Bayes. Classification was performed with datasets consisting of features obtained from RGB, HSV and YCbCr color spaces of images for SVM and Naive Bayes classifiers. The SVM algorithm showed a high accuracy rate of 99% in the test with features obtained from the YCbCr color space of two different quality black teas. In addition, the CNN made a classification with 98.52% accuracy on training images and 98.56% accuracy on images used for verification without the need for any feature extraction process on the images due to its structure [
19].
In the classification made by Büyükarıkan and Ülker on fruits, a CNN was used again. Fruit images consisting of 29 classes were obtained with 12 different color temperatures and they used an ALOI-COL dataset consisting of 1000 classes. Fruit images consisting of 29 classes in the ALOI-COL dataset were classified using the ESA architectures AlexNet, VGG16 and VGG19. The images in the dataset were enriched with image processing techniques and 51 images were obtained from each class. The study was examined in two structures as 80–20% and 60–40% training tests. As a result of 50 cycles, the test data were classified with 100% accuracy in the AlexNet (80–20%) and VGG16 (60–40%) architectures and 86.49% accuracy in the VGG19 (80–20%) architecture [
20]. Flachot, who investigated the effect of light on color, worked on identifying the correct Munsell chip used as the surface reflection for the object using the ResNet, ConvNet and DeepCC deep neural networks (DNNs) in a scene derived from three-dimensional objects in a room illuminated with a Munsell chip, illuminated under 278 different natural lighting conditions. In his research, he found that ResNet and ConvNet performed well, while DeepCC represented colors in three color dimensions of human color vision [
21].
In a study conducted by Bianco et al. on color constancy and illumination estimation, a CNN was used, and in a study conducted by Choi et al., a deep convolutional neural network (DCNN) and ResNET18 architecture were used [
22,
23]. In a study conducted by Huang et al. to estimate the pigment mixture in watercolors, an average ∆ELab of 2.29 and ∆ELab < 5 of 88.7% success were achieved by using a loss function and CNN to minimize perceptual differences in color [
24].
Color, which is an important criterion in image classification, was used in the diagnostic system of middle and outer ear diseases by Viscaino et al. in the field of health. Viscaino et al. developed a computer-aided diagnosis (CAD) system with an F1 score of 96 by training the eardrum imaged for each ear disease in a CNN with RGB color channels [
25]. A study by Sáez-Hernández and colleagues used a chemometric support vector classifier to estimate CIELAB values from RGB values of an image taken with a smartphone that was colorimetrically characterized to distinguish between different historical inorganic pigments used in murals. The study showed that RGB images taken with a smartphone could be used in color classification [
26].
In the research conducted by Al-Omaisi Asia et al., a CNN, which is a deep learning application in fundus photography, was used to distinguish the stages of diabetic retinopathy. To detect DR, the power of the CNN with different residual neural network (ResNet) structures was taken advantage of, namely ResNet-101, ResNet-50 and VggNet-16 [
27]. In a study conducted by Kwiek and Jakubowska, vitamin C was detected from images by creating color standardization in chemical solutions using the deep learning method [
28]. Signh, who empirically investigated the importance of colors in object recognition for CNNs, examined five different datasets with different architectures (MobileNet-v2, DenseNet-121, Resnet-50, BagNet-91, BagNet-9). In his research, he found that different architectures exhibited similar behaviors in terms of color importance across the datasets. Sighn’s study provided empirical evidence to highlight the high impact of colors for CNNs [
29]. In a study conducted by Rachmadi and Purnama, CNN architecture was used to develop a vehicle color recognition system. Data from the Chen et al. study were used for training. The dataset contained 15,601 vehicle images with eight classes of vehicle colors, namely black, blue, cyan, gray, green, red, white and yellow. Each sample was resized to 256 × 256@3 resolution with specific color spaces and four different color spaces, namely RGB, CIE Lab, CIE XYZ and HSV, were used. The best accuracy from the experiment was obtained using the RGB color space. In the paper, a vehicle color recognition system with 94.47% accuracy was developed using a convolutional neural network [
30].
Based on the findings of these researchers, deep learning and convolutional neural networks were used in this study because they were successful in image classification [
31,
32].
3. Methods
A dataset was created by taking photographs of samples. An image or image is a visual representation of something and is obtained because of some light events [
40]. As the wavelength in a light spectrum changes, the pixel values in each color channel of the image are captured by a camera change, because different wavelengths of light affect the light reflected from the object surface [
41].
Light is the heat coming to our eyes from an energy source transformed into electromagnetic waves. According to Lambert’s law, when light reaches the surface of an object, some portion of it is reflected, some is transmitted and the rest is absorbed. Since the amount of reflected, transmitted and absorbed light varies according to the properties of the object surface, bulk structure and wavelength of the light, the perception of color also changes [
40,
42,
43]. The different colors in electromagnetic waves are due to the different wavelengths (frequencies) and vibrations of these waves. In other words, each color sends us vibrations of different wavelengths. Wavelength values according to light colors are given in
Table 3 [
43]. The whiteness of a light is characterized by the color temperature (CCT) of a lamp and is defined in Kelvin. The white light source illuminating the object was divided into three groups according to their color temperatures. This grouping is given in
Table 4. The table shows that as the color temperature decreased, the image became more reddish, and as the color temperature increased, the image became more bluish [
41].
In natural light, the color temperature changes according to the weather and the movement of the sun, so the perception of color changes. The color temperature of sunlight filtered by the atmosphere is 5600 K. This value is the color temperature of noon and cloudless weather. In the morning and evening hours, this value drops below 4000 K. In clear, blue skies, this value increases to 10,000 K or above. The color temperature of a tungsten incandescent lamp used in homes is 2700 K. Although the amount of light emitted from any lamp is not related to the color of the lamp, the amount of light emitted from the lamp affects a viewer’s ability to see the object. Another factor that affects the viewer is the background of the surface where the paint is placed, because light is not absorbed or reflected only by the paint. For example, since a black background does not reflect all light, the colors on it appear clearer, more vivid and lighter, while a white background reflects almost all light and color on it appears darker [
43].
For this reason, the images of the samples in the dataset were taken on a black and white background using different light sources and intensities. A Nikon D7100 digital camera (Nikon, Tokyo, Japan) was used in the photo shoot. In natural light shots, the ISO setting was 100, aperture 8 and shutter speed 60. In artificial light shots, the ISO was set to 200 to benefit more from the light. An aperture of 8 and shutter speed of 60 were used. The intensity of the light in the photo shoots was measured with an Illuminance UV Recorder TR-74Ui brand lux meter. The size of the photos was 6000 × 4000 and they were taken in large size. No flash or color calibration card was used in the photo shoots, because artificial intelligence would be trained on the pigment type, not on the brightness light. In other words, if the artificial intelligence was asked to estimate the color tone in the pigment or the light used, the calibration card could have been a reference while creating the data. However, artificial intelligence was asked to recognize the pigment. For this reason, the pigment was photographed under different light sources and light intensities. A more concrete data analyzer, a lux meter, was used to observe the color change in the pigment. In the photo shoot, the fresco-secco, tempera, tempera grassa, oil paint, watercolor, tone scale samples, pigments used in the production of paints, and another sample prepared to see the color of the pigment in light and shadow were used. This sample was prepared on watercolor paper with a mixture of arabica gum and pigment. The sample was folded in four. Thus, light and shadow were created, because when painters created their works, they determined the colors of objects according to the light source in the work.
For this reason, while a pigment appeared in its color under daylight in some pictures, it appeared darker or lighter in situations where the light varied. This is related to the dominant light source in the picture and how this source illuminates the environment. The artist achieved this change in the pigment by darkening the paint with black or lightening it with white or by making the color more green, reddish or orangey. Within the scope of this project, only light was used as a source to determine this change in the pigment. The pigment used in the photo shoots and the sample created to see the color tone based on light and shadows are given in
Figure 12 and
Figure 13.
Natural Light: Since the images of the samples in natural light changed according to the angle of incidence of the sun rays, photographs were taken outdoors at sunrise, noon and sunset on a black and white background. According to the luxmeter data, the light intensity was measured at different intensities as 3415 K at sunrise, 1390 K-8185-7591 K at noon, and 5962-5317-4651-3821 K toward sunset. The photo shooting environment is given in
Figure 14 and
Figure 15.
Artificial Light: Artificial light shots were taken under white and yellow light. LED was used as the white light source and an incandescent lamp was used as the yellow light source. LED light shots were taken in the Life of Photo product shooting tent. There were 120 LED lamps with a power of approximately 50 W on the upper part of the shooting tent. In this tent environment, samples were photographed under two different conditions. The first of these was adjusting the LED lights in the tent from the lowest to the highest level, and the second was keeping the LED lights constant at the highest light intensity and selecting different color temperatures on the camera. Under both conditions, photos were taken using black and white backgrounds. An example of the photo shoots is given in
Figure 16.
The light intensities of the photographs taken under the first condition were the following: 1029 K, 1239 K, 1325 K, 1895 K, 2214 K, 3031 K, 3546 K, 4024 K, 5015 K and 6512 K. The photographs taken under this condition are given in
Table 5. Under the second condition, the LED was set to 5000 Kelvin in accordance with daylight, and the photographs were taken according to the different Kelvins in the color temperature options in the camera’s white balance setting. These photographs are given in
Table 6. The camera was set to 2700 K for the warm white image, 4000 Kelvin for the warm white image and 5600 K for the cold white image. Since a blue tone was observed in the pigments in the photographs taken at 2700 K to obtain a warm white image, they were not used as a dataset.
The second light source in artificial lighting, the incandescent lamp, was used to see the colors of the pigments in a dim yellow light environment. The light intensity of the incandescent lamp was measured as 2888 K in the lux meter. The photo shooting environment is shown in
Figure 17.
In addition to natural and artificial light, photographs of the pigments in their wet and dry states were used as a dataset, because the pigments could be observed to lighten or darken in color during the wetting and drying processes. These data obtained while creating the samples are given in
Figure 18.
3.1. Dataset Creation
The photographed samples were cropped in a square shape using the Image Cropper Pro application, and each pigment was saved in its own folder by numbering the number of images. The name K1 was used for the red ochre pigment, MM1 for Egyptian blue, SO1 for yellow ochre, YTP1 for the green soil and UM1 for ultramarine blue. An example of the pigment folders is given in
Figure 19. A total of 8332 images were created, with 1643 from Egyptian blue, 1620 from red ochre, 1691 from yellow ochre, 1682 from ultramarine and 1696 from the green soil pigment. In creating the images, care was taken to ensure that the pigments were close to each other in number and balanced. The data created were converted to 256 × 256 pixels in the FastStone Photo Reziser application and prepared for the working environment.
3.2. Working Environment
Visual Studio Code 1.87.2 was used in the working environment and Python (version 3.9) was used as the programming language. Training and testing of artificial intelligence algorithms were performed on an AMD Ryzen 5 3600 6-Core processor and NVIDIA GeForce GTX 1080 TI (11 GB) graphics card. SVM machine learning algorithms were used except for the deep learning models studied on the processor and using the Python sklearn library. The TensorFlow library developed by Google was used for the deep learning algorithms. In addition, neural network modelling was performed with Keras. Keras was preferred because it is a library that can work with the TensorFlow library and makes neural network modelling more practical. Since artificial intelligence models produced with libraries such as TensorFlow work more efficiently with graphics cards, the data developed and tested with this structure were processed on the graphics card. Data processed by a graphics card are less costly compared to processors.
3.3. Classification Models
A support vector machine (SVM), Convolutional Neural Network (CNN), DenseNet, VGG19 and Resnet50 were used in the classification of the data. The difference between the CNN and the other models was that its architecture was specially designed for the classification of historical pigments. Other models were pre-trained. The reason for using more than one model in the classification was to see the performance of the CNN compared to the pre-trained models and to find the model that gave the best result in the classification of historical pigments.
In the context of color classification, CNN, SVM, ResNet50, DenseNet and VGG19 exhibit unique advantages and limitations. CNNs are naturally well suited for capturing local patterns like color gradients and texture, making them effective in color-based tasks. However, their performance can degrade on more complex color spaces due to the reliance on fixed filter sizes, which may miss subtle color variations. SVMs, although highly efficient for smaller, well-defined feature spaces, often struggle with the intricate relationships in color data, especially when subtle hues or overlapping shades must be distinguished. SVMs excel in linear separability but require extensive feature engineering to match the flexibility of CNNs.
ResNet50, with its residual connections, is particularly adept at capturing color relationships in deeper networks, allowing it to excel in tasks that require understanding finer color distinctions. The added depth, however, can make ResNet50 overcomplicated for simple color classification tasks, leading to unnecessary computational overhead. DenseNet’s dense connections, which aggregate color features from multiple layers, offer a more compact solution, potentially improving accuracy by reusing learned color features. This reduces the need for redundant computations, but the network’s structure demands higher memory, which can be limiting in resource-constrained environments.
VGG19, while effective in basic color classification, tends to suffer from inefficiency due to its deep, sequential architecture and large parametric size. Though it can handle the hierarchical structure of colors, it lacks the innovations seen in more modern architectures like ResNet50 and DenseNet, making it slower and more prone to overfitting, particularly in smaller color datasets. Therefore, while all the models have their utility in color classification, deeper architectures like ResNet50 and DenseNet offer a more nuanced handling of complex color data, while a SVM and CNN may suffice for simpler color tasks with fewer computational demands.
3.3.1. Support Vector Machine
Support vector machines are a machine learning method developed by Vladimir Vapnik and Alexey Chervonenkis in 1960 based on basic statistical learning theory. This method is used in data mining for classification problems in datasets where the patterns between variables are unknown [
44]. SVM classifies data as linear in two-dimensional space, planar in three-dimensional space and hyperplanar in multi-dimensional space. The best hyperplane for an SVM is the one with the largest margin between the two classes. The points closest to a hyperplane are called support vectors. In support vector machines, training data were used to find the most suitable hyperplane, and test data were used to classify the part of the area separated by the hyperplane by including it in the class of that part [
45].
3.3.2. Convolutional Neural Network
Convolutional neural network architecture is a supervised machine learning algorithm and artificial neural network type used in analyzing high-dimensional data. It has a multi-layer feedforward neural network formed by sequentially overlapping many hidden layers [
46]. CNN architecture basically consists of three layers: an input layer, a hidden layer and an output layer. The hidden layer consists of the following: a convolutional layer, pooling layer and fully connected layer [
47].
CNNs are designed to learn spatial hierarchies of features automatically and adaptively through backpropagation using multiple building blocks like in the hidden layer [
46]. They are used to extract the features of a image and obtain a result in line with the purpose for which they are used [
47]. The CNN designed for the classification of historical pigments had 4 (four) convolutional layers, 4 (four) pooling layers, one (one) flattening layer and one fully connected layer. A summary of information on the CNN model is shown in
Table 7 and its architectural structure is shown in
Figure 20.
The convolutional layer of this architectural structure was used to perform feature extraction by applying a filter to the image and to detect colors. The pooling layer was used to reduce the dimensions of the input and make the feature extraction process more accurate [
47].
The third layer, which was a fully connected layer, served to provide the output necessary to classify the extracted features. The convolutional layer served as a combination of linear and nonlinear operations. The pooling layer provided a downsampling operation that reduced the in-plane dimensionality of the feature maps to add translational invariance to small distortions and reduce the number of subsequent learnable parameters. The output feature maps of the last convolutional or pooling layer are typically flattened and connected to one or more fully connected layers, also known as dense layers [
46].
In all convolutional layers of the CNN model designed for the classification of the historical pigments, ReLU activation function was used, and 32 filters were applied in the first convolutional layer, 64 filters in the second convolutional layer, and 128 filters in the third and fourth convolutional layers. The pooling layers were applied after each convolutional layer. Classification was performed using a flattening layer and a fully connected layer to turn the outputs into a single-dimensional vector. A total of 833,733 parameters were used. The pigments were cropped using the Image Cropper Pro application, and after obtaining 256 × 256 images using the FastStone Photo Resiser application, they were sent to the CNN model for classification without any feature extraction.
3.3.3. Densely Connected Convolutional Network
DenseNet is a type of CNN in which all layers are directly connected to each other (with matching feature map sizes) and it uses dense connections between layers through dense blocks. In this neural network, each layer receives additional inputs from all previous layers and transfers its own feature maps to all subsequent layers [
48]. In other words, in the system, each layer is connected to other layers in a feedforward manner [
49]. In general, by using a structure similar to the ResNet architecture, the features coming out of a layer in DenseNet architecture are given as input to all lower layers [
50].
3.3.4. Residual Network 50
ResNet is a type of neural network introduced in 2015 by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun in the article “Deep Residual Learning for Image Recognition” to facilitate the training of networks that are significantly deeper. Designed based on the VGG-19 architecture, the network uses a 34-layer flat network architecture with fewer filters and lower complexity than VGG networks. The innovative aspect of the ResNet neural network is the block solution it offers to the problem of the disappearance of the gradient that occurs with increasing depth, which is one of the fundamental problems of deep learning architectures. When the network depth is increased, the convolutional blocks in the upper layers are connected to the outputs of the convolutional blocks in the lower layers at certain periods, preventing the disappearance of the gradient problem. The network is given input images of 256 × 256 in size and consists of approximately 25 million parameters. This architecture, which has fewer trainable parameters than the VGG architecture, is widely used in image classification due to its high performance [
50].
3.3.5. Visual Geometry Group 19
VGG is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from Oxford University in 2014, and it achieved successful results on ImageNet, a dataset containing more than 14 million images of 1000 classes, in the ILSVR-2014 competition. The network was created by making improvements in the AlexNet convolutional neural network model and takes inputs with dimensions of 224 × 224. The architecture, which has approximately 143 million parameters, consists of three convolutional layers [
50].
4. Experiments and Results
In this section, the training experiments and the results obtained from the five models used in this study are discussed in detail.
4.1. Training
The CNN, DenseNet, VGG19, Resnet50 and SVM models were applied for the detection of historical pigments. Before applying the models, the dataset, consisting of 8332 images of 256 × 256 pixels in total, was divided into 70% training, 20% testing and 10% validation. A total of 1665 images were used for testing in the five models.
For deep learning algorithms, it is generally recommended to decrease the learning rate as the training cycle progresses while training the model. In the four deep learning models, the initial value of the learning rate was 0.001 and the learning rate was decreased every 10 epochs. The momentum value was set to 0.9 and the batch size was set to 32. Each model was run for 20 epochs and the weights with the highest validation value were recorded.
In the SVM, a utilized classifier with a linear kernel was used to address the classification task. The linear kernel was chosen due to its effectiveness in linearly separating data and its computational efficiency for high-dimensional feature spaces. Specifically, the SVM model was implemented using the SVC class from the scikit-learn library, with the hyperparameter C set to 1.0 and a fixed random state of 42 to ensure the reproducibility of the results. The regularization parameter C controlled the trade-off between maximizing the margin and minimizing classification error; a value of 1.0 was selected as a balance between these objectives. The classification performances of the models were evaluated with accuracy, precision, recall and F1 score performance measures. The results of the models are given below.
4.2. Results
The SVM, CNN, DenseNet, Resnet50 and VGG19 classification results are presented below in
Table 8,
Table 9,
Table 10,
Table 11 and
Table 12, respectively. According to the SVM result report, the accuracy rate of all pigments was 97%, and the F1 scores were 97% for the green soil, 100% for red ochre, 96% for ultramarine, 98% for yellow ochre and 95% for Egyptian blue. As stated in the CNN classification result report, the accuracy rate of all pigments was 97%, and the F1 scores were 98% for the green soil, 100% for red ochre, 96% for ultramarine, 98% for yellow ochre and 95% for Egyptian blue.
Conforming to the DenseNet classification result report, the accuracy rate of all pigments was 100%, and the F1 scores were 100% for the green earth pigment, 100% for red ochre, 99% for ultramarine, 100% for yellow ochre and 99% for Egyptian blue. As reported by the ResNet50 classification result report, the accuracy rate of all pigments was 100%, and the F1 scores were 100% for the green earth pigment, 100% for red ochre, 100% for ultramarine, 100% for yellow ochre and 100% for Egyptian blue. For the VGG19 classification result report, the accuracy rate of all pigments was 99%, and the F1 scores were 99% for the green earth pigment, 100% for red ochre, 99% for ultramarine, 100% for yellow ochre and 99% for Egyptian blue.
A summary of the results of the models used in the classification of historical pigments is given in
Table 13. McNemar’s test was performed to see the success performance of the models.
4.3. McNemar Test Results
The McNemar test is a variant of the χ
2 test and is a nonparametric test used to analyze matched data pairs. It is used to determine the strengths and weaknesses of each learning algorithm in machine learning [
51]. This test was conducted to compare the models used in the classification of historical pigments within the scope of the research and to find the most successful model. In the test, the PA value of each of the 1665 images was calculated, and then the PA values of the models for each image were compared with each other. During this comparison, the high model was given a value of 1 and the low model a value of 0 according to the PA values. Equal PA values were not included in the calculation. Then, the z value was calculated according to the calculation formula, allowing the PA values of the 1665 test images to be found and compared with each other [
47]. Sample values for comparison are shown in
Figure 21.
All PA values were compared as presented in the examples above. The
A (1,0) value was taken as 1 if the two PA values were greater than each other and as 0 if they were smaller. The
A (0,1) value was taken as 0 if the PA value in the two compared values was greater than each other and as 1 if they were smaller. As a result, the values of 1 were collected in all comparisons and the z values were calculated [
47]. The calculated z values are shown in
Figure 21. While the directions of the arrows (←, ↑) used in the comparison of the models show the more successful model, the gray areas show that the comparison was made with the previous model. According to
Table 14, while the DenseNet, ResNet50 and VGG19 models were more successful than the CNN, the CNN was more successful than the SVM. While the DenseNet model was more successful than the SVM and VGG19, ResNet50 was more successful than DenseNet. ResNet50 was more successful than VGG19 and the SVM. The VGG19 model was more successful than the SVM.
When all models were evaluated, it was seen that the most successful model was ResNet50. After ResNet50, this order was followed by DenseNet, VGG19, the SVM and the CNN. The results obtained in the McNemar test had similar success with the code result.
4.4. Field Work on the Created Artificial Intelligence System
The aim of this research was to develop a new analytical method that did not take samples from painted historical artifacts and had a minimal cost. In line with this purpose, whether the trained artificial intelligence could make correct pigment estimations by using images of cultural assets whose pigment chemistry was determined by instrumental methods in the literature was examined. Within the scope of this study, images were taken from the area where there was no paint mixture and where a single tone was dominant. The sample area description is given in
Figure 22.
In the selection of the example, no attention was paid to the lighting conditions under which the photograph was taken, because within the scope of this study, artificial intelligence was trained with pigment photographs of different light sources and different light intensities. For this reason, it was not important whether the photograph of the work to be taken as an example was taken in daylight (morning, noon, evening) or under artificial light. The dataset in artificial intelligence was created by taking these light variables into account. The photographs of the artwork to be tested were cropped into a square shape in the Image Cropper Pro application and then converted to 256 × 256 pixels in the FastStone Photo Reziser application. This process is shown in
Figure 23. In the test dataset created for each pigment type, images of artworks, as well as images of different paint companies and paints with different tonal contents, were used. The test of the pigments for the real class and predicted class was carried out in the VGG19 model, which had a 99% success rate. The data used in the test of the pigments and the VGG19 test results are given below.
In the images loaded into the VGG19 model for testing purposes, the model did not know which pigment was used for the image. The data owner knew which pigment the relevant images had according to the analyses made with instrumental methods to date. Therefore, whether the VGG19 model, which had been trained on pigments, could correctly classify the pigments used in archaeological and artistic works was tested. Test data are given in
Table 15. Visuals of the artworks in
Table 15 are given in
Figure 24.
The VGG19 model predicted nine yellow ochre pigments with 9/9 accuracy, 11 green earth pigments with 11/11 accuracy, five red ochre pigments with 5/5 accuracy, 11 Egyptian blue pigments with 11/11 accuracy and seven ultramarine pigments with 7/4 accuracy. The model correctly classified the pigment used in the artwork as ultramarine, and it classified the ultramarine color produced by different paint companies as Egyptian blue. When this result obtained in the ultramarine data was examined, it was seen that the VGG19 prediction was correct, because it was determined that there was no color close to a similar tone in the ultramarine training of the model and that the ultramarine color produced by different paint companies was closer to the Egyptian blue in the VGG19 model. The findings of this determination are given in
Table 16. For this reason, when the model was trained with the ultramarine color produced by different paint companies, it was seen that the prediction would have had 100% success like the other pigments.
This was clearly shown by the fact that it classified the Egyptian blue and ultramarine colors, which were close to each other in the works of art, because Egyptian blue is the first synthetic paint produced by the Ancient Egyptians to obtain a color similar to lapis lazuli rock. Ultramarine is a pigment obtained from lapis lazuli rock. The fact that the model distinguished these two pigments, which have different chemical structures but similar colors, showed that deep learning models are extremely successful. The findings obtained as a result of the test are directly proportional to the model code and the McNemar test, and the paints in the works of art could be classified non-destructively through photographs with a 99% success rate.
5. Conclusions
Cultural assets carry value belonging to humanity’s past. Today, destructive and non-destructive analytical methods are used to understand the technology of this value. Since destructive methods require the examination of a cultural asset under laboratory conditions, they require taking samples from the work. Since these analyses are subject to legal permission, they make it difficult for art historians and archaeologists.
Non-destructive analytical methods provide the opportunity to work in the field. However, since these analyses are limited to the area that the analyst can reach, this method provides ease of analysis for movable cultural assets. When both methods are evaluated with their advantages and disadvantages, the cost becomes negative for cultural heritage workers. This study, which was prepared to find solutions to the problems experienced in the national and international arenas, was aimed to develop a non-destructive, cost-minimizing and easy-to-use analytical method for cultural assets.
For this purpose, historical paints were produced by different painting techniques in this study. Artificial intelligence was trained by taking photographs of the produced paints. In this way, the paints used in the historical artifacts could be examined easily and non-destructively in the created system without requiring any cost by taking photographs. Since the aim of this article was to develop a new non-destructive method with artificial intelligence, the materials were prepared according to a preliminary study. Therefore, they were limited to four main colors. These four main colors were red and yellow ochre, green earth, Egyptian blue, and ultramarine blue because these are the most commonly used paints in history. Two types of pigments in blue were used to see how a CNN performed in pigment classification. Pigments were used in fresco-secco, tempera, tempera grassa, oil paintings and watercolor techniques. The reason for using different binders in the paint was that they reflected light differently. The produced paints were photographed under natural and artificial light, at different light intensities, and converted to a 256 × 256 pixel size, and then the CNN was trained on SVM, Res-Net50, DenseNet and VGG19 models. In the trained models, the SVM and CNN showed 97% accuracy, the VGG19 99% accuracy, and the ResNet50 and DenseNet 100% accuracy. The performance of the models was examined with the McNemar test. With this examination, the most successful model was determined as ResNet50. After ResNet50, this order was followed by DenseNet, VGG19, the SVM and the CNN. The results obtained in the McNemar test had similar success with the code result. The trained VGG19 model was asked whether it could classify the paints used in archaeological and artistic works analyzed with instrumental methods in the literature with their real identities. The VGG19 test was able to classify the paints in the works of art in the photographs with a 99% success rate.
Also during the VGG19 test, ultramarine blue produced by different companies was predicted as Egyptian blue. The ultramarine blue in the training set did not have the same blue tone as paints from different companies. The closest color to this was Egyptian blue. Although it is normal for a model to predict a blue color as Egyptian blue due to the absence of this color in a training set, this result clearly showed that the model could be trained with paints produced by different companies., because paints are produced with different raw materials or different chemical processes by companies. For example, although celadonite and glauconite minerals are found in green soil, the percentages of major elements vary regionally. Therefore, the same pigment found in France and Russia differs in tone.
This article investigated whether pigment photographs could be classified using artificial intelligence. Therefore, this paper is a preliminary study. However, the 99% success rate of the system is promising as it provides a new non-destructive analytical method for the cultural heritage field. The authors believe that it would be useful to conduct the following studies in the future to achieve 100% results and aim to carry their work further:
Using pigments from different regions;
Using pigments produced by different companies;
Using paints produced by different companies and their variations (dark, light, dark, etc.);
Increasing the variety of binders (for example, boiled linseed oil, etc.);
Using varnish varieties in samples;
Creating a tone scale with different chemical white and black pigment colors (including white and black paints produced by different companies);
Aging produced paints in climate chambers.
With this research, it was understood that information could be obtained about the paints used in cultural assets using artificial intelligence. Since the model training was conducted through photographs, each paint in the works of art could be evaluated quickly and practically without requiring analytical costs. This evaluation is valid for paints where a single tone is dominant. It did not cover more than one paint mixture. For example, color transitions used by a painter on facial tone were not included in the scope of the research.
In addition, the model did not provide information about the chemical content of the paint. It only estimated the paint used in the picture with a 99% accuracy rate. For this reason, it is important to note that spectroscopic methods are needed for information such as trace elements that show from which region paint comes or which region the artist used the paint. However, if the system is trained with data (photographs, spectra, etc.) from different civilizations, civilizations, artists and craftsmen, it has the potential to be used in originality studies.
In the field of paint science, a system can be trained with different pigment types and transformed into an interfaced application. This application can be used by conservators-restorers, archaeologists, art historians, museum curators, archaeometrists and material scientists in the field. For this reason, in future studies, the dataset used here will be expanded with material diversity and its continuity will be ensured in different projects.