1. Introduction
The rapid growth of digital images that could be retrieved from different sources has created a strong demand for efficient object localization and detection methods in various fields, such as medicine, military, manufacturing, law enforcement, and forensics. There is much research in the scientific literature that focuses on the detection of different diseases by analyzing images using various object detection or localization algorithms. For example, the research by Jucevičius et al. [
1] and Verbukaitė et al. [
2] focuses on the analysis of the image of medicine in which the images of glaucoma and prostate cancer images have been analyzed. In the research of Gupta et al. [
3], the image data obtained from an uncrewed aerial vehicle were analyzed. The main aim of the research was to detect military vehicles. The Industry 4.0 context has led to many intelligences or automated solutions based on object detection as well. The research by Usamentiaga et al. [
4] used deep learning algorithms to detect product defects in manufacturing. The other research by Li et al. [
5] analyzed metal surface defects using object detection methods, where the results are useful and can be applied to improve manufacturing lines. Object detection and localization tasks are very popular and could be applied in different solutions, for example, for the detection of construction details [
6,
7] or as a component of travel direction recommendation systems [
8].
In law enforcement and forensics, different types of hard biometric data can be used to identify people in the image, such as the iris pattern, facial features, or fingerprint. Despite significant advances in hard biometric identification, a single biometric characteristic cannot guarantee the desired identification accuracy. Characteristics of people who are less unique compared to traditional biometric data are called soft biometric data. As one of the soft biometric features, tattoos are valuable in helping people identify associations, groups, members, gangs, criminals, or victims. Tattoos are considered soft biometrics because, over time, the tattoo on the human body may change, compared to hard biometric characteristics such as fingerprints or iris [
9]. However, the accuracy of automatic tattoo identification and detection is challenged by a wide range of artistic compositions, colors, shapes, textures, image conditions, and quality [
10]. Therefore, it is more difficult to choose the right model to solve this problem. The concept of tattoo detection and identification using deep learning involves the use of machine learning techniques, specifically deep learning, to identify and locate tattoos on the human body, as well as object detection methods. This is a relatively new area of research, as tattoos have traditionally been difficult to analyze and classify using automated techniques due to their complex and highly variable visual appearance. In the context of tattoo detection and recognition, deep learning techniques can be used to analyze images of the human body and identify the presence and location of tattoos. This can be useful in a variety of applications, including law enforcement, forensic analysis, and medical research. In general, the use of deep learning for tattoo detection and recognition represents a significant advance in the field of machine learning and has the potential to revolutionize the way tattoos are analyzed and understood.
The main aim of this paper is to detect tattoos on a person’s body and then link them with the data available in the database to identify to whom it belongs. An experimental investigation has been performed to find out the influence of various hyperparameters of YOLOv5, data augmentation, and similarity distances on tattoo detection and identification. The main contributions of the paper are as follows:
- (1)
A total of 135 models have been trained to detect which model of YOLOv5 (n, s, m, l, x) allows obtaining the highest results in tattoo detection. The different combinations of hyperparameters, such as learning rate, momentum, and decay weight, were investigated;
- (2)
The influence of the data enhancement parameters on the final results of tattoo detection has been investigated. There is a lack of this kind of research, especially in the context of the tattoo dataset;
- (3)
The efficiency of the YOLOv5 algorithm and similarity distances combination have been experimentally investigated to detect tattoos on the person’s body and link them to the database of tattoos.
The results of this research may be useful for law enforcement and the field of forensics, as well as for other researchers who focus on object detection tasks. During the research, a large number of parameter combinations were used and five different size YOLOv5 models (n, s, m, l, x) were thoroughly investigated.
The remainder of the paper is organized as follows. In
Section 2, related works are reviewed. In
Section 3, the background of the experimental investigation is presented. The YOLOv5 algorithm was introduced for tattoo detection. Also, similarity distances have been described that have been used for the person identification process.
Section 4 describes the main steps of the tattoo detection and identification process. The experimental investigation of the data augmentation and selection of hyperparameters for YOLOv5 was presented. The limitation of the research performed was discussed in
Section 5.
Section 6 concludes the paper.
2. Related Works
The literature analysis performed has shown that one of the first tattoo identification forensics was the keyword-based matching method. Law enforcement authorities followed the ANSI/NIST-ITL 1-2011 standard, which defines eight major classes (human, animal, plant, flag, object, abstract, symbol, and other) and a total of 70 subclasses (including male face, cat, narcotics, American flag, fire, figure, national symbols, and wording) to categorize tattoos [
11] to assign a single keyword to the tattoo image in the database. However, as Jain et al. [
12] explain in their paper, in practice, searching for tattoo images based on keywords has several limitations: (1) the ANSI/NIST classes define a limited vocabulary that is not sufficient to describe different tattoo patterns; (2) several keywords may be required to accurately describe the tattoo image; (3) human annotation is subjective, meaning that different people can give quite different labels to the same tattoo. These deficiencies in the keyword-based tattoo image search system have led to the development of a Content-Based Image Recovery System (CBIR) to improve the efficiency and accuracy of tattoo search. To overcome the limitations of keyword-based tattoo matching, Jain et al. [
13] proposed the CBIR called Tattoo-ID to match tattoos using the image-to-image method. Tattoo-ID extracts key points from tattoo images with scale-invariant feature transform (SIFT) (Lowe) and uses an unsupervised ensemble ranking algorithm to measure visual similarities between two tattoos [
14].
A brief review of the literature on tattoo detection and identification is presented in
Table 1. Research, where the main aim was tattoo detection, was usually motivated by forensic applications aimed at building tattoo-content-based image search systems to help law enforcement. Therefore, Han and Jain [
15] proposed a system in which a cropped tattoo is segmented, represented by color, shape, and texture characteristics, and matched to the database. Duangphasuk and Kurutach [
16] proposed an approach to the detection and segmentation of tattoo skin using image-negative methods in pre-processing to improve the retrieval and matching of tattoo images. The first step in this process was skin detection. The authors used various skin patches to perform the tasks of separating human skin color using the HSV model (hue, saturation, and value (or lightning)) model. In the second step, the negative image method was used to detect clear graphical images of the tattoo. In the third step, they extract the tattoo segment from the skin area of the negative image, and, as a result, negative images of the tattoos are obtained and can be used for further identification.
The Bag-of-Words (BoW) model, which uses SIFT functions, was probably the most popular in the early CBIR system for tattoo search [
19]. In addition to SIFT features, local binary patterns (LBP) and histograms of oriented gradients (HoG) features were also used in the research by Wilber et al. [
17] and Heflin et al. [
24] with support vector machine (SVM) and random forest classifiers for tattoo classification. Although these CBIR systems have been reported to provide quite high accuracy on various benchmarks, they require careful manipulation of characteristic descriptors, vocabulary sizes, and indexing algorithms. The success of deep learning has led to the point where CBIR’s methods are shifting from handcrafted features and models to deep learning methods. The AlexNet method has been successfully used for tattoo vs. non-tattoo classification in the work of Di and Patel [
18].
Sun et al. [
10] also focused on the tasks of tattoo image detection and localization. The authors developed TATT-RBDL, a tattoo detector that can classify images with one or more tattoos. Then, the region-based deep learning method (Faster R-CNN) was applied to the domain-specific data, and a tattoo detector was trained using two datasets, one with tattoo images and one with non-tattoo images. Xu and Kong [
22] presented another decision tree-based approach. It achieved only 53.38% accuracy in its own dataset, resulting in a less expressive result. Recently, Han et al. [
20] have also presented a detection model using faster R-CNN. This model classified detection problems as examples of image recovery systems in which learning and detection were performed simultaneously. Another basic tattoo detection method [
23] uses a GraphCut-based method, with an accuracy of 70.5%. Silva and Lopes [
9] presented a deep learning model based on transfer learning for tattoo detection problems.
Due to the wide diversity of tattoo types and the lack of image capture standards, the datasets may be quite different. Therefore, it is difficult or even unreasonable to compare the results. Since the datasets are real-world samples, it is especially important for the efficiency of machine learning methods that the datasets that are used reflect the same diversity. In addition, for multiclass datasets, class samples are another prominent issue, as classifiers tend to be strongly biased. Unfortunately, many real-world datasets do not follow this principle. For example, the Tatt-C dataset presented in the previous table, which is widely used in the literature, consists of images of faces in non-tattoo classes. This may prejudice a classified trained with this dataset, which may acquire a false concept that images without tattoos are face-type images. Nevertheless, this database was the most used database for tattoo studies to date [
9]. The first results on the Tatt-C dataset were published in response to the challenges issued by NIST [
26]. According to the report by Ngan et al. [
26], four institutions participated in this challenge, with MorphoTrek as the best performance, with 96.3% accuracy. Unfortunately, the algorithms developed by the participants in the NIST challenge have not been published. This was criticized by Qingyong Xu et al. [
21] in their work, and it emerged that it was impossible to perform external validation tests.
Although the concept of tattoo detection and identification is theoretically uncomplicated, the process is not simple and depends on various factors. There are no defined standards of what tattoos are in terms of shape, color, size, proportion of individuals, and their location on the body. Additionally, a single image may have several tattoos. Furthermore, the background of an image can introduce significant noise into the detection process because its complexity can be confused with the tattoo itself. Furthermore, it is difficult to compare different studies due to differences in the test procedures, metrics, and datasets used. There are relatively few publications dealing with tattoo detection and identification problems using deep learning. It should also be noted that most previous tattoo studies were based on the NIST Tatt-C dataset, which was discontinued over time and is no longer available for download and use. The lack of standardized datasets for the detection of tattoos is one of the problems in this field of research. Methods such as Faster R-CNN, RetinaNet, YOLO, and SSD, coupled with feature extraction models such as VGG, ResNet, Siamese Networks, and Triplet Networks, collectively contribute to the intricate landscape of tattoo detection and identification [
27]. Furthermore, a diverse set of similarity measures, including Euclidean distance, cosine similarity, and others, form a versatile toolkit for evaluating similarities across various data types.
3. Background of the Experimental Investigation
The review of related works has shown (
Table 1) that various deep learning-based algorithms can be used for tattoo detection or localization tasks. In our research, the focus is on a real-time object detection task, so CNN and R-CNN methods are not entirely appropriate. When comparing real-time object detector algorithms YOLO, SSD, and RetinaNet, it should be noted that SSD has low accuracy compared to other alternatives. Furthermore, RetinaNet exhibits better accuracy compared to YOLO or SSD, but lower efficiency for real-time object detection due to its high computational cost. Although YOLOv5 continues to be a widely acclaimed real-time object detection algorithm, boasting improved accuracy over its predecessors and retaining the ability to identify even diminutive objects. So, based on related works, the YOLOv5 object detection algorithm was used in the experimental investigation for tattoo detection. The pre-trained models of YOLOv5 were used as a base [
28] (
Table 2).
During the experimental investigation, all models were trained in an environment with the following specifications: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 Threads, 10 Cores). The environment had a Linux operating system with 32 GB DDR4 RAM and GPU, called Tesla P100 PCIe 12GB.
After the tattoo has been detected, the person’s identification in the database based on similarity distance has been analyzed. In this investigation, two similarity distances were analyzed: cosine and Euclidean distances. Suppose that we have two images: the tattoo detected in image
, and the tattoo from the database
, where
is the dimensionality of the vector corresponding to each datum. In this case, the Euclidean similarity distance can be calculated (1).
The Euclidean distance is the distance between two data points in Euclidean space. In the context of data analysis, it is often used to find the dissimilarity or similarity between data. Smaller values of the Euclidean distance indicate greater similarity. The cosine similarity distance (2) is useful in the analysis of high-dimensional data, such as in the detection of pairwise similarities in images. The value of cosine similarity indicates the cosine of the angle between two vectors in a multidimensional space.
4. The Experimental Investigation
The idea of automated person identification based on their tattoo is divided into five steps:
Preparation of the tattoo dataset;
The experiments performed on the influence of data augmentation parameters;
The experiments performed on the influence of hyperparameters of YOLOv5;
Preparation of the identification dataset;
Estimation of similarity threshold.
All five steps and the principles of it are presented in
Figure 1 and are detailed in further sections.
4.1. Preparation of Tattoo Dataset
The analysis of related works has shown that the availability of publicly available tattoo datasets is not very high. In some of the studies, several authors have created their datasets, but the datasets are not publicly available. Additionally, as mentioned, the Tatt-C dataset was frequently used in related studies, but this dataset has been discontinued and can no longer be downloaded. Other equally popular datasets, such as the NTU Tattoo dataset and WebTattoo, require a special agreement to be used for research purposes, which has not been obtained. Therefore, in this paper, only one publicly available deMSI (Hrkać et al., 2016 [
27]) dataset was chosen for model training. The sample of the dataset is presented in
Figure 2.
To facilitate the development and testing of their proposed method, the authors have assembled their dataset by collecting and manually labeling 450 tattoo images from the ImageNet database. Each of the collected images contains one or more tattoos. The authors of this dataset used the ConvNet model in their research and, therefore, annotated each tattoo using a series of connected line segments. However, in the case of this study, such an annotation will not be suitable for our research because the chosen object detection model for tattoo detection is YOLOv5. The chosen model accepts bounding box annotations for each object, where each object in an image is surrounded by a rectangular box that can be described by the coordinates of its top-left corner and its width and height. Therefore, the images in the dataset have been manually annotated. In this study, a dataset was created that contained a total of 1000 images.
4.2. The Experiments Performed on the Influence of Data Augmentation Parameters
As mentioned, the results of tattoo detection can depend on various parameters. First of all, the influence of data augmentation parameters has been analyzed using the pre-trained YOLOv5l model, as according to the related work analysis performed, for such types of tasks, it is the most suitable model. The results of the experiment are presented in the table below (
Table 3).
All augmentation variations were trained on the deMSI dataset with 300 epochs. On the basis of the provided experimental results, some considerations were made. Crop augmentation allowed us to obtain the highest
[email protected] (0.82) and good precision and recall scores. It would be an effective option in this study. When talking about balanced precision and recall, hue augmentation showed a good balance between precision and recall, with a decent
[email protected] score (0.794). So, this augmentation should also be considered. In addition, computational efficiency must be considered: flip, 90-degree rotation, and blur. These enhancements showed a good balance between precision and recall with moderate
[email protected] scores. They may be computationally efficient. Also, it should be mentioned that it is a good practice to avoid low-performing augmentations; in this case, it would be grayscale and rotation. These augmentations had lower
[email protected] scores. Depending on priorities, it might be good to consider excluding them from the final augmentation strategy. To achieve even higher results, it was decided to try to combine multiple augmentations (
Table 4).
Taking into account the metrics
[email protected] and
[email protected]:0.95 metrics, group 4 (90° rotation, shear, and blur) appears to have the best overall performance, with the highest values of
[email protected] and
[email protected]:0.95 values. Therefore, group 4 was chosen as the best-performing augmentation strategy. Based on the results of the experiments provided, the following pre-processing and augmentation steps were applied in this study.
Resize. The images were resized to a uniform dimension of 320 × 320 pixels, which is a common choice for the YOLOv5 models. This step not only standardizes the input size but also enhances training efficiency;
90° Rotate. The images were rotated 90 degrees during the augmentation process. This rotation can help the model become more robust to object orientations in the training data. It introduces variations in the orientation of objects, making the model more versatile;
Shear. Shearing involves shifting one part of an image in a certain direction, creating a “tilted” effect. In this study, shear was applied horizontally and vertically in a range of ±15°. Shearing introduces distortions that can improve the model’s ability to recognize objects from different perspectives;
Blur. A blur filter was applied to the images, with a maximum blur of up to 2 pixels. Blur helps simulate real-world conditions where images may not be perfectly sharp. It can prevent the model from relying too heavily on fine details and encourage a more generalized understanding of the objects in the images.
Collectively, these pre-processing and augmentation techniques aim to increase the robustness and ability of the model to handle tattoo detection in a wide range of real-world scenarios.
4.3. The Experiments Performed on the Influence of Hyperparameters of YOLOv5
To find out which size of the YOLOv5 model (n, s, m, l, x) and which hyperparameters allow to obtain the highest tattoo detection results, an additional experiment has been performed using the data augmentation options from previous experiment results. Related works have shown that, in various investigations, usually only three hyperparameters are changed to improve the results: learning rate, momentum, and weight decay. Therefore, in this investigation, the combination of five YOLOv5 models and three hyperparameters has been analyzed. The hyperparameters changed in this way:
- ▪
learning rate: 0.01; 0.001; 0.0001;
- ▪
momentum: 0.9; 0.935; 0.95;
- ▪
weight decay: 0.0001; 0.0005; 0.0007.
In this way, a total of 135 models were trained and tested (26 models for each size of the YOLOv5). The other parameters have been chosen considering the primary research performed and have not been changed during the training of all 135 models. This ensured the same condition during the experimental investigations in this research. The fixed parameters of YOLOv5 are as follows:
- ▪
image size: 320 × 320;
- ▪
batch size: 32;
- ▪
number of epochs: 300;
- ▪
optimizer: SGD.
In
Table 5, the average results for each size of the YOLOv5 model are presented. As we can see (
Table 5), the highest averaged precision results are obtained by YOLOv5m (0.87). Slightly small results are obtained by YOLOv5l (0.86), YOLOv5s (0.85), and YOLOv5x (0.81). The smallest averaged precision is obtained by YOLOv5n, which is equal to 0.67. All estimation measures are significantly smaller using YOLOv5n compared to other sizes of the models. The highest average recall value is obtained by the YOLOv5l model (0.70). In the case of mAP values, slightly better results are also obtained using the YOLOv5l model, where
[email protected] is equal to 0.79 and
[email protected]:0.95 is equal to 0.60. In
Table 6, the standard deviation for each size of the YOLOv5 model is presented. As we can see, the deviation is not high, so it means that there is no very high influence on which hyperparameters will be used to train the tattoo detection model.
Summarizing the results of the experimental investigation on the influence of hyperparameters, the YOLOv5l model has been chosen as the basis for tattoo detection. The highest values of
[email protected] (0.82) and mAP0.5:0.95 (0.63) have been obtained using such hyperparameters of YOLOv5l: the learning rate is set at 0.001; momentum is set at 0.95; the weight decay is set at 0.0001. In this case, the precision is equal to 0.87, and the recall is 0.75. In addition, in
Figure 3, the graphs of precision, recall,
[email protected], and
[email protected]:0.95 are presented.
4.4. Preparation of Identification Dataset
To correspond to real conditions, an additional dataset was constructed for identity estimation. Twelve persons were asked to provide at least 6 photos of each of their tattoos. The tattoos had to be taken under good conditions, when the full tattoo is visible, as well as in lower quality when only part of the tattoo is visible, it is partially covered, etc. An ID was assigned to each tattoo. The best photo of the tattoo was selected as a reference model, while other photos of the tattoo were added to the suspect dataset. Additionally, 10 random unused photos from the deMSI dataset were added to the reference dataset and 44 to the suspect dataset. This allows us to reflect on situations when no reference tattoo exists for the analyzed one. There were 43 reference photos and 209 suspect photos.
Using the best model for tattoo detection, both datasets were processed to obtain only the cropped version of the localized tattoo in each photo. After the detection, the reference dataset contained 49 photos, while the suspect dataset contained 245 photos (167 of the tattoos in the reference dataset and 78 not listed in the reference dataset). The increase was affected by the fact that in some photos multiple areas were detected. Sometimes, one tattoo was divided into several parts. In other cases, non-tattoo areas were localized as tattoos.
The suspect dataset was left as it was because this part will have to be performed completely automatically. Meanwhile, the reference dataset was manually revised to leave only photos of good quality. During the revision, redundant or not full photos were eliminated, leaving only 39 reference tattoos, 1 photo for each of the tattoos.
4.5. Estimation of the Similarity Threshold
For further analysis, each photo was resized to a dimension of 224 × 224 px and pre-processed for ResNet50 suitable feature extraction. Each photo is represented as a (7, 7, 2048) dimension output, where, after flattening, it contains 100,352 values for comparison. This vector was used to estimate the similarity between each suspect and each reference photo. For similarity estimation two most often used similarity methods were used: cosine and Euclidean similarity. Usually, F-score and accuracy metrics are used to define the best model. Using cosine similarity, the threshold value should be 0.45–0.5 to achieve a 48% F-score and 99% accuracy (
Figure 4, left chart). For Euclidean distance, the F-score and accuracy optimal threshold value would be 525 and would allow us to achieve a 0.46% F-score and 99% accuracy (
Figure 4, right chart).
However, for suspect linking to reference photo task, accuracy and F-score are not the best measurements as the automation should work as decision support and workload reduction, but not a human replacement solution. The final results will have to be verified by a person in any case to ensure that no false results are provided. Therefore, False-positive (FP) and False-negative (FN) values are important.
Table 7 presents the threshold values under which tattoo identification achieves 100% recall or precision. Accordingly, it provides numbers that indicate how much workload could be reduced in the case of suspect photos and comparisons. Suspect photos define the suspect photos, which do not require any revision, as all similarity scores to reference tattoos are below or above the threshold value and, therefore, will for sure not be linked to any of the reference photos. Meanwhile, comparison reduction indicates that some of the threshold values do not meet the interval; therefore, the suspect photo has to be compared not to all but just to some reference tattoos. The result indicates that cosine similarity is better for recall assurance situations as it allows a reduction of 2% of photos and 20% of comparisons. Meanwhile, if the task is oriented toward precision assurance, it is better to use the Euclidean distance. The difference to cosine similarity is not very high, but under the same 100% precision, it provides a lower False negative, higher True-positive, and the same True-negative rate.
The zero reduction in 100% precision-oriented task for comparisons indicates that the similarity of the suspect photo was the highest among all reference tattoos; therefore, only the True-positive values were left as candidates. If the models were adjusted to take not the threshold value with multiple candidates but to link each photo with the highest similarity/lowest distance reference tattoo, the false positive ratio would increase automatically as not all suspect photos have a reference. Meanwhile, for the cosine similarity case with the highest F-score, the F-score increases from 48% to 52%, while under those conditions, the precision decreases to 42% from 56%, the recall increases from 42% to 65%, and accuracy remains 98%. For Euclidean distance, the accuracy decreases from 99% to 98%, precision from 82% to 41%, recall increases from 32% to 60%, while the F-score remains 46% (
Figure 5).
5. Discussion
In this paper, a solution for automated tattoo detection and identification was implemented. The experimental investigation carried out in this research has focused on different types (n, s, n, l, x) of YOLOv5 models, YOLOv5 hyperparameters, data augmentation parameters, and similarity distances used in the identification stage. The newest versions of YOLO have not been analyzed because these algorithms have not been officially released and could be found in the public repository without any scientific investigation. Therefore, in this research, the most popular YOLOv5 version has been used today. Taking into account the research performed by other authors, some parameters have not been investigated due to the cost of time for each model training. A total of 135 models have been trained to find the influence of three main hyperparameters on the results of tattoo detection. It is necessary to admit that a complete investigation using more combinations has not been performed. Even with these research limitations, the results of this research have shown that for such a type of object detection task, the most suitable models are YOLOv5l.
The chosen similarity distances used in the identification stage are the most used distances in various clustering algorithms, similarity detection tasks, recommendation systems, etc. Related works have shown that similarity distances, such as Jaccard, Spearman correlation, Manhattan, and others, can also be used for identification, but primary research has shown that in our case, it was not suitable.
6. Conclusions
In this paper, a solution for automated tattoo detection and identification was implemented. This task is multi-stage as it requires both tattoo detection, its bounds estimation, as well as comparison to the reference tattoo. Such a solution, when the second stage is implemented based on similarity rather than the trained model, has the advantage of the easy extension of the reference dataset; there is no need to retrain the model for added reference tattoos.
After investigation of photo augmentation and the impact of the YOLOv5 hyperparameter on tattoo detection, the highest values of
[email protected] (0.82) and mAP0.5:0.95 (0.63) have been obtained. Those were obtained using the YOLOv5l model, with the learning rate set at 0.001, momentum set at 0.95, and the weight decay set at 0.0001, while the photos were augmented using 90° rotation, shear, and blur options. This model was not only able to achieve the best detection but led to the highest recall. This is important as it is better to have a bigger set of instances for the next stage rather than miss some tattoos or its elements.
In the similarity estimation between tattoos, the highest accuracy in linking the tattoo photo with one reference tattoo with the highest similarity score reached 98%, while the F-score is up to 52%. This would not be an acceptable accuracy for criminal identification or similar tasks. However, the similarity score can be used for the reduction of manual work revising the possible candidates. By applying cosine similarity, all cases where similarity is less than the threshold value of 0.15 can be ignored. This would decrease the workload by 20% while no False-negative cases would be skipped.
The results obtained during the experimental investigation have shown that tattoo detection and identification tasks require larger models than YOLOv5n. Additionally, the learning rate, momentum, and decay weight parameters have not influenced the results too much. Considering the possible implementation of the models obtained in the real environment, such as real-time detection systems, the most suitable are YOLOv5m and YOLOv5l. The training time of these models is lower compared to that of the YOLOv5x; therefore, it would be easy to retrain the models using more tattoo images and improve the quality of tattoo detection. In the future, the newest versions of YOLO could be trained and tested under the same conditions to see how it influences the results of tattoo detection and identification.