1. Introduction
The demand for livestock products, including meat and dairy, is experiencing nearly exponential growth due to the expanding global population and increased affordability of these commodities. To enhance productivity, minimize resource wastage, promote animal welfare, and facilitate sustainable and efficient livestock farming practices, the precision livestock farming approach has been developed (
https://geopard.tech/blog/precision-livestock-farming-technologies-benefits-and-risks/ (accessed on 16 November 2023)). In the context of precision livestock farming management, the identification of individual cattle assumes a pivotal role, encompassing tasks such as health monitoring [
1], reproduction management [
2], behavior research [
3,
4,
5], and individual cattle’s performance tracking [
6]. The widely adopted cattle’s identification technology, Radio Frequency Identification (RFID) [
7,
8], faces various challenges in practical use, including a limited recognition area, instances of collisions, and the potential for tag duplication and loss. Consequently, the labor costs associated with the application of RFID technology are notably high. As a result, the use of RFID is gradually revealing inefficiencies and cost-related drawbacks. In recent times, the advent of deep-learning-based approaches for cattle identification tasks [
9,
10,
11] has led to notable advancements, driven by their low-cost, non-invasive, and stress-free nature, and efficacy.
Nonetheless, the application of deep-learning-based methods for automated cattle’s face recognition encounters numerous challenges in real-world farm settings, such as issues related to varying illumination, weather conditions, overlapping objects, and fluctuations in cattle’s face orientations. While collecting data during cattle’s feeding, multiple cattle often share the same feeding trough, resulting in overlapping cattle’s faces. Additionally, as cattle move their heads while eating, only partial images of their faces are captured. Moreover, cattle assume various postures and orientations during feeding, leading to significant variations in the angles. Furthermore, since the data are obtained during three distinct feeding times, the same cattle may feed at different troughs, causing variations in the distances between the cattle and the cameras. As a result, the collected data may also feature differences in their characteristics, including variations in head size. These factors present substantial challenges for accurately identifying cattle’s faces. Some examples of the challenges addressed in this study can be illustrated in the dataset collection subsection. Furthermore, these cattle recognition tasks primarily function under the closed-set assumption, which typically implies that the source (training) and target (testing) datasets share the same classes (known classes). This closed-set assumption, while suitable for some scenarios, encounters limitations in real-world farm applications, particularly concerning the emergence and re-identification of new cattle (unknown) within the herd.
To tackle these challenges and overcome the limitations of the closed-set assumption, we embraced a more-adaptable open-set [
12] perspective for recognizing known and unknown individual cattle, relying on highly distinguishable cattle’s facial features, termed Cattle’s Face Open-Set Recognition (CFOSR). At the same time, we enhanced the performance of the closed-set classification task by integrating open-set techniques to obtain a good classifier. In Open-Set Recognition (OSR), the training phase involved known individuals, while the test phase accommodated the presence of new (unknown) individuals. Specifically, the model, having been trained solely on known individuals, was expected to accurately identify recognized individual cattle or effectively distinguish previously unseen individuals during the testing stage. As an instance, consider the scenario where a new individual is introduced to the barn. In contrast, in the context of closed-set identification, an unknown individual might erroneously be categorized as one of the known individuals, thereby impeding the accurate monitoring and tracking of individual cattle within real-world farm settings. For instance, this issue becomes particularly pronounced when unidentified cattle carry infectious diseases; the failure to promptly recognize them and institute appropriate measures can potentially inflict significant harm on the entire cattle herd.
A prominent challenge in OSR pertains to the absence of unknown individuals during the training phase, thereby confining the model’s learning solely to known individuals’ information [
13]. Moreover, the complexity of CFOSR surpasses that of conventional OSR in the realm of computer vision, predominantly due to the subtle disparities observed among most cattle’s faces. In contrast, prevalent computer vision datasets utilized to evaluate OSR tasks [
12,
14,
15], such as TinyImageNet and CIFAR10, encompass semantically distinct known and unknown classes, such as cats and dogs. In the context of CFOSR, where Hanwoo cattle’s faces have a similar appearance, to obtain a good open-set classifier, a prevailing approach involves cultivating a more-compressed feature space for known individuals, thereby affording greater expansiveness to unknown individuals [
16,
17]. The whole architecture is shown in
Figure 1.
To address the challenges in CFOSR, on one front, we opted for a straightforward, yet impactful loss function known as the Additive Margin Softmax loss (AM-Softmax) [
18]. This choice amplifies the separability of distinct individual features, while concurrently compacting the distance between the features of the same individual. This, in turn, facilitates the provision of additional feature spaces for accommodating unknown individuals. On the other hand, we harnessed the potential of the distance-based Adversarial Reciprocal Point Learning (ARPL) loss to curtail the overlap between the known and unknown distributions. Specifically, the reciprocal point for each known class was derived within an extra-class space, followed by the imposition of an adversarial margin constraint, which confined the extent of the latent open space established by these reciprocal points [
17].
Prior investigations [
15] posited that a good closed-set classifier can offer valuable support for open-set recognition tasks. Within this research, we harnessed transfer learning to acquire an adept closed-set classifier, a strategic move that notably enhanced the performance in the subsequent tasks [
19,
20]. We opted for a Vision Transformer (ViT) model, pretrained on the ImageNet21K dataset, instead of the more-conventionally employed ImageNet1K dataset. Moreover, we harnessed a ViT model that was pretrained on the plant-relevant dataset PlantCLEF2022 [
21]. Interestingly, we observed that the datasets focused on plants also contributed to enhancing the accuracy of the cattle’s face recognition. Additionally, we employed transfer learning with a pretrained ResNet50 model sourced from the ImageNet1K dataset.
The remainder of this paper is organized as follows: In
Section 2, we formally define the cattle’s face open-set recognition, introduce our dataset, and provide detailed insights into the proposed method.
Section 3 presents the implementation details and experimental results, showcasing the performance of our model and highlighting the significance of our findings. In
Section 4, we outline the limitations and highlight key contributions. Finally, in
Section 5, we conclude the paper by summarizing the key techniques and suggesting avenues for future research.
2. Materials and Methods
2.1. Problem Definition
In this research, our goal was to develop a robust classifier for Cattle’s Face Open-Set Recognition (CFOSR) using a real-world farm dataset. This subsection is devoted to formally defining the open-set recognition, specifically CFOSR. Let denote the training dataset, where represents the input image space and signifies the corresponding label space. Here, N corresponds to the total inputs in the training dataset. Similarly, let characterize the test dataset, with M denoting its overall inputs. Operating under the Closed-Set Assumption (CSA), both the training and test datasets share a common label space, . However, in real-world testing scenarios, novel individual cattle may emerge, a situation that poses a risk when current methods classify the new cattle among the known cattle in . Consequently, there arises a desire to extend the closed-set paradigm to embrace the open-set realm.
Mathematically, the test dataset in CFOSR is expressed as , where pertains to the domain of unknown or new individual cattle. A well-trained model is mandated to adeptly categorize a testing image, assigning it to either the known individuals in or the unknown individual .
2.2. Dataset Collection and Preprocessing
Dataset collection: This study solely utilized video data. No physical experiments or intrusive devices that could disrupt the animals’ normal conditions were employed in our research. The dataset comprises video recordings obtained from the “Baksagol” private Hanwoo cattle farm located in Imsil, South Korea. This farm is situated in a temperate climate with distinct seasons, experiencing cold, dry winters and hot, humid summers in South Korea. Spring and autumn are relatively brief, offering mild and generally pleasant temperatures. The animal housing facility was designed with semi-open compartments, allowing for external air ventilation. Additionally, each compartment is equipped with indoor ventilators. The floor is covered with sawdust on a basic concrete foundation, while the ceiling consists of opaque Styrofoam steel sheets in some areas and transparent polycarbonate in others.
The experimental barn had a size of 30 × 12 m and housed 21 cattle, ranging in age from 1 to 7 y, including 3 calves. To capture continuous video data for face recognition, three Hikvision (HIK) surveillance camera devices with 4 K resolution (3840 × 2160) were installed in the barn facing the longest area of the barn with a clear view of the animal faces.
Figure 2 illustrates the corresponding camera setup. It is important to highlight that we deployed these three cameras at various locations to capture the data during three distinct feeding periods: morning, noon, and night, as illustrated in
Figure 3. To standardize the data, we extracted image frames from the three video cameras at a rate of 15 frames per second (fps). We employed the upper left and lower right coordinates to effectively label the cattle’s face data, distinctly indicating the face’s position, as depicted in the lowermost row of
Figure 3. Furthermore, we tracked and annotated each of the cattle based on video data to determine their exact location.
Preprocessing: For the collection of cattle’s face datasets, precise annotations of the bounding boxes were applied to accurately pinpoint the cattle’s faces within the original images. Subsequently, the face images were obtained through cropping based on these annotations. It is important to note that this aspect of the work was not our primary focus; we extend our gratitude to our research collaborator for providing the cattle-to-face image datasets. Our dataset encompasses a span of three days, during which we captured face images across three distinct feeding instances. To enhance the model training, we merged the images from all three feeding times for a given day, as illustrated in the parity examples displayed in
Figure 4. As shown in the figure, it is clear that each of the cattle’s faces exhibited multiple angles and orientations, and some images were taken in foggy conditions, which added complexity to the dataset. Notably, the figure also illustrates that distinct cattle often share remarkably similar facial features, significantly augmenting the challenge of accurate cattle identification. In our experimental setup, images from one day were allocated as the training dataset, while the other two days’ images were the testing datasets (Testing Dataset 1 and Testing Dataset 2). In Testing Dataset 2, there are relatively few instances of overlapping images of two cattle and a few images containing only a tiny portion of the cattle’s faces. Further details regarding the number of images corresponding to each of the individual cattle in both the training and testing datasets can be found in
Table 1.
2.3. Proposed Method
To achieve a well-trained open-set classifier within the CFOSR context, we devised a unified algorithm by incorporating two strategies into a method for open-set recognition of cattle faces.
2.3.1. Cattle’s Face Open-Set Recognition
In this section, our aim was to introduce a baseline approach for CFOSR. We leveraged Adversarial Reciprocal Points Learning (ARPL) [
17], one of the state-of-the-art techniques in the domain of open-set recognition. The architectural layout, as depicted in
Figure 1, encompasses the learning of
K known reciprocal points during the training phase, where
K signifies the count of known individuals. At its core, each input image
x traverses through the feature extractor
f to yield feature representations, denoted as
. These learned features for the known entities are strategically positioned to exhibit a notable separation from their corresponding reciprocal points. During the evaluation phase, a given sample is allocated to either the known or unknown category based on the calculated distance between its features and the reciprocal points. To elaborate, if the distance to all reciprocal points falls below a predefined threshold
, the model designates the sample as an unknown entity. Conversely, the model assigns samples to specific known classes based on the maximum distance between the feature and all reciprocal points. The ARPL loss is combined with the cross-entropy loss and Adversarial Margin Constraint (AMC) loss. Notably, given sample
x and reciprocal point
, their distance
is calculated by combining the Euclidean distance and dot product when using the cross-entropy loss, while the AMC loss only uses the Euclidean distance
to mitigate the overlap between the distributions of known and unknown individuals, which is given by:
where
R is a learnable margin.
2.3.2. Additive Margin Softmax to Enhance Compactness within the Known Feature Space
Our first strategy was to employ the additive margin softmax loss rather than the cross-entropy loss, to obtain a compact feature space. Different from closed-set classification, OSR models are tasked with generating an unknown score to indicate the likelihood of an input sample belonging to the unknown category. Fine-tuning a threshold can make a decision based on this unknown score. In this case, the advantage of a compact feature space for known classes becomes apparent since it allocates more space for unknown ones. Inspired by this idea, the Additive Margin Softmax loss (AM-Softmax) [
18] function was employed, which can be formalized as
where
n is the total number of samples,
C is the total number of known classes, and
is the correct class. The hyperparameter margin
m is only used for the correct class, and
s is a scaling hyperparameter. In the equation,
denotes extracted features for an input sample, and
is the weights in the classifier layer. Notably, the feature vector
and weight vector
need to be normalized. The AM-Softmax with the margin
m added to the decision boundary increases the separability of the classes and makes the distance between the same classes more compact in CFOSR, as shown in
Figure 5.
2.3.3. Transfer Learning
Our second strategy was utilizing transfer learning to boost the classification performance. Transfer learning aims to leverage the knowledge learned from source tasks in different domains to adapt to target tasks, so it does not need to learn from scratch with large amounts of data [
22,
23,
24]. Benefiting from transfer learning, the model can attain enhanced performance within a relatively short training period. In this study, we leveraged a large ViT model pretrained on the ImageNet21K dataset. This dataset contains a greater diversity of classes and images compared to the ImageNet1K dataset. Better performance in the target task is often observed when the source dataset comprises a wider array of classes and substantial images [
25]. Moreover, we opted for a ViT model that was pretrained on the plant-relevant dataset PlantCLEF2022 [
21]. Interestingly, we observed that the plant-relevant datasets also contributed to enhancing the performance of the cattle’s face identification. In the case of the CNN-based model, utilizing a pretrained model on ImageNet1k led to improved accuracy. As such, we employed the ResNet50 model, initially pretrained on the ImageNet1k dataset, and subsequently, fine-tuned it using the cattle’s face dataset.
4. Discussion
Accurate identification of individual cattle holds significant importance in farm management, facilitating the monitoring of cattle behavior, disease prevention, and improving animal welfare. This approach allows managers to promptly understand the situation of individual cattle, enabling them to respond promptly to identified issues. Therefore, it can greatly enhance the efficiency, production performance, and health of livestock management, contributing to the sustainability and profitability of the livestock industry.
In contrast to prior research employing wearable devices, the current study introduced a non-invasive approach utilizing image data. This innovative method involved the placement of multiple cameras on a real-world closed farm, capturing data during feeding times, and promoting the use of non-invasive information for individual cattle’s identification. The gathered data were subsequently processed by the proposed deep-learning-based architecture to accurately identify individual cattle and, at the same time, recognize unknown individuals.
The qualitative and quantitative results obtained from both closed-set and open-set scenarios validated the effectiveness of the proposed techniques. Furthermore, employing state-of-the-art classifiers and metrics allowed for comparative analysis, revealing significant potential for further improvements in future research. However, a significant limitation of the current model is that it identified all unknown classes as a single unknown class and could not further differentiate among these unknown classes. Addressing this limitation requires the incorporation of additional techniques to distinguish among unknown classes. For instance, we can apply clustering techniques to classify unknown classes. Additionally, there is room for improvement in recognition accuracy when dealing with more-complex datasets. Moreover, obtaining image annotations from video data has been proven to be a complex and time-consuming task. Therefore, additional research and technology are needed to replicate the proposed framework in a more-versatile system that can operate across multiple farms. Our future research efforts will be dedicated to addressing these challenges.
5. Conclusions
In this study, we proposed a method to achieve cattle’s face recognition in the open-set scenario named CFOSR. Meanwhile, from a novel perspective, introducing an effective open-set classifier has the potential to significantly enhance the classification performance in closed-set scenarios. To obtain an effective classifier in the CFOSR context, two strategies were utilized and incorporated with the state-of-the-art OSR method, the APRL. To be more specific, the AM-Softmax was employed to have a compact intra-class feature space that is beneficial to detect the unknown ones. A ViT-based model pretrained in a large-scale dataset, ImageNet21k, was transferred for the downstream tasks, compared to the commonly used small-scale dataset ImageNet1K. Furthermore, we observed that the plant-relevant dataset PlantCLEF2022 also contributed to enhancing the performance of the cattle’s face identification. Our strategies were executed on real-world cattle’s face datasets, and the experimental results validated their effectiveness. More precisely, our method achieved an AUROC of 91.84 and an OSCR of 87.85 in the context of open-set recognition on a complex dataset. Simultaneously, it demonstrated an accuracy of 94.46 for closed-set recognition. Notably, we achieved a 95.12 AUROC and a 94.33 OSCR on a less-challenging dataset. In spite of the decent performance and some basic understanding of CFOSR, we desire to improve our model for real-world applications. We hope our work will contribute to the community, encourage more work, and offer a novel visual approach to enhance closed-set classification accuracy.