1. Introduction
To monitor animal behaviour, classical approaches are used which involve real-time manual observation or the manual analysis of recorded animal behaviours. These methods are labour-intensive and a video from one experiment may take several months to analyse. Although standardized methods and analysis programs are used, most of the data still need to be collected using manual methods. Thus, automatic monitoring is desirable and urgently needed. Several methods using 2D cameras for detection and tracking have been investigated [
1,
2,
3]. As is the case with image analysis techniques, these methods suffer the same problem: visual cues are unreliable and similar objects might be difficult to differentiate. With the blooming of machine learning/deep learning (ML/DL) research in images in recent years, significant improvements have been made in animal shape detection [
4,
5] and behavioural sequence detection [
6]. Individual pigs can be identified on the basis of their inherent dimensions and colour [
7].
Several sensor modalities are now available for the automatic monitoring of behaviour. For instance, deviations in drinking and feeding and the frequency of coughs and vocalisations have been registered by using such systems [
8]. Deviations from behavioural synchrony in groups of pigs is important as they tend to show very synchronous activity patterns, and the individuals deviating from this pattern could potentially be suffering health or welfare issues. However, if one wants to monitor several pens with many pigs in each and gain an insight into their welfare status, we need cameras from above with a slight side angle, which would thus make it easier to recognize pigs based on shapes rather than faces as their faces are usually oriented towards the ground. The face will thus be less visible than their body posture, shape, tail or ears. Detecting individual pigs and their body parts by using deep learning-based computer vision has great potential as a welfare assessment tool to define a positive/negative affective state in individual pigs and the interaction between them (head to tail/ear proximity to define ear/tail biting). A two-dimensional imaging system supported by deep learning can be successfully utilized to detect the standing and lying (belly and side) postures of pigs under commercial farming conditions [
4]. Data from different commercial farms were used for the training and validation of the proposed models. Experimental results show that, for instance, the R-FCN ResNet101 DL-network was able to detect lying and standing postures with a mean precision of more than 93%. This is extremely interesting as both positive behaviours, such as play and exploration, and negative behaviours such as aggressive conflicts are associated with certain postures that can most likely be recognized from images. Some have used deep learning for the automatic recognition of sows’ nursing behaviours in 2D images, with a precision of 97.6% [
5]. Faster R-CNN and ZFnet were applied to recognize the individual feeding behaviours of pigs [
9], where each pig in the barn was labelled with a letter. Their proposed method was able to recognise pigs’ feeding behaviours with a precision of 99.6%. Image analysis techniques using fully convolutional networks (FCNs) appear to be among the most promising methods for the automatic recognition of sow behaviours from video sequences. In a study of lactating sows [
6], features that evaluated the temporal motions of the animals were extracted, and these spatial and temporal features were then put into a hierarchical classifier for behavioural recognition. Based on the 468,000 frames of three sows, the accuracies of behavioural classification compared to manual scoring was: 98% for drinking, 95% for feeding and 88% for nursing, respectively.
The most reliable and preventive way of ensuring the positive welfare state of the pig is to understand how species–specific signals can serve as immediate non-invasive indicators of the individuals‘ affective state (i.e., their mental and physical condition). Pigs react (behave) differently to signals in their environment, for example, in harmful (suffering) or rewarding situations (pleasure) [
10]. Behaviour expressions can be used to describe the affective states of domestic pigs [
10,
11]. Thus, pig behaviour might be the most powerful and efficient early warning tool to monitor welfare at an individual level as they can predict more serious welfare and health problems that can occur at a later stage. They are honest signals (pig postures) and responses to the physical and social environment as well as the caretaker. Thus, implementing a camera-based monitoring system can serve as an important tool for on-farm preventive animal welfare work as behaviours represent early warning signs of a positive (good) vs. negative (poor) welfare state in pigs.
In such automated monitoring of pigs‘ behaviour, a focus should be on individual recognition while the pig is in lying or standing position. As pigs are social animals, they spend most of their time lying in proximity to or over pen-mates, which makes them less detectible. Therefore, it is still a problem to detect and recognise individual pigs at every point in their life span. Furthermore, it is of great importance to monitor pig body parts such as head with ears and tail. In a barren environment, pigs are likely to manipulate the ears and tail of pen-mates, a precursor to injurious ear and tail biting [
12]. Tail and ear injuries can be sources of infection resulting in further suffering, weight loss and can potentially lead to carcass condemnation at slaughter [
13]. Therefore, the monitoring and identification of individual pig tail and head/ears are of great importance for the future detection of individual pigs and biting outbreaks on farms. In addition, tail posture (straight down vs. curled) is associated with affective state in pigs. While a straight tail in an individual pig is linked to a negative affective state, a curled tail is linked with a neutral-to-positive state [
12]. Thus, it is important to develop a robust automated monitoring system of individual pig body parts (head with ears and tail) which could potentially lead to a better understanding of pig needs in their environment, to prevent tail/ear biting and determine the welfare (negative vs. positive) status of pigs on the farm.
The goal of the present paper was to develop an automated monitoring system for pig body, head and tail detection for future behavioural study applications. In the first part of this study, the aim was to recognize individual pigs in groups (in lying or standing position) and their body parts (head/ears, and tail) by the use of machine learning algorithms for object detection based on the feature pyramid network (FPN) architecture. In the second part of this study, the goal was to improve the detection of tail posture (tail straight and curled) using a YOLOv4 neural network analysis.
3. Results and Discussion
3.1. Experiment 1—Pig, Head and Tail Labelling and Detection
Out of 23,202 detected objects, there were 3307 showing pigs in lying posture (pig lying–good visibility, n = 1030; pig lying–bad visibility, n = 2277), 4436 in standing posture (pig standing–good visibility, n = 1518; standing–bad visibility, n = 2918), 7717 heads (face–good visibility, n = 1649; face–bad visibility, n = 6068) and 7742 tails (tail–curled–good visibility, n = 3651, tail–straight, n = 273; tail–uncertain, n = 3818).
The Mask R-CNN Matterport implementation was able to recognize pigs with a precision of 96%, tails with 77% and heads with 66% precision—thereby already achieving human-level precision (
Figure 7,
Table 3) when compared to three independent human observers (
Figure 7,
Table 3).
The stated precision and recall values were in fact the average precision and average recall calculated over all positive detections in the test set, using intersection over union (IoU) thresholds. IoU is a term mainly used in applications related to object detection, where we train a model to output a bounding box that fits around an object of interest [
18]. The IoU describes the extent of overlap between two bounding boxes, one marking the ground truth (the actual labelled object), the other describes the bounding box predicted by our model. The greater the region of overlap, the greater the IoU. Positive detection IoU thresholds are between 0.5 and 0.95. Positive detections or true positives (TPs) indicate the instances in which the object detection algorithm correctly identified pigs, heads or tails on the image. False detections or false positives (FPs) indicate objects wrongly identified by the object detection algorithm as pigs, heads or tails. Finally, missed detections or false negatives (FNs) are instances in which pigs, their heads or tails were labelled on the image, but they were not detected by the object detection algorithm. Average precision is the ratio of how many positives were correctly classified as positive among all positives, so this was calculated as the number of all TPs divided by the sum of all TPs and FPs. The average recall (also known as sensitivity), on the other hand, refers to the proportion of positive detections out of all actual positive detections, and is calculated as the number of all TPs divided by the sum of TPs and FNs.
While the precision of individual pig detection in groups was the highest, the head and tail detection was the lowest. The most important result from our first experiment was that we were able to distinguish individuals in groups of 12 until 15 pigs, which is the most common group size in Norwegian and other European farms. While studies have previously documented that detection precision with the R-FCN ResNet101 DL-network for both standing/lying posture was of 93%, our study revealed that with the feature pyramid network (FPN) architecture, we can achieve almost human-level precision. We showed that our model is even better than previous ones for pigs in standing/lying posture but not the best one for head and tail detection. While the detection of tail posture (straight vs. curled) was correct in 77% of occasions, the head was correct only 66% of the time. As both ears and tails are relatively small compared to the rest of the body, it is sometimes problematic even for human detection.
Pigs prefer to lie/sleep in close proximity to pen-mates—or even more frequently—lying over them [
19]. This makes it harder to identify tails or heads. Out of 7.717 pig heads labelled, their eyes, snout, mouth and ears were only visible in 21% of them. Even though tail posture detection is crucial not only in terms of determining negative welfare (tail biting) but also grading positive status (affective state), we were not able to gain enough variation in our dataset. While we labelled 47% of curled tails, there was only 3.6% of the time with tail straight down. One possible strategy to improve the model performance would be to use a separate script to increase the size of the bounding boxes, as some annotations of the head do not cover the entire head with the face and ears, as seen in
Figure 7. This can adversely impact network performance. Another strategy would be to exclude “bad_visibility” annotations from the training batch, especially if those head/tail annotations are hidden behind an annotation belonging to another individual. Bad visibility mostly occurs while pigs are lying down. From our data, we can pinpoint that we should not focus on the head/tail while the pig is lying, but only while it is standing/moving. Even though pigs spend 80% of their time resting/lying in traditional barren environments [
20], it is not necessary to have a constantly running automated monitoring system. It is more efficient to have an on-demand monitoring that can scan activities (lying/standing) with high precision, with further focus on only a certain time interval, focusing on the periods where the pigs are most active, to detect the pig body parts (head and tail). This would improve the precision of the model as pigs’ parts would be easier to detect. Ear and tails biting only appears during active periods [
12]. As the feature pyramid network (FPN) architecture is time consuming and demands a high number of labelled images, we decided to start with a new method in the second experiment.
3.2. Experiment 2—Tail Detection during Active Phase with YOLOv4
In the second part of this study, the goal was to improve the detection of tail posture (straight or curled) during the active phase using a YOLOv4 neural network analysis. We tested the tail detection, curved or straight, in YOLOv4 as an alternative to Mask R-CN, as YOLO often beats Mask R-CNN in object detection performance. YOLOv4 was initialized via a standard framework protocol (see
Figure 6) and trained on a custom dataset of 30 images with on average of 6 pigs with visible tails. The algorithm detected straight or curled tails with an average precision of 90% (
Figure 8).
With this new method, we used only 30 annotating images and by focusing on tail detection during the active phase, we were able to improve the precision from 77% to 90%. In this case, we were able to more precisely recognize the pig affective state at the farm level, showing the importance of the proper definition of a golden standard (i.e., tail posture during active phase). In the future, we are looking into possibilities to achieve even higher precision by retraining the model on additional images in an iterative fashion, potentially assisting with the labelling of new images (
Figure 9) or perhaps by using semi-supervised learning techniques [
21,
22] to even further reduce the workload.
Developing an automated monitoring system for body, head and tail detection in pigs is a new time-consuming work method that needs a lot of preparation in testing annotation tools, the preparation/annotation of images, choosing the best/most appropriate model (based on, e.g., Mask R-CNN or YOLOv4 neural networks), training the chosen models and running the models to gain a precision as high as possible, preferably at the human level. However, if the model is developed with a detailed focus on solving all problems such as overcoming bad visibility or focusing on the active rather than passive stage, a high precision model can be the result, such as in our case. This means that, in the methodological part of this study, it was crucial to develop the best performing models based on “golden standards” traits to detect under different circumstances to obtain the best-quality data on those traits one would like to monitor. Therefore, in the next stage, we can begin to focus on welfare assessment problems with tail/ear biting by using YOLOv4 neural network with greater certainty and confidence. As this is again a novel approach in the systematic assessment of pig welfare, there will be a lot of work invested before it can be used at the farm level. In addition, digital solutions are still expensive to be used on all farms; thus, farmers may avoid implementing them. However, after using our developed model in the assessment of welfare, it would be easier, quicker and cheaper than current classical approaches of gathering data based on manual observation in real time or the manual analysis of recorded animal behaviours. Therefore, developing novel digital models leading to welfare improvement is of great importance. Furthermore, with decreasing prices of equipment while increasing operational efficiency together with our future goal of defining a complete digital concept that would work under farm conditions, such a system concept would most likely be implemented as soon as possible. Farmers are namely in need of having better control over their pigs, so bringing this to their attention is most crucial. Currently, farmers do not have the possibility of 24/7 pig monitoring. Our novel digital solutions with their thorough methodological contribution and future implementation in welfare assessment have the potential to be valuable tools for farmers to reduce workload and costs. The main problem of farmers nowadays is that they still have to produce more and more pigs yearly only to survive, meaning that control and contact with each individual pig is being reduced and only a proper digital monitoring concept can ameliorate this trend and consequently improve farmer and pig welfare.
4. Conclusions
In conclusion, out of three tested annotation programs (Labelbox, Imglabel and Supervisely), we decided to use Labelbox (since it allowed the creation of the right training data, managing the process data in one place and was easy to use due to a better interface). With the use of machine learning algorithms for object detection based on feature pyramid network (FPN) architecture, the precision of individual pig detection in groups was almost the same as human-level precision. However, the method was time consuming and not optimal for head and tail detection during the passive (lying) and the active (standing/moving phase) phases. By the use of the YOLOv4 neural network analysis, we were able to reduce the human workload and improve tail posture (straight vs. curled) detection precision during the active phase—which is most crucial for reducing the incidence of tail biting and in the evaluation of the welfare state of the pigs on the farm. Our new method can be further explored in detecting behavioural sequences, group synchrony as well as quantifying positive welfare (play, exploration, tail curled and wagging). Most of these behaviours are associated with certain body postures and by defining such golden standards with high, human-level precision, we could improve the welfare status of the pig. This means that the current classical approaches of gathering data based on manual observation in real time or the manual analysis of recorded animal behaviours in research will be time- and labour-expensive and will be replaced by a cheaper, real-time digital monitoring system. Furthermore, with the implementation of a digital system, we will be able to gather more information about pig behaviour (positive and negative), and thus have better control over them and be able to provide optimal conditions in real time at the farm level.