An Integrated Gather-and-Distribute Mechanism and Attention-Enhanced Deformable Convolution Model for Pig Behavior Recognition
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset
2.1.1. Data Sources
2.1.2. Data Preprocessing
- Image acquisition. Python scripts were used to intercept images in jpg format every 12 frames of the collected videos.
- Screening and filtering. After excluding images with significant pig occlusion, 11,999 images were finally extracted. We then used filters to blur the noise spots in the image and reduce interference with model training.
- Labeling. We utilized the labelImg software (https://github.com/HumanSignal/labelImg (accessed on 17 March 2024)) to manually label the pig behaviors in the images. Throughout the process, pigs with occluded areas exceeding 30% or with occluded heads were not labeled. Adhering to this criterion, we produced standard txt format annotation files following the COCO dataset format.After labeling is completed, we divided the 11,999 images and annotation files into a training set, a validation set, and a test set at a ratio of 7:2:1. Since an image may contain multiple instances of behaviors, the number of instances for each behavior in the dataset is shown in Table 2.
- Data enhancement. Data enhancement techniques can increase the diversity of samples and improve the robustness of the model. In this study, the Mosaic Data Augmentation (MDA) [25] method in the YOLO model was deployed to augment the data through five methods, including image stitching, mirroring, clipping, random rotation, and HSV tone enhancement, such that the scale of the dataset used for model training was expanded fivefold.
2.2. Our Model
2.2.1. DCN-MPCA
2.2.2. Gather-and-Distribute (GD) Mechanism
- FAM. FAM’s primary role is aligning input features of different scales to form homogenous scale features. It involves selecting a standard size for feature alignment, then upscaling smaller features via bilinear interpolation and downsizing larger features by average pooling. This uniform scaling ensures consistent spatial dimension across all features. The final step involves concatenation to generate the aligned feature, .
- IFM. IFM takes on the responsibility of fusing the features aligned by FAM, creating global information. It does this by inputting the alignment feature, , from FAM’s output into multi-layer reparameterized convolutional blocks (RepBlocks) or Transformer blocks. From these operations, the global fusion feature is derived. is split into different scales features along the channel dimension to produce , which is then fused with corresponding scales features.
- IIM. IIM fuses the split global information from the IFM output with the input information from the corresponding scale in FAM. This compound is injected into the network model to ensure the efficient distribution of global context information through a self-attention approach. As shown in Figure 4, the inputs are the features of the current scale () and the global features derived from the IFM output (). Here, i is an integer ranging from 3 to 5, representing various levels. Two types of Convs are used with , yielding and , respectively, and is calculated using a Conv of . Should there be any inconsistency in size during the fusion process, it is rectified using average pooling or bilinear interpolation. We then compute and , considering attention to generate the fused feature . The resultant information undergoes further screening and merging via RepBlock processing, producing the output feature . The corresponding equations used in this process are as follows:
- Low-GD. The Low-GD component is structured to encompass Low-FAM, Low-IFM, and IIM. Figure 5 provides a detailed representation of this layout. B2, B3, B4, and B5 feature maps from the backbone output are chosen as input for Low-IFM, with B4 determining the target feature size. After feature alignment, the combined feature is derived, which is then inputted into Low-IFM and run through a Repblock. This generates the global features and . and are combined with these in IIM, yielding and . Simultaneously, is retained as . Ultimately, the output features of Low-GD are produced in this manner.
- High-GD. Similarly, High-GD is comprised of High-FAM, High-IFM, and IIM as illustrated in Figure 6. The respective inputs of High-FAM are derived from the output of Low-GD. We target for the final size, downsample and , and generate following feature alignment. Unlike Low-IFM, in High-IFM, is fed into the Transformer block and subsequently partitioned to yield and . These features are injected into and , respectively, producing and while retaining as , giving us the output features .
2.2.3. DM-GD-YOLO
3. Results and Analysis
3.1. Experiment Environment
3.2. Evaluation Metrics
3.3. Ablation Experiment
3.4. Comparative Analysis of Model Performance
3.5. Model Visualization Analysis
3.6. Comparative Analysis with Other Models
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- García-Gudiño, J.; Blanco-Penedo, I.; Gispert, M.; Brun, A.; Perea, J.; Font-i Furnols, M. Understanding consumers’ perceptions towards Iberian pig production and animal welfare. Meat Sci. 2021, 172, 108317. [Google Scholar] [CrossRef]
- Buller, H.; Blokhuis, H.; Lokhorst, K.; Silberberg, M.; Veissier, I. Animal Welfare Management in a Digital World. Animals 2020, 10, 1779. [Google Scholar] [CrossRef]
- Smulders, D.; Verbeke, G.; Mormède, P.; Geers, R. Validation of a behavioral observation tool to assess pig welfare. Physiol. Behav. 2006, 89, 438–447. [Google Scholar] [CrossRef] [PubMed]
- Matthews, S.G.; Miller, A.L.; Clapp, J.; Plötz, T.; Kyriazakis, I. Early detection of health and welfare compromises through automated detection of behavioural changes in pigs. Vet. J. 2016, 217, 43–51. [Google Scholar] [CrossRef] [PubMed]
- Cronin, G.M.; Rault, J.L.; Glatz, P.C. Lessons learned from past experience with intensive livestock management systems. Rev. Sci. Tech. 2014, 33, 139–151. [Google Scholar] [CrossRef]
- Kashiha, M.A.; Bahr, C.; Ott, S.; Moons, C.P.; Niewold, T.A.; Tuyttens, F.; Berckmans, D. Automatic monitoring of pig locomotion using image analysis. Livest. Sci. 2014, 159, 141–148. [Google Scholar] [CrossRef]
- Küster, S.; Kardel, M.; Ammer, S.; Brünger, J.; Koch, R.; Traulsen, I. Usage of computer vision analysis for automatic detection of activity changes in sows during final gestation. Comput. Electron. Agric. 2020, 169, 105177. [Google Scholar] [CrossRef]
- Vranken, E.; Berckmans, D. Precision livestock farming for pigs. Anim. Front. 2017, 7, 32–37. [Google Scholar] [CrossRef]
- Boissy, A.; Manteuffel, G.; Jensen, M.B.; Moe, R.O.; Spruijt, B.; Keeling, L.J.; Winckler, C.; Forkman, B.; Dimitrov, I.; Langbein, J.; et al. Assessment of positive emotions in animals to improve their welfare. Physiol. Behav. 2007, 92, 375–397. [Google Scholar] [CrossRef]
- Chen, C.; Zhu, W.; Steibel, J.; Siegford, J.; Han, J.; Norton, T. Recognition of feeding behaviour of pigs and determination of feeding time of each pig by a video-based deep learning method. Comput. Electron. Agric. 2020, 176, 105642. [Google Scholar] [CrossRef]
- Liu, D.; Oczak, M.; Maschat, K.; Baumgartner, J.; Pletzer, B.; He, D.; Norton, T. A computer vision-based method for spatial-temporal action recognition of tail-biting behaviour in group-housed pigs. Biosyst. Eng. 2020, 195, 27–41. [Google Scholar] [CrossRef]
- Riekert, M.; Klein, A.; Adrion, F.; Hoffmann, C.; Gallmann, E. Automatically detecting pig position and posture by 2D camera imaging and deep learning. Comput. Electron. Agric. 2020, 174, 105391. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Alameer, A.; Kyriazakis, I.; Bacardit, J. Automated recognition of postures and drinking behaviour for the detection of compromised health in pigs. Sci. Rep. 2020, 10, 13665. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Cai, J.; Xiao, D.; Li, Z.; Xiong, B. Real-time sow behavior detection based on deep learning. Comput. Electron. Agric. 2019, 163, 104884. [Google Scholar] [CrossRef]
- Li, Y.; Li, J.; Na, T.; Yang, H. Detection of attack behaviour of pig based on deep learning. Syst. Sci. Control Eng. 2023, 11, 2249934. [Google Scholar] [CrossRef]
- Odo, A.; Muns, R.; Boyle, L.; Kyriazakis, I. Video Analysis using Deep Learning for Automatic Quantification of Ear Biting in Pigs. IEEE Access 2023, 11, 59744–59757. [Google Scholar] [CrossRef]
- Luo, Y.; Zeng, Z.; Lu, H.; Lv, E. Posture detection of individual pigs based on lightweight convolution neural networks and efficient channel-wise attention. Sensors 2021, 21, 8369. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. Adv. Neural Inf. Process. Syst. 2023, 36, 51094–51112. [Google Scholar]
- Yang, M.; Zhang, H.; Yang, G. Common Behavior Terms and Definitions of Pigs. J. Liaoning Univ. Tradit. Chin. Med. 2016, 18, 77–83. [Google Scholar] [CrossRef]
- GB 5749-2022; China National Standard of Drinking Water. China Quality and Standards Publishing & Media Co., Ltd.: Beijing, China, 2022.
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4:Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network In Network. arXiv 2014, arXiv:1312.4400. [Google Scholar]
- Zhang, W.; Huang, Z.; Luo, G.; Chen, T.; Wang, X.; Liu, W.; Yu, G.; Shen, C. TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12083–12093. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Turner, S.P.; Farnworth, M.J.; White, I.M.; Brotherstone, S.; Mendl, M.; Knap, P.; Penny, P.; Lawrence, A.B. The accumulation of skin lesions and their use as a predictor of individual aggressiveness in pigs. Appl. Anim. Behav. Sci. 2006, 96, 245–259. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, H.; He, Y.; Liu, T. A Review in the Automatic Detection of Pigs Behavior with Sensors. J. Sens. 2022, 2022, 4519539. [Google Scholar] [CrossRef]
- Chen, C.; Zhu, W.; Liu, D.; Steibel, J.; Siegford, J.; Wurtz, K.; Han, J.; Norton, T. Detection of aggressive behaviours in pigs using a RealSence depth sensor. Comput. Electron. Agric. 2019, 166, 105003. [Google Scholar] [CrossRef]
- Wei, J.; Tang, X.; Liu, J.; Zhang, Z. Detection of Pig Movement and Aggression Using Deep Learning Approaches. Animals 2023, 13, 3074. [Google Scholar] [CrossRef]
- Yang, Q.; Xiao, D.; Lin, S. Feeding behavior recognition for group-housed pigs with the Faster R-CNN. Comput. Electron. Agric. 2018, 155, 453–460. [Google Scholar] [CrossRef]
Typical Behaviors | Description | Example |
---|---|---|
Sniffing | Pigs use their snouts to approach or touch objects. | |
Lying | Pigs lie on the ground with the sternum and udder touching the ground. | |
Walking | Pigs move by alternately lifting and landing the front and back legs while using all four legs for support. | |
Kneeling | Pigs are supported by their hips and extended front legs, with the hips making contact with the ground. | |
Fighting | Pigs interact by swiftly pushing each other’s neck, head, or ears with their heads. | |
Fence climbing | Pigs place their front legs on the fence, tilting their bodies or positioning them perpendicularly to the ground. | |
Mounting | Pigs place their two front legs on their partner’s front or back, with or without pelvic insertion. |
Behavior | Instances of the Training Set | Instances of the Validation Set | Instances of the Test Set | Total |
---|---|---|---|---|
Sniffing | 3692 | 1033 | 536 | 5261 |
Lying | 4803 | 1348 | 709 | 6860 |
Walking | 1884 | 557 | 264 | 2705 |
Kneeling | 2126 | 612 | 320 | 3058 |
Fighting | 482 | 129 | 55 | 666 |
Fence-climbing | 1138 | 351 | 173 | 1662 |
Mounting | 1269 | 351 | 168 | 1788 |
Hyperparameters | Value |
---|---|
Optimization | SGD |
Learning rate | 0.01 |
Momentum | 0.937 |
Weight decay | 0.0005 |
Batchsize | 64 |
Warm-up round | 3 |
Warm-up momentum | 0.8 |
Warm-up deviation | 0.1 |
Intersection over Union | 0.7 |
Training epoch | 200 |
Model | C2f-DM | GD Mechanism | Precision (%) | Recall (%) | mAP50 (%) |
---|---|---|---|---|---|
YOLOv8 | 87.0 | 87.0 | 92.7 | ||
YOLOv8+C2f-DM | ✔ | 91.3 | 88.3 | 94.8 | |
YOLOv8+GD | ✔ | 87.3 | 92.6 | 94.8 | |
DM-GD-YOLO | ✔ | ✔ | 88.2 | 92.2 | 95.3 |
Model | Precision (%) | Recall (%) | mAP (%) | Parameters (MB) | FLOPs (G) |
---|---|---|---|---|---|
EfficientDet | 90.5 | 88.1 | 93.5 | 6.6 | 11.6 |
Faster R-CNN | 68.5 | 92.6 | 90.4 | 136.8 | 401.8 |
YOLOv7 | 85.9 | 91.0 | 93.3 | 36.6 | 103.3 |
YOLOv8 | 87.0 | 87.0 | 92.7 | 3.0 | 8.1 |
DM-GD-YOLO | 88.2 | 92.2 | 95.3 | 6.0 | 10.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mao, R.; Shen, D.; Wang, R.; Cui, Y.; Hu, Y.; Li, M.; Wang, M. An Integrated Gather-and-Distribute Mechanism and Attention-Enhanced Deformable Convolution Model for Pig Behavior Recognition. Animals 2024, 14, 1316. https://doi.org/10.3390/ani14091316
Mao R, Shen D, Wang R, Cui Y, Hu Y, Li M, Wang M. An Integrated Gather-and-Distribute Mechanism and Attention-Enhanced Deformable Convolution Model for Pig Behavior Recognition. Animals. 2024; 14(9):1316. https://doi.org/10.3390/ani14091316
Chicago/Turabian StyleMao, Rui, Dongzhen Shen, Ruiqi Wang, Yiming Cui, Yufan Hu, Mei Li, and Meili Wang. 2024. "An Integrated Gather-and-Distribute Mechanism and Attention-Enhanced Deformable Convolution Model for Pig Behavior Recognition" Animals 14, no. 9: 1316. https://doi.org/10.3390/ani14091316
APA StyleMao, R., Shen, D., Wang, R., Cui, Y., Hu, Y., Li, M., & Wang, M. (2024). An Integrated Gather-and-Distribute Mechanism and Attention-Enhanced Deformable Convolution Model for Pig Behavior Recognition. Animals, 14(9), 1316. https://doi.org/10.3390/ani14091316