1. Introduction
Eggs are rich in nutrition and indispensable agricultural and livestock products in people’s daily lives [
1]. With the application of mechanical and control technology, feed supply, water supply, egg collection, manure removal, and environmental management have been automated in large-scale laying hen farms. However, detecting and removing dead chickens still relies heavily on manual labor [
2]. Due to visual fatigue during long-term inspections and difficulty observing chicken cages higher than the worker, some dead chickens may not be found in time, leading to cross-infection, disease transmission, and economic loss. Additionally, prolonged work on a farm with high concentrations of harmful gases and dust harms workers’ health. Therefore, using machines to replace manual inspections and detect dead chickens is increasingly urgent on large-scale commercial farms.
With digital image processing and machine learning development, many scholars have researched dead chicken detection using chicken images. Lu et al. [
3] converted chicken RGB images into the L*a*b* color space, extracted a* as a feature, and segmented the red chicken comb. They determined whether a dead chicken was in the cage by judging the presence or absence of a stationary red chicken comb. Zhu et al. [
4] and Peng [
5] extracted five features from the chicken comb based on the L*a*b* color space and used the variation of these features as input for SVM to determine whether there was a dead chicken in the image. The accuracy of this method in recognizing dead chickens was over 92%. Lu [
6] used center radiating vectors to represent chicken contours in images. She used the absolute difference between the corresponding vectors of two contours over a period of time as the input of the support vector machine (SVM) to train the classifier to identify whether an image contains a dead chicken. The classification accuracy of this method could reach 95%. Li [
7] collected chicken leg images and counted the shanks to determine the number of chickens in the image and whether there was a dead chicken. The accuracy of this method was greatly affected by chicken density and activity, with an accuracy of 90% in experiments. However, these methods could not detect dead chickens well on commercial farms when the head was obscured or lighting conditions were poor.
Recently, with the development of deep learning, the convolutional neural network (CNN) has been increasingly applied to dead chicken detection, leveraging its powerful feature extraction capabilities. Zhuang et al. [
8] proposed a diseased broilers detection network named IFSSD based on the single shot multibox detector (SSD) network, which identified the health status of broilers while detecting them. The mAP (IoU = 0.5) of the network reached 99.7%. Xue [
9] established a dead broilers detection model based on the Faster R-CNN algorithm after registering and fusing thermal infrared images with visible light images. This model’s precision and recall were over 98%. Liu et al. [
10] designed a small removal system for dead broilers that used the YOLOv4 algorithm to detect dead chickens, with an mAP (IoU = 0.5) of 95.24%. Hao et al. [
11] built a dead broiler detection system in a stacked cage environment, including an autonomous detection platform and a dead broiler detection model based on an improved YOLOv3. The experimental results showed that the mAP (IoU = 0.5) of the model was 98.6%.
Despite significant progress made in previous research, several issues still need to be addressed regarding the practical application of dead chicken detection in commercial farming. Firstly, the light intensity is low in commercial farming, with the stacked cage farming system usually below 20 lux. Although visible light cameras contain rich texture information, the imaging efficiency and quality are low under low-light conditions, leading to target loss [
12,
13]. Using visible light to improve image quality may cause stress on chickens, resulting in reduced production performance or even killing chickens. Secondly, chickens often obstruct each other on commercial farms. The single-source image does not contain enough information about dead chickens due to hardware limitations, leading to problems such as missing, false, repeated, and inaccurate detection [
14,
15]. Using multi-source images for dead chicken detection is a good solution to solve these issues. Multi-source images generally refer to images from multiple sensors or cameras, including RGB, thermal infrared (TIR), depth, and near-infrared (NIR) images. The NIR image lacks color information compared to the RGB image, but it is not affected by ambient light and still has details and distinct edges [
16]. The TIR image contains thermal radiation information emitted by targets. However, it lacks detailed descriptions of the targets and has a low signal-to-noise ratio and contrast [
17]. The depth image contains distance information, which can be used for image registration and 3D reconstruction. Nevertheless, the lack of detailed information on object features makes it difficult to identify objects with similar heights and shapes accurately [
18].
In the livestock industry, previous research applied multi-source images to different aspects. Liu et al. [
19,
20,
21] developed a registration and fusion algorithm to fuse thermal infrared and visible light images of pigs. They successfully detected the region of the pig ear root using the fused image and an improved active shape model. Zhu et al. [
18] proposed an end-to-end pose recognition algorithm for lactating sows using RGB-D images. Their algorithm employed two CNN networks to extract features from the RGB image and depth image separately. Additionally, a region proposal network and a feature fusion layer were utilized to generate regions of interest and merge features of the RGB-D images. The fused features were then input into a Faster R-CNN network to detect the posture of lactating sows. He et al. [
22] introduced a Light-weight High-Resolution Network (LiteHRNet) for estimating the weight of sheep using RGB-D images. The LiteHRNet backbone extracted features from the given sheep RGB-D images, and the LEHead module combined these features to estimate the sheep’s weight. Lamping et al. [
23] proposed ChickenNet, a network based on the Mask R-CNN architecture. Depth images were added to visible light images in input, and plumage conditions were added to the detection of laying hens in output. Zhang et al. [
24] presented a method for monitoring feather damage using RGB, thermal infrared, and depth images. The method utilized the above images as inputs to reconstruct a three-dimensional model of a chicken and established an automated algorithm for assessing the depth of feather damage. With a multi-source image, it is theoretically possible to achieve better detection results than that with a single-source image because it contains different types of information.
This study replaced visible light images with NIR images to avoid ambient light interference and proposed a method of dead laying hen detection using TIR-NIR-Depth images. Firstly, we proposed a pixel-level registration method and projected the NIR image and depth image into the coordinates of the TIR image, resulting in registered images. Then, we used the registered single-source, dual-source, and multi-source images as inputs for object detection networks such as YOLOV8n, Deformable DETR, Cascade R-CNN, and TOOD to analyze the detection performance of each source image. The flow diagram is shown in
Figure 1. Overall, this study proposed and evaluated a dead chicken detection method based on multi-source images and deep learning, which helped researchers select and use the multi-source image for hen monitoring on commercial farms.
2. Materials and Methods
2.1. Image Data Collection
2.1.1. Animals and Farm Environment
All images were collected at a commercial laying hen farm in BeijingHuadu Yukou Poultry Industry Co., Ltd., Beijing, China. The chicken breed was Jingfen 6, about 500 days old. The farm had four rows of chicken cages; each row was divided into three tiers: upper, middle, and lower, and arranged in an A-shaped staircase. Each tier had 300 cages, with four chickens per cage on the upper and middle tiers and three chickens per cage on the lower tier. Each cage was equipped with nipple-type drinkers. The automatic feeding device fed the chickens at 7:00, 11:00, 15:00, and 18:30. The environmental control system regulated the temperature and humidity inside the farm by controlling the fans. Incandescent lamps were used as the light source, with an illumination time of 2:30–18:30 and an illumination intensity range of 5~20 lux.
2.1.2. Image Acquisition Device
Image acquisition device included two cameras, an industrial computer, a self-developed mobile chassis, and some connectors. All images were captured using a depth camera and a TIR camera with fixed relative positions, as shown in
Figure 2. The depth camera (RealSense L515, Intel Corporation, Santa Clara, CA, USA) was used to acquire NIR and depth images. The TIR camera (IRay P2, IRay Technology Co., Ltd., Yantai, China) was used to acquire TIR images. The two cameras were connected to an industrial computer (ARK-3531, Advantech Co., Ltd., Kunshan, China) via a USB interface, and the image acquisition and storage were controlled using the Python programming language.
The cameras were fixed on a self-developed mobile chassis via a ball head, a square hollow, a clamp, and a steel pipe. The height of the clamp was adjustable to capture images of chickens on different tiers, and the length of the square hollow and the angle of the ball head were adjustable to change the linear field of view of the cameras. The self-developed mobile chassis was controlled based on the robot operating system (ROS) with a combination of Python and C++ programming languages. The chassis was guided by magnetic tape to stop at fixed locations, and then the industrial computer sent a command to the cameras to collect images. Due to the small number of dead chickens, the dead chicken images were collected manually using the cameras and a tripod after locating the dead chickens by manual inspection.
2.1.3. Image Data
On 10 and 11 November 2022, a total of 2052 sets of live chicken images (excluding dead chickens) were collected from the upper, middle, and lower tiers. From 11 November to 4 December 2022, 81 dead chickens were manually inspected, and a total of 1937 sets of dead chicken images (including dead chickens) were collected from different angles and distances. Each set of images included one TIR image, one NIR image, and one depth image, with resolutions of 256 × 192, 640 × 480, and 640 × 480, respectively, as shown in
Figure 3.
2.2. Image Registration Method
Due to the differences in perspective, field of view, and resolution between TIR and depth cameras, image registration must be performed to align multiple images to ensure that temperature, texture, and depth information are correctly matched. This paper proposed a pixel-level registration method for TIR, NIR, and depth images based on coordinate transformation using depth information obtained from the depth camera.
2.2.1. Image Registration Principle
Because the TIR camera had a smaller field than the depth camera, in order to retain more information after registration, the NIR image and depth image coordinates should be projected onto the NIR image coordinates. Since the relative positions of the TIR camera and the depth camera were fixed, the intrinsic and extrinsic parameters of the two cameras could be calculated under the same world coordinate system. After incorporating the depth information, the NIR and depth image coordinates were projected onto the TIR image coordinates. The registration process is illustrated in
Figure 4, using the example of projecting the NIR image coordinates onto the TIR image coordinates.
- (1)
Transform the NIR image coordinates to the NIR camera coordinate system.
Based on the pinhole imaging model and the principle of similar triangles, the four parameters of the camera model were obtained, as shown in Equation (1).
where
is the NIR image coordinates,
is the intrinsic parameters of the NIR camera, and
is the coordinate in the NIR camera coordinate system.
Equations (2) and (3) were derived from Equation (1), which expressed the coordinates in the NIR camera coordinate system.
where
is depth information corresponding to the depth image.
- (2)
Transform the NIR camera coordinate system coordinates to the TIR camera coordinate system.
The transformation between the NIR camera coordinate system and the world coordinate system is shown in Equation (4). The transformation between the TIR camera coordinate system and the world coordinate system is shown in Equation (5).
where
is the coordinate in the NIR camera coordinate system,
is the coordinate in the world coordinate system,
is the coordinates in the TIR camera coordinate system,
is the rotation matrix of the NIR camera,
is the rotation matrix of the TIR camera,
is the translation vector of the NIR camera, and
is the translation vector of the TIR camera. In the same world coordinate system, where
is the same.
Equation (6) was derived from Equations (4) and (5). The coordinates in the NIR camera coordinate system were transformed into the coordinates in the TIR camera coordinate system using Equation (6).
- (3)
Transform the TIR camera coordinate system coordinates to the TIR image coordinates.
Similar to (1), coordinates in the TIR camera coordinate system were transformed into the TIR image coordinates using Equation (7).
where
is the TIR image coordinates,
is the intrinsic parameters of the TIR camera, and
is the coordinates in the TIR camera coordinate system.
The transformation from the NIR image coordinates to the TIR image coordinates was achieved. The same steps could be used to transform the depth image coordinates to the TIR image coordinates. The registration process described above was implemented using Python and the OpenCV-Python library.
2.2.2. Camera Intrinsic and Extrinsic Parameters Calibration
To obtain the intrinsic and extrinsic parameters of both cameras in the same world coordinate system, we made a calibration board that worked for both the TIR and depth cameras, as shown in
Figure 5a. The calibration board was made of a 1 mm thick 7075 aluminum plate with six 5 cm square grids, then painted white. A heating plate covered with black tape was placed underneath the calibration board to build a checkerboard. Twenty-seven sets of calibration board images were captured from different angles and distances using both cameras, as shown in
Figure 5b,c. The Zhang method [
25] was used for calibration with MATLAB 2016a (MathWorks, Natick, MA, USA) and the calibration toolbox developed by Bouguet, J.-Y. [
26].
2.3. Dataset
The registered TIR, NIR, and depth images were stacked in the TND image according to the RGB color space, where the R channel was the TIR image, the G channel was the NIR image, and the B channel was the depth image. Single-source and dual-source images were combinations derived from different channels of TND images. A sample set of 1250 TND images were manually selected, including 950 dead chicken images and 300 live chicken images. An image annotation tool named Labelimg was used to select the chickens’ contours and label them as dead or live. We randomly selected 60% of the TND images for model training, 20% for validating the model’s performance and adjusting the model training parameters, and 20% for evaluating the model’s generalization ability.
2.4. Dead Chicken Detection Network
YOLOv8n, Deformable DERT, Cascade R-CNN, and TOOD were selected as the dead chicken detection networks. The performance of dead chicken detection using single-source, dual-source, and multi-source images was explored by changing the channels of the TND image.
2.4.1. YOLOv8n
The YOLO series is a typical one-stage object detection algorithm. YOLOv8 is the latest YOLO series algorithm developed by the YOLOv5 team. The algorithm replaces the C3 module in the YOLOv5 backbone network with C2f, introduces a new Anchor-Free detection head, and a new loss function. According to the official test results on the COCO Val 2017 dataset, although YOLOv8 has a corresponding increase in model parameter size and FLOPs compared to YOLOv5, it has significantly improved accuracy and is a state-of-the-art model. YOLOv8n is a network with the fewest network layers in YOLOv8.
2.4.2. Deformable DETR
Deformable DETR [
27] is a representative algorithm that applies Transformer neural network architecture to object detection. In response to issues such as slow convergence speed and limited spatial resolution of DETR [
28], Deformable DETR introduces a deformable attention mechanism to DETR, accelerating its convergence speed and improving its detection performance for small objects.
2.4.3. Cascade R-CNN
Cascade R-CNN (Cai et al., 2017) [
29] is one of the representatives of two-stage object detection algorithms. To address the issues that most of the candidate boxes selected by the region proposal network have low quality at low IoU and simply increasing the IoU threshold may lead to overfitting and mismatching problems, Cascade R-CNN proposes a multi-detector structure. It uses the output of the previous stage detector to train the next stage detector and adopts higher IoU thresholds for each subsequent stage to generate higher-quality rectangular boxes.
2.4.4. TOOD
TOOD (Feng et al., 2021) [
30] is a task-aligned one-stage object detection algorithm. The traditional one-stage object detection algorithms use two parallel branches for classification and localization tasks. This might lead to a certain level of spatial misalignment in predictions between the two tasks. TOOD designed a task-aligned head (T-Head) to increase the interaction between the two tasks and proposed task alignment learning (TAL) to explicitly pull (or even unify) the optimal anchors closer for the two tasks.
We implemented YOLOv8n based on the code provided by the authors on GitHub [
https://github.com/ultralytics/ultralytics, accessed on 1 March 2023]. We implemented Deformable DETR, Cascade R-CNN, and TOOD based on the MMDetection deep learning object detection toolkit, with ResNet50 as the backbone network. All other parameters were set to default except for changing the training epochs for Deformable DETR, Cascade R-CNN, and TOOD to 50, 36, and 72.
2.5. Computing Resource
The dead chicken detection networks’ training, validation, and testing were based on the PyTorch deep learning framework, using the Python programming language and GPU acceleration for computing. The experiments were conducted on a server running Ubuntu 18.04 with an Intel(R) Xeon(R) Gold 6133 CPU @ 2.5 GHz, 40 GB of RAM, and a Tesla V100 SXM2 32 GB GPU. (To facilitate the reproducibility of experiments, it is recommended to use a graphics processing unit (GPU) with a memory size of 12 GB or higher).
2.6. Evaluation Metrics of Object Detection Networks
To evaluate the dead chicken detection performance of different types of images, AP50, R, AP75, and AP@50:5:95 were used as the evaluation metrics. The R and AP formulas are shown in Equations (8)–(10).
where TP is the number of positive samples that are correctly predicted, FP is the number of positive samples that are incorrectly predicted, and FN is the number of negative samples that are incorrectly predicted. AP is the detection accuracy of a single category. AP50 and AP75 are the detection accuracy when IoU = 0.5 and IoU = 0.75, respectively. AP@50:5:95 is the mean detection accuracy when IoU is set to 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95 (IoU = 0.5:0.95).
4. Conclusions
In this study, we first proposed a pixel-level registration method using depth information to align TIR, NIR, and depth images and projected the NIR and depth images into the TIR image coordinate system to achieve registration. To evaluate the performance with different source images, we trained dead chicken detection models with the registered single-source, dual-source, and multi-source images using representative object detection networks, including YOLOv8, Cascade R-CNN, TOOD, and Deformable DERT. At an IoU threshold of 0.5, the detection performance with each image was not identical. As the IoU threshold increases, the detection results show a similar trend: the model with the NIR image performed best among models with single-source images, and the models with dual-source images performed better than that with single-source images. The model with the TIR-NIR image or the NIR-depth image performed better than the model with the TIR-depth image. The detection performance with the multi-source images was better than that with single-source images but was not significantly different from that with the TIR-NIR and NIR-depth images. Therefore, the TIR-NIR image could be used for dead chicken detection to achieve high accuracy and reduce the cost.
Overall, we proposed a high-accuracy method for detecting dead chickens on commercial farms that was robust to ambient light. We also evaluated the performance of various dead chicken detection models using different source images. Our findings may prove to be useful for future research on poultry health monitoring using near-infrared, thermal infrared, or depth cameras.