1. Introduction
The Inner Mongolia Autonomous Region is a significant area for horse breeding in China. It boasts a rich cultural heritage and favorable geographical and climatic conditions for the development of modern equine industry [
1]. Mongolian horses, as a superior breed in the Inner Mongolia Autonomous Region, have a history of over a thousand years and have always been an important resource in China [
2,
3]. In recent years, the traditional horse breeding industry is gradually developing into a new modern industry, integrating sports, economy, and leisure [
4]. The effective protection of local horse breed resources, conducting excellent breeding work, and fully exploiting excellent germplasm characteristics have become key aspects of the development process [
5,
6]. Researchers will test the horse’s gene sequences to analyze the effects of different gene expressions on various morphological traits [
7]. The central task of animal breeding is the improvement of quantitative traits, among which body dimension parameters play an important role in Mongolian horse breeding [
8,
9]. They can directly reflect its growth and development status, as well as the effectiveness of breed improvement efforts. The differences in body dimension parameters may lead to changes in kinematic and dynamic parameters, thereby affecting the performance of horses [
10,
11]. However, the gene sequencing may not be applicable to the large-scale horse industry. Traditional body measurements often require direct manual contact with the horse using tools. The manual measurement method is not only inefficient and labor-intensive, but also lacks high automation and can easily cause stress reactions in horses. A rapid, accurate, and efficient measurement method is therefore crucial for overcoming this dilemma [
12].
With the development of Precision Livestock Farming (PLF), researchers are increasingly turning to advanced technologies to replace manual methods, allowing them to obtain physiological indicators that reflect the health status of livestock in a precise and efficient way [
13]. Pallottino et al. [
14] proposed an approach that utilized a stereo vision system in combination with an image analysis algorithm to automatically extract the body information of a Lipizzan horse. Zhang et al. [
15] developed a body measurement system for Yili horses using Matlab-GUI and the YOLACT algorithm to segment and extract Yili horses from the background, and combined this with manual markers to determine the measurement points. Freitag et al. [
16] investigated the placement of markers on the left side of the horses and combined it with ImageJ 1.51r image analysis software to measure the corresponding body size parameters by manual labeling, and the average relative error of all body size measurements was controlled within 1.5%. Gmel et al. [
17] used the tpsDig2 image analysis tool for the manual measurement and extraction of various body dimension parameters of Freiberger horses by operators. They then employed a restricted maximum likelihood model to estimate the heritability of body dimension parameters. Zhang et al. [
18] employed a simple linear iterative clustering algorithm to obtain high-precision images of the target sheep, and localized the keypoint locations after calculating the maximum curvature of the image curve. GENG et al. [
19] collected point cloud data from pigs through the utilization of dual KinectV2 cameras and calculated body parameters by employing curve fitting and point cloud slicing techniques. The above research indicates that it is feasible to use image processing techniques to obtain Mongolian horses’ body dimension parameters and formulate breeding plans. However, the methods mentioned mainly rely on the manual selection of measurement points, which is complex, lacks automation, and requires expensive equipment. Therefore, a rapid, accurate, and efficient measurement method has become a key means to advance Mongolian horses’ breeding processes.
In recent years, thanks to advancements in the field of human pose estimation, researchers have applied convolutional neural network-based keypoint detection methods to the real-time automatic detection of keypoints in livestock studies, achieving outstanding results. Li et al. [
20] utilized the Hourglass model to locate measurement point positions in segmented images of the trunk of cows and goats. Du et al. [
21] applied the DeepLabCut algorithm to automatically detect measurement keypoints for cattle and pigs, achieving body size measurements through keypoint location information. Wang et al. [
22] achieved the detection of keypoints in standing pigs by constructing the HRNet (High-Resolution Network) model. Song et al. [
23], addressing the issue of high network complexity in existing deep learning models for cow keypoint detection, proposed the SimCC-ShufleNetV2 lightweight cow keypoint detection model. This model has a floating-point operation volume of 0.15 G, a parameter quantity of 1.31 × 10
6 M, and a detection speed of 10.87 f/s, providing technical support for tasks such as cow body dimension measurement, behavior recognition, and weight estimation.
In summary, the application of computer vision combined with deep learning methods offers the potential to enhance the accuracy and efficiency of non-contact body measurements of livestock while reducing the need for human operators. Notably, there is a lack of both domestic and international studies that utilize a keypoint detection method based on convolutional neural networks for the automatic measurement of Mongolian horses’ body parameters. Therefore, this study focuses on Mongolian horses and proposes an end-to-end fusion approach for target and keypoint detection to automatically measure horse body parameters. The objective is to improve the accuracy of keypoint localization while obtaining the necessary measurements for breeding programs, and enable automatic measurements of Mongolian horse body parameters in a natural state.
3. Experiments and Results
3.1. Test Environment and Experimental Configuration
All the experiments in this study were conducted under a unified environment. The test environment was based on Ubuntu 20.04 operating system equipped with Intel Core i7-9700K and Nvidia GeForce RTX 2080ti (USA). The acceleration environment was CUDA 11.3, CUDNN 8.2.1, the deep learning framework was PyTorch 1.11.0, and Python 3.8 was selected as programming language.
In the training process of SimAM–YOLOv8n, the size of input image was 640 × 640, the batch size was 16, the initial learning rate was 0.01, the learning rate momentum was 0.937, the weight decay coefficient was 0.0005, the SGD was used as the optimizer, and the training epoch was set to 150. The RTMPose training parameters are set as follows: the training image size was 256 × 256, the batch size was set to 16, the initial learning rate was 0.004, the weight decay was 0.05, AdamW was used as the optimizer, and the training epoch was set to 200.
Owing to the substantial volume of training iterations, the cosine annealing method is employed to dynamically modulate the learning rate throughout the training regimen, thereby ensuring a more consistent and efficient convergence of the model [
28]. The computational procedure for the cosine annealing method is shown in Equation (6).
where
and
represents the maximum and minimum values of the learning rate, respectively;
represents the number of epochs that have been executed; and
represents the total number of epochs in the
ith run.
3.2. Evaluation Indicators
This study comprises two models: one for target detection and one for keypoint detection, each with different computational foundations. To distinguish between them, AP-obj was set as the evaluation metric for the target detection experiment, as illustrated in Equations (7)–(9).
where
TP is the number of correct positive samples detected;
FP is the number of false positive samples detected; and
FN is the number of false negative samples detected.
AP-obj is the
AP value at different Intersection over Union (IoU) thresholds, spanning from 0.5 to 0.95 in increments of 0.05.
The evaluation metric chosen for the keypoint identification experiment was
AP-kp. Its computation is grounded in Object Keypoint Similarity (
OKS):
where
pi is the ID of the keypoint;
is the visibility of the keypoint;
is the Euclidean distance between the detected keypoint and the corresponding labeled keypoint;
is the scale of the target bounding box; and
is the normalization factor of the ith keypoint.
In this phase,
OKS is equivalent to the
IoU value in the target detection experiment, and a higher threshold represents more accuracy and consistency between the predicted keypoints and the labeled keypoints [
29]. In this study,
AP-kp values are calculated when the threshold value ranges from 0.5 to 0.95 with a step size of 0.05. Additionally, the number of parameters, floating-point operations per second (FLOPs), and model size are used for evaluating the algorithm computational requirements and complexity.
3.3. SimAM–YOLOv8n Performance Validation
The effectiveness and superiority of the enhanced algorithm proposed in this study were verified and a performance analysis of various algorithms was conducted. The selected algorithms for comparison encompass the prevailing target detection methods: Faster RCNN, SSD, YOLOv5n, YOLOv7-tiny, and YOLOv8n. The experimental results are presented in
Table 2.
The results demonstrate that SimAM–YOLOv8n achieves the highest AP-obj value. Moreover, in comparison to the other models, SimAM–YOLOv8n boasts a more compact set of parameters, reduced computational demands, and lighter model weights. These results suggest that the algorithm excels in detection efficiency and is well-suited for real-time detection applications.
3.4. RTMPose Performance Validation
Based on the self-constructed dataset, RTMPose and three mainstream keypoint detection models were trained and validated, and the accuracy curves of the four models during the training process are shown in
Figure 7, which shows that RTMPose achieves convergence earlier than other algorithms. The algorithm performance validation results are shown in
Table 3.
From the comparison results, it can be seen that the AP-kp value of RTMPose was 91.4%, which was 2.3% and 1.8% higher compared to Hourglass and HRNet, respectively, and although it was 0.4% lower compared to SimCC, the parameters, FLOPs, and the size of the model weights are reduced by 21.41 M, 5.87 G, and 115.7 MB, respectively, which comprehensively shows that the RTMPose required fewer computational resources and the model complexity is low, and it achieves a good balance between accuracy and speed.
3.5. Body Measurements Accuracy Verification
The detection results of each keypoint and the heatmap representation results after inputting the horse image to be measured into the algorithmic model proposed in this study are shown in
Figure 8, from which the algorithm proposed in this study can effectively determine and classify the location and category of each measurement keypoint. Finally, the automated calculation of individual body measurements is predicated on the coordinate data associated with keypoints across distinct categories.
The conversion factor (CF) was determined through the analysis of the height contrast stick positioned within the passageway using image processing software (ImageJ 1.51 r, National Institute of Mental Health, USA). The calculation procedure is elucidated in Equations (11).
where
represents the true length of the stick; and
represents the pixel length of the stick in the figure. A comparison of the manual and modeled measurements of body parameters is shown in
Figure 9. The MRE of shoulder height, chest depth, body height, body length, croup height, shoulder angle of shoulder and croup angle of croup were determined to be 3.86%, 4.72%, 3.98%, 2.74%, 2.89%, 4.59% and 5.28%, respectively. The results show that the method proposed in this study can be used as a non-contact method for the automatic measurement of equine body size.
4. Discussion and Conclusions
Given the significance of obtaining the morphological parameters of horses and the limited research on automating this process using deep learning methods, this paper proposes a deep learning-based method for the automatic measurement of Mongolian horse body size and conformation, aiming to be able to obtain equine morphometric measurements in an efficient and accurate way. However, we also noticed the shortcoming in our method. Additionally, only one side of the Mongolian horse is assessed, which means that traits such as body width, heart girth and cannon bone girth, cannot be evaluated with a 2D camera. Some 3D equipment is already available in the context of equine performance laboratories; however, the equipment is challenging to operate, and the manual placing of anatomical landmarks is time-consuming, which therefore limits implementation in a routine measurement. Although animal body measurements based on 3D reconstruction have been researched, the real-time performance is poor and there is a high demand for operating equipment, and it mainly relies on the horse’s side image in the horse body measurement task. Additionally, since the dataset of this study was only for Mongolian horses, its applicability to the other horse breeds remains to be verified.
This study proposes an end-to-end fusion of target and keypoint detection for body measurements, which can realize the measurement task with low computational cost and provides technical support and a theoretical basis for the development of subsequent mobile devices for horse body measurements. In the original YOLOv8n backbone network, the SimAM parameterless attention mechanism is introduced, the coordinate regression-based RTMPose keypoint detection algorithm is selected, and the cosine annealing method is used to dynamically adjust the learning rate so as to ensure a more consistent and effective convergence of the model. The experimental results show that the parameters of SimAM–YOLOv8n model are reduced by 38.35, 23.16, and 3 M compared to Faster RCNN, SSD, and YOLOv7-tiny, respectively; the FLOPs of the SimAM–YOLOv8n model are reduced by 164.2, 80.1, and 5.1 G compared to Faster RCNN, SSD, and YOLOv7-tiny, respectively; the model size of SimAM–YOLOv8n model is reduced by 153.4, 256.1, and 17.7 MB compared to Faster RCNN, SSD, and YOLOv7-tiny, respectively; and the AP-obj of the SimAM–YOLOv8n model increases by 4.3, 4.7, 5.5, 4.4, and 3.5 percentage points compared to Faster RCNN, SSD, YOLOv5n, YOLOv7-tiny, and YOLOv8n, respectively. The parameters of the keypoint detection model RTMPose are reduced by 89.51, 23.19, and 21.41 M compared to Hourglass, HRNet, and SimCC, respectively; the FLOPs of the keypoint detection model RTMPose are reduced by 27.76, 15.89, and 5.87 G compared to Hourglass, HRNet, and SimCC, respectively; the model size of the keypoint detection model RTMPose is reduced by 337, 84.3, and 115.7 MB compared to Hourglass, HRNet, and SimCC, respectively; and the AP-kp value of RTMPose was 91.4%, which was 2.3% and 1.8% higher compared to Hourglass and HRNet, respectively, and it was 0.4% lower compared to SimCC. Compared with the manual measurements the shoulder height, chest depth, body height, body length, croup height, angle of shoulder and angle of croup had mean relative errors (MRE) of 3.86%, 4.72%, 3.98%, 2.74%, 2.89%, 4.59% and 5.28%, respectively.
The measurement of horse body parameters through machine vision technology has been corroborated by relevant studies. In one instance, researchers utilized dual cameras for linear and angular measurements on Lipizzan horses. The manual and visual measurements displayed a strong overall correlation among operators (r = 0.998) with an average error rate of less than 3%. Nonetheless, this study was limited to a small sample size of only 10 horses [
14]. To enhance the precision of body measurements, researchers adhered high-contrast stickers at designated points on horses for measurements. Subsequently, the side images of the horses were captured with cameras, and the body size parameters were determined through image processing and related procedures. The Pearson correlation coefficient between manual and automated systems reached 0.999. Nonetheless, the manual placement of stickers in this method was prone to human factors, and the markers were susceptible to dislodging when the horses were in motion, which could adversely affect body measurements [
16]. In another study, a digital 3D modeling-based method was employed to measure the body parameters of five Pura Raza Española horses. A comparison with human measurements revealed that 88% of the system’s samples had an average relative error of less than 20% [
30]. By comparing the non-contact measurement methods of different horse breeds, it can be found that the method proposed in this paper has the advantage of automatic measurement of parameters, even though it is slightly lower than manual measurement in terms of measurement accuracy. Therefore, future research should focus on expanding the variety of body parameters obtained from automated measurements and further improving the measurement accuracy. This will help to enhance the reliability and accuracy of automated measurement techniques in practical applications.
Furthermore, to identify where the model underperforms in specific scenarios or keypoints, a detailed error analysis was conducted. This analysis aids in targeting improvements and understanding the limitations of the current method. By examining the error distribution across different keypoints, we found that certain keypoints exhibit higher error rates. Specifically, in dynamic scenarios (e.g., when the horse is walking or moving), shoulder height and hip height errors are relatively higher. This may be due to significant posture changes during movement, which increases the difficulty of accurately localizing these keypoints. Additionally, shoulder and hip angles are particularly sensitive to variations in lighting and background clutter. In images with inconsistent lighting or more complex backgrounds, these keypoints tend to show greater errors. Simultaneously, in manual measurements, reference points are susceptible to disturbances from both horses and testers, potentially leading to measurement inaccuracies [
31]. This suggests that the model needs further optimization to improve robustness under such varying conditions.
The deep learning-based automatic measurement of equine morphological parameters proposed in this paper is an effective and valid tool for farmers, breeding enterprises and research scholars. Additionally, our method can be carried out in the daily life of horses, which greatly safeguards the welfare of the horses and reduces the risks of injury for the evaluator; with the development trend of precision livestock farming (PFL), we can apply the method to the establish the database to allow better Mongolian horse breeding programs.