Vision-Based Quadruped Pose Estimation and Gait Parameter Extraction Method
Abstract
:1. Introduction
- Because there are many distinct kinds of quadrupeds and because their body kinds, limbs, and behavior modes frequently change substantially, the results of applying a model originally applied using data from one animal to others are very subpar.
- Animal fur comes in various hues, and if the color is close to the backdrop color, it is easy for the background to distort the image, making it difficult to extract an accurate animal pose information from complicated surroundings.
- Because quadruped animals often have their limbs below the body, it is easy for the limbs to become mutually occluded during exercise, which makes it more challenging to extract the keypoints.
2. Models and Methods
2.1. Computational Framework
- (1)
- Target screening: The individual animal in the input image was targeted after the raw resolution video image data were passed through an object detection network, and the upper left and lower right corner coordinates of the selected box, the confidence of the box, and the animal species were output. In the experiment, if the confidence score >0.9, confidence was correctly selected, but the gait parameters collected were not continuous, possibly. Even though the parameters that were extracted were correct, some valid information was lost at the same time. If the confidence score <0.7 was correctly selected for the animal, more mutant noise was added to the final gait parameters, and the processing was difficult. We set this confidence threshold at 0.8 to balance accuracy and data integrity. The animal was correctly identified when the frame had a confidence score greater than 0.8. Following a cropping process based on the coordinates of the frame, the resolution was adjusted to 288 × 384 without changing the aspect ratio of the raw picture. If the detection confidence was less than 0.80, we skipped the current image, detected the next image, and entered the results into the subsequent RFB-HRNet network.
- (2)
- Animal pose estimation model: The image sliced by the pre-object detection network was resized to a fixed size, 288 × 384 × 3 resolution, and input into the keypoint extraction network RFB-HRNet to acquire the heatmap of the animal’s keypoints in the image, and the heatmap measured 72 × 96 × 17. The coordinate information of the keypoints in the feature map was calculated based on the predicted heatmap, and the resulting coordinate information was restored to the ground truth coordinates in the original image space and connected sequentially.
- (3)
- Animal gaits parameter extraction: After obtaining the coordinates of the keypoints of the animal under the original resolution, a set of time series of the coordinates of the animal keypoint was obtained to process the obtained data, and the gait information of the quadruped animal was acquired. This information included gait frequency, gait sequence, gait duty cycle, and gait trajectory. Section 2.3 contains information regarding the particular technique of implementation.
2.2. Animal Pose Estimation Model
2.2.1. Improved Animal Pose Estimation Network (RFB-HRNet)
2.2.2. DyC-RFB Module
2.2.3. Evaluation Metric
- (1)
- OKS
- (2)
- AP
- (3)
- mAP
- (4)
- AR
2.3. Animal Gait Parameter Extraction Model
2.3.1. Animal Gait Frequency Extraction Model
2.3.2. Gait Sequence Recognition Model
2.3.3. Duty Cycle Parameter Extraction Model
2.3.4. Gait Trajectory Extraction Model
2.3.5. Evaluation Index of Gait Parameter Extraction
- (1)
- Relative error
- (2)
- Gait sequence consistency judgment index
3. Experiments and Results
3.1. Animal Pose Estimation Experiment
3.1.1. Dataset Source and Preprocessing
3.1.2. Experimental Design
3.1.3. Experimental Results on the AP10K Dataset
3.1.4. Ablation Study
3.2. Gait Parameter Extraction Experiment
3.2.1. Video Data Collection
3.2.2. Gait Frequency Extraction Experiment
3.2.3. Experiment of Gait Sequence Recognition
3.2.4. Gait Duty Cycle Extraction Experiment
3.2.5. Gait Trajectory Extraction Experiment
4. Conclusions and Discussion
- (1)
- We converted the original video data to images and transmitted them to the network for object detection to obtain the location anchor box of the animal in the image.
- (2)
- The position anchor frame obtained in the first stage was used to crop the image and feed it into the RFB-HRNet network to obtain quadruped animal keypoints in original resolution space.
- (3)
- Various quadruped animal gait characteristics were obtained through computational research.
- (1)
- The feature extraction capability of the network model as a whole could be significantly improved by using a special receptive field module DyC-RFB to improve the feature extraction capability of the first branch of stage 1 of the HRNet network and by using the transfer effect of each branch in the later stage. Even if there was only a small improvement, it was still possible to improve the performance of quadrupeds in terms of keypoint extraction without significantly increasing the network parameters or the computation.
- (2)
- A two-stage cascade network was created by adding an object detection network to the front end of the animal pose estimation model for target screening. This network could significantly improve the animal pose estimation effect of some small targets and multitargets, as well as the stability and reliability of pose estimation.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Toshev, A.; Szegedy, C. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1653–1660. [Google Scholar]
- Fan, X.; Zheng, K.; Lin, Y.; Song, W. Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1347–1355. [Google Scholar]
- Pfister, T.; Charles, J.; Zisserman, A. Flowing convnets for human pose estimation in videos. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1913–1921. [Google Scholar]
- Li, C.; Lee, G.H. From synthetic to real: Unsupervised domain adaptation for animal pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1482–1491. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin, Germany, 2016; pp. 483–499. [Google Scholar]
- Mu, J.; Qiu, W.; Hager, G.D.; Yuille, A.L. Learning from synthetic animals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12386–12395. [Google Scholar]
- Yuan, Y.; Fu, R.; Huang, L.; Lin, W.; Zhang, C.; Chen, X.; Wang, J. Hrformer: High-resolution vision transformer for dense predict. Adv. Neural Inf. Process. Syst. 2021, 34, 7281–7293. [Google Scholar]
- Cao, J.; Tang, H.; Fang, H.S.; Shen, X.; Lu, C.; Tai, Y.W. Cross-domain adaptation for animal pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9498–9507. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Geng, Z.; Sun, K.; Xiao, B.; Zhang, Z.; Wang, J. Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14676–14686. [Google Scholar]
- Zhang, F.; Zhu, X.; Dai, H.; Ye, M.; Zhu, C. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7093–7102. [Google Scholar]
- Izonin, I.; Tkachenko, R.; Fedushko, S.; Koziy, D.; Zub, K.; Vovk, O. RBF-Based Input Doubling Method for Small Medical Data Processing. In Proceedings of the International Conference on Artificial Intelligence and Logistics Engineering, Kyiv, Ukraine, 22–24 January 2021; Springer: Berlin, Germany, 2021; Volume 82, pp. 23–31. [Google Scholar]
- Daou, H.E.; Libourel, P.A.; Renous, S.; Bels, V.; Guinot, J.C. Methods and experimental protocols to design a simulated bio-mimetic quadruped robot. Int. J. Adv. Robot. Syst. 2013, 10, 256. [Google Scholar] [CrossRef]
- Yang, A.; Huang, H.; Zhu, X.; Yang, X.; Chen, P.; Li, S.; Xue, Y. Automatic recognition of sow nursing behaviour using deep learning-based segmentation and spatial and temporal features. Biosyst. Eng. 2018, 175, 133–145. [Google Scholar] [CrossRef]
- Dissanayake, G. Infrastructure robotics: Opportunities and challenges. Assistive robotics. In Proceedings of the 18th International Conference on CLAWAR 2015, Hangzhou, China, 6–9 September 2015; p. 3. [Google Scholar]
- Peng, X.B.; Coumans, E.; Zhang, T.; Lee, T.W.; Tan, J.; Levine, S. Learning agile robotic locomotion skills by imitating animals. arXiv 2020, arXiv:2004.00784. [Google Scholar]
- Kim, C.H.; Shin, H.C.; Lee, H.H. Trotting gait analysis of a lizard using motion capture. In Proceedings of the 2013 13th International Conference on Control, Automation and Systems (ICCAS 2013), Gwangju, Korea, 20–23 October 2013; pp. 1247–1251. [Google Scholar]
- Chapinal, N.; de Passille, A.M.; Pastell, M.; Hänninen, L.; Munksgaard, L.; Rushen, J. Measurement of acceleration while walking as an automated method for gait assessment in dairy cattle. J. Dairy Sci. 2011, 94, 2895–2901. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin, Germany, 2016; pp. 21–37. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11030–11039. [Google Scholar]
- Golubitsky, M.; Stewart, I.; Buono, P.L.; Collins, J.J. Symmetry in locomotor central pattern generators and animal gaits. Nature 1999, 401, 693–695. [Google Scholar] [CrossRef] [PubMed]
- Yu, H.; Xu, Y.; Zhang, J.; Zhao, W.; Guan, Z.; Tao, D. AP-10K: A Benchmark for Animal Pose Estimation in the Wild. arXiv 2021, arXiv:2108.12617. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Keypoint | Definition | Keypoint | Definition |
---|---|---|---|
1 | Left Eye | 10 | Right Elbow |
2 | Right Eye | 11 | Right Front Paw |
3 | Nose | 12 | Left Hip |
4 | Neck | 13 | Left Knee |
5 | Root of Tail | 14 | Left Back Paw |
6 | Left Shoulder | 15 | Right Hip |
7 | Left Elbow | 16 | Right Knee |
8 | Left Front Paw | 17 | Right Back Paw |
9 | Right Shoulder |
Keypoint | Average Precision | Keypoint | Average Precision |
---|---|---|---|
Left Eye | 0.810 | Right Elbow | 0.701 |
Right Eye | 0.799 | Right Front Paw | 0.693 |
Nose | 0.777 | Left Hip | 0.691 |
Neck | 0.652 | Left Knee | 0.718 |
Root of Tail | 0.694 | Left Back Paw | 0.685 |
Left Shoulder | 0.732 | Right Hip | 0.766 |
Left Elbow | 0.728 | Right Knee | 0.671 |
Left Front Paw | 0.704 | Right Back Paw | 0.722 |
Right Shoulder | 0.728 |
Methods | Input Size | GPLOPs | #Params | mAP | AP50 | AP75 | APM | APL | AR |
---|---|---|---|---|---|---|---|---|---|
ResNet50 [19] | 288 × 384 | 5.396 | 23.508M | 0.681 | 0.926 | 0.738 | 0.552 | 0.687 | 0.718 |
ResNet101 [19] | 288 × 384 | 10.272 | 42.500M | 0.683 | 0.921 | 0.751 | 0.545 | 0.690 | 0.719 |
HRNet-W48 [21] | 288 × 384 | 21.059 | 63.595M | 0.729 | 0.936 | 0.802 | 0.577 | 0.736 | 0.762 |
RFB-HRNet | 288 × 384 | 22.612 | 63.972M | 0.750 | 0.958 | 0.818 | 0.621 | 0.753 | 0.781 |
HRNet | DyConv | RFB | #Params | mAP |
---|---|---|---|---|
√ | × | × | 63.595 M | 0.729 |
√ | × | √ | 63.915 M | 0.742 |
√ | √ | √ | 63.972 M | 0.750 |
Species | Mean Interval Frame (Frames) | This Article (Hz) | Manual Method | Relative Error/% |
---|---|---|---|---|
Buffalo | 38.25 | 0.792 | 0.812 | 2.46 |
Horse | 59.8 | 0.507 | 0.498 | 1.81 |
Dog | 12.75 | 2.377 | 2.331 | 1.97 |
Species | S1 | S2 | S3 | S4 | S5 | Gait Sequence Consistency Judgment Index (Formula (12) |
---|---|---|---|---|---|---|
Buffalo | LF | RB | RF | LB | LF | 100% |
Horse | LF | RB | RF | LB | LF | 100% |
Dog | RF | LB | LF | RB | RF | 100% |
Limb | Buffalo | Horse | Dog | |||
---|---|---|---|---|---|---|
Our Method | Manual Calculation | Our Method | Manual Calculation | Our Method | Manual Calculation | |
LF | 0.543 | 0.512 | 0.615 | 0.608 | 0.372 | 0.356 |
RF | 0.580 | 0.563 | 0.630 | 0.622 | 0.310 | 0.330 |
LB | 0.546 | 0.555 | 0.597 | 0.556 | 0.360 | 0.321 |
RB | 0.540 | 0.525 | 0.631 | 0.642 | 0.308 | 0.288 |
Average value | 0.552 | 0.539 | 0.618 | 0.607 | 0.338 | 0.324 |
Relative error/% | 2.41 | 1.81 | 4.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, Z.; Zhang, Y.; Lu, D.; Wu, T. Vision-Based Quadruped Pose Estimation and Gait Parameter Extraction Method. Electronics 2022, 11, 3702. https://doi.org/10.3390/electronics11223702
Gong Z, Zhang Y, Lu D, Wu T. Vision-Based Quadruped Pose Estimation and Gait Parameter Extraction Method. Electronics. 2022; 11(22):3702. https://doi.org/10.3390/electronics11223702
Chicago/Turabian StyleGong, Zewu, Yunwei Zhang, Dongfeng Lu, and Tiannan Wu. 2022. "Vision-Based Quadruped Pose Estimation and Gait Parameter Extraction Method" Electronics 11, no. 22: 3702. https://doi.org/10.3390/electronics11223702
APA StyleGong, Z., Zhang, Y., Lu, D., & Wu, T. (2022). Vision-Based Quadruped Pose Estimation and Gait Parameter Extraction Method. Electronics, 11(22), 3702. https://doi.org/10.3390/electronics11223702