YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection
Abstract
:1. Introduction
- This paper introduces the YOLOv8-PoseBoost model, which enhances the network’s ability to focus on small targets and increase sensitivity to small-sized pedestrians by incorporating the CBAM attention mechanism module, employing multiple scale detection heads, and optimizing the bounding box regression loss function (SIoU).
- To further improve the network’s feature fusion capabilities and reduce the rate of missed detections for small-sized pedestrians, this study establishes two cross-level connectivity channels between the backbone network and the neck. Such structural innovations contribute to enhanced model performance in complex scenes.
- The introduction of the SIoU-redefined bounding box regression loss function not only accelerates training convergence but also enhances the accuracy of motion key points detection. These advancements provide a more efficient and precise solution for practical applications, particularly in the domains of small target detection and
2. Related Work
2.1. Based on the Top-Down Pose Estimation Method
2.2. Based on the Bottom-Up Pose Estimation Method
2.3. Research on Pose Estimation Based on YOLO
3. Method
3.1. YOLO-Pose
3.2. YOLOv8-PoseBoost
3.3. Introducing the CBAM Lightweight Attention Module
3.4. Cross-Layer Cascaded Feature Fusion
3.5. Introducing SIoU to Improve the Loss Function
4. Experimental Section
4.1. Dataset
4.2. Experimental Environment
4.3. Baseline
4.4. Implementation Details
4.4.1. Data Processing
4.4.2. Network Parameter Setting
4.4.3. Evaluation Metrics
4.5. Ablation Experiment
4.6. Presentation of Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cheng, B.; Xiao, B.; Wang, J.; Shi, H.; Huang, T.S.; Zhang, L. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5386–5395. [Google Scholar]
- Moon, G.; Yu, S.I.; Wen, H.; Shiratori, T.; Lee, K.M. Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 548–564. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Sattler, T.; Zhou, Q.; Pollefeys, M.; Leal-Taixe, L. Understanding the limitations of cnn-based absolute camera pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3302–3312. [Google Scholar]
- Iskakov, K.; Burkov, E.; Lempitsky, V.; Malkov, Y. Learnable triangulation of human pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 7718–7727. [Google Scholar]
- Wang, H.; Sridhar, S.; Huang, J.; Valentin, J.; Song, S.; Guibas, L.J. Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2642–2651. [Google Scholar]
- Boukhayma, A.; Bem, R.d.; Torr, P.H. 3d hand shape and pose from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 15 2019; pp. 10843–10852. [Google Scholar]
- Pillai, S.; Ambruş, R.; Gaidon, A. Superdepth: Self-supervised, super-resolved monocular depth estimation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 9250–9256. [Google Scholar]
- Lin, K.; Wang, L.; Liu, Z. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1954–1963. [Google Scholar]
- Ke, Y.; Liang, J.; Wang, L. Characterizations of Weighted Right Core Inverse and Weighted Right Pseudo Core Inverse. J. Jilin Univ. Sci. Ed. 2023, 61, 733–738. [Google Scholar]
- Rasouli, A.; Kotseruba, I.; Kunic, T.; Tsotsos, J.K. Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6262–6271. [Google Scholar]
- Ji, B.; Zhang, Y. Few-Shot Relation Extraction Model Based on Attention Mechanism Induction Network. J. Jilin Univ. Sci. Ed. 2023, 61, 845–852. [Google Scholar]
- Li, J.; Su, W.; Wang, Z. Simple pose: Rethinking and improving a bottom-up approach for multi-person pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11354–11361. [Google Scholar]
- Khirodkar, R.; Chari, V.; Agrawal, A.; Tyagi, A. Multi-instance pose networks: Rethinking top-down pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3122–3131. [Google Scholar]
- Yao, B.; Wang, W. Graph Embedding Clustering Based on Heterogeneous Fusion and Discriminant Loss. J. Jilin Univ. Sci. Ed. 2023, 61, 853–862. [Google Scholar]
- Yang, G.; Wang, J.; Nie, Z.; Yang, H.; Yu, S. A lightweight YOLOv8 tomato detection algorithm combining feature enhancement and attention. Agronomy 2023, 13, 1824. [Google Scholar] [CrossRef]
- Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
- Liu, Q.; Liu, Y.; Lin, D. Revolutionizing Target Detection in Intelligent Traffic Systems: YOLOv8-SnakeVision. Electronics 2023, 12, 4970. [Google Scholar] [CrossRef]
- Hou, T.; Ahmadyan, A.; Zhang, L.; Wei, J.; Grundmann, M. Mobilepose: Real-time pose estimation for unseen objects with weak shape supervision. arXiv 2020, arXiv:2003.03522. [Google Scholar]
- Zhao, X.; Zhang, J.; Tian, J.; Zhuo, L.; Zhang, J. Residual dense network based on channel-spatial attention for the scene classification of a high-resolution remote sensing image. Remote Sens. 2020, 12, 1887. [Google Scholar] [CrossRef]
- Wang, G.; Gu, C.; Li, J.; Wang, J.; Chen, X.; Zhang, H. Heterogeneous Flight Management System (FMS) Design for Unmanned Aerial Vehicles (UAVs): Current Stages, Challenges, and Opportunities. Drones 2023, 7, 380. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, Z.; Tao, D. Towards high performance human keypoint detection. Int. J. Comput. Vis. 2021, 129, 2639–2662. [Google Scholar] [CrossRef]
- Zhang, F.; Zhu, X.; Ye, M. Fast human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 15 2019; pp. 3517–3526. [Google Scholar]
- Chen, W.; Jiang, Z.; Guo, H.; Ni, X. Fall detection based on key points of human-skeleton using openpose. Symmetry 2020, 12, 744. [Google Scholar] [CrossRef]
- Fang, H.S.; Li, J.; Tang, H.; Xu, C.; Zhu, H.; Xiu, Y.; Li, Y.L.; Lu, C. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 7157–7173. [Google Scholar] [CrossRef] [PubMed]
- Maji, D.; Nagori, S.; Mathew, M.; Poddar, D. Yolo-pose: Enhancing yolo for multi person pose estimation using object keypoint similarity loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2637–2646. [Google Scholar]
- Guo, Y.; Li, Z.; Li, Z.; Du, X.; Quan, S.; Xu, Y. PoP-Net: Pose over Parts Network for Multi-Person 3D Pose Estimation from a Depth Image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 1240–1249. [Google Scholar]
- Yuan, S.; Zhu, Z.; Lu, J.; Zheng, F.; Jiang, H.; Sun, Q. Applying a Deep-Learning-Based Keypoint Detection in Analyzing Surface Nanostructures. Molecules 2023, 28, 5387. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Sun, K.; Fan, H.; He, Z. Real-Time Cattle Pose Estimation Based on Improved RTMPose. Agriculture 2023, 13, 1938. [Google Scholar] [CrossRef]
- Yang, Z.; Zeng, A.; Yuan, C.; Li, Y. Effective whole-body pose estimation with two-stages distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4210–4220. [Google Scholar]
- Shi, L.; Xue, H.; Meng, C.; Gao, Y.; Wei, L. DSC-OpenPose: A Fall Detection Algorithm Based on Posture Estimation Model. In Proceedings of the International Conference on Intelligent Computing, Zhengzhou, China, 10–13 August 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 263–276. [Google Scholar]
COCO Datasets | MPII Datasets | |||||||
---|---|---|---|---|---|---|---|---|
Methods | AP50 | AP75 | APM | APL | AP50 | AP75 | APM | APL |
OpenPose [24] | 81.60 | 52.20 | 55.70 | 70.20 | 79.42 | 50.02 | 53.52 | 68.02 |
AlphaPose [25] | 82.70 | 52.90 | 56.80 | 71.30 | 80.22 | 49.82 | 54.72 | 68.92 |
HigherHRNet-W32 [31] | 83.90 | 53.70 | 57.90 | 72.40 | 80.82 | 50.72 | 55.32 | 69.32 |
YOLO-Pose-640 [26] | 82.40 | 53.30 | 56.60 | 71.10 | 80.32 | 50.22 | 54.42 | 68.22 |
YOLO-Pose-960 [27] | 83.00 | 53.80 | 57.20 | 71.70 | 80.62 | 50.52 | 54.82 | 68.72 |
YOLOv7-w6-pose [28] | 84.20 | 54.90 | 58.60 | 73.80 | 81.52 | 51.62 | 56.92 | 69.82 |
RTMpose [29] | 85.39 | 59.15 | 60.05 | 74.15 | 83.15 | 56.12 | 58.76 | 71.00 |
DWpose [30] | 85.32 | 58.37 | 60.09 | 74.21 | 83.20 | 56.90 | 58.62 | 70.25 |
Ours | 85.40 | 59.10 | 60.10 | 74.20 | 83.22 | 56.92 | 58.92 | 71.02 |
COCO Datasets | MPII Datasets | |||
---|---|---|---|---|
Model | PARAMS | FLOPs | PARAMS | FLOPs |
OpenPose | 7.11 M | 10.08 B | 6.61 M | 9.28 B |
AlphaPose | 6.92 M | 9.98 B | 6.32 M | 8.88 B |
HigherHRNet-W32 | 6.85 M | 9.78 B | 6.23 M | 8.68 B |
YOLO-Pose-640 | 7.45 M | 10.38 B | 6.85 M | 9.58 B |
YOLO-Pose-960 | 7.65 M | 10.58 B | 7.05 M | 9.78 B |
YOLOv7-w6-pose | 7.35 M | 10.28 B | 6.75 M | 9.48 B |
Ours | 4.93 M | 9.08 B | 4.73 M | 8.88 B |
COCO Datasets | MPII Datasets | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Method | CBAM | CCFU | AP50 | AP50−95 | APM | APL | AP50 | AP50−95 | APM | APL |
(1) | 83.70 | 57.80 | 58.00 | 72.60 | 81.52 | 55.62 | 56.82 | 70.42 | ||
(2) | ✓ | 84.70 | 58.40 | 59.90 | 73.90 | 82.52 | 56.22 | 57.72 | 71.72 | |
(3) | ✓ | 85.20 | 59.00 | 60.30 | 72.70 | 83.02 | 56.82 | 58.12 | 70.52 | |
(4) | ✓ | ✓ | 85.40 | 59.10 | 60.10 | 74.20 | 83.22 | 56.92 | 58.92 | 71.02 |
COCO Datasets | MPII Datasets | |||||||
---|---|---|---|---|---|---|---|---|
Method | AP50 | AP50−95 | APM | APL | AP50 | AP50−95 | APM | APL |
CIoU | 81.98 | 56.08 | 56.28 | 70.88 | 79.80 | 53.90 | 54.10 | 68.70 |
GIoU | 82.98 | 56.68 | 58.18 | 72.18 | 80.80 | 54.50 | 55.00 | 69.50 |
DIoU | 83.48 | 57.28 | 58.58 | 71.98 | 81.30 | 55.10 | 55.40 | 69.30 |
SIoU | 85.40 | 59.10 | 60.10 | 74.20 | 83.22 | 56.92 | 58.92 | 71.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, F.; Wang, G.; Lu, B. YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection. Electronics 2024, 13, 1046. https://doi.org/10.3390/electronics13061046
Wang F, Wang G, Lu B. YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection. Electronics. 2024; 13(6):1046. https://doi.org/10.3390/electronics13061046
Chicago/Turabian StyleWang, Feng, Gang Wang, and Baoli Lu. 2024. "YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection" Electronics 13, no. 6: 1046. https://doi.org/10.3390/electronics13061046
APA StyleWang, F., Wang, G., & Lu, B. (2024). YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection. Electronics, 13(6), 1046. https://doi.org/10.3390/electronics13061046