A Light-Weight Grasping Pose Estimation Method for Mobile Robotic Arms Based on Depthwise Separable Convolution
Abstract
:1. Introduction
- (1)
- This paper introduces Grasp-DSC, a light-weight grasping pose estimation method tailored for mobile robotic arms. Grasp-DSC achieves a tradeoff between grasping speed and accuracy.
- (2)
- Grasp-DSC utilizes adaptive soft thresholding in the DRSN to effectively mitigate the impact of background noise, overlapping objects, and relative instability of robotic arm bases. This enhancement improves the model’s robustness against issues such as insufficient feature extraction and interference.
- (3)
- Grasp-DSC introduces DSC, which partitions spatial dimensions and channel dimensions to parallelize convolutional kernels. This approach reduces network complexity, thereby optimizing the balance between grasping accuracy and efficiency for robotic arms.
- (4)
- Experimental evaluations using the Cornell Grasp Dataset and Jacquard Grasp Dataset validate the effectiveness of Grasp-DSC. Comparative analyses with multiple algorithms demonstrate its superior performance.
- (5)
- The practical applicability of Grasp-DSC is validated through grasping experiments conducted on the MR2000-UR5 platform. These experiments underscore Grasp-DSC’s efficacy in real-world applications.
2. Related Works
2.1. Deep Learning Methods for Robotic Grasping
2.2. Deep Residual Shrinkage Network (DRSN)
2.3. Depthwise Separable Convolution (DSC)
3. Methodology
3.1. Grasp-DSC Framework
- Grasp-DSC integrates the DRSN as the core residual unit within the encoder–decoder framework of the GG-CNN 2. This integration utilizes attention mechanisms and soft thresholding to enhance grasping accuracy and resistance to disturbances.
- Grasp-DSC introduces parallel DSC to lighten the model, enabling simultaneous processing of convolutional kernels within the DRSN.
- Grasp-DSC adopts Smooth L1 Loss as its loss function, incorporating weight coefficients to adjust training emphasis on different grasp parameters. This approach mitigates the potential training instability associated with traditional mean square error loss.
3.2. Deep Residual Shrinkage Network
3.3. Depthwise Separable Convolution
3.4. Loss Function
4. Experimentation and Results from the Dataset
4.1. Dataset
4.2. Evaluation Metrics
- (1)
- IoU: This metric measures the angle intersection index between the predicted grasping rectangle G and the ground truth grasping rectangle g. The IoU is expressed by the following formula:
- (2)
- The Jaccard index: The area overlap index between the predicted grasping rectangle G and the ground truth grasping rectangle g. The Jaccard index is expressed by the following formula:
4.3. Training Process
4.4. Results Comparison
5. Experiments and Results in Real-World Scenarios
- (1)
- The RGB-D camera (Intel RealSense D435i) connects to the laptop using a Type-C data cable, while the laptop interfaces with the robot arm controller via an Ethernet cable.
- (2)
- The Intel RealSense D435i camera captures RGB and depth images of the object intended for grasping.
- (3)
- These captured images are fed into the Grasp-DSC network running on the laptop to determine a grasping rectangle.
- (4)
- Subsequently, the laptop transmits the identified grasping parameters to the UR5 robot arm controller.
- (5)
- Upon receiving the signal, the UR5 robot arm controller processes the command and directs the UR5 robot arm to execute the grasping motion.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zou, M.; Li, X.; Yuan, Q.; Xiong, T.; Zhang, Y.; Han, J.; Xiao, Z. Robotic grasp detection network based on improved deformable convolution and spatial feature center mechanism. Biomimetics 2023, 8, 403. [Google Scholar] [CrossRef]
- Fang, H.; Gou, M.; Wang, C.; Lu, C. Robust grasping across diverse sensor qualities: The GraspNet-1Billion dataset. Int. J. Robot. Res. 2023, 42, 1094–1103. [Google Scholar] [CrossRef]
- Wang, S.; Jiang, X.; Zhao, J.; Wang, X.; Zhou, W.; Liu, Y. Efficient fully convolution neural network for generating pixel wise robotic grasps with high resolution images. In Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali, China, 6–8 December 2019; pp. 474–480. [Google Scholar] [CrossRef]
- William, A.; Christopher, X.; Aaron, W.; Caelen, W.; Pedro, D.; Siddhartha, S. Amodal 3D reconstruction for robotic manipulation via stability and connectivity. In Proceedings of the 4th Conference on Robot Learning, (CoRL 2020), Cambridge, MA, USA, 16–18 November 2020; pp. 1498–1508. [Google Scholar] [CrossRef]
- Wang, C.; Zang, X.; Zhang, X.; Liu, Y.; Zhao, J. Parameter estimation and object gripping based on fingertip force/torque sensors. Measurement 2021, 179, 109479. [Google Scholar] [CrossRef]
- Hong, Q.; Yang, L.; Zeng, B. RANET: A grasp generative residual attention network for robotic grasping detection. Int. J. Control Autom. Syst. 2022, 20, 3996–4004. [Google Scholar] [CrossRef]
- Zhai, D.; Yu, S.; Xia, Y. FANet: Fast and accurate robotic grasp detection based on keypoints. IEEE Trans. Autom. Sci. Eng. 2024, 21, 2974–2986. [Google Scholar] [CrossRef]
- Sulabh, K.; Shirin, J.; Ferat, S. GR-ConvNet v2: A real-time multi-grasp detection network for robotic grasping. Sensors 2022, 22, 6208. [Google Scholar] [CrossRef] [PubMed]
- Huang, Z.; Wang, L.; An, Q.; Zhou, Q.; Hong, H. Learning a contrast enhancer for intensity correction of remotely sensed images. IEEE Signal Process. Lett. 2022, 29, 394–398. [Google Scholar] [CrossRef]
- Yan, M.; Adrian, L.; Mrinal, K.; Peter, P. Learning probabilistic multi-modal actor models for vision-based robotic grasping. In Proceedings of the 2019 International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 4804–4810. [Google Scholar] [CrossRef]
- Antanas, L.; Moreno, P.; Neumann, M.; de Figueiredo, R.P.; Kersting, K.; De Raedt, L. Semantic and geometric reasoning for robotic grasping: A probabilistic logic approach. Auton. Robot. 2019, 43, 1393–1418. [Google Scholar] [CrossRef]
- Zhang, H.; Lan, X.; Bai, S.; Zhou, X.; Tian, Z.; Zheng, N. ROI-based robotic grasp detection for object overlapping scenes. In Proceedings of the International Conference on Intelligent Robots and Systems, Macau, China, 3–8 November 2019; pp. 4768–4775. [Google Scholar] [CrossRef]
- Douglas, M.; Peter, C.; Jürgen, L. Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 2020, 39, 183–201. [Google Scholar] [CrossRef]
- Sulabh, K.; Shirin, J.; Ferat, S. Antipodal robotic grasping using generative residual convolutional neural network. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 9626–9633. [Google Scholar] [CrossRef]
- Wang, D.; Liu, C.; Chang, F.; Li, N.; Li, G. High-performance pixel-level grasp detection based on adaptive grasping and grasp-aware network. IEEE Trans. Ind. Electron. 2022, 69, 1161–11621. [Google Scholar] [CrossRef]
- Yu, S.; Zhai, D.; Xia, Y.; Wu, H.; Liao, J. SE-ResUNet: A novel robotic grasp detection method. IEEE Robot. Autom. Lett. 2022, 7, 5238–5245. [Google Scholar] [CrossRef]
- Priya, S.; Nilotpal, P.; Deepesh, M.G.C. Nandi Generative model based robotic grasp pose prediction with limited dataset. Appl. Intell. 2022, 52, 9952–9966. [Google Scholar] [CrossRef]
- Fang, H.; Wang, C.; Fang, H.; Gou, M.; Yan, H.; Liu, W.; Xie, Y.; Lu, C. AnyGrasp: Robust and efficient grasp perception in spatial and temporal domains. IEEE Trans. Robot. 2023, 39, 1552–3098. [Google Scholar] [CrossRef]
- Fang, H.; Wang, C.; Gou, M.; Lu, C. GraspNet-1Billion: A large-scale benchmark for general object grasping. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11441–11450. [Google Scholar] [CrossRef]
- Chen, H.; Guo, W.; Kang, K.; Hu, G. Automatic modulation recognition method based on phase transformation and deep residual shrinkage network. Electronics 2024, 13, 2141. [Google Scholar] [CrossRef]
- Su, Z.; Yu, J.; Xiao, X.; Wang, J.; Wang, X. Deep learning seismic damage assessment with embedded signal denoising considering three-dimensional time-frequency feature correlation. Eng. Struct. 2023, 286, 116148. [Google Scholar] [CrossRef]
- Wang, H.; Wang, J.; Xu, H.; Sun, Y.; Yu, Z. DRSNFuse: Deep residual shrinkage network for infrared and visible image fusion. Sensors 2022, 22, 5149. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Wang, Z.; Liu, D.; Sun, Q.; Wang, J. Rock thin section image classification based on depth residuals shrinkage network and attention mechanism. Earth Sci. Inform. 2023, 16, 1449–1457. [Google Scholar] [CrossRef]
- François, C. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef]
- Liu, Z.; Liu, Q.; Yan, S.; Ray, C.C.C. An efficient FPGA-based depthwise separable convolutional neural network accelerator with hardware pruning. ACM Trans. Reconfigurable Technol. Syst. 2024, 17, 1–20. [Google Scholar] [CrossRef]
- Li, S.; Liu, Z.; Gao, M.; Bai, Y.; Yin, H. MDSCN: Multiscale depthwise separable convolutional network for underwater graphics restoration. Vis. Comput. 2024. [Google Scholar] [CrossRef]
- Yi, S.; Mei, Z.; Kamen, I.; Mei, Z.; He, T.; Zeng, H. Gait-based identification using wearable multimodal sensing and attention neural networks. Sens. Actuators A Phys. 2024, 374, 115478. [Google Scholar] [CrossRef]
- Li, L.; Qin, S.; Yang, N.; Hong, L.; Dai, Y.; Wang, Z. LVNet: A lightweight volumetric convolutional neural network for real-time and high-performance recognition of 3D objects. Multimed. Tools Appl. 2024, 83, 61047–61063. [Google Scholar] [CrossRef]
- Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Michael, P. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 2020, 16, 4681–4690. [Google Scholar] [CrossRef]
- Laurent, S.; Stéphane, M. Rigid-motion scattering for image classification. arXiv 2014, arXiv:1403.1687. [Google Scholar] [CrossRef]
- Andrew, G.H.; Zhu, M.; Chen, B.; Dmitry, K.; Wang, W.; Tobias, W.; Marco, A.; Hartwig, A. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Ian, L.; Honglak, L.; Ashutosh, S. Deep learning for detecting robotic grasps. IJRR Spec. Issue Robot Vis. 2015, 34, 705–724. [Google Scholar] [CrossRef]
- Amaury, D.; Emmanuel, D.; Chen, L. Jacquard: A Large Scale Dataset for Robotic Grasp Detection. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3511–3516. [Google Scholar] [CrossRef]
- Zhao, J.; Sui, Y.; Xu, Y.; Lai, K. Industrial robot selection using a multiple criteria group decision making method with individual preferences. PLoS ONE 2021, 16, e0259354. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Huang, P.; Meng, Z. Convolutional multi-grasp detection using grasp path for RGBD images. Robot. Auton. Syst. 2019, 113, 94–103. [Google Scholar] [CrossRef]
- Umar, A.; Tang, J.; Harrer, S. GraspNet: An efficient convolutional neural network for real-time grasp detection for low-powered devices. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 4875–4882. [Google Scholar] [CrossRef]
- Hakan, K.; Patric, J. Object detection approach for robot grasp detection. In Proceedings of the 2019 International Conference on Robotics and Automation. (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 4953–4959. [Google Scholar] [CrossRef]
- Shao, Z.; Qu, Y.; Ren, G.; Wang, G.; Guan, Y.; Shi, Z.; Tan, J. Batch normalization masked sparse autoencoder for robotic grasping detection. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 27 October 2020; pp. 9614–9619. [Google Scholar] [CrossRef]
- Yu, Q.; Shang, W.; Zhao, Z.; Cong, S.; Li, Z. Robotic grasping of unknown objects using novel multilevel convolutional neural networks: From parallel gripper to dexterous hand. IEEE Trans. Autom. Sci. Eng. 2021, 18, 1730–1741. [Google Scholar] [CrossRef]
- Xu, R.; Chu, F.; Patricio, A.V. GKNet: Grasp keypoint network for grasp candidates detection. arXiv 2021. arXiv.2106.08497. [Google Scholar] [CrossRef]
- Hu, W.; Wang, C.; Liu, F.; Peng, X.; Sun, P.; Tan, J. A grasps-generation-and-selection convolutional neural network for a digital twin of intelligent robotic grasping. Robot. Comput. -Integr. Manuf. 2022, 77, 102371. [Google Scholar] [CrossRef]
- Song, Y.; Wen, J.; Liu, D.; Yu, C. Deep robotic grasping prediction with hierarchical RGB-D fusion. Int. J. Control Autom. Syst. 2022, 20, 243–254. [Google Scholar] [CrossRef]
- Yu, Y.; Cao, Z.; Liu, Z.; Geng, W.; Yu, J.; Zhang, W. A two-stream CNN with simultaneous detection and segmentation for robotic grasping. IEEE Trans. Sys. Man Cybern. Syst. 2022, 52, 1167–1181. [Google Scholar] [CrossRef]
- Li, X.; Zhang, X.; Zhou, X.; Chen, I. UPG: 3D vision-based prediction framework for robotic grasping in multi-object scenes. Knowl. -Based Syst. 2023, 270, 110491. [Google Scholar] [CrossRef]
- Duan, J.; Zhuang, L.; Zhang, Q.; Qin, J.; Zhou, Y. Vision-based robotic grasping using faster R-CNN-GRCNN dual-layer detection mechanism. Proc. IMechE. Part B J. Eng. Manuf. 2024. [Google Scholar] [CrossRef]
- Amaury, D.; Emmanuel, D.; Chen, L. Scoring graspability based on grasp regression for better grasp prediction. In Proceedings of the 2021 International Conference on Robotics and Automation (ICRA), Xi’an, China, 5 June 2021; pp. 4370–4376. [Google Scholar] [CrossRef]
Algorithm | Accuracy (%) | Speed (ms) |
---|---|---|
GG-CNN 2 [13] | 65 | 20 |
GG-CNN 2 + DRSN | 98.9 | 35.1 |
GG-CNN 2 + DSC | 89.2 | 9.7 |
GG-CNN 2 + DRSN + DSC | 96.6 | 14.4 |
Authors | Algorithm | Accuracy (%) | Speed (ms) | Final Score |
---|---|---|---|---|
Chen et al. [35] | Multi-grasp | 86.4 | 25.7 | 53.4 |
Asif et al. [36] | GraspNet | 90.5 | 24.4 | 55.9 |
Zhang et al. [12] | ROI-GD | 92.3 | 40.1 | 56.4 |
Karaoguz et al. [37] | GRPN | 88.7 | 189.6 | 53.4 |
Shao et al. [38] | SAE + BN + SAE | 95.5 | - | - |
Yu et al. [39] | Multilevel CNNs | 95. | 36.5 | 58.6 |
Xu et al. [40] | GK-Net | 96.9 | 41.7 | 59.1 |
Wang et al. [3] | GPWRG | 94.4 | 12.9 | 59.7 |
Hu et al. [41] | GGS-CNN | 95.5 | 43.5 | 58.2 |
Song et al. [42] | Confinet + BEM | 92.3 | 18.7 | 57.5 |
Yu et al. [43] | TsGNet | 93.1 | - | - |
Zhai et al. [7] | FANet-CPU | 95.3 | 25.6 | 58.7 |
Li et al. [44] | UPG | 93.7 | 21.1 | 58.1 |
Duan et al. [45] | Faster R-CNN + CBAM | 94.6 | 42.9 | 57.7 |
Ours | Grasp-DSC | 96.6 | 14.4 | 60.7 |
Authors | Algorithm | Accuracy (%) |
---|---|---|
Morrison et al. [13] | GG-CNN 2 | 84 |
Depierre et al. [46] | Grasp Regression | 85.7 |
Kumra et al. [14] | GR-ConvNet | 89.5 |
Kumra et al. [8] | GR-ConvNet v2 | 91.4 |
Ours | Grasp-DSC | 92.2 |
Algorithm | Physical Grasp | Accuracy (%) | Time (s) | Final Score |
---|---|---|---|---|
UPG [44] | 91/100 | 91 | 4.83 | 8.8 |
Faster R-CNN + CBAM [45] | 94/100 | 94 | 4.96 | 8.6 |
Grasp-DSC | 98/100 | 98 | 4.08 | 10.4 |
Algorithm | Physical Grasp | Accuracy (%) | Time (s) | Final Score |
---|---|---|---|---|
UPG [44] | 96/100 | 96 | 4.83 | 8.9 |
Faster R-CNN + CBAM [45] | 90/100 | 90 | 4.98 | 8.6 |
Grasp-DSC | 94/100 | 94 | 4.09 | 10.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Duan, J.; Ye, C.; Wang, Q.; Zhang, Q. A Light-Weight Grasping Pose Estimation Method for Mobile Robotic Arms Based on Depthwise Separable Convolution. Actuators 2025, 14, 50. https://doi.org/10.3390/act14020050
Duan J, Ye C, Wang Q, Zhang Q. A Light-Weight Grasping Pose Estimation Method for Mobile Robotic Arms Based on Depthwise Separable Convolution. Actuators. 2025; 14(2):50. https://doi.org/10.3390/act14020050
Chicago/Turabian StyleDuan, Jianguo, Chuyan Ye, Qin Wang, and Qinglei Zhang. 2025. "A Light-Weight Grasping Pose Estimation Method for Mobile Robotic Arms Based on Depthwise Separable Convolution" Actuators 14, no. 2: 50. https://doi.org/10.3390/act14020050
APA StyleDuan, J., Ye, C., Wang, Q., & Zhang, Q. (2025). A Light-Weight Grasping Pose Estimation Method for Mobile Robotic Arms Based on Depthwise Separable Convolution. Actuators, 14(2), 50. https://doi.org/10.3390/act14020050