Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning
Abstract
:1. Introduction
- To minimize the overall operational time of the robot, we modeled the problem of task planning as a Markov game and designed the states, actions, state transition, and reward function of the model.
- To optimize the technical challenges of slow model convergence and insufficient stability, we introduced a self-attention mechanism and a centralized training and execution strategy into the design and training of the deep reinforcement learning (DRL) model. This approach aims to accelerate the model’s convergence process and enhance its overall stability.
- To address the dynamic task planning challenges of multi-arm robots, we propose a task planning framework based on deep reinforcement learning (DRL). Numerical simulation results demonstrate that the framework is both effective and superior.
2. Preliminary Model and Problem Statement
2.1. Robot Model Description
- Approaching: The process of moving the horizontal and vertical joint positions from the initial location to the desired one.
- Extension: The process of moving the extending joint positions from the initial location to the desired one.
- Grasp: The process of grasping a fruit using a flexible three-finger gripper and rotating the arm.
- Retraction: The process of moving the extending joint positions from the desired location to the initial one.
- Placement: The process of releasing a fruit and rotating the arm.
2.2. Picking Time Analysis
- ■
- Additional attempts may be required due to the failure to pick the fruit.
- ■
- Mechanical limitations require safety distances in both horizontal and vertical directions. This means that any two joints in the same direction must be at least a certain distance apart.
- ■
- The motors driving the joints in the horizontal, vertical, and extension directions rotate at different speeds, resulting in different joint velocities in the three directions.
- ■
- The total harvesting time is determined by the longer operating time of the lower or upper group.
3. Multi-Arm Task Scheduling Framework
3.1. Problem Statement
3.2. Markov Game Model
3.2.1. State
3.2.2. Action
3.2.3. State Transition
3.2.4. Reward Function
4. Experiential Results
4.1. Model Optimization and Training
4.1.1. Self-Attention
4.1.2. Training Progress
Algorithm 1 Multi-arm Task Planning and Cooperative Control |
Require: the location of each target 1: Initialize the layout |
2: Initialize the state tuple based on the layout |
3: Initialize the policy |
4: Initialize the number of targets n |
5: while Ω ≤ n do |
6: if the status of Group-U is waiting, then |
7: Have decoded by the action received from |
8: if is unavailable, then |
9: Pass |
10: else |
11: Execute picking using Group-U |
12: Set the status of Group-U as working |
13: end if |
14: else |
15: if the status of Group-D is waiting, then |
16: Have decoded by the action received from |
17: if is unavailable, then |
18: Pass |
19: else |
20: Execute picking using Group-D |
21: Set the status of Group-D as working |
22: end if |
23: end if |
24: end if |
25: if Group-U has finished picking, then |
26: Calculate |
27: Set the status of Group-U as waiting |
28: end if |
29: if Group-U has finished picking, then |
30: Calculate |
31: Set the status of Group-D as waiting |
32: end if |
33: Update rk by and |
34: Update |
35: Update by and |
36: Update k and Ω |
37: end while |
4.2. Simulation of the PPO Framework
4.2.1. Static Environment
4.2.2. Dynamic Environment
4.2.3. Real-Word Testing
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zou, B.; Liu, F.; Zhang, Z.; Hong, T.; Wu, W.; Lai, S. Mechanization of mountain orchards: Development bottleneck and foreign experiences. J. Agric. Mech. Res. 2019, 41, 254–260. [Google Scholar]
- Bac, C.W.; Van Henten, E.J.; Hemming, J.; Edan, Y. Harvesting robots for high-value crops: State-of-the-art review and challenges ahead. J. Field Robot. 2014, 31, 888–911. [Google Scholar] [CrossRef]
- Kang, H.; Chen, C. Fruit detection, segmentation and 3D visualisation of environments in apple orchards. Comput. Electron. Agric. 2020, 171, 105302. [Google Scholar] [CrossRef]
- Jin, Y.; Liu, J.; Xu, Z.; Yuan, S.; Li, P.; Wang, J. Development status and trend of agricultural robot technology. Int. J. Agric. Biol. Eng. 2021, 14, 1–19. [Google Scholar] [CrossRef]
- Kim, W.S.; Lee, D.H.; Kim, Y.J.; Kim, T.; Lee, H.J. Path detection for autonomous traveling in orchards using patch-based CNN. Comput. Electron. Agric. 2020, 175, 105620. [Google Scholar] [CrossRef]
- Li, T.; Yu, J.; Qiu, Q.; Zhao, C. Hybrid Uncalibrated Visual Servoing Control of Harvesting Robots with RGB-D Cameras. IEEE Trans. Ind. Electron. 2022, 70, 2729–2738. [Google Scholar] [CrossRef]
- Li, T.; Xie, F.; Zhao, Z.; Zhao, H.; Guo, X.; Feng, Q. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control. Comput. Electron. Agric. 2023, 211, 107979. [Google Scholar] [CrossRef]
- Chakraborty, S.K.; Subeesh, A.; Potdar, R.; Chandel, N.S.; Jat, D.; Dubey, K.; Shelake, P. AI-enabled farm-friendly automatic machine for washing, image-based sorting, and weight grading of citrus fruits: Design optimization, performance evaluation, and ergonomic assessment. J. Field Robot. 2023, 40, 1581–1602. [Google Scholar] [CrossRef]
- Li, Y.; Feng, Q.; Liu, C.; Xiong, Z.; Sun, Y.; Xie, F.; Li, T.; Zhao, C. MTA-YOLACT: Multitask-aware network on fruit bunch identification for cherry tomato robotic harvesting. Eur. J. Agron. 2023, 146, 126812. [Google Scholar] [CrossRef]
- Williams, H.; Ting, C.; Nejati, M.; Jones, M.H.; Penhall, N.; Lim, J.; Seabright, M.; Bell, J.; Ahn, H.S.; Scarfe, A.; et al. Improvements to and large-scale evaluation of a robotic kiwifruit harvester. J. Field Robot. 2020, 37, 187–201. [Google Scholar] [CrossRef]
- Defterli, S.G. Review of robotic technology for strawberry production. Appl. Eng. Agric. 2016, 32, 301–318. [Google Scholar]
- Tibbetts, J.H. Agricultural Disruption: New technology, consolidation, may yield production gains, job upheaval. BioScience 2019, 69, 237–243. [Google Scholar] [CrossRef]
- SepúLveda, D.; Fernández, R.; Navas, E.; Armada, M.; González-De-Santos, P. Robotic aubergine harvesting using dual-arm manipulation. IEEE Access 2020, 8, 121889–121904. [Google Scholar] [CrossRef]
- Bogue, R. Fruit picking robots: Has their time come? Ind. Robot. Int. J. Robot. Res. Appl. 2020, 47, 141–145. [Google Scholar] [CrossRef]
- Delbridge, T. Robotic strawberry harvest is promising but will need improved technology and higher wages to be economically viable. Calif. Agric. 2021, 75, 57–63. [Google Scholar] [CrossRef]
- Xiong, Z.; Feng, Q.; Li, T.; Xie, F.; Liu, C.; Liu, L.; Guo, X.; Zhao, C. Dual-Manipulator Optimal Design for Apple Robotic Harvesting. Agronomy 2022, 12, 3128. [Google Scholar] [CrossRef]
- Barnett, J.; Duke, M.; Au, C.K.; Lim, S.H. Work distribution of multiple Cartesian robot arms for kiwifruit harvesting. Comput. Electron. Agric. 2020, 169, 105202. [Google Scholar] [CrossRef]
- Au, C.; Barnett, J.; Lim, S.H.; Duke, M. Workspace analysis of Cartesian robot system for kiwifruit harvesting. Ind. Robot. Int. J. Robot. Res. Appl. 2020, 47, 503–510. [Google Scholar] [CrossRef]
- Mann, M.P.; Zion, B.; Shmulevich, I.; Rubinstein, D.; Linker, R. Combinatorial optimization and performance analysis of a multi-arm cartesian robotic fruit harvester—Extensions of graph coloring. J. Intell. Robot. Syst. 2016, 82, 399–411. [Google Scholar] [CrossRef]
- Tang, S.Y.; Zhu, Y.F.; Li, Q.; Lei, Y.L. Survey of task allocation in multi agent systems. Syst. Eng. Electron. 2010, 32, 2155–2161. [Google Scholar]
- Tiacci, L.; Rossi, A. A discrete event simulator to implement deep reinforcement learning for the dynamic flexible job shop scheduling problem. Simul. Model. Pract. Theory 2024, 134, 102948. [Google Scholar] [CrossRef]
- Bouktif, S.; Cheniki, A.; Ouni, A. Traffic Signal Control Using Hybrid Action Space Deep Reinforcement Learning. Sensors 2021, 21, 2302. [Google Scholar] [CrossRef] [PubMed]
- Lu, R.; Bai, R.; Luo, Z.; Jiang, J.; Sun, M.; Zhang, H.T. Deep Reinforcement Learning-Based Demand Response for Smart Facilities Energy Management. IEEE Trans. Ind. Electron. 2022, 69, 8554–8565. [Google Scholar] [CrossRef]
- Horst, R.; Pardalos, P.M.; Van Thoai, N. Introduction to Global Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
- Floudas, C.A.; Lin, X. Mixed integer linear programming in process scheduling: Modeling, algorithms, and applications. Ann. Oper. Res. 2005, 139, 131–162. [Google Scholar] [CrossRef]
- Li, W.; Ding, Y.; Yang, Y.; Sherratt, R.S.; Park, J.H.; Wang, J. Parameterized algorithms of fundamental NP-hard problems: A survey. Hum.-Centric Comput. Inf. Sci. 2020, 10, 1–24. [Google Scholar] [CrossRef]
- Sasaki, T.; Biro, D. Cumulative culture can emerge from collective intelligence in animal groups. Nat. Commun. 2017, 8, 15049. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Barenji, A.V.; Jiang, J.; Zhong, R.Y.; Xu, G. A mechanism for scheduling multi robot intelligent warehouse system face with dynamic demand. J. Intell. Manuf. 2020, 31, 469–480. [Google Scholar] [CrossRef]
- Zhang, K.; He, F.; Zhang, Z.; Lin, X.; Li, M. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach. Transp. Res. Part C Emerg. Technol. 2020, 121, 102861. [Google Scholar] [CrossRef]
- Panerati, J.; Gianoli, L.; Pinciroli, C.; Shabah, A.; Nicolescu, G.; Beltrame, G. From swarms to stars: Task coverage in robot swarms with connectivity constraints. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 7674–7681. [Google Scholar]
- Chen, J.; Zhang, Y.; Wu, L.; You, T.; Ning, X. An adaptive clustering-based algorithm for automatic path planning of heterogeneous UAVs. IEEE Trans. Intell. Transp. Syst. 2021, 23, 16842–16853. [Google Scholar] [CrossRef]
- Li, T.; Qiu, Q.; Zhao, C. Task planning of multi-arm harvesting robots for high-density dwarf orchards. Nongye Gongcheng Xuebao/Trans. Chin. Soc. Agric. Eng. 2021, 37, 1–10. [Google Scholar]
- Tian, Y.T.; Yang, M.; Qi, X.Y.; Yang, Y.M. Multi-robot task allocation for fire-disaster response based on reinforcement learning. In Proceedings of the 2009 International Conference on Machine Learning and Cybernetics, Baoding, China, 12–15 July 2009; IEEE: Piscataway, NJ, USA, 2009; Volume 4, pp. 2312–2317. [Google Scholar]
- Wilson, A.; Fern, A.; Ray, S.; Tadepalli, P. Multi-task reinforcement learning: A hierarchical bayesian approach. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 1015–1022. [Google Scholar]
- Sun, L.; Yu, X.; Guo, J.; Yan, Y.; Yu, X. Deep reinforcement learning for task assignment in spatial crowdsourcing and sensing. IEEE Sens. J. 2021, 21, 25323–25330. [Google Scholar] [CrossRef]
- Choudhury, S.; Gupta, J.K.; Kochenderfer, M.J.; Sadigh, D.; Bohg, J. Dynamic multi-robot task allocation under uncertainty and temporal constraints. Auton. Robot. 2021, 46, 231–247. [Google Scholar] [CrossRef]
- Antonin, R.; Ashley, H.; Adam, G.; Anssi, K.; Maximilian, E.; Noah, D. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 268:1–268:8. [Google Scholar]
Parameter | Value | Parameter | Value | Parameter | Value |
---|---|---|---|---|---|
0.25 m/s | 0.1 m/s | 0.30 m/s | |||
3 s | 2 s | - | - | ||
γ | 0.95 | lr | 0.0005 | - | - |
Quantity of Fruits | Method | Max.(s) | Min.(s) | Mean(s) | Var. | Calc.(s) |
---|---|---|---|---|---|---|
25 | random | 320.6 | 220.0 | 265.2 | 581.6 | — |
[32] | 173.60 | 159.3 | 165.3 | 36.7 | 6.3 | |
ours | 164.5 | 150.8 | 156.7 | 33.5 | 1.0 | |
50 | random | 591.1 | 449.1 | 535.1 | 1536.5 | — |
[32] | 303.6 | 284.9 | 296.0 | 64.5 | 13.1 | |
ours | 291.2 | 279.8 | 286.2 | 22.7 | 1.16 |
Quantity of Fruits | Round | Total Time (s) | Average Time (s) | Remaining Fruits | |||
---|---|---|---|---|---|---|---|
[32] | Ours | [32] | Ours | [32] | Ours | ||
25 | I | 153.4 | 189.3 | 6.1 | 6.5 | 4 | 0 |
II | 181.2 | 185.2 | 7.2 | 6.4 | 4 | 0 | |
III | 166.7 | 193.9 | 6.7 | 6.7 | 4 | 0 | |
50 | I | 283.3 | 335.8 | 5.7 | 5.7 | 9 | 1 |
II | 294.4 | 299.5 | 5.9 | 5.1 | 9 | 2 | |
III | 304.9 | 318.8 | 6.1 | 5.4 | 9 | 2 |
Number of Fruits | Arm of Picking | Picking Time (s) | ||
---|---|---|---|---|
[32] | Ours | [32] | Ours | |
1 | 1 | 2 | 8.2 | 8.4 |
2 | 3 | 3 | 8.6 | 12.7 |
3 | 2 | 1 | 7.4 | 7.2 |
4 | 4 | 4 | 8.1 | 8.8 |
5 | 1 | 2 | 7.6 | 8.0 |
6 | 2 | 2 | 7.9 | 8.3 |
7 | 2 | 1 | 8.7 | 8.1 |
8 | 3 | 3 | 14.6 | 16.2 |
9 | 1 | 2 | 14.7 | 8.5 |
10 | 4 | 1 | 12.8 | 8.4 |
11 | 2 | 4 | 15.3 | 17.1 |
12 | 1 | 2 | 16.4 | 16.3 |
13 | 3 | 3 | 15.2 | 10.8 |
14 | 1 | 4 | 13.8 | 8.9 |
15 | 2 | 1 | 17.1 | 8.7 |
16 | 4 | 3 | 10.8 | 16.9 |
17 | 2 | 2 | 8.6 | 8.7 |
18 | 1 | 1 | 8.2 | 7.6 |
19 | 3 | 2 | 15.3 | 7.8 |
20 | 2 | 1 | 14.2 | 6.9 |
21 | 1 | 2 | 7.6 | 7.2 |
22 | 3 | 1 | 18.4 | 7.5 |
23 | 4 | 4 | 8.3 | 20.7 |
24 | 3 | 1 | 8.1 | 9.3 |
25 | 2 | 2 | 9.2 | 8.1 |
1 (repeat) | - | 1 | - | 7.8 |
3 (repeat) | - | 2 | - | 7.4 |
13 (repeat) | - | 4 | - | 8.5 |
24 (repeat) | - | 2 | - | 7.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, F.; Guo, Z.; Li, T.; Feng, Q.; Zhao, C. Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning. Horticulturae 2025, 11, 88. https://doi.org/10.3390/horticulturae11010088
Xie F, Guo Z, Li T, Feng Q, Zhao C. Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning. Horticulturae. 2025; 11(1):88. https://doi.org/10.3390/horticulturae11010088
Chicago/Turabian StyleXie, Feng, Zhengwei Guo, Tao Li, Qingchun Feng, and Chunjiang Zhao. 2025. "Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning" Horticulturae 11, no. 1: 88. https://doi.org/10.3390/horticulturae11010088
APA StyleXie, F., Guo, Z., Li, T., Feng, Q., & Zhao, C. (2025). Dynamic Task Planning for Multi-Arm Harvesting Robots Under Multiple Constraints Using Deep Reinforcement Learning. Horticulturae, 11(1), 88. https://doi.org/10.3390/horticulturae11010088