Resource Allocation in UAV-D2D Networks: A Scalable Heterogeneous Multi-Agent Deep Reinforcement Learning Approach
Abstract
:1. Introduction
2. System Model
2.1. UAV Mobility Model
2.2. Communication Model
2.3. Transmission Delay Model
2.4. UAV Energy Consumption Model
2.5. Cache Hit Probability Model
3. Problem Formulation and Optimization
3.1. Problem Formulation
3.2. Partially Observable Markov Game
3.3. Markov Decision Process
3.4. Scalable Heterogeneous Multi-Agent Mean-Field Actor–Critic
Algorithm 1 SH-MAMFAC Algorithm |
|
3.5. Computational Complexity Analysis
4. Numerical Result
4.1. Simulation Settings
4.2. Performance Evaluation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Summary of Acronyms
Acronym | Description |
---|---|
AR | Augmented reality |
BaB | Branch and bound |
BS | Base station |
CDF | Cumulative distribution function |
CHP | Cache hit probability |
CU/RU | Caching user/requesting user |
CTDE | Centralized training with decentralized execution |
D2D | Device to device |
DRL | Deep reinforcement learning |
LoS/NLoS | Line of sight/non-line of sight |
MADDPG | Multi-agent deep deterministic policy gradient |
MADRL | Multi-agent deep reinforcement learning |
MIMO | Multiple input multiple output |
MBS | Macro base station |
NP-SH-MAMFAC | No-power scalable heterogeneous multi-agent mean-field actor–critic |
NPNT-SH-MAMFAC | No-power and no-trajectory scalable heterogeneous multi-agent mean-field actor–critic |
NT-SH-MAMFAC | No-trajectory scalable heterogeneous multi-agent mean-field actor–critic |
PPO | Proximal policy optimization |
SH-MAMFAC | Scalable heterogeneous multi-agent mean-field actor–critic |
SI-DRL | Scalable independent deep reinforcement learning |
STP | Successful transmission probability |
TDMA/NOMA | Time-division multiple access/non-orthogonal multiple access |
UH-MADDPG | Unscalable heterogeneous multi-agent deep deterministic policy gradient |
UC-DRL | Unscalable centralized deep reinforcement learning |
Appendix B. Proof for Lemma 1
Appendix C. Derivation of D2D Successful Transmission Probability
Appendix D. Proof for Theorem 1
Appendix E. Proof for Theorem 2
References
- Zeng, Y.; Zhang, R.; Lim, T.J. Wireless communications with unmanned aerial vehicles: Opportunities and challenges. IEEE Commun. Mag. 2016, 54, 36–42. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, Y.; Liu, Y.; Xu, W.; Nallanathan, A. Cache-enabling UAV communications: Network deployment and resource allocation. IEEE Trans. Wirel. Commun. 2020, 19, 7470–7483. [Google Scholar] [CrossRef]
- Lyu, J.; Zeng, Y.; Zhang, R. UAV-aided offloading for cellular hotspot. IEEE Trans. Wirel. Commun. 2018, 17, 3988–4001. [Google Scholar] [CrossRef]
- Li, Y.; Xia, M. Ground-to-air communications beyond 5G: A coordinated multipoint transmission based on Poisson-Delaunay triangulation. IEEE Trans. Wirel. Commun. 2022, 22, 1841–1854. [Google Scholar] [CrossRef]
- Zhao, N.; Liu, X.; Yu, F.R.; Li, M.; Leung, V.C. Communications, caching, and computing oriented small cell networks with interference alignment. IEEE Commun. Mag. 2016, 54, 29–35. [Google Scholar] [CrossRef]
- Liu, D.; Chen, B.; Yang, C.; Molisch, A.F. Caching at the wireless edge: Design aspects, challenges, and future directions. IEEE Commun. Mag. 2016, 54, 22–28. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, Y.; Yi, W.; Liu, Y.; Nallanathan, A. Joint optimization of caching placement and trajectory for UAV-D2D networks. IEEE Trans. Commun. 2022, 70, 5514–5527. [Google Scholar] [CrossRef]
- Ding, R.; Xu, Y.; Gao, F.; Shen, X. Trajectory design and access control for air–ground coordinated communications system with multiagent deep reinforcement learning. IEEE Internet Things J. 2021, 9, 5785–5798. [Google Scholar] [CrossRef]
- Zhang, Y.; Mou, Z.; Gao, F.; Jiang, J.; Ding, R.; Han, Z. UAV-enabled secure communications by multi-agent deep reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 11599–11611. [Google Scholar] [CrossRef]
- Chang, Z.; Deng, H.; You, L.; Min, G.; Garg, S.; Kaddoum, G. Trajectory design and resource allocation for multi-UAV networks: Deep reinforcement learning approaches. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2940–2951. [Google Scholar] [CrossRef]
- Yu, J.; Li, Y.; Liu, X.; Sun, B.; Wu, Y.; Tsang, D.H.K. IRS assisted NOMA aided mobile edge computing with queue stability: Heterogeneous multi-agent reinforcement learning. IEEE Trans. Wirel. Commun. 2022, 22, 4296–4312. [Google Scholar] [CrossRef]
- Li, Y.; Aghvami, A.H.; Dong, D. Path Planning for Cellular-Connected UAV: A DRL Solution With Quantum-Inspired Experience Replay. IEEE Trans. Wirel. Commun. 2022, 21, 7897–7912. [Google Scholar] [CrossRef]
- Chen, Y.J.; Liao, K.M.; Ku, M.L.; Tso, F.P.; Chen, G.Y. Multi-Agent Reinforcement Learning Based 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks. IEEE Trans. Veh. Technol. 2021, 70, 8201–8215. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, Z.; Liu, Y.; Xu, W.; Nallanathan, A. Joint resource, deployment, and caching optimization for AR applications in dynamic UAV NOMA networks. IEEE Trans. Wirel. Commun. 2021, 21, 3409–3422. [Google Scholar] [CrossRef]
- Al-Hilo, A.; Samir, M.; Assi, C.; Sharafeddine, S.; Ebrahimi, D. UAV-assisted content delivery in intelligent transportation systems-joint trajectory planning and cache management. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5155–5167. [Google Scholar] [CrossRef]
- Qin, P.; Fu, Y.; Zhang, J.; Geng, S.; Liu, J.; Zhao, X. DRL-Based Resource Allocation and Trajectory Planning for NOMA-Enabled Multi-UAV Collaborative Caching 6 G Network. IEEE Trans. Veh. Technol. 2024, 73, 8750–8764. [Google Scholar] [CrossRef]
- Ji, J.; Zhu, K.; Cai, L. Trajectory and communication design for cache-enabled UAVs in cellular networks: A deep reinforcement learning approach. IEEE Trans. Mob. Comput. 2022, 22, 6190–6204. [Google Scholar] [CrossRef]
- Peng, H.; Shen, X. Multi-agent reinforcement learning based resource management in MEC-and UAV-assisted vehicular networks. IEEE J. Sel. Areas Commun. 2020, 39, 131–141. [Google Scholar] [CrossRef]
- Araf, S.; Saha, A.S.; Kazi, S.H.; Tran, N.H.; Alam, M.G.R. UAV Assisted Cooperative Caching on Network Edge Using Multi-Agent Actor-Critic Reinforcement Learning. IEEE Trans. Veh. Technol. 2023, 72, 2322–2337. [Google Scholar] [CrossRef]
- Rodrigues, T.K.; Suto, K.; Nishiyama, H.; Liu, J.; Kato, N. Machine learning meets computation and communication control in evolving edge and cloud: Challenges and future perspective. IEEE Commun. Surv. Tutor. 2019, 22, 38–67. [Google Scholar] [CrossRef]
- Yang, Y.; Luo, R.; Li, M.; Zhou, M.; Zhang, W.; Wang, J. Mean field multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5571–5580. [Google Scholar] [CrossRef]
- Baştuğ, E.; Bennis, M.; Zeydan, E.; Kader, M.A.; Karatepe, I.A.; Er, A.S.; Debbah, M. Big data meets telcos: A proactive caching perspective. J. Commun. Netw. 2015, 17, 549–557. [Google Scholar] [CrossRef]
- Wu, H.; Lyu, F.; Zhou, C.; Chen, J.; Wang, L.; Shen, X. Optimal UAV caching and trajectory in aerial-assisted vehicular networks: A learning-based approach. IEEE J. Sel. Areas Commun. 2020, 38, 2783–2797. [Google Scholar] [CrossRef]
- Ji, J.; Zhu, K.; Niyato, D.; Wang, R. Joint cache placement, flight trajectory, and transmission power optimization for multi-UAV assisted wireless networks. IEEE Trans. Wirel. Commun. 2020, 19, 5389–5403. [Google Scholar] [CrossRef]
- Wang, J.; Zhou, X.; Zhang, H.; Yuan, D. Joint trajectory design and power allocation for uav assisted network with user mobility. IEEE Trans. Veh. Technol. 2023, 72, 13173–13189. [Google Scholar] [CrossRef]
- Ji, J.; Zhu, K.; Niyato, D.; Wang, R. Joint trajectory design and resource allocation for secure transmission in cache-enabled UAV-relaying networks with D2D communications. IEEE Internet Things J. 2020, 8, 1557–1571. [Google Scholar] [CrossRef]
- Mozaffari, M.; Saad, W.; Bennis, M.; Debbah, M. Unmanned aerial vehicle with underlaid device-to-device communications: Performance and tradeoffs. IEEE Trans. Wirel. Commun. 2016, 15, 3949–3963. [Google Scholar] [CrossRef]
- Wang, L.; Tang, H.; Čierny, M. Device-to-device link admission policy based on social interaction information. IEEE Trans. Veh. Technol. 2014, 64, 4180–4186. [Google Scholar] [CrossRef]
- Zhang, Y.; Pan, E.; Song, L.; Saad, W.; Dawy, Z.; Han, Z. Social network enhanced device-to-device communication underlaying cellular networks. In Proceedings of the 2013 IEEE/CIC International Conference on Communications in China-Workshops (CIC/ICCC), Xi’an, China, 12–14 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 182–186. [Google Scholar] [CrossRef]
- Feng, Q.; McGeehan, J.; Tameh, E.K.; Nix, A.R. Path loss models for air-to-ground radio channels in urban environments. In Proceedings of the 2006 IEEE 63rd Vehicular Technology Conference, Melbourne, Australia, 7–10 May 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 6, pp. 2901–2905. [Google Scholar] [CrossRef]
- Gao, X.; Wang, X.; Qian, Z. Probabilistic Caching Strategy and TinyML-Based Trajectory Planning in UAV-Assisted Cellular IoT System. IEEE Internet Things J. 2024, 11, 21227–21238. [Google Scholar] [CrossRef]
- Zeng, Y.; Xu, J.; Zhang, R. Energy minimization for wireless communication with rotary-wing UAV. IEEE Trans. Wirel. Commun. 2019, 18, 2329–2345. [Google Scholar] [CrossRef]
- Kangasharju, J.; Roberts, J.; Ross, K.W. Object replication strategies in content distribution networks. Comput. Commun. 2002, 25, 376–383. [Google Scholar] [CrossRef]
- Liang, L.; Ye, H.; Li, G.Y. Spectrum sharing in vehicular networks based on multi-agent reinforcement learning. IEEE J. Sel. Areas Commun. 2019, 37, 2282–2292. [Google Scholar] [CrossRef]
- Lu, Y.; Zhang, M.; Tang, M. Caching for Edge Inference at Scale: A Mean Field Multi-Agent Reinforcement Learning Approach. In Proceedings of the GLOBECOM 2023-2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 8–12 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 332–337. [Google Scholar] [CrossRef]
- Zhang, H.; Tang, H.; Hu, Y.; Wei, X.; Wu, C.; Ding, W.; Zhang, X.P. Heterogeneous mean-field multi-agent reinforcement learning for communication routing selection in SAGI-net. In Proceedings of the 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall), London, UK, 26–29 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Qiu, D.; Wang, J.; Dong, Z.; Wang, Y.; Strbac, G. Mean-field multi-agent reinforcement learning for peer-to-peer multi-energy trading. IEEE Trans. Power Syst. 2022, 38, 4853–4866. [Google Scholar] [CrossRef]
- Li, T.; Zhu, K.; Luong, N.C.; Niyato, D.; Wu, Q.; Zhang, Y.; Chen, B. Applications of multi-agent reinforcement learning in future internet: A comprehensive survey. IEEE Commun. Surv. Tutor. 2022, 24, 1240–1279. [Google Scholar] [CrossRef]
- Zhao, N.; Pang, X.; Li, Z.; Chen, Y.; Li, F.; Ding, Z.; Alouini, M.S. Joint trajectory and precoding optimization for UAV-assisted NOMA networks. IEEE Trans. Commun. 2019, 67, 3723–3735. [Google Scholar] [CrossRef]
- Hu, J.; Wellman, M.P. Nash Q-Learning for General-Sum Stochastic Games. J. Mach. Learn. Res. 2003, 4, 1039–1069. [Google Scholar]
- Jaakkola, T.; Jordan, M.I.; Singh, S.P. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Comput. 1994, 6, 1185–1201. [Google Scholar] [CrossRef]
Work | Trajectory Design | Heterogeneous Services | Mixed Actions | Joint Optimization | Scalability |
---|---|---|---|---|---|
[8] | ✓ | ✕ | ✓ | ✓ | ✕ |
[10] | ✓ | ✕ | ✕ | ✓ | ✕ |
[16] | ✓ | ✕ | ✓ | ✓ | ✕ |
[17] | ✓ | ✓ | ✓ | ✓ | ✕ |
[18] | ✕ | ✓ | ✓ | ✓ | ✕ |
[13] | ✓ | ✕ | ✕ | ✕ | ✕ |
[14] | ✓ | ✕ | ✕ | ✓ | ✕ |
[15] | ✓ | ✕ | ✕ | ✓ | ✕ |
[19] | ✕ | ✓ | ✕ | ✓ | ✕ |
Ours | ✓ | ✓ | ✓ | ✓ | ✓ |
Parameter | Description |
---|---|
I | Number of CUs. |
J | Number of RUs. |
F | Number of files. |
UAV flight trajectory . | |
Cache placement matrix of CUs. | |
Cache placement vector of UAV. | |
Transmission power vector of CUs. | |
Data rate from CU i to RU j. | |
Successful transmission probability from CU i to RU j. | |
Data rate from UAV to RU j. | |
Maximum communication range for D2D transmission. | |
Maximum communication range for UAV transmission. | |
LoS channel probability. | |
Noise spectral density. | |
The probability of RU j requesting file f. | |
The set of CUs that meet the conditions for establishing D2D links. | |
H | The total cache hit probability. |
The total transmission delay for RU j. |
System Parameters | Training Parameters | ||
---|---|---|---|
Parameters | Values | Parameters | Values |
[7,8] | 0.2 s, 200 s | 0.1 | |
[25] | 0.503, 120 | 0.95 | |
[25] | 0.05, 1.225 | 3000 | |
[17] | 20 MHz | 128 | |
[25] | 0.3, 4.03 | 0.01 | |
3.5 W | |||
[17] | −100 dBm/Hz | ||
[25] | 79.86 W, 88.63 W | 5, 5 | |
[23] | 50 KJ | 256, 64, 32 | |
V [23] | 15 m/s | 256, 64, 32 |
Algorithm | Episodic Time (s) | Number of Episodes | Total Time (h) |
---|---|---|---|
SH-MAMFAC | 1.38 | 1600 | 0.61 |
UH-MADDPG | 1.79 | 3000 | 1.49 |
UC-DRL | 2.62 | 5500 * | 4.00 * |
SI-DRL | 0.96 | 1100 | 0.29 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Li, H.; Wang, X.; Xia, S.; Liu, T.; Wang, R. Resource Allocation in UAV-D2D Networks: A Scalable Heterogeneous Multi-Agent Deep Reinforcement Learning Approach. Electronics 2024, 13, 4401. https://doi.org/10.3390/electronics13224401
Wang H, Li H, Wang X, Xia S, Liu T, Wang R. Resource Allocation in UAV-D2D Networks: A Scalable Heterogeneous Multi-Agent Deep Reinforcement Learning Approach. Electronics. 2024; 13(22):4401. https://doi.org/10.3390/electronics13224401
Chicago/Turabian StyleWang, Huayuan, Hui Li, Xin Wang, Shilin Xia, Tao Liu, and Ruonan Wang. 2024. "Resource Allocation in UAV-D2D Networks: A Scalable Heterogeneous Multi-Agent Deep Reinforcement Learning Approach" Electronics 13, no. 22: 4401. https://doi.org/10.3390/electronics13224401
APA StyleWang, H., Li, H., Wang, X., Xia, S., Liu, T., & Wang, R. (2024). Resource Allocation in UAV-D2D Networks: A Scalable Heterogeneous Multi-Agent Deep Reinforcement Learning Approach. Electronics, 13(22), 4401. https://doi.org/10.3390/electronics13224401