Policy Optimization of the Power Allocation Algorithm Based on the Actor–Critic Framework in Small Cell Networks
Abstract
:1. Introduction
- Based on the actor–critic framework, we propose a novel policy optimization algorithm for power allocation (POPA) in small cell networks with distributed execution and centralized exploration training, which is suitable for continuous action spaces and can reduce the computational complexity and storage of SBSs, and ensure that SBSs can obtain power policy in real time.
- The actor–critic-based policy optimization framework proposed in this paper can work effectively without the need for real-time global CSI; the training convergence effects of networks are fast with high robustness.
- The optimization framework proposed in this paper is scalable, not only for increasing or decreasing the number of base stations in an environment, but also for deployment in different environments and other wireless networks without adjusting most of the hyperparameters.
2. System Model and Problem Formulation
2.1. Channel Model
2.2. Problem Formulation
3. Proximal Policy Optimization Algorithm
3.1. Overview of Deep Reinforcement Learning
3.2. Proximal Policy Optimization Algorithm
4. The Proposed Power Allocation Algorithm
4.1. Framework Architecture
4.2. Process of Proximal Policy Optimization
4.3. Algorithm Design
4.3.1. State and Action Space
4.3.2. Reward Function
4.3.3. Action Selection Distribution
4.4. Framework Scalability
5. Simulation
5.1. Simulation Setup
5.2. Clip Range
5.3. Algorithm Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shen, K.; Yu, W. Fractional Programming for Communication Systems—Part I: Power Control and Beamforming. IEEE Trans. Signal Process. 2018, 66, 2616–2630. [Google Scholar] [CrossRef] [Green Version]
- Shi, Q.; Razaviyayn, M.; Luo, Z.Q.; He, C. An Iteratively Weighted MMSE Approach to Distributed Sum-Utility Maximization for a MIMO Interfering Broadcast Channel. IEEE Trans. Signal Process. 2011, 59, 4331–4340. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Kang, C.; Ma, T.; Teng, Y.; Guo, D. Power Allocation in Multi-Cell Networks Using Deep Reinforcement Learning. In Proceedings of the 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 27–30 August 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Nasir, Y.S.; Guo, D. Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks. IEEE J. Sel. Areas Commun. 2019, 37, 2239–2250. [Google Scholar] [CrossRef] [Green Version]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019, arXiv:1509.02971. [Google Scholar]
- Zhang, T.; Zhu, K.; Wang, J. Energy-Efficient Mode Selection and Resource Allocation for D2D-Enabled Heterogeneous Networks: A Deep Reinforcement Learning Approach. IEEE Trans. Wirel. Commun. 2021, 20, 1175–1187. [Google Scholar] [CrossRef]
- Meng, F.; Chen, P.; Wu, L.; Cheng, J. Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches. IEEE Trans. Wirel. Commun. 2020, 19, 6255–6267. [Google Scholar] [CrossRef]
- Sinan Nasir, Y.; Guo, D. Deep Actor-Critic Learning for Distributed Power Control in Wireless Mobile Networks. In Proceedings of the 2020 54th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 1–5 November 2020; pp. 398–402. [Google Scholar] [CrossRef]
- Zhang, L.; Liang, Y.C. Deep Reinforcement Learning for Multi-Agent Power Control in Heterogeneous Networks. IEEE Trans. Wirel. Commun. 2021, 20, 2551–2564. [Google Scholar] [CrossRef]
- Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; Meger, D. Deep reinforcement learning that matters. In Proceedings of the AAAI Conference on Artificial intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Dent, P.; Bottomley, G.; Croft, T. Jakes Fading Model Revisited. Electron. Lett. 1993, 29, 1162. [Google Scholar] [CrossRef]
- Luo, Z.Q.; Zhang, S. Dynamic Spectrum Management: Complexity and Duality. IEEE J. Sel. Top. Signal Process. 2008, 2, 57–73. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Zhang, H.; Chen, H.; Xiao, C.; Li, B.; Liu, M.; Boning, D.; Hsieh, C.J. Robust deep reinforcement learning against adversarial perturbations on state observations. Adv. Neural Inf. Process. Syst. 2020, 33, 21024–21037. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 1928–1937. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Parameters | Values |
---|---|
n | 16 |
c | 5 |
0.2 | |
T | 100 |
0.99 | |
0.97 | |
50 | |
50 | |
0.001 | |
0.001 | |
20 ms | |
hidden layers | (64, 64) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, H.; Huang, Z.; Zhao, X.; Liu, X.; Jiang, Y.; Geng, P.; Yang, G.; Cao, Y.; Wang, D. Policy Optimization of the Power Allocation Algorithm Based on the Actor–Critic Framework in Small Cell Networks. Mathematics 2023, 11, 1702. https://doi.org/10.3390/math11071702
Chen H, Huang Z, Zhao X, Liu X, Jiang Y, Geng P, Yang G, Cao Y, Wang D. Policy Optimization of the Power Allocation Algorithm Based on the Actor–Critic Framework in Small Cell Networks. Mathematics. 2023; 11(7):1702. https://doi.org/10.3390/math11071702
Chicago/Turabian StyleChen, Haibo, Zhongwei Huang, Xiaorong Zhao, Xiao Liu, Youjun Jiang, Pinyong Geng, Guang Yang, Yewen Cao, and Deqiang Wang. 2023. "Policy Optimization of the Power Allocation Algorithm Based on the Actor–Critic Framework in Small Cell Networks" Mathematics 11, no. 7: 1702. https://doi.org/10.3390/math11071702
APA StyleChen, H., Huang, Z., Zhao, X., Liu, X., Jiang, Y., Geng, P., Yang, G., Cao, Y., & Wang, D. (2023). Policy Optimization of the Power Allocation Algorithm Based on the Actor–Critic Framework in Small Cell Networks. Mathematics, 11(7), 1702. https://doi.org/10.3390/math11071702