A Power Allocation Scheme for MIMO-NOMA and D2D Vehicular Edge Computing Based on Decentralized DRL
Abstract
:1. Introduction
- (1)
- We propose a power allocation model in VEC based on the decentralized DRL, defining the action function, state function, and reward function. The DDPG algorithm is employed to deal with the continuous action space problem and to guide the model to learn the optimal policy.
- (2)
- Performance testing of the trained model in a large number of experiments shows that the proposed approach outperforms other existing ones.
2. Related Work
2.1. D2D and MIMO-NOMA Technology in MEC and VEC
2.2. V2V and V2I Communication in VEC
2.3. DRL-Based Resource Allocation in VEC
3. System Model
3.1. Mobility Model
3.2. Communication Model
3.2.1. V2I Communication
3.2.2. V2V Communication
3.3. Task-Computation Model
3.3.1. Local Processing
3.3.2. V2I and V2V Processing
4. Problem Formulation
4.1. State
4.2. Action
4.3. Reward
5. Solution
5.1. Training Stage
Algorithm 1: Model training stage based on the DDPG algorithm |
5.2. Testing Stage
Algorithm 2:Testing stage for the trained model |
6. Simulation Results and Analysis
6.1. Training Stage
6.2. Testing Stage
- GD-Local policy: SVU k first maximally allocates the local processing in each slot. The remaining tasks are equally allocated to V2I processing and V2V processing.
- GD-V2I policy: SVU k first maximally allocates the V2I processing power in each slot. The remaining tasks are equally allocated to local processing and V2V processing.
- GD-V2V policy: SVU k first maximally allocates the V2V processing power in each slot. The remaining tasks are equally allocated to local processing and V2I processing.
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Raza, S.; Wang, S.; Ahmed, M.; Anwar, M.R.; Mirza, M.A.; Khan, W.U. Task Offloading and Resource Allocation for IoV Using 5G NR-V2X Communication. IEEE Internet Things J. 2022, 9, 10397–10410. [Google Scholar] [CrossRef]
- Wu, Q.; Wan, Z.; Fan, Q.; Fan, P.; Wang, J. Velocity-Adaptive Access Scheme for MEC-Assisted Platooning Networks: Access Fairness Via Data Freshness. IEEE Internet Things J. 2022, 9, 4229–4244. [Google Scholar] [CrossRef]
- Wu, Q.; Xia, S.; Fan, Q.; Li, Z. Performance Analysis of IEEE 802.11p for Continuous Backoff Freezing in IoV. Electronics 2019, 8, 1404. [Google Scholar] [CrossRef] [Green Version]
- Wu, Q.; Zheng, J. Performance Modeling and Analysis of IEEE 802.11 DCF Based Fair Channel Access for Vehicle-to-Roadside Communication in a Non-Saturated State. Wirel. Netw. 2015, 21, 1–11. [Google Scholar] [CrossRef]
- Sabireen, H.; Neelanarayanan, V. A Review on Fog Computing: Architecture, Fog with IoT, Algorithms and Research Challenges. ICT Express 2021, 7, 162–176. [Google Scholar]
- Zhang, X.; Zhang, J.; Liu, Z.; Cui, Q.; Tao, X.; Wang, S. MDP-Based Task Offloading for Vehicular Edge Computing Under Certain and Uncertain Transition Probabilities. IEEE Trans. Veh. Technol. 2020, 69, 3296–3309. [Google Scholar] [CrossRef]
- Zhang, K.; Mao, Y.; Leng, S.; He, Y.; Zhang, Y. Mobile-Edge Computing for Vehicular Networks: A Promising Network Paradigm with Predictive Off-Loading. IEEE Veh. Technol. Mag. 2017, 12, 36–44. [Google Scholar] [CrossRef]
- Wu, Q.; Zhao, Y.; Fan, Q.; Fan, P.; Wang, J.; Zhang, C. Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning. IEEE J. Sel. Top. Signal Process. 2022, 17, 66–81. [Google Scholar] [CrossRef]
- Hou, X.; Li, Y.; Chen, M.; Wu, D.; Jin, D.; Chen, S. Vehicular Fog Computing: A Viewpoint of Vehicles as the Infrastructures. IEEE Trans. Veh. Technol. 2016, 65, 3860–3873. [Google Scholar] [CrossRef]
- Hou, X.; Ren, Z.; Wang, J.; Cheng, W.; Ren, Y.; Chen, K.-C.; Zhang, H. Reliable Computation Offloading for Edge-Computing-Enabled Software-Defined IoV. IEEE Internet Things J. 2020, 7, 7097–7111. [Google Scholar] [CrossRef]
- Zhu, H.; Wu, Q.; Wu, X.-J.; Fan, Q.; Fan, P.; Wang, J. Decentralized Power Allocation for MIMO-NOMA Vehicular Edge Computing Based on Deep Reinforcement Learning. IEEE Internet Things J. 2022, 9, 12770–12782. [Google Scholar] [CrossRef]
- Asadi, A.; Wang, Q.; Mancuso, V. A Survey on Device-to-Device Communication in Cellular Networks. IEEE Commun. Surv. Tut. 2014, 16, 1801–1819. [Google Scholar] [CrossRef] [Green Version]
- Ren, Y.; Liu, F.; Liu, Z.; Wang, C.; Ji, Y. Power Control in D2D-Based Vehicular Communication Networks. IEEE Trans. Veh. Technol. 2015, 64, 5547–5562. [Google Scholar] [CrossRef]
- Sun, W.; Yuan, D.; Ström, E.G.; Brännström, F. Cluster-Based Radio Resource Management for D2D-Supported Safety-Critical V2X Communications. IEEE Trans. Wirel. Commun. 2016, 15, 2756–2769. [Google Scholar] [CrossRef] [Green Version]
- Sun, W.; Ström, E.G.; Brännström, F.; Sou, K.C.; Sui, Y. Radio Resource Management for D2D-Based V2V Communication. IEEE Trans. Veh. Technol. 2016, 65, 6636–6650. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, K.K.; Duong, T.Q.; Vien, N.A.; Le-Khac, N.-A.; Nguyen, L.D. Distributed Deep Deterministic Policy Gradient for Power Allocation Control in D2D-Based V2V Communications. IEEE Access 2019, 7, 164533–164543. [Google Scholar] [CrossRef]
- Wu, Q.; Shi, S.; Wan, Z.; Fan, Q.; Fan, P.; Zhang, C. Towards V2I Age-aware Fairness Access: A DQN Based Intelligent Vehicular Node Training and Test Method. Chin. J. Electron. 2022, 32, 1. [Google Scholar]
- Wang, H.; Ke, H.; Liu, G.; Sun, W. Computation Migration and Resource Allocation in Heterogeneous Vehicular Networks: A Deep Reinforcement Learning Approach. IEEE Access 2020, 8, 171140–171153. [Google Scholar] [CrossRef]
- Dong, P.; Ning, Z.; Ma, R.; Wang, X.; Hu, X.; Hu, B. NOMA-based energy-efficient task scheduling in vehicular edge computing networks: A self-imitation learning-based approach. China Commun. 2020, 17, 1–11. [Google Scholar] [CrossRef]
- Wang, Q.; Fan, P.; Letaief, K.B. On the Joint V2I and V2V Schedule for Cooperative VANET with Network Codeding. IEEE Trans. Veh. Technol. 2012, 61, 62–73. [Google Scholar] [CrossRef]
- He, Y.; Zhao, N.; Yin, H. Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2018, 67, 44–45. [Google Scholar] [CrossRef]
- Luo, Q.; Li, C.; Luan, T.H.; Shi, W. Collaborative Data Scheduling for Vehicular Edge Computing via Deep Reinforcement Learning. IEEE Internet Things J. 2020, 7, 9637–9650. [Google Scholar] [CrossRef]
- Liu, Y.; Yu, H.; Xie, S.; Zhang, Y. Deep Reinforcement Learning for Offloading and Resource Allocation in Vehicle Edge Computing and Networks. IEEE Trans. Veh. Technol. 2019, 68, 11158–11168. [Google Scholar] [CrossRef]
- Tan, L.T.; Hu, R.Q. Mobility-Aware Edge Caching and Computing in Vehicle Networks: A Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2018, 67, 10190–10203. [Google Scholar] [CrossRef]
- Zhu, Z.; Wan, S.; Fan, P.; Letaief, K.B. Federated Multiagent Actor–Critic Learning for Age Sensitive Mobile-Edge Computing. IEEE Internet Things J. 2022, 9, 1053–1067. [Google Scholar] [CrossRef]
- Wu, Q.; Zhao, Y.; Fan, Q. Time-Dependent Performance Modeling for Platooning Communications at Intersection. IEEE Internet Things J. 2022, 9, 18500–18513. [Google Scholar] [CrossRef]
- Hai, T.; Zhou, J.; Padmavathy, T.V.; Md, A.Q.; Jawawi, D.N.A.; Aksoy, M. Design and Validation of Lifetime Extension Low Latency MAC Protocol (LELLMAC) for Wireless Sensor Networks Using a Hybrid Algorithm. Sustainability 2022, 14, 15547. [Google Scholar] [CrossRef]
- Wu, Q.; Liu, H.; Zhang, C.; Fan, Q.; Li, Z.; Wang, K. Trajectory protection schemes based on a gravity mobility model in iot. Electronics 2019, 8, 148. [Google Scholar] [CrossRef] [Green Version]
- Wang, K.; Yu, F.; Wang, L.; Li, J.; Zhao, N.; Guan, Q.; Li, B.; Wu, Q. Interference alignment with adaptive power allocation in full-duplex-enabled small cell networks. IEEE Trans. Veh. Technol. 2019, 68, 3010–3015. [Google Scholar] [CrossRef]
- Fan, J.; Yin, S.; Wu, Q.; Gao, F. Study on refined deployment of wireless mesh sensor network. In Proceedings of the 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM), Chengdu, China, 23–25 September 2010; pp. 1–5. [Google Scholar]
- Ye, H.; Li, G.Y.; Juang, B.-H.F. Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Trans. Veh. Technol. 2019, 68, 3163–3173. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.-H.; Yang, C.-C.; Hua, M.; Zhou, W. Deep Deterministic Policy Gradient (DDPG)-Based Resource Allocation Scheme for NOMA Vehicular Communications. IEEE Access 2020, 8, 18797–18807. [Google Scholar] [CrossRef]
- Ding, C.; Wang, J.-B.; Zhang, H.; Lin, M.; Wang, J. Joint MU-MIMO Precoding and Resource Allocation for Mobile-Edge Computing. IEEE Trans. Wirel. Commun. 2021, 20, 1639–1654. [Google Scholar] [CrossRef]
- Liu, Y.; Cai, Y.; Liu, A.; Zhao, M.; Hanzo, L. Latency Minimization for mmWave D2D Mobile Edge Computing Systems: Joint Task Allocation and Hybrid Beamforming Design. IEEE Trans. Veh. Technol. 2022, 71, 12206–12221. [Google Scholar] [CrossRef]
- Li, Y.; Xu, G.; Yang, K.; Ge, J.; Liu, P.; Jin, Z. Energy Efficient Relay Selection and Resource Allocation in D2D-Enabled Mobile Edge Computing. IEEE Trans. Veh. Technol. 2020, 69, 15800–15814. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, Z.; Liu, K. V2X offloading and resource allocation in SDN-assisted MEC-based vehicular networks. China Commun. 2020, 17, 266–283. [Google Scholar] [CrossRef]
- Bai, X.; Chen, S.; Shi, Y.; Liang, C.; Lv, X. Collaborative Task Processing in Vehicular Edge Computing Networks. In Proceedings of the 2021 4th International Conference on Hot Information-Centric Networking (HotICN), Nanjing, China, 25–27 November 2021; pp. 92–97. [Google Scholar]
- Ning, Z.; Zhang, K.; Wang, X.; Obaidat, M.S.; Guo, L.; Hu, X.; Hu, B.; Guo, Y.; Sadoun, B.; Kwok, R.Y.K. Joint Computing and Caching in 5G-Envisioned Internet of Vehicles: A Deep Reinforcement Learning-Based Traffic Control System. IEEE Trans. Intell. Transp. 2021, 22, 5201–5212. [Google Scholar] [CrossRef]
- Ren, T.; Yu, X.; Chen, X.; Guo, S.; Xue-Song, Q. Vehicular Network Edge Intelligent Management: A Deep Deterministic Policy Gradient Approach for Service Offloading Decision. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 905–910. [Google Scholar]
- Jang, Y.; Na, J.; Jeong, S.; Kang, J. Energy-Efficient Task Offloading for Vehicular Edge Computing: Joint Optimization of Offloading and Bit Allocation. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–5. [Google Scholar]
- Zhan, W.; Luo, C.; Wang, J.; Wang, C.; Min, G.; Duan, H.; Zhu, Q. Deep-Reinforcement-Learning-Based Offloading Scheduling for Vehicular Edge Computing. IEEE Internet Things J. 2020, 7, 5449–5465. [Google Scholar] [CrossRef]
- Ngo, H.Q.; Larsson, E.G.; Marzetta, T.L. Energy and Spectral Efficiency of Very Large Multiuser MIMO Systems. IEEE Trans. Commun. 2013, 61, 1436–1449. [Google Scholar]
- Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. Am. J. Phys. 1988, 55, 958–962. [Google Scholar] [CrossRef] [Green Version]
- Kwak, J.; Kim, Y.; Lee, J.; Chong, S. DREAM: Dynamic Resource and Task Allocation for Energy Minimization in Mobile Cloud Systems. IEEE J. Sel. Area. Comm. 2015, 22, 2510–2523. [Google Scholar] [CrossRef]
- King, C. Fundamentals of wireless communications. In Proceedings of the 2014 IEEE-IAS/PCA Cement Industry Technical Conference, National Harbor, MD, USA, 13–17 April 2014; pp. 1–7. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 2014 International Conference on Machine Learning(ICML), Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Notation | Description | Notation | Description |
---|---|---|---|
Action of SVU k in slot n. | Number of task bits of SVU k arriving in slot n. | ||
Abbreviation for . | Action of the ith tuple. | ||
Abbreviation for . | Next action of the ith tuple. | ||
Channel power gain at the reference distance. | Buffer capacity of SVU k in slot n. | ||
D | Diameter of the BS’s range. | Error vector. | |
Number of task bits processed locally by SVU k in slot n. | Number of task bits processed for SVU k performing V2I communication in slot n. | ||
Distance between SVU k and the BS along the x-axis in slot n. | Number of task bits processed for SVU k performing V2V communication in slot n. | ||
An exponentially distributed random variable with unit mean. | CPU frequency of SVU k in slot n. | ||
Small-scale fading channel gain of SVU k in slot n. | Channel vector of SVU k for V2V communication in slot n. | ||
Large-scale fading coefficient for V2I communication at slot n. | Channel power gain at 1 m. | ||
MIMO-NOMA channel martix. | Channel gain of SVU k for V2V communication in slot n. | ||
Objective function. | Maximum number of episodes in the training stage. | ||
Loss function. | L | Number of CPU cycles for processing one bit. | |
Noise received by the BS. | Last slot in lane j. | ||
Maximum number of SVUs in lane j. | Number of antennae. | ||
Location of SVU k in slot n. | Position of the BS antenna. | ||
Processing power of SVU k for V2I communication in slot n. | Local processing power of SVU k in slot n. | ||
Processing power of SVU k for V2V communication in slot n. | Maximum V2I processing power. | ||
Maximum V2I processing power. | Maximum local processing power. | ||
Action value function of SVU k. | Action value function output from the critic network. | ||
Action value function output from the target critic network. | Reward of SVU k in slot n. | ||
Abbreviation for . | Reward of the ith tuple. | ||
Experience buffer. | State space of SVU k in slot n. | ||
Abbreviation for . | State of the ith tuple. | ||
Abbreviation for . | Next state of the ith tuple. | ||
T | Maximum number of tuples in a minibatch. | Velocity of SVU k. | |
Velocity of lane j. | Distance from lane 1 to the BS. | ||
Bandwidth. | Distance between SVU k driving in lane j and antennas along the y-axis. | ||
Lane width. | Target value of the ith tuple. | ||
Signal received by the BS. | Learning rate of the critic network. | ||
Learning rate of the actor network. | Path loss exponent for V2V communication. | ||
Normalized channel correlation coefficient of SVU k. | Mean rate of task arrival for SVU k. | ||
Parameter of the critic network. | Parameter of the target critic network. | ||
Effective converted capacitance of SVU k. | Distance between SVU k and the corresponding VEC vehicle. | ||
Parameter of the actor network. | Parameter of the target actor network. | ||
Discount factor. | Exploration noise in slot n. | ||
Policy of SVU k approximated by the actor network. | Optimal policy of SVU k. | ||
Variance of the Gaussian noise in communication. | Update degree parameter for the target networks. | ||
Slot duration. | Reward weight factors. | ||
SINR of SVU k for V2I communication in slot n. | Path loss exponent. | ||
SINR of SVU k for V2V communication in slot n. |
Parameters of the System Model | |||
Parameter | Value | Parameter | Value |
W | dB | ||
dB | 1 MHZ | ||
20 ms | |||
20 m/s | 25 m/s | ||
30 m/s | 10 m | ||
L | 500 cycles/bit | 3 Mbps | |
H | 10 m | 4 | |
D | 500 m | 1 W | |
1 W | 1 W | ||
65 m | 10 m | ||
2 | |||
Parameters of the Training Process | |||
Parameter | Value | Parameter | Value |
1000 | T | 64 | |
10 |
Policies | Average Power Consumption | Average Buffer Capacity | Cumulative Discount Reward |
---|---|---|---|
DDPG | A | B | A |
GD-Local | B | B | B |
GD-V2I | D | C | D |
GD-V2V | B | B | B |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Long, D.; Wu, Q.; Fan, Q.; Fan, P.; Li, Z.; Fan, J. A Power Allocation Scheme for MIMO-NOMA and D2D Vehicular Edge Computing Based on Decentralized DRL. Sensors 2023, 23, 3449. https://doi.org/10.3390/s23073449
Long D, Wu Q, Fan Q, Fan P, Li Z, Fan J. A Power Allocation Scheme for MIMO-NOMA and D2D Vehicular Edge Computing Based on Decentralized DRL. Sensors. 2023; 23(7):3449. https://doi.org/10.3390/s23073449
Chicago/Turabian StyleLong, Dunxing, Qiong Wu, Qiang Fan, Pingyi Fan, Zhengquan Li, and Jing Fan. 2023. "A Power Allocation Scheme for MIMO-NOMA and D2D Vehicular Edge Computing Based on Decentralized DRL" Sensors 23, no. 7: 3449. https://doi.org/10.3390/s23073449
APA StyleLong, D., Wu, Q., Fan, Q., Fan, P., Li, Z., & Fan, J. (2023). A Power Allocation Scheme for MIMO-NOMA and D2D Vehicular Edge Computing Based on Decentralized DRL. Sensors, 23(7), 3449. https://doi.org/10.3390/s23073449