Ultra-Reliable Deep-Reinforcement-Learning-Based Intelligent Downlink Scheduling for 5G New Radio-Vehicle to Infrastructure Scenarios
Abstract
:1. Introduction
- (1)
- Eliminate the overvaluation issue with Q value—when learning the Q value function for the DQN algorithm, the Q value is prone to being overstated, which means that for some state–action combinations, its Q value might overestimate. Due to this, the DQN algorithm may occasionally choose ineffective actions, which will have an impact on the scheduling efficiency. The overestimation problem of Q values can be reduced by DDQN by using two Q networks, one for choosing actions and the other for assessing the value of those activities, therefore enhancing the precision and stability of downlink scheduling algorithm learning.
- (2)
- More precise action choice—dual Q networks are utilized by the DDQN algorithm to pick activities, which allows for a more precise assessment of the relative worth of various actions. Due to this, DDQN may be able to choose actions with greater precision, improving the downlink scheduling approach. The DDQN algorithm can more precisely choose the actions that can optimize throughput or lower the BER, thereby enhancing link performance, when compared to DQN, OLLA, and NoOLLA.
- (3)
- Overcoming the issue of the local optimal solution—the OLLA algorithm may enter the local optimal solution and fail to attain the global optimal by optimizing the local action selection. The DDQN algorithm, in contrast, employs dual Q networks throughout the learning phase, which can better avoid the local optimal solution problem and more effectively explore the larger action space.
- (4)
- Adapt to surroundings that are more complicated—by using two Q networks and reinforcement learning, the DDQN algorithm can adapt more flexibly to various channel environments and network requirements under a dynamic, changing environment, so as to improve the efficiency and reliability of communication links. This makes DDQN have strong adaptability and superior performance in a complex environment.
2. Problem Formulation
3. DDQN-Based V2I Downlink Scheduling Adaptation
3.1. Downlink Channel Measurement
3.2. Data Processing and Network Architecture
- 1.
- Matrix processing: Equation (15) illustrates how one may acquire the precoding matrix for the precoding matrix and obtain following the same matrix processing:
- 2.
- Embedding layer: As a result of , the CQI encoding vector and RI encoding vector must be obtained to satisfy the network input conditions. These vectors can be obtained by the embedding layer network in deep learning, and the embedding layer can transform the input’s index value into a vector of a specific dimension size. The embedding layer, in particular, is essentially made up of several fully connected networks, but it has a different focus. The output of the embedding layer is equivalent to the weights in the fully connected network, which acquires the network weights.
- 3.
- Fully Connected Layer: To be able to obtain the mapping vector of BER, high-dimensional mapping will be executed by applying the FCN network’s because .
- 4.
- Concat operation: Following the previously mentioned process, the processed data must be concatenated into a single dimension to receive the DNN layer’s input.
- (1)
- Environment (environment): communication system with adaptive scheduling for NR-V2I downlink;
- (2)
- Intelligent body (agent): vehicle-mounted terminal;
- (3)
- Action: the quantity of space division multiplexing layers RI and MCS used by downlink scheduling by RSU, which is referred to as action in DDQN;
- (4)
- State: states are defined as those that are explicitly specified, as indicated in Equation (20), such as the acquired from downlink measurement, the precoding matrix corresponding to and , and the state matrix produced from after data preprocessing;
- (5)
- Reward: , which is specified as indicated in Equation (5), is defined as the BER following downlink adaptive scheduling.
Algorithm 1: Intelligent DDQN-based link scheduling algorithm for NR-V2I |
Input: Calculate network weights ; target network weights . Initialization: Memory database size ; Step 1: Repeat the number of iterations do; Step 2: Initialize the state ; Step 3: for the number of subframes do; Step 4: The action that fulfills with probability , or the number of air division multiplexing layers and the order of MCS, is chosen by the E-IoV server; Step 5: E-IoV server schedules the corresponding number of layers and the order of the MCS for the downlink, and then calculates the reward value BER , and the system enters a new state ; Step 6: The memory database stores the previous iteration experience ; Step 7: Randomly select a small batch of sample data from the memory database and train the network; the target network obtains Q target value , and the computational network obtains Q estimated value ; Step 8: If the final state is reached; Step 9: Then ; Step 10: Otherwise, , is the decay rate of future rewards. Step 11: Calculate the loss function according to Equation (21) and update the weights of the computational network according to Equation (22); Step 12: Every certain number of iterations, update the parameters of the target network with the parameters of the computational network, setting to ; Step 13: end; Step 14: until the iteration termination condition is reached. Output: DDQN downlink adaptive scheduling model. |
3.3. Training Parameter Settings
4. Simulation Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, C.; Yao, G.; Liu, L.; Pei, Q.; Song, H.; Dustdar, S. A Cooperative Vehicle-Infrastructure System for Road Hazards Detection with Edge Intelligence. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5186–5198. [Google Scholar] [CrossRef]
- Chen, C.; Wang, C.; Li, C.; Xiao, M.; Pei, Q. A V2V Emergent Message Dissemination Scheme for 6G-Oriented Vehicular Networks. Chin. J. Electron. 2023, 32, 1179–1191. [Google Scholar]
- Yan, X.; Liu, G.; Wu, H.-C.; Zhang, G.; Wang, Q.; Wu, Y. Robust Modulation Classification Over α-Stable Noise Using Graph-Based Fractional Lower-Order Cyclic Spectrum Analysis. IEEE Trans. Veh. Technol. 2020, 69, 2836–2849. [Google Scholar] [CrossRef]
- Mota, M.P.; Araujo, D.C.; Costa Neto, F.H.; de Almeida, A.L.F.; Cavalcanti, F.R. Adaptive Modulation and Coding Based on Reinforcement Learning for 5G Networks. In Proceedings of the 2019 IEEE Globecom Workshops (GC Wkshps), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
- Tsurumi, S.; Fujii, T. Reliable Vehicle-to-Vehicle Communication Using Spectrum Environment Map. In Proceedings of the 2018 International Conference on Information Networking (ICOIN), Chiang Mai, Thailand, 10–12 January 2018; pp. 310–315. [Google Scholar]
- Huang, Z.; Zheng, B.; Zhang, R. Roadside IRS-Aided Vehicular Communication: Efficient Channel Estimation and Low-Complexity Beamforming Design. IEEE Trans. Wirel. Commun. 2023, 22, 5976–5989. [Google Scholar] [CrossRef]
- Veres, M.; Moussa, M. Deep Learning for Intelligent Transportation Systems: A Survey of Emerging Trends. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3152–3168. [Google Scholar] [CrossRef]
- Ramjee, S.; Ju, S.; Yang, D.; Liu, X.; Gamal, A.E.; Eldar, Y.C. Fast Deep Learning for Automatic Modulation Classification. Available online: https://arxiv.org/abs/1901.05850v1 (accessed on 3 July 2023).
- Liu, X.; Yang, D.; Gamal, A.E. Deep Neural Network Architectures for Modulation Classification. In Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 29 October–1 November 2017; pp. 915–919. [Google Scholar]
- Klaine, P.V.; Imran, M.A.; Onireti, O.; Souza, R.D. A Survey of Machine Learning Techniques Applied to Self-Organizing Cellular Networks. IEEE Commun. Surv. Tutor. 2017, 19, 2392–2431. [Google Scholar] [CrossRef]
- de Carvalho, P.H.P.; Vieira, R.D.; Leite, J.P. A Continuous-State Reinforcement Learning Strategy for Link Adaptation in OFDM Wireless Systems. J. Commun. Inf. Syst. 2015, 30, 47–57. [Google Scholar] [CrossRef]
- Robust Adaptive Modulation and Coding (AMC) Selection in LTE Systems Using Reinforcement Learning. Available online: https://ieeexplore.ieee.org/abstract/document/6966162/ (accessed on 3 July 2023).
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Zhang, L.; Tan, J.; Liang, Y.-C.; Feng, G.; Niyato, D. Deep Reinforcement Learning-Based Modulation and Coding Scheme Selection in Cognitive Heterogeneous Networks. IEEE Trans. Wirel. Commun. 2019, 18, 3281–3294. [Google Scholar] [CrossRef]
- Liao, Y.; Yang, Z.; Yin, Z.; Shen, X. DQN-Based Adaptive MCS and SDM for 5G Massive MIMO-OFDM Downlink. IEEE Commun. Lett. 2023, 27, 185–189. [Google Scholar] [CrossRef]
- Xi, L.; Yu, L.; Xu, Y.; Wang, S.; Chen, X. A Novel Multi-Agent DDQN-AD Method-Based Distributed Strategy for Automatic Generation Control of Integrated Energy Systems. IEEE Trans. Sustain. Energy 2020, 11, 2417–2426. [Google Scholar] [CrossRef]
- Bui, V.-H.; Hussain, A.; Kim, H.-M. Double Deep Q -Learning-Based Distributed Operation of Battery Energy Storage System Considering Uncertainties. IEEE Trans. Smart Grid. 2020, 11, 457–469. [Google Scholar] [CrossRef]
- Xiaoqin, S.; Juanjuan, M.; Lei, L.; Tianchen, Z. Maximum-Throughput Sidelink Resource Allocation for NR-V2X Networks with the Energy-Efficient CSI Transmission. IEEE Access 2020, 3, 73164–73172. [Google Scholar] [CrossRef]
- Kim, J.; Choi, Y.; Noh, G.; Chung, H. On the Feasibility of Remote Driving Applications Over mmWave 5G Vehicular Communications: Implementation and Demonstration. IEEE Trans. Veh. Technol. 2023, 9, 2009–2023. [Google Scholar] [CrossRef]
- Liao, Y.; Yin, Z.; Yang, Z.; Shen, X. DQL-based intelligent scheduling algorithm for automatic driving in massive MIMO V2I scenarios. China Commun. 2023, 3, 18–26. [Google Scholar] [CrossRef]
- Sasaoka, N.; Sasaki, T.; Itoh, Y. PMI/RI Selection Based on Channel Capacity Increment Ratio. In Proceedings of the 2019 International Symposium on Multimedia and Communication Technology (ISMAC), Quezon City, Philippines, 19–21 August 2019; pp. 1–4. [Google Scholar]
- Passive Method for Estimating Available Throughput for Autonomous Off-Peak Data Transfer. Available online: https://www.hindawi.com/journals/wcmc/2020/3502394/ (accessed on 19 July 2023).
- Zhao, D.; Qin, H.; Song, B.; Zhang, Y.; Du, X.; Guizani, M. A Reinforcement Learning Method for Joint Mode Selection and Power Adaptation in the V2V Communication Network in 5G. IEEE Trans. Cogn. Commun. 2020, 3, 452–463. [Google Scholar] [CrossRef]
Parameter Name | Parameter Value |
---|---|
Carrier Frequency | 5.925 GHz |
Carrier Interval | 30 kHz |
Number of subcarriers | 624 |
FFT Points | 1024 |
Modulation mode | QPSK, 16 QAM, 64 QAM, 256 QAM |
Channel model | TDL |
Number of antennas of road test unit | 32 |
Number of vehicle terminal antennas | 4 |
Number of subframes | 300 |
Parameter Name | Parameter Value |
---|---|
Iteration number | 1000 |
Memory size | 1000 |
Frequency of update of target network parameters | 150 |
Activation function | Tanh |
Loss function | Huber |
Learning rate | 0.01 |
Batch size | 16 |
Number of vehicle terminal antennas | 0.9 |
Parameter Name | Performance Name | DQN Average Iterations | DDQN Average Iterations |
---|---|---|---|
Time delay = 0 μs | Average BER | 732 | 761 |
Time delay = 10 μs | Average BER | 775 | 784 |
Frequency offset = 250 Hz | Average BER | 753 | 789 |
Frequency offset = 500 Hz | Average BER | 794 | 810 |
Time delay = 0 μs | Average Throughput/(Mbps) | 703 | 732 |
Time delay = 10 μs | Average Throughput/(Mbps) | 731 | 747 |
Frequency offset = 250 Hz | Average Throughput/(Mbps) | 726 | 752 |
Frequency offset = 500 Hz | Average Throughput/(Mbps) | 765 | 773 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Zheng, Y.; Wang, J.; Shen, Z.; Tong, L.; Jing, Y.; Luo, Y.; Liao, Y. Ultra-Reliable Deep-Reinforcement-Learning-Based Intelligent Downlink Scheduling for 5G New Radio-Vehicle to Infrastructure Scenarios. Sensors 2023, 23, 8454. https://doi.org/10.3390/s23208454
Wang J, Zheng Y, Wang J, Shen Z, Tong L, Jing Y, Luo Y, Liao Y. Ultra-Reliable Deep-Reinforcement-Learning-Based Intelligent Downlink Scheduling for 5G New Radio-Vehicle to Infrastructure Scenarios. Sensors. 2023; 23(20):8454. https://doi.org/10.3390/s23208454
Chicago/Turabian StyleWang, Jizhe, Yuanbing Zheng, Jian Wang, Zhenghua Shen, Lei Tong, Yahao Jing, Yu Luo, and Yong Liao. 2023. "Ultra-Reliable Deep-Reinforcement-Learning-Based Intelligent Downlink Scheduling for 5G New Radio-Vehicle to Infrastructure Scenarios" Sensors 23, no. 20: 8454. https://doi.org/10.3390/s23208454
APA StyleWang, J., Zheng, Y., Wang, J., Shen, Z., Tong, L., Jing, Y., Luo, Y., & Liao, Y. (2023). Ultra-Reliable Deep-Reinforcement-Learning-Based Intelligent Downlink Scheduling for 5G New Radio-Vehicle to Infrastructure Scenarios. Sensors, 23(20), 8454. https://doi.org/10.3390/s23208454