BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning
Abstract
:1. Introduction
- (1)
- Based on the characteristics of the BDS3-SMC, an ideal SMSCS model is proposed. We formally describe the path transmission loss of the short-message terminal, satellite load balance, and satellite service quality through the above model and then establish a multi-objective optimization mathematical model for short-message satellite resource allocation.
- (2)
- Considering that the number of short message terminals in the application scenario can reach more than one million, the huge input data makes DRL-SRA challenging to perform in the training process. We improve the ideal model of short-message satellite resource allocation and propose a region division strategy and a resource allocation model based on this strategy to reduce the computational complexity. The state space, action space, and reward mechanism of satellite resource allocation are defined according to the improved model.
- (3)
- We design a feature extraction network to extract features from the state space to reduce the dimensions of the input data. Combined with the DDPG framework, it solves resource allocation in continuous states. Finally, we propose a BeiDou short-message satellite resource allocation algorithm based on DRL (DRL-SRA).
2. System Model
- (1)
- To reduce the transmission energy consumption of short-message terminals;
- (2)
- To improve the resource utilization of short-message satellites;
- (3)
- To produce adequate short-message terminal service quality.
2.1. Path Transmission Loss Model
2.2. Satellite Load Balancing Model
2.3. Terminal Satisfaction Model
3. Algorithm Design
3.1. Regional Division Strategy
3.2. DRL-SAR Algorithm
- (1)
- Status
- (2)
- Action
- (3)
- Reward
- STEP 1: DRL-SAR input data reconstruction
- STEP 2: DRL-SAR training and update
- (1)
- The target critic network is responsible for calculating targets and based on the state sampled in the experience replay pool. Parameter in the target critic network is regularly copied from in the critic network, i.e.,
- (2)
- The critic network is responsible for iteratively updating parameter in the value function and calculating the current values of and . The loss function of the critic network can be defined as:
- (3)
- The target actor target network is responsible for selecting the optimal action based on the state sampled in the experience replay pool. The parameter in the target actor target network is periodically copied from in the actor network, i.e.,
- (4)
- The actor network is responsible for iteratively updating parameter in the strategy function. According to the state of the SMSCS in the current snapshot , it selects the current action , obtains the reward , and determines the initial state of snapshot . The loss of the actor network can be simply understood as follows: the greater the value of the action obtained, the smaller the network loss. Therefore, the loss function of the actor network can be defined as:
Algorithm 1 DRL-SAR |
Input:, , Output: Optimization result Begin
End |
4. Simulation and Performance Analysis
4.1. Simulation Parameter Setting
4.2. Analysis of Simulation Results
4.2.1. Algorithm Performance Comparison
- (1)
- DQN: The Q network and target Q network with the same network structure are included, the Q network has three hidden layers, and the number of neurons in each layer is 64. The DQN strategy is shown in Equation (38).
- (2)
- TS-IHA: The Hungarian algorithm is satisfied by adding a virtual satellite and terminal.
- (3)
- GA: The population size is 50, the termination evolution number is 400, the crossover probability is 0.9, and the mutation probability is 0.01.
4.2.2. The Influences of Different Weight Values on the Optimization Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Summary of Main Notations | |
Symbol | Description |
Model | |
() | The set of short-message satellites |
() | The set of short-message terminals |
The number of short-message satellite | |
The number of short-message terminal | |
Snapshoot | |
. | |
Resource matrix | |
The set of task queue | |
The satellite-to-Earth link matrix | |
The path transmission loss | |
The path transmission loss matrix | |
The total path transmission loss | |
The load balancing index | |
The set of short-message task | |
The service satisfaction index | |
The objective function | |
Algorithm | |
The set of subregion | |
State space | |
The system state under snapshot | |
The resource allocation decisions | |
The action under | |
The reward function | |
The size of tensor | |
The action function | |
The state function | |
The optimal action function | |
The optimal state function | |
The optimal action | |
The critic network of loss function | |
The actor network of loss function | |
Random noise |
References
- Wang, M.J.; Chen, X.X.; Wu, T.; Si, D.R.; Zhai, Z.Z. Design of integrated radio meteorological parameter monitoring system based on LoRa. Chin. J. Radio Sci. 2020, 6, 943–948. [Google Scholar]
- Wang, C.M.; Lei, W.Y.; Huang, H.; Huang, F.L. Designation of automatic weather station message transmission project based on Beidou. Meteorol. Sci. Technol. 2019, 6, 900–904. [Google Scholar]
- Chen, S.T.; Lin, T.; Zhang, Y.Y. Weather warning information transmission method based on Beidou. Chin. J. Electron Devices 2018, 5, 1269–1274. [Google Scholar]
- Li, H.S.; Cao, Z.Y.; He, S.S.; Zhou, G.Z. Design and application of meteorological disaster early warning release system based on Beidou Satellite technology. Meteorol. Sci. Technol. 2014, 5, 799–803. [Google Scholar]
- Wang, C.F.; Chen, Y.T.; Li, C.L.; Jiang, K.J. Technology and implementation of warning information distribution based on Beidou satellite. J. Appl. Meteorol. Sci. 2014, 3, 375–384. [Google Scholar]
- Li, B.F.; Zhang, Z.T.; Zhang, N.; Wang, S.Y. High-precision GNSS ocean positioning with BeiDou short-message communication. J. Geod. 2019, 2, 125–139. [Google Scholar] [CrossRef]
- Liu, D.G.; Wu, B.G.; Xie, Y.Y.; Luo, H.H. Present state and development trend of maritime meteorological support service. Navig. China 2014, 37, 131–135. [Google Scholar]
- He, K.F.; Weng, D.J.; Ji, S.Y.; Wang, Z.J.; Chen, W.; Lu, Y.W. Ocean Real-Time Precise Point Positioning with the BeiDou Short-Message Service. Remote Sens. 2020, 12, 4167. [Google Scholar] [CrossRef]
- Li, G.; Guo, S.R.; Lv, J.; Zhao, K.L.; He, Z.H. Introduction to global short message communication service of BeiDou-3 navigation satellite system. Adv. Space Res. 2021, 67, 1701–1708. [Google Scholar] [CrossRef]
- Xiang, Y.W.; Zhang, W.Y.; Tian, M.M. Satellite data transmission integrated scheduling and optimization. Syst. Eng. Electron. 2018, 40, 1288–1293. [Google Scholar]
- Cocco, G.; Cola, T.D.; Angelone, M.; Erl, S. Radio Resource Management Optimization of Flexible Satellite Payloads for DVB-S2 Systems. IEEE Trans. Broadcast. 2018, 64, 266–280. [Google Scholar] [CrossRef] [Green Version]
- Zhang, P.; Wang, X.H.; Ma, Z.G.; Song, J.D. Joint optimization of satisfaction index and spectrum efficiency with cache restricted for resource allocation in multi-beam satellite systems. China Commun. 2019, 16, 189–201. [Google Scholar]
- Dai, J.H.; Liu, J.J.; Shi, Y.P.; Zhang, S.B.; Ma, J.F. Analytical Modeling of Resource Allocation in D2D Overlaying Multihop Multichannel Uplink Cellular Networks. IEEE Trans. Veh. Technol. 2017, 66, 6633–6644. [Google Scholar] [CrossRef]
- Sawyer, N.; Smith, D.B. Flexible Resource Allocation in Device-to-Device Communications Using Stackelberg Game Theory. IEEE Trans. Commun. 2019, 67, 653–667. [Google Scholar] [CrossRef]
- Vakilian, V.; Frigon, J.F.; Roy, S. Distributed Resource Allocation for D2D Communications Underlaying Cellular Networks in Time-Varying Environment. IEEE Commun. Lett. 2018, 22, 388–391. [Google Scholar]
- Artiga, X.; Nunez-Martinez, J.; Perez-Neira, A.; Vela, G.; Garcia, J.; Ziaragkas, G. Terrestrial-satellite integration in dynamic 5G backhaul networks. In Proceedings of the 2016 8th Advanced Satellite Multimedia Systems Conference and the 14th Signal Processing for Space Communications Workshop (ASMS/SPSC), Palma de Mallorca, Spain, 5–7 September 2016; pp. 1–6. [Google Scholar]
- Choi, J.P.; Chan, V.W.S. Optimum power and beam allocation based on traffic demands and channel conditions over satellite downlinks. IEEE Trans. Wirel. Commun. 2005, 4, 2983–2993. [Google Scholar] [CrossRef]
- Kan, X.; Xu, X.D. Power allocation based on energy and spectral efficiency in multi-beam satellite systems. J. Univ. Sci. Technol. China 2016, 46, 138–147. [Google Scholar]
- Aravanis, A.I.; Shankar, M.R.B.; Arapoglou, P.D.; Danoy, G.; Cottis, P.G.; Ottersten, B. Power Allocation in Multibeam Satellite Systems: A Two-Stage Multi-Objective Optimization. IEEE Trans. Wirel. Commun. 2015, 14, 3171–3182. [Google Scholar] [CrossRef]
- Efrem, C.N.; Panagopoulos, A.D. Dynamic Energy-Efficient Power Allocation in Multibeam Satellite Systems. IEEE Wirel. Commun. Lett. 2020, 9, 228–231. [Google Scholar] [CrossRef] [Green Version]
- Jiao, J.; Sun, Y.Y.; Wu, S.H. Network Utility Maximization Resource Allocation for NOMA in Satellite-Based Internet of Things. IEEE Internet Things J. 2020, 7, 3230–3242. [Google Scholar] [CrossRef]
- Lin, C.C.; Su, N.W.; Deng, D.J.; Tsai, I.H. Resource allocation of simultaneous wireless information and power transmission of multi-beam solar power satellites in space–terrestrial integrated networks for 6G wireless systems. Wirel. Netw. 2020, 26, 4095–4107. [Google Scholar] [CrossRef]
- Yang, L.; Yang, H.; Wei, D.B.; Pan, C.S. SAGA: A Task-oriented Resource Allocation Algorithms for Satellite Network. J. Chin. Comput. Syst. 2020, 41, 122–127. [Google Scholar]
- Xia, K.W.; Feng, J.; Wang, Q.R.; Yan, C. Optimal Selection Mechanism of Short Message Terminal for “Beidou-3”. In Proceedings of the 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 23–25 October 2020; pp. 1106–1111. [Google Scholar]
- Liu, Q.; Zhai, J.W.; Zhang, Z.C.; Zhong, S.; Zhou, Q.; Zhang, P.; Xu, J. A survey on deep reinforcement learning. Chin. J. Comput. 2018, 41, 1–27. [Google Scholar]
- Gu, B.; Zhang, X.; Lin, Z.; Alazab, M. Deep Multiagent Reinforcement-Learning-Based Resource Allocation for Internet of Controllable Things. IEEE Internet Things J. 2021, 8, 3066–3074. [Google Scholar] [CrossRef]
- Zhao, N.; Liang, Y.; Niyato, D.; Pei, Y.; Wu, M.; Jiang, Y. Deep Reinforcement Learning for User Association and Resource Allocation in Heterogeneous Cellular Networks. IEEE Trans. Wirel. Commun. 2019, 18, 5141–5152. [Google Scholar] [CrossRef]
- Tang, F.; Zhou, Y.; Kato, N. Deep Reinforcement Learning for Dynamic Uplink/Downlink Resource Allocation in High Mobility 5G HetNet. IEEE J. Sel. Areas Commun. 2020, 38, 2773–2782. [Google Scholar] [CrossRef]
- Hu, X.; Zhang, Y.; Liao, X.; Liu, Z.; Wang, W.; Ghannouchi, F.M. Dynamic Beam Hopping Method Based on Multi-Objective Deep Reinforcement Learning for Next Generation Satellite Broadband Systems. IEEE Trans. Broadcast. 2020, 66, 630–646. [Google Scholar] [CrossRef]
- Xiong, X.; Zheng, K.; Lei, L.; Hou, L. Resource Allocation Based on Deep Reinforcement Learning in IoT Edge Computing. IEEE J. Sel. Areas Commun. 2020, 38, 1133–1146. [Google Scholar] [CrossRef]
- Takahashi, M.; Kawamoto, Y.; Kato, N.; Miura, A.; Toyoshima, M. Adaptive Power Resource Allocation with Multi-Beam Directivity Control in High-Throughput Satellite Communication System. IEEE Wirel. Commun. Lett. 2019, 8, 1248–1251. [Google Scholar] [CrossRef]
- Ferreira, P.; Paffenroth, R.; Wyglinski, A.M. Multiobjective Reinforcement Learning for Cognitive Satellite Communications Using Deep Neural Network Ensembles. IEEE J. Sel. Areas Commun. 2018, 36, 1030–1041. [Google Scholar]
- Hu, X.; Liu, S.; Chen, R.; Wang, W.; Wang, C. A Deep Reinforcement Learning-Based Framework for Dynamic Resource Allocation in Multibeam Satellite Systems. IEEE Commun. Lett. 2018, 22, 1612–1615. [Google Scholar] [CrossRef]
- Hu, X.; Liu, S.; Wang, Y.; Xu, L.; Zhang, Y.; Wang, C.; Wang, W. Deep reinforcement learning based beam hopping algorithm in multibeam satellite systems. IET Commun. 2019, 13, 2485–2491. [Google Scholar] [CrossRef]
- Luis, J.J.G.; Guerster, M.; del Portillo, I.; Crawley, E.; Cameron, B. Deep Reinforcement Learning for Continuous Power Allocation in Flexible High Throughput Satellites. In Proceedings of the 2019 IEEE Cognitive Communications for Aerospace Applications Workshop (CCAAW), Cleveland, OH, USA, 25–26 June 2019; pp. 1–4. [Google Scholar]
- Yu, Z.; Machado, P.; Zahid, A.; Abdulghani, A.M.; Dashtipour, K.; Heidari, H.; Imran, M.A.; Abbasi, Q.H. Energy and performance trade-off optimization in heterogeneous computing via reinforcement learning. Electronics 2020, 9, 1812. [Google Scholar] [CrossRef]
- Zhang, P.; Liu, S.J.; Ma, Z.G.; Wang, X.H.; Song, D.J. Improved satellite resource allocation algorithmbased on DRL and MOP. J. Commun. 2020, 41, 51–60. [Google Scholar]
- Qiu, C.; Yao, H.; Yu, F.R.; Xu, F.; Zhao, C. Deep Q-Learning Aided Networking, Caching, and Computing Resources Allocation in Software-Defined Satellite-Terrestrial Networks. IEEE Trans. Veh. Technol. 2019, 68, 5871–5883. [Google Scholar] [CrossRef]
- China Satellite Navigation Office. BeiDou Navigation Satellite System Open Service Performance Standard [EB/OL]. 2018-12-28. Available online: http://www.beidou.gov.cn (accessed on 28 December 2018).
- Gounder, V.V.; Prakash, R.; Abu-Amara, H. Routing in LEO-based satellite networks. In Proceedings of the 1999 IEEE Emerging Technologies Symposium, Wireless Communications and Systems (IEEE Cat. No.99EX297), Richardson, TX, USA, 12–13 April 1999; pp. 22.1–22.6. [Google Scholar]
- Ministry of Industry and Information Technology of the People’s Republic of China. Methods for Calculating Attanuations by Atmospheric Gases and Rain in the Satellite Communication Link (YD/T 984-2020); Standards Press of China: Beijing, China, 2020.
- Chen, L.M.; Guo, Q.; Yang, M.C. Probability-Based Bandwidth Reservation Strategy for LEO Satellite Networks with Multi-Class Traffic. J. South China Univ. Technol. (Nat. Sci. Ed.) 2012, 40, 84–89. [Google Scholar]
- Yang, B.; He, F.; Jin, J.; Xu, G.H. Analysis of Coverage Time and Handoff Number on LEO Satellite Communication Systems. J. Electron. Inf. Technol. 2014, 36, 804–809. [Google Scholar]
- Xu, S.Y.; Xing, Y.F.; Guo, S.G.; Yang, C.; Qiu, X.S.; Meng, L.M. Deep reinforcement learning based task allocation mechanism for intelligent inspection in energy Internet. J. Commun. 2021, 42, 191–204. [Google Scholar]
Layer | Input | Kernel | Activation | Output |
---|---|---|---|---|
Conv1 | ,4 | Relu | ||
Conv2 | ,8 | Relu | ||
FC1 | NA | Relu | 1024 | |
FC2 | 1024 | NA | Relu | 256 |
FC3 | 256 | NA | Relu |
Simulation Parameter | Value |
---|---|
Uplink operating frequency/MHz | 1620 |
Downlink operating frequency/MHz | 1207.14 |
Orbital altitude/km | 21,528 |
Total number of short-message terminals | 1100~2400 |
/bit | 560 |
/s | 60 |
Time taken up by each snapshot/s | 60 |
1000 | |
/Mbit/s | 50 |
10 | |
Episodes | 500 |
Steps T | 500 |
Batch size | 8 |
Discount factor | 0.9 |
Soft update factor | 0.01 |
Learning rate | 0.01 |
Activation function | Relu |
64 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xia, K.; Feng, J.; Yan, C.; Duan, C. BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning. Entropy 2021, 23, 932. https://doi.org/10.3390/e23080932
Xia K, Feng J, Yan C, Duan C. BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning. Entropy. 2021; 23(8):932. https://doi.org/10.3390/e23080932
Chicago/Turabian StyleXia, Kaiwen, Jing Feng, Chao Yan, and Chaofan Duan. 2021. "BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning" Entropy 23, no. 8: 932. https://doi.org/10.3390/e23080932
APA StyleXia, K., Feng, J., Yan, C., & Duan, C. (2021). BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning. Entropy, 23(8), 932. https://doi.org/10.3390/e23080932