Learning-Based Multi-Domain Anti-Jamming Communication with Unknown Information
Abstract
:1. Introduction
1.1. Related Work
1.2. Contribution
- A non-zero-sum game model is used to formulate the competition between the legitimate transmitter and the smart jammers. Two learning-based schemes are proposed to solve the frequency–power domain anti-jamming communication game under the assumption that the information about the jammers is unavailable to the transmitter.
- An anti-jamming scheme based on the DQN algorithm is proposed to optimize the transmit channel and transmit power, while another anti-jamming scheme based on the HL algorithm is proposed to solve the mixed strategy for the Nash equilibrium. Simulation results show that the HL-based anti-jamming scheme has the best converge performance among the learning-based schemes and the DQN-based scheme achieves the largest utility value for the transmitter compared to the anti-jamming communication schemes.
1.3. Organization
2. System Model and Anti-Jamming Game with Smart Jammers
2.1. System Model
2.2. Game Formulation
3. Anti-Jamming Communication Scheme without Opponents’ Information
3.1. The DQN-Based Scheme for Multi-Domain Anti-Jamming Strategies
3.1.1. The Process of the Proposed Scheme
Algorithm 1 The pseudocode of the proposed DQN-Based Scheme |
Without loss of generality, the DQNs of S and are illustrated in the following: |
Set up DQNs for both S and , set empty experience pools and |
Initialize the train DQN with random weights for S and |
Initialize the target DQN with weights and for S and , respectively |
Agent of chooses an action randomly |
Agent of stores its experience into , respectively, until full |
Repeat |
Agent of observes its states , respectively, in the k-th time slot |
Agent of chooses an action , respectively |
Agent of S calculates reward according to the feedback of D |
Agent of calculates reward according to the feedback of D and the jamming powers |
shared among the jammers |
Agent of obtains the next state , respectively |
Agent of stores its experience into , respectively |
Agent of samples a mini-batch from , respectively |
Agent of updates the weights of the train DQN , respectively |
Agent of updates the weights of the target DQN with per , respectively |
Until convergence |
3.1.2. The Definition of the Action, Reward and State
3.2. Hierarchical Learning-Based Scheme for Mixed Strategies
Algorithm 2 The pseudocode of the HL-Based Scheme |
Set up Q-functions and for S and , |
Set up mixed strategies and for S and , respectively |
Initialize and for all actions of S and |
Initialize and so that every action can be chosen with equal probability |
Repeat |
S chooses action by in the k-th time slot |
chooses action by in the k-th time slot |
S calculates the reward according to the feedback SINR of D |
calculates the reward according to the feedback SINR of D as well as the jamming powers |
shared among the jammers |
S and update their Q-functions via (16) and (18), respectively |
S and update their mixed strategies via (17) and (19), respectively |
Until convergence |
4. Simulation Results
4.1. Comparison Schemes
- Q-learning scheme: in addition to the DQN algorithm, Q-learning is another common method used to estimate the value of each action and make a decision by recording the Q-values of all actions in a local table.
- random strategy: randomly choosing all actions with the same probability is a classic and commonly used method to defend against jamming.
- no-jamming: the transmitter and receiver work in the environment without malicious interference from the jammer. This scheme works as the upper bound of the transmitter’s utility.
4.2. Discussion on the Parameters of the HL-Based Scheme
4.3. Performance Comparisons of Different Schemes
4.4. Complexity Analysis of Different Learning-Based Schemes
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
FH | Frequency hopping |
DQN | Deep Q-Network |
HL | Hierarchical learning |
SINR | Signal to interference plus noise ratio |
References
- Zou, Y.; Zhu, J.; Wang, X.; Hanzo, L. A Survey on Wireless Security: Technical Challenges, Recent Advances, and Future Trends. IEEE Trans. Commun. 2016, 10, 1727–1765. [Google Scholar] [CrossRef]
- Jia, L.; Yao, F.; Sun, Y.; Niu, Y.; Zhu, Y. Bayesian Stackelberg Game for Anti-jamming Transmission With Incomplete Information. IEEE Commun. Lett. 2016, 20, 1991–1994. [Google Scholar] [CrossRef]
- Li, Y.; Xiao, L.; Liu, J.; Tang, Y. Power control Stackelberg game in cooperative anti-jamming communications. In Proceedings of the 2014 5th International Conference on Game Theory for Networks, Beijing, China, 25–27 November 2014. [Google Scholar]
- Xu, Y.; Ren, G.; Chen, J.; Luo, Y.; Jia, L.; Liu, X.; Yang, Y.; Xu, Y. A One-Leader Multi-Follower Bayesian-Stackelberg Game for Anti-Jamming Transmission in UAV Communication Networks. IEEE Access 2018, 6, 21697–21709. [Google Scholar] [CrossRef]
- Jia, L.; Xu, Y.; Sun, Y.; Feng, S.; Yu, L.; Anpalagan, A. A Multi-Domain Anti-Jamming Defense Scheme in Heterogeneous Wireless Networks. IEEE Access 2018, 6, 40177–40188. [Google Scholar] [CrossRef]
- Li, Y.; Bai, S.; Gao, Z. A Multi-Domain Anti-Jamming Strategy Using Stackelberg Game in Wireless Relay Networks. IEEE Access 2020, 8, 173609–173617. [Google Scholar] [CrossRef]
- Xiao, L.; Chen, T.; Liu, J.; Dai, H. Anti-Jamming Transmission Stackelberg Game With Observation Errors. IEEE Commun. Lett. 2015, 19, 949–952. [Google Scholar] [CrossRef]
- Jia, L.; Xu, Y.; Sun, Y.; Feng, S.; Anpalagan, A. Stackelberg Game Approaches for Anti-Jamming Defence in Wireless Networks. IEEE Wirel. Commun. 2018, 25, 120–128. [Google Scholar] [CrossRef]
- Naparstek, O.; Cohen, K. Deep multi-user reinforcement learning for distributed dynamic spectrum access. IEEE Trans. Wirel. Communtions 2019, 18, 310–323. [Google Scholar] [CrossRef]
- Liu, X.; Xu, Y.; Cheng, Y.; Li, Y.; Zhao, L.; Zhang, X. A heterogeneous information fusion deep reinforcement learning for intelligent frequency selection of HF communication. China Commun. 2018, 15, 73–84. [Google Scholar] [CrossRef]
- D’Oro, S.; Galluccio, L.; Morabito, G.; Palazzo, S.; Chen, L.; Martignon, F. Defeating Jamming With the Power of Silence: A Game-Theoretic Analysis. IEEE Trans. Wirel. Commun. 2015, 14, 2337–2352. [Google Scholar] [CrossRef]
- Yang, D.; Xue, G.; Zhang, J.; Richa, A.; Fang, X. Coping with a Smart Jammer in Wireless Networks: A Stackelberg Game Approach. IEEE Trans. Wirel. Commun. 2013, 12, 4038–4047. [Google Scholar] [CrossRef]
- D’Oro, S.; Ekici, E.; Palazzo, S. Optimal Power Allocation and Scheduling Under Jamming Attacks. IEEE/ACM Trans. Netw. 2017, 25, 1310–1323. [Google Scholar] [CrossRef]
- Noori, H.; Vilni, S.S. Defense Against Intelligent Jammer in Cognitive Wireless Networks. In Proceedings of the 2019 27th Iranian Conference on Electrical Engineering (ICEE), Yazd, Iran, 30 April–2 May 2019; pp. 1309–1314. [Google Scholar]
- Zhu, Y.; Yu, L.; Zhu, Y.; Jia, L. Optimal Frequency Hopping Rate for Anti-follower Jamming with Detection Error. J. Signal Process. 2018, 34, 824–832. [Google Scholar]
- Mansour, A.E.; Saad, W.M. Adaptive Chaotic Frequency Hopping. In Proceedings of the 2015 Tenth International Conference on Computer Engineering & Systems (ICCES), Cairo, Egypt, 23–24 December 2015. [Google Scholar]
- Niu, Y.; Zhou, Z.; Pu, Z.; Wan, B. Anti-jamming Communication using Slotted Cross Q learning. Electroncs 2023, 12, 2879. [Google Scholar] [CrossRef]
- Hanawal, M.K.; Abdel-Rahman, M.J.; Krunz, M. Game Theoretic Anti-jamming Dynamic Frequency Hopping and Rate Adaptation in Wireless Systems. In Proceedings of the 2014 12th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Hammamet, Tunisia, 12–16 May 2014; pp. 247–254. [Google Scholar]
- Han, Z.; Niyato, D.; Saad, W.; Basar, T.; Hjorungnes, A. Game Theory in Wireless and Communication Networks; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Shen, Z.; Xu, K.; Xia, X. Beam-Domain Anti-Jamming Transmission for Downlink Massive MIMO Systems: A Stackelberg Game Perspective. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2727–2742. [Google Scholar] [CrossRef]
- Li, Y.; Li, K.; Gao, Z.; Zheng, C. A Multi-Domain Anti-Jamming Scheme Based on Bayesian Stackelberg Game With Imperfect Information. IEEE Access 2022, 10, 132250–132259. [Google Scholar] [CrossRef]
- Jia, L.; Yao, F.; Sun, Y.; Xu, Y.; Feng, S.; Anpalagan, A. A Hierarchical Learning Solution for Anti-Jamming Stackelberg Game With Discrete Power Strategies. IEEE Wirel. Commun. Lett. 2017, 6, 818–821. [Google Scholar] [CrossRef]
- Yao, F.; Jia, L.; Sun, Y.; Xu, Y.; Feng, S.; Zhu, Y. A Hierarchical Learning Approach to Anti-jamming Channel Selection Strategies. Wirel. Netw. 2019, 25, 201–213. [Google Scholar] [CrossRef]
- Sun, Y.; Shao, H.; Qiu, J.; Zhang, J.; Sun, F.; Feng, S. Capacity Offloading in Two-tier Small Cell Networks over Unlicensed Band: A Hierarchical Learning Framework. In Proceedings of the 2015 International Conference on Wireless Communications & Signal Processing (WCSP), Nanjing, China, 15–17 October 2015; pp. 1–5. [Google Scholar]
- Wu, Y.; Wang, B.; Liu, K.J.R.; Clancy, T.C. Anti-Jamming Games in Multi-Channel Cognitive Radio Networks. IEEE J. Sel. Areas Commun. 2012, 30, 4–15. [Google Scholar] [CrossRef]
- Adem, N.; Hamdaoui, B. Jamming Resiliency and Mobility Management in Cognitive Communication Networks. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; pp. 1–6. [Google Scholar]
- Xiao, L.; Jiang, D.; Xu, D.; Zhu, H.; Zhang, Y.; Poor, H.V. Two-Dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning. IEEE Trans. Veh. Technol. 2018, 67, 9499–9512. [Google Scholar] [CrossRef]
- Han, G.; Xiao, L.; Poor, H.V. Two-dimensional Anti-jamming Communication based on Deep Reinforcement Learning. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
- Nguyen, P.K.H.; Nguyen, V.H.; Do, V.L. A Deep Double-Q Learning-based Scheme for Anti-Jamming Communications. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–22 January 2021. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D. Human-level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Fudenberg, D.; Tirole, J. Game Theory; MIT Press: Cambridge, MA, USA, 1991; pp. 24–30. [Google Scholar]
- Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2005; pp. 10–48. [Google Scholar]
Parameter | Value |
---|---|
the distance between S and D, | |
the distance between and D, | |
the distance between and D, | |
the distance between and D, | |
the distance between and D, | |
the number of usable channels, M | 5 |
the set of usable transmission power, | |
the set of jamming power for jammer, | |
the noise power, | |
the number of transmission power levels, | 3 |
the number of jamming power levels, | 2 |
the discount of long-term reward, | 0.5 |
the far-field reference distance in (20), | |
the path-loss exponent in (20), | 3 |
the central frequency in (20), |
() | Experiment | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
(0.3, 0.3) | 5.03 | 2.52 | 2.52 | 5.03 | 2.51 | 2.52 | 2.52 | 5.03 | 5.03 | 2.52 | |
(5, 3) | 4.11 | 4.08 | 4.07 | 4.05 | 4.09 | 4.04 | 4.04 | 4.02 | 4.04 | 3.99 | |
(0.3,0.3) | −7.37 | −3.89 | −4.02 | −7.37 | −4.02 | −3.92 | −3.82 | −7.40 | −7.53 | −3.86 | |
(5, 3) | −6.76 | −6.72 | −6.70 | −6.68 | −6.73 | −6.67 | −6.67 | −6.64 | −6.67 | −6.62 |
No-Jamming | DQN-Based Scheme | HL-Based Scheme | Q-Learning Scheme | Random Strategy | |
---|---|---|---|---|---|
(1 jammer) | 7.55 | 6.03 | 4.76 | 4.89 | 3.66 |
(2 jammers) | 7.55 | 4.75 | 3.36 | 3.28 | 2.48 |
(4 jammers) | 7.55 | 1.50 | 0.99 | 0.97 | 0.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Wang, J.; Gao, Z. Learning-Based Multi-Domain Anti-Jamming Communication with Unknown Information. Electronics 2023, 12, 3901. https://doi.org/10.3390/electronics12183901
Li Y, Wang J, Gao Z. Learning-Based Multi-Domain Anti-Jamming Communication with Unknown Information. Electronics. 2023; 12(18):3901. https://doi.org/10.3390/electronics12183901
Chicago/Turabian StyleLi, Yongcheng, Jinchi Wang, and Zhenzhen Gao. 2023. "Learning-Based Multi-Domain Anti-Jamming Communication with Unknown Information" Electronics 12, no. 18: 3901. https://doi.org/10.3390/electronics12183901
APA StyleLi, Y., Wang, J., & Gao, Z. (2023). Learning-Based Multi-Domain Anti-Jamming Communication with Unknown Information. Electronics, 12(18), 3901. https://doi.org/10.3390/electronics12183901