Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission
Abstract
:1. Introduction
1.1. Contributions
1.2. Related Research
2. Background
3. Methodology
3.1. Leaderless and Leader-Follower Topologies
3.2. Observation
3.3. Action
- No Selection: If the channel is busy at time step t (checked at less than one mini-slot), the DQN agent does not take any action at the next time step. Hence, .
- Uniform Selection with Probability ε: If the channel is idle at time step t, the DQN agent chooses an action (transfer or not to transfer a packet) at time step . If the agent at the next mini-slot transmits packets with the length of , then the action at time step is , where . This action selection method is a uniform random selection with probability (exploration) using -greedy algorithm.
- Non-uniform Selection with Probability : If the channel is idle at time step t, an action to transfer or not to transfer a packet at time step can be chosen by the DQN agent. According to the conventional -greedy DQN algorithm, the action will be the maximum Q-value , where Q is a parametric function including state s, action a, and parameter as a vector, including the weights in the NN. Moreover, is the set of actions. Therefore, the action at time step is
3.4. State
3.5. Reward
- At time step t, if the transferred package does not reach the destination (another agent from another cluster or the agent from the same cluster) successfully, and collides on the way, the immediate reward for i-th agent at time step is .
- Using the observation set of [20], if the data packet successfully transferred by each agent in the network, the immediate reward for i-th agent at time step is
- By proposing on-time and time-delay observations, we append other components to the immediate reward and present a new immediate reward for each agent. Considering constant and , the new immediate reward for i-th agent is introduced by
- Considering the distance between agents who transmit data to each other, we propose another type of immediate reward for i-th agent using the combined immediate reward function [41]. If the data packet successfully transferred by i-th agent to j-th agent (on-time), we propose the novel distance-based immediate reward for i-th agent at time step asIf the data packet transferred by i-th agent to j-th agent successfully and time-delayed, the distance-based immediate reward for i-th agent at time step when is given by
3.6. DQN Loss
3.7. Adversarial Attacks
3.7.1. FGSM Adversarial Attack
3.7.2. FGM Adversarial Attack
3.7.3. BIM Adversarial Attack
3.8. First Adversarial Attack Defense
3.8.1. FGSM Adversarial Attack Defense
Algorithm 1 First Adversarial Attack Defense |
Input: s, , , T, , , Output: ,
|
3.8.2. FGM Adversarial Attack Defense
3.8.3. BIM Adversarial Attack Defense
3.9. Second Adversarial Attack Defense
Algorithm 2 Second Adversarial Attack Defense |
Input: s, , , T, , , Output: ,
|
4. Results and Discussion
4.1. Multi-Agent Performance Analysis
4.2. Performance Analysis of the Proposed MAS under Adversarial Attacks
4.3. Performance Analysis of the Proposed MAS after Applying First Adversarial Attack Defense
4.4. Performance Analysis of the Proposed MAS after Applying Second Adversarial Attack Defense
4.5. Variety of Agents
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
BIM | Basic Iterative Method |
CS-DLMA | Carrier-sense Deep Reinforcement Learning Multiple Access |
CSMA | Carrier-sense Multiple Access |
DQN | Deep Q-network |
DL | Deep Learning |
DRL | Deep Reinforcement Learning |
FGM | Fast Gradient Method |
FGSM | Fast Gradient Sign Method |
ML | Machine Learning |
MSE | Mean Squared Error |
MADRL | Multi-agent Deep Reinforcement Learning |
MAS | Multi-agent System |
MTDDQN | Multi-source Transfer Double DQN |
NN | Neural Network |
ReLU | Rectified Linear Unit |
RL | Reinforcement Learning |
TDMA | Time Division Multiple Access |
TL | Transfer Learning |
References
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Canese, L.; Cardarilli, G.C.; Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Re, M.; Spanò, S. Multi-agent reinforcement learning: A review of challenges and applications. Appl. Sci. 2021, 11, 4948. [Google Scholar] [CrossRef]
- Mousavi, S.S.; Schukat, M.; Howley, E. Deep reinforcement learning: An overview. In Proceedings of the SAI Intelligent Systems Conference, London, UK, 21–22 September 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 426–440. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. arXiv 2018, arXiv:1811.12560. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, S.; Doan, T.T.; Maguluri, S.T.; Clarke, J.P. Performance of Q-learning with linear function approximation: Stability and finite-time analysis. arXiv 2019, arXiv:1905.11425. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Gao, Z.; Gao, Y.; Hu, Y.; Jiang, Z.; Su, J. Application of deep q-network in portfolio management. In Proceedings of the IEEE International Conference on Big Data Analytics (ICBDA), Xiamen, China, 8–11 May 2020; pp. 268–275. [Google Scholar]
- Carta, S.; Ferreira, A.; Podda, A.S.; Recupero, D.R.; Sanna, A. Multi-DQN: An ensemble of Deep Q-learning agents for stock market forecasting. Expert Syst. Appl. 2021, 164, 113820. [Google Scholar] [CrossRef]
- Jang, B.; Kim, M.; Harerimana, G.; Kim, J.W. Q-learning algorithms: A comprehensive classification and applications. IEEE Access 2019, 7, 133653–133667. [Google Scholar] [CrossRef]
- Pan, J.; Wang, X.; Cheng, Y.; Yu, Q. Multisource transfer double DQN based on actor learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2227–2238. [Google Scholar] [CrossRef]
- Zhao, C.; Liu, X.; Zhong, S.; Shi, K.; Liao, D.; Zhong, Q. Secure consensus of multi-agent systems with redundant signal and communication interference via distributed dynamic event-triggered control. ISA Trans. 2021, 112, 89–98. [Google Scholar] [CrossRef]
- Zheng, S.; Shi, P.; Agarwal, R.K.; Lim, C.P. Periodic event-triggered output regulation for linear multi-agent systems. Automatica 2020, 122, 109223. [Google Scholar] [CrossRef]
- Yuan, S.; Yu, C.; Sun, J. Adaptive event-triggered consensus control of linear multi-agent systems with cyber attacks. Neurocomputing 2021, 442, 1–9. [Google Scholar] [CrossRef]
- Chen, M.; Yan, H.; Zhang, H.; Chi, M.; Li, Z. Dynamic event-triggered asynchronous control for nonlinear multi-agent systems based on TS fuzzy models. IEEE Trans. Fuzzy Syst. 2020, 29, 2580–2592. [Google Scholar] [CrossRef]
- Rehan, M.; Tufail, M.; Ahmed, S. Leaderless Consensus Control of Nonlinear Multi-agent Systems under Directed Topologies subject to Input Saturation using Adaptive Event-Triggered Mechanism. J. Frankl. Inst. 2021, 358, 6217–6239. [Google Scholar] [CrossRef]
- Zhou, B.; Yang, Y.; Li, L.; Hao, R. Leaderless and leader-following consensus of heterogeneous second-order multi-agent systems on time scales: An asynchronous impulsive approach. Int. J. Control 2021, 1–11. [Google Scholar] [CrossRef]
- Yu, Y.; Liew, S.C.; Wang, T. Non-Uniform Time-Step Deep Q-Network for Carrier-Sense Multiple Access in Heterogeneous Wireless Networks. IEEE Trans. Mob. Comput. 2021, 20, 2848–2861. [Google Scholar] [CrossRef]
- Chakraborty, A.; Alam, M.; Dey, V.; Chattopadhyay, A.; Mukhopadhyay, D. Adversarial attacks and defences: A survey. arXiv 2018, arXiv:1810.00069. [Google Scholar] [CrossRef]
- Akhtar, N.; Mian, A. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access 2018, 6, 14410–14430. [Google Scholar] [CrossRef]
- Goodfellow, I.; McDaniel, P.; Papernot, N. Making machine learning robust against adversarial inputs. Commun. ACM 2018, 61, 56–66. [Google Scholar] [CrossRef]
- Wang, X.; Li, J.; Kuang, X.; Tan, Y.a.; Li, J. The security of machine learning in an adversarial setting: A survey. J. Parallel Distrib. Comput. 2019, 130, 12–23. [Google Scholar] [CrossRef]
- Pitropakis, N.; Panaousis, E.; Giannetsos, T.; Anastasiadis, E.; Loukas, G. A taxonomy and survey of attacks against machine learning. Comput. Sci. Rev. 2019, 34, 100199. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial machine learning at scale. arXiv 2016, arXiv:1611.01236. [Google Scholar]
- Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9185–9193. [Google Scholar]
- Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar]
- Haydari, A.; Zhang, M.; Chuah, C.N. Adversarial Attacks and Defense in Deep Reinforcement Learning (DRL)-Based Traffic Signal Controllers. IEEE Open J. Intell. Transp. Syst. 2021, 2, 402–416. [Google Scholar] [CrossRef]
- Hussenot, L.; Geist, M.; Pietquin, O. Manipulating Neural Policies with Adversarial Observations. In Proceedings of the Real-World Sequential Decision Making Workshop, ICML, Long Beach, CA, USA, 14 June 2019. [Google Scholar]
- Yuan, Z.; Zhang, J.; Jia, Y.; Tan, C.; Xue, T.; Shan, S. Meta gradient adversarial attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 7748–7757. [Google Scholar]
- Metzen, J.H.; Genewein, T.; Fischer, V.; Bischoff, B. On detecting adversarial perturbations. arXiv 2017, arXiv:1702.04267. [Google Scholar]
- Dong, Y.; Su, H.; Zhu, J.; Bao, F. Towards interpretable deep neural networks by leveraging adversarial examples. arXiv 2017, arXiv:1708.05493. [Google Scholar]
- Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. Ensemble adversarial training: Attacks and defenses. arXiv 2017, arXiv:1705.07204. [Google Scholar]
- Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
- Elhami Fard, N.; Selmic, R.R. Time-delayed Data Transmission in Heterogeneous Multi-agent Deep Reinforcement Learning System. In Proceedings of the 2022 30th Mediterranean Conference on Control and Automation (MED), Athens, Greece, 28 June–1 July 2022; pp. 636–642. [Google Scholar]
- Mesbahi, M.; Egerstedt, M. Graph Theoretic Methods in Multiagent Networks; Princeton University Press: Princeton, NJ, USA, 2010; Volume 33. [Google Scholar]
- Wang, S.; Diao, R.; Lan, T.; Wang, Z.; Shi, D.; Li, H.; Lu, X. A DRL-aided multi-layer stability model calibration platform considering multiple events. In Proceedings of the 2020 IEEE Power & Energy Society General Meeting (PESGM), Virtual, 3–6 August 2020; pp. 1–5. [Google Scholar]
- Liu, Y.; Cao, B.; Li, H. Improving ant colony optimization algorithm with epsilon greedy and Levy flight. Complex Intell. Syst. 2021, 7, 1711–1722. [Google Scholar] [CrossRef] [Green Version]
- Elhami Fard, N.; Selmic, R.R. Consensus of Multi-agent Reinforcement Learning Systems: The Effect of Immediate Rewards. J. Robot. Control (JRC) 2022, 3, 115–127. [Google Scholar] [CrossRef]
- Ohnishi, S.; Uchibe, E.; Yamaguchi, Y.; Nakanishi, K.; Yasui, Y.; Ishii, S. Constrained deep q-learning gradually approaching ordinary q-learning. Front. Neurorobot. 2019, 13, 103. [Google Scholar] [CrossRef] [PubMed]
- Yu, Y.; Wang, T.; Liew, S.C. Deep-reinforcement learning multiple access for heterogeneous wireless networks. IEEE J. Sel. Areas Commun. 2019, 37, 1277–1290. [Google Scholar] [CrossRef] [Green Version]
- Ammouri, K. Deep Reinforcement Learning for Temperature Control in Buildings and Adversarial Attacks. 2021. Available online: https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1590898&dswid=-7284 (accessed on 6 September 2021).
- Yu, Y. CS-DLMA. 2019. Available online: https://github.com/YidingYu/CS-DLMA (accessed on 26 August 2019).
- Abramson, N. The ALOHA system: Another alternative for computer communications. In Proceedings of the Fall Joint Computer Conference, Houston, TX, USA, 17–19 November 1970; pp. 281–285. [Google Scholar]
- Kuo, F.F. The ALOHA system. ACM SIGCOMM Comput. Commun. Rev. 1995, 25, 41–44. [Google Scholar] [CrossRef]
- Kuo, F.F. Computer Networks–The ALOHA System; Technical report; Hawaii University at Manoa Honolulu Department of Electrical Engineering: Honolulu, HI, USA, 1981. [Google Scholar]
- Jung, P. Time Division Multiple Access (TDMA). In Wiley Encyclopedia of Telecommunications; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
Rewards and Loss | Agent 1 Reward | Agent 2 Reward | Agent 3 Reward | Agent 4 Reward | Agent 5 Reward | Team Reward | DQN Loss | |
---|---|---|---|---|---|---|---|---|
Various Graphs | ||||||||
Leaderless MAS | ||||||||
Leaderless MAS with time-delay | ||||||||
Leader-follower MAS | ||||||||
Leader-follower MAS with time-delay |
Rewards and Loss | Agent 1 Reward | Agent 2 Reward | Agent 3 Reward | Agent 4 Reward | Agent 5 Reward | Team Reward | DQN Loss | |
---|---|---|---|---|---|---|---|---|
Various Graphs | ||||||||
Leaderless MAS | ||||||||
Leaderless MAS with time-delay | ||||||||
Leader-follower MAS | ||||||||
Leader-follower MAS with time-delay |
Rewards and Loss | Agent 1 Reward | Agent 2 Reward | Agent 3 Reward | Agent 4 Reward | Agent 5 Reward | Team Reward | DQN Loss | |
---|---|---|---|---|---|---|---|---|
Various Attacks | ||||||||
Leader-follower MAS without attack | ||||||||
Leader-follower MAS with FGSM attack | 20,920.10 | |||||||
Leader-follower MAS with FGM attack | 60,232.71 | |||||||
Leader-follower MAS with BIM attack | 27,949.57 |
Rewards and Loss | Agent 1 Reward | Agent 2 Reward | Agent 3 Reward | Agent 4 Reward | Agent 5 Reward | Team Reward | DQN Loss | |
---|---|---|---|---|---|---|---|---|
Various Attacks | ||||||||
Leader-follower MAS without attack | ||||||||
Leader-follower MAS with FGSM attack | ||||||||
Leader-follower MAS with FGM attack | ||||||||
Leader-follower MAS with BIM attack |
Rewards and Loss | Agent 1 Reward | Agent 2 Reward | Agent 3 Reward | Agent 4 Reward | Agent 5 Reward | Team Reward | DQN Loss | |
---|---|---|---|---|---|---|---|---|
Various Attacks | ||||||||
Leader-follower MAS without attack | ||||||||
Leader-follower MAS with FGSM attack | ||||||||
Leader-follower MAS with FGM attack | ||||||||
Leader-follower MAS with BIM attack |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elhami Fard, N.; Selmic, R.R. Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission. J. Sens. Actuator Netw. 2022, 11, 45. https://doi.org/10.3390/jsan11030045
Elhami Fard N, Selmic RR. Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission. Journal of Sensor and Actuator Networks. 2022; 11(3):45. https://doi.org/10.3390/jsan11030045
Chicago/Turabian StyleElhami Fard, Neshat, and Rastko R. Selmic. 2022. "Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission" Journal of Sensor and Actuator Networks 11, no. 3: 45. https://doi.org/10.3390/jsan11030045
APA StyleElhami Fard, N., & Selmic, R. R. (2022). Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission. Journal of Sensor and Actuator Networks, 11(3), 45. https://doi.org/10.3390/jsan11030045