Network Attack Path Selection and Evaluation Based on Q-Learning
Abstract
:1. Introduction
2. FPN-Q Learning Algorithm to Determine the Best Attack Path
2.1. Attack Model
- (1)
- H = {h1, h2, h3, …, hn} is a finite set of places h, which represents the host of the information system in the model;
- (2)
- T = {t1, t2, t3, …, tm} is a finite set of transitions t, which represents the exploitable vulnerabilities of the system host in the model;
- (3)
- α represents the risk value caused by the system host represented by the place after being invaded, that is, the threat index; and
- (4)
- μ: T × H → (1,10) represents the confidence of the transition rule, that is, the probability that a certain transition is triggered. In the network attack model, it represents the complexity of the attack process. The higher the attack complexity, the lower the possibility of being attacked. Attack complexity is affected by many factors such as attack tools and attacker experience. Its value is given according to the Common Vulnerability Scoring System (CVSS) [19].
2.2. FPN-Q Learning Algorithm to Determine the Best Attack Path
2.3. Learning Stage
Algorithm 1 environment: FPN-Q Learning algorithm: learning stage |
Initialization: Initialize the Q-table |
Determine the attack access host h0 and the target host ht. |
for current number of episodes ≤ maximal episodes |
do |
Reset: Current host: hc = h0; |
Target host = ht; |
while hc ~= ht do |
Obtain all vulnerabilities t from the attackable hosts h; |
Obtain the probability pei of each vulnerability being selected according to (1) |
Choose a vulnerability tei according to pei; |
Obtain evaluative feedback: Generate the reward rt+1 according to (2); |
Update the current host: hc = he; |
Update the value of Q(he, vi) according to (4). |
end while |
end for |
2.4. Attack Stage
Algorithm 2 environment: FPN-Q Learning algorithm: learning stage |
Initialization: Initialize the G |
Determine the attack access host h0 and the target host ht. |
Set: Current host: hc = h0; |
Target host = ht; |
while hc ~= ht do |
Obtain all attackable hosts h; |
Obtain Q(hi,tij,hj) of each attackable host; |
Find the max Q(hi,tij,he) from attackable hosts; |
Update G: G ← G + gie |
Update the current host: hc = he; |
end while |
3. Information-Physical Cross-Layer Risk Spread Model
4. Security Risk Assessment of Electric Power CPS under Cyber Attack
- (1)
- The attacker uses a certain strategy to launch an attack through the smart terminal to enter the control center, and then randomly or deliberately tamper with the business data according to the knowledge of the physical power grid, so that the load of some physical nodes exceeds the predetermined quota.
- (2)
- The control center considers that the load on the node exceeds the capacity, and judges that the node is faulty. Therefore, the control center performs load reduction according to Formulas (5)–(10) with the goal of minimizing load loss, and issues a control command to cut off part of the load of the node and its neighboring nodes to ensure the safe and stable operation of the system.
- (1)
- The intelligent terminal adjusts the load of the physical system according to the wrong instruction issued by the control system;
- (2)
- Each node of the physical power grid adjusts the load according to the control command, and some nodes will lose the load of normal operation. Therefore, the physical power grid trend will change, and new business data will be transmitted to the control center.
5. Simulation Evaluation
5.1. Establishment of Simulation Environment
- (1)
- The attacker uses the power distribution terminal equipment as the access point to further invade the DMZ area. By invading the DMZ area, The system’s security vulnerabilities are continuously used to increase the authority until entering the control center application server because of the invasion. Deliberate tampering of business data will cause greater losses to the system. At this time, the attacker’s target is H8.
- (2)
- The attacker uses the power distribution terminal equipment as the access point to invade the DMZ area and continuously use the system security vulnerabilities to increase the authority until entering the operator’s Human Machine Interface (HMI). The business data is randomly tampered on the HMI side, because the attacker does not have detailed physical power grid parameters and data. At this time the attacker’s target is H4.
5.2. Analysis of Experimental Results
5.2.1. Experimental Results—Security Analysis of the Information Layer
5.2.2. Experimental Results—Algorithm Comparison
- (1)
- Compare the algorithm proposed in this paper with random attacks and selective attacks. Taking attack mode 1 as an example, the attack gains obtained after 100 attacks are shown in Figure 7a (its abscissa variables are the experiment times). It can be seen from the figure that compared with random attacks, selective attacks have a greater redirection through the best attack path to get the greatest attack gain, but the algorithm proposed in this paper shows obvious advantages. After learning 15 times, you can find the best attack path and get the maximum attack gain.
- (2)
- Compare the algorithm proposed in this article with the traditional Q-Learning algorithm. Use the traditional Q-Learning algorithm to perform path learning and attack gain calculation for attack mode 1 and attack mode 2. The comparison of the result with two methods is shown in Figure 7b. It can be seen from the figure that the attack gain obtained by the improved algorithm is consistent with the traditional algorithm, indicating that the algorithm in this paper has high accuracy. However, in terms of learning speed, the algorithm proposed in this paper can find the best attack path faster. Especially for the more complex attack mode (mode 1), the traditional algorithm needs 30 times of learning to get the best gain and reach convergence, while the algorithm in this paper only needs 15 times. The learning efficiency has nearly doubled, showing obvious advantages.
5.2.3. Experimental Results—Cross-Layer Risk Communication Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xin, S.; Guo, Q.; Sun, H.; Zhang, B.; Wang, J.; Chen, C. Cyber-Physical Modeling and Cyber-Contingency Assessment of Hierarchical Control Systems. IEEE Trans. Smart Grid 2017, 6, 2375–2385. [Google Scholar] [CrossRef]
- Sridhar, S.; Hahn, A.; Govindarasu, M. Cyber–Physical System Security for the Electric Power Grid. Proc. IEEE Inst. Electr. Electron. Eng. 2011, 100, 210–224. [Google Scholar] [CrossRef]
- Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
- Liang, G.; Weller, S.R.; Zhao, J.; Luo, F.; Dong, Z.Y. The 2015 ukraine blackout: Implications for false data injection attacks. IEEE Trans. Power Syst. 2016, 32, 3317–3318. [Google Scholar] [CrossRef]
- Staff, T. Steinitz: Israel’s Electric Authority Hit by ‘Severe’ Cyber-Attack. The Times of Israel. 2016. Available online: https://www.timesofisrael.com/steinitz-israels-electric-authority-hit-by-severe-cyber-attack/ (accessed on 26 January 2016).
- Liu, S.; Chen, B.; Zourntos, T.; Kundur, D.; Butler-Purry, K. A Coordinated Multi-Switch Attack for Cascading Failures in Smart Grid. IEEE Trans. Smart Grid 2014, 5, 1183–1195. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, N. The LAP under facility disruptions during early post-earthquake rescue using PSO-GA hybrid algorithm. Fresen. Environ. Bull. 2019, 28, 9906–9914. [Google Scholar]
- Liu, X. A network attack path prediction method using attack graph. J. Ambient Intell. Humaniz. Comput. 2020, 1–8. [Google Scholar] [CrossRef]
- Swiler, L.P.; Phillips, C.; Ellis, D.; Chakerian, S. Computer-attack graph generation tool. In Proceedings of the DARPA Information Survivability Conference and Exposition II, DISCEX’01, Anaheim, CA, USA, 12–14 June 2001; Volume 2, pp. 307–321. [Google Scholar]
- Zhang, B.; Lu, K.; Pan, X.; Wu, Z. Reverse search based network attack graph generation. In Proceedings of the International Conference on Computational Intelligence and Software Engineering, Wuhan, China, 11–13 December 2009; pp. 1–4. [Google Scholar]
- Singhal, A.; Ou, X. Security risk analysis of enterprise networks using probabilistic attack graphs. In Network Security Metrics; Springer: Cham, Switzerland, 2017; pp. 53–73. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: London, UK, 1998. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Jaradat, M.A.K.; Al-Rousan, M.; Quadan, L. Reinforcement based mobile robot navigation in dynamic environment. Robot. Comput. Integr. Manuf. 2011, 27, 135–149. [Google Scholar] [CrossRef]
- Maoudj, A.; Hentout, A. Optimal path planning approach based on Q-learning algorithm for mobile robots. Appl. Soft Comput. 2020, 97, 106796. [Google Scholar] [CrossRef]
- Yan, J.; He, H.; Zhong, X.; Tang, Y. Q-Learning-Based Vulnerability Analysis of Smart Grid against Sequential Topology Attacks. IEEE Trans. Inf. Forensics Secur. 2016, 12, 200–210. [Google Scholar] [CrossRef]
- Yousefi, M.; Mtetwa, N.; Zhang, Y.; Tianfield, H. A Reinforcement Learning Approach for Attack Graph Analysis. In Proceedings of the 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), New York, NY, USA, 1–3 August 2018; pp. 212–217. [Google Scholar]
- Chen, S.M. Fuzzy backward reasoning using fuzzy Petri nets. IEEE Trans. Syst. Man Cybern. Syst. Part B (Cybern.) 2000, 30, 846–856. [Google Scholar] [CrossRef] [PubMed]
- Mell, P.; Scarfone, K.; Romanosky, S. Common vulnerability scoring system. IEEE Secur. Priv. 2006, 4, 85–89. [Google Scholar] [CrossRef]
- Xu, L.; Guo, Q.; Yang, T.; Sun, H. Robust Routing Optimization for Smart Grids Considering Cyber-Physical Interdependence. IEEE Trans. Smart Grid 2018, 10, 5620–5629. [Google Scholar] [CrossRef]
- Li, X.; Zhou, C.; Tian, Y.C.; Xiong, N.; Qin, Y. Asset-based dynamic impact assessment of cyberattacks for risk analysis in industrial control systems. IEEE Trans. Ind. Inform. 2017, 14, 608–618. [Google Scholar] [CrossRef]
- Ye, X.; Zhao, J.; Zhang, Y.; Wen, F. Quantitative vulnerability assessment of cyber security for distribution automation systems. Energies 2015, 8, 5266–5286. [Google Scholar] [CrossRef] [Green Version]
- Zhou, X.; Yang, Z.; Ni, M.; Lin, H.; Li, M.; Tang, Y. Analysis of the Impact of Combined Information-Physical-Failure on Distribution Network CPS. IEEE Access 2020, 8, 44140–44152. [Google Scholar] [CrossRef]
- Stellios, I.; Kotzanikolaou, P.; Psarakis, M.; Alcaraz, C.; Lopez, J. A survey of iot-enabled cyberattacks: Assessing attack paths to critical infrastructures and services. IEEE Commun. Surv. Tutor. 2018, 20, 3453–3495. [Google Scholar] [CrossRef]
- Teixeira, A.; Shames, I.; Sandberg, H.; Johansson, K.H. A secure control framework for resource-limited adversaries. Automatica 2015, 51, 135–148. [Google Scholar] [CrossRef] [Green Version]
Node | Risk | Node | Risk | ||
---|---|---|---|---|---|
Attack Mode 1 | Attack Mode 2 | Attack Mode 1 | Attack Mode 2 | ||
1 | 0.05571 | 0.16091 | 8 | 0.28763 | 0.36563 |
2 | 0.0402 | 0.13669 | 9 | 1.96169 | 0.95486 |
3 | 0.08665 | 0.20068 | 10 | 0.27507 | 0.35756 |
4 | 0.17548 | 0.28559 | 11 | 0.30509 | 0.37656 |
5 | 0.13683 | 0.25218 | 12 | 0.29032 | 0.36734 |
6 | 9.77915 | 2.13194 | 13 | 0.20045 | 0.30523 |
7 | 0.15986 | 0.27258 | 14 | 0.24754 | 0.33920 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, R.; Gong, J.; Tong, W.; Fan, B. Network Attack Path Selection and Evaluation Based on Q-Learning. Appl. Sci. 2021, 11, 285. https://doi.org/10.3390/app11010285
Wu R, Gong J, Tong W, Fan B. Network Attack Path Selection and Evaluation Based on Q-Learning. Applied Sciences. 2021; 11(1):285. https://doi.org/10.3390/app11010285
Chicago/Turabian StyleWu, Runze, Jinxin Gong, Weiyue Tong, and Bing Fan. 2021. "Network Attack Path Selection and Evaluation Based on Q-Learning" Applied Sciences 11, no. 1: 285. https://doi.org/10.3390/app11010285
APA StyleWu, R., Gong, J., Tong, W., & Fan, B. (2021). Network Attack Path Selection and Evaluation Based on Q-Learning. Applied Sciences, 11(1), 285. https://doi.org/10.3390/app11010285