Enhanced Encrypted Traffic Analysis Leveraging Graph Neural Networks and Optimized Feature Dimensionality Reduction
Abstract
:1. Introduction
- The metadata used for encrypted network traffic classification and the ideal combination of usable features within the metadata, particularly the TLS session data, were analyzed.
- A classification architecture capable of classifying encrypted normal/malicious network traffic using fewer features than existing GNN-based ETA algorithms was proposed.
- The accuracy, precision, recall, and F1-score for each feature combination were analyzed to derive an ideal combination of features, demonstrating the enhanced capability of classifying malicious network traffic with fewer features based on the proposed feature combination.
2. Related Work
2.1. Encrypted Network Traffic
2.2. Graph Neural Network and GraphSAGE
3. Proposed Architecture
3.1. Expression Notations
3.2. Data Collection
3.3. Feature Extraction
NETCAP-Based TLS Metadata Extraction
- Step 1. Setting up the Docker environment.
- Step 2. Identifying extractable fields.
- Step 3. Data extraction from metadata
- Step 4. Storing extracted data
3.4. Data Preprocessing
3.4.1. Data Scaling and Normalization
3.4.2. Graph Data Generation
3.5. Feature Selection
3.5.1. CipherSuite
3.5.2. MessageLen
3.5.3. JA3
3.6. ETA-GNN Model
3.6.1. GraphSAGE Model
Algorithm 1: GraphSAGE embedding |
1. Input: 2. Graph: G 3. Node feature (), number of epochs (), 4. loss function () (cross-entropy loss) 5. Edge features: 6. Message passing depth: 7. Aggregator: 8. Parameters: 9. Weight matrices: 10. Output: 11. Edge embeddings 12. for iteration 1 to do: 13. for all do: 14. = , 15. for k ← 1 to do: 16. for : 17. ← //Message Aggregation 18. ← //Message Concatenation 19. ← //Add Nonlinearity(Sigmoid Function) 20. ← cross-entropy loss 21. |
3.6.2. Prediction Model
Algorithm 2: Prediction model |
1. Input: 2. input feature {} 3. number of epochs () 4. loss function () 5. Parameters: 6. weight and bias matrix{, } 7. Output: 8. Classification result () 9. for iteration 1 to do: 10. 11. 12. ← binary cross-entropy loss 13. backpropagate the gradient and update trainable parameters 14. |
4. Implementation
4.1. Experimental Settings
4.2. Performance Metrics
4.3. Results
4.3.1. CipherSuite + SupportedGroups
4.3.2. CipherSuite + HandshakeLen
4.3.3. CipherSuite + SignatureAlgs
4.3.4. CipherSuite + JA3
4.3.5. CipherSuite + MessageLen
4.3.6. CipherSuite + MessageLen + JA3
4.4. Comparison with Related Work
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rescorla, E. The Transport Layer Security (TLS) Protocol Version 1.3. RFC. 2018. Available online: https://tools.ietf.org/html/rfc8446 (accessed on 16 May 2023).
- Google. Google Transparency Report: HTTPS Encryption on the Web. 2023. Available online: https://transparencyreport.google.com/https/overview?hl=en (accessed on 16 May 2023).
- Let’s Encrypt. Let’s Encrypt Stats. 2023. Available online: https://letsencrypt.org/stats/ (accessed on 16 May 2023).
- Fu, C.; Li, Q.; Shen, M.; Xu, K. Realtime robust malicious traffic detection via frequency domain analysis. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 15–19 November 2021; pp. 3431–3446. [Google Scholar] [CrossRef]
- Papadogiannaki, E.; Ioannidis, S. A survey on encrypted network traffic analysis applications, techniques, and countermeasures. ACM Comput. Surv. 2022, 54, 123. [Google Scholar] [CrossRef]
- Wang, Z.; Thing, V.L.L. Feature mining for encrypted malicious traffic detection with deep learning and other machine learning algorithms. Comput. Sec. 2023, 128, 103143. [Google Scholar] [CrossRef]
- Shen, M.; Zhang, J.; Zhu, L.; Xu, K.; Du, X. Accurate decentralized application identification via encrypted traffic analysis using graph neural networks. IEEE Trans. Inform. Forensics Secur. 2021, 16, 2367–2380. [Google Scholar] [CrossRef]
- Liu, T.; Li, Z.; Long, H.; Bilal, A. NT-GNN: Network Traffic Graph for 5G Mobile IoT Android Malware Detection. Electronics 2023, 12, 789. [Google Scholar] [CrossRef]
- Zhang, H.; Yu, L.; Xiao, X.; Li, Q.; Mercaldo, F.; Luo, X.; Liu, Q. TFE-GNN: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification. In Proceedings of the ACM Web Conference WWW, New York, NY, USA, 30 April–4 May 2023; pp. 2066–2075. [Google Scholar] [CrossRef]
- Hu, G.; Xiao, X.; Shen, M.; Zhang, B.; Yan, X.; Liu, Y. TCGNN: Packet-grained network traffic classification via graph neural networks. Eng. Appl. Artif. Intell. 2023, 123, 106531. [Google Scholar] [CrossRef]
- Hong, Y.; Li, Q.; Yang, Y.; Shen, M. Graph based encrypted malicious traffic detection with hybrid analysis of multi-view features. Inf. Sci. 2023, 644, 119229. [Google Scholar] [CrossRef]
- Pang, B.; Fu, Y.; Ren, S.; Wang, Y.; Liao, Q.; Jia, Y. CGNN: Traffic classification with graph neural network. arXiv 2021, arXiv:2110.09726. [Google Scholar]
- Sun, B.; Yang, W.; Yan, M.; Wu, D.; Zhu, Y.; Bai, Z. An encrypted traffic classification method combining graph convolutional network and autoencoder. In Proceedings of the 2020 IEEE 39th International Performance Computing and Communications Conference, Austin, TX, USA, 6–8 November 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Huoh, T.L.; Luo, Y.; Li, P.; Zhang, T. Flow-based encrypted network traffic classification with graph neural networks. IEEE Trans. Netw. Serv. Manag. 2023, 20, 1224–1237. [Google Scholar] [CrossRef]
- Jiang, M.; Li, Z.; Fu, P.; Cai, W.; Cui, M.; Xiong, G.; Gou, G. Accurate mobile-app fingerprinting using flow-level relationship with graph neural networks. Comput. Netw. 2022, 217, 109309. [Google Scholar] [CrossRef]
- Diao, Z.; Xie, G.; Wang, X.; Ren, R.; Meng, X.; Zhang, G.; Xie, K.; Qiao, M. EC-GCN: A encrypted traffic classification framework based on multi-scale graph convolution networks. Comput. Netw. 2023, 224, 109614. [Google Scholar] [CrossRef]
- Huoh, T.L.; Luo, Y.; Zhang, T. Encrypted network traffic classification using a geometric learning model. In Proceedings of the 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM), Bordeaux, France, 17–21 May 2021; pp. 376–383. [Google Scholar]
- Okonkwo, Z.; Foo, E.; Hou, Z.; Li, Q.; Jadidi, Z. Encrypted network traffic classification with higher order graph neural network. In Proceedings of the Australasian Conference on Information Security and Privacy, Brisbane, QLD, Australia, 15 June 2023; pp. 630–650. [Google Scholar] [CrossRef]
- Pang, B.; Fu, Y.; Ren, S.; Jia, Y. High-performance network traffic classification based on graph neural network. In Proceedings of the 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 24–26 February 2023; pp. 800–804. [Google Scholar] [CrossRef]
- Zhao, R.; Deng, X.; Wang, Y.; Chen, L.; Liu, M.; Xue, Z.; Wang, Y. Flow sequence-based anonymity network traffic identification with residual graph convolutional networks. In Proceedings of the 2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS), Oslo, Norway, 10–12 June 2022; pp. 1–10. [Google Scholar] [CrossRef]
- Pham, T.D.; Ho, T.L.; Truong-Huu, T.; Cao, T.D.; Truong, H.L. Mappgraph: Mobile-app classification on encrypted network traffic using deep graph convolution neural networks. In Proceedings of the 37th Annual Computer Security Applications Conference, Virtual, 6–10 December 2021; pp. 1025–1038. [Google Scholar] [CrossRef]
- Shi, Z.; Luktarhan, N.; Song, Y.; Tian, G. BFCN: A novel classification method of encrypted traffic based on BERT and CNN. Electronics 2023, 12, 516. [Google Scholar] [CrossRef]
- Zeng, Z.; Xun, P.; Peng, W.; Zhao, B.K. Toward identifying malicious encrypted traffic with a causality detection system. J. Inf. Sec. Appl. 2024, 80, 103644. [Google Scholar] [CrossRef]
- Anderson, B.; McGrew, D. Identifying encrypted malware traffic with contextual flow data. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, Vienna, Austria, 24–28 October 2016; pp. 35–46. [Google Scholar] [CrossRef]
- Anderson, B.; McGrew, D. Machine learning for encrypted malware traffic classification: Accounting for noisy labels and non-stationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1723–1732. [Google Scholar] [CrossRef]
- Shen, M.; Ye, K.; Liu, X.; Zhu, L.; Kang, J.; Yu, S.; Li, Q.; Xu, K. Machine learning-powered encrypted network traffic analysis: A comprehensive survey. IEEE Commun. Surv. Tutor. 2022, 25, 791–824. [Google Scholar] [CrossRef]
- Choi, Y.S.; Yoo, J.H.; Koo, K.J.; Moon, D.S. Trends of encrypted network traffic analysis technologies for network anomaly detection. Elec. Telecommun. Trends 2023, 38, 71–80. [Google Scholar]
- Srivastava, G.; Jhaveri, R.H.; Bhattacharya, S.; Pandya, S.; Rajeswari; Maddikunta, P.K.R.; Yenduri, G.; Hall, J.G.; Alazab, M.; Gadekallu, T.R. XAI for Cybersecurity: State of the art, challenges, open issues and future directions. arXiv 2022, arXiv:2206.03585. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1263–1272. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Hamilton, W.L.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1025–1035. [Google Scholar]
- Wang, Y.; Xiong, G.; Liu, C.; Li, Z.; Cui, M.; Gou, G. CQNET: A clustering-based quadruplet network for decentralized application classification via encrypted traffic. In Proceedings of the Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track: European Conference, ECML PKDD 2021, Bilbao, Spain, 10 September 2021; pp. 518–534. [Google Scholar] [CrossRef]
- Lab, P. Real-World Web Test Bed. 2023. Available online: https://www.pcl.ac.cn/html/1030/2021-10-20/content-3879.html (accessed on 1 May 2023).
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
- Yang, S.; Verma, S.; Cai, B.; Jiang, J.; Yu, K.; Chen, F.; Yu, S. Variational co-embedding learning for attributed network clustering. arXiv 2021, arXiv:2104.07295. [Google Scholar] [CrossRef]
- Yin, H.; Yang, S.; Song, X.; Liu, W.; Li, J. Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 2021, 24, 1027–1044. [Google Scholar] [CrossRef]
- Dong, W.; Wu, J.; Zhang, X.; Bai, Z.; Wang, P.; Woźniak, M. Improving performance and efficiency of Graph Neural Networks by injective aggregation. Knowl.-Based Syst. 2022, 254, 109616. [Google Scholar] [CrossRef]
- Hamilton, W.L. Graph Representation Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning; Morgan & Claypool Publishers: San Rafael, CA, USA, 2020; Volume 14, pp. 1–159. [Google Scholar]
- Wireshark. The World’s Most Popular Network Protocol Analyzer. 2023. Available online: https://www.wireshark.org/ (accessed on 23 March 2023).
- Malware-traffic-analysis.net. A Source for Packet Capture (pcap) Files and Malwarre Samples. 2023. Available online: https://www.malware-traffic-analysis.net/ (accessed on 1 February 2023).
- NETCAP. Netcap Overview. 2023. Available online: https://docs.netcap.io/ (accessed on 1 February 2023).
- Scikit-Learn. Examples. 2023. Available online: https://scikit-learn.org/stable/auto_examples/index.html#preprocessing (accessed on 1 February 2023).
- Rescorla, E.; Dierks, T. The Transport Layer Security (TLS) Protocol Version 1.2. RFC. 2008. Available online: https://tools.ietf.org/html/rfc5246 (accessed on 16 February 2023).
- Malhotra, V.; Potika, K.; Stamp, M. A Comparison of graph neural networks for malware classification. J. Comput. Virol. Hack. Tech. 2023, 20, 53–69. [Google Scholar] [CrossRef]
- Yuan, Y.; Wang, W.; Pang, W. Which hyperparameters to optimise? An investigation of evolutionary hyperparameter optimisation in graph neural network for molecular property prediction. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Lille, France, 10–14 July 2021; pp. 1403–1404. [Google Scholar] [CrossRef]
- Gonzales, C.; Lee, E.H.; Lee, K.L.K.; Tang, J.; Miret, S. Hyperparameter optimization of graph neural networks for the OpenCatalyst dataset: A case study AI for accelerated materials design. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28–29 December 2022. [Google Scholar]
Author | Feature | Model | Performance (%) | Dataset (Ref. or Name) |
---|---|---|---|---|
Wang et al. [6] | Flow duration, Payload ratio, TCP payload length, Number of payload per session, TTL of each encrypted session, Ratio of flow duration | LSTM, ResNet, Random Forest/ XGBoost | 99.73 | Benign Capture, Mixture Capture, CICIDS-2017, CICIDS-2012, CIRA-CIC-DoHBRW-2020 |
Shen et al. [7] | Src IP, Dst IP, Src Port, Dst Port, Protocol, Packet length, TCP/IP flags | GNN | 99.73 (Acc.) | Router of China Univ. |
Liu et al. [8] | Packet header, Payload byte | MPNN | 97 (Acc.) | CICAndMal2017, AAGM |
Zhang et al. [9] | Packet header, Payload | GraphSAGE | 95.97–98.88 (Acc.) | ISCX VPN-nonVPN, ISCX Tor-nonTor, self-collected WWT |
Hu et al. [10] | Payload (excluding packet Ethernet header, IP, transport header, and upper layer payload) | GCN | 99.9 (Acc.) | UNB-ISCX, TrafficX |
Hong et al. [11] | Handshake data and session metadata, conn.log, ssl.log, x509.log | GCN, GraphSAGE | 99.9 (Acc.) | CTU-13, MCFP |
Pang et al. [12] | Src IP, Src Port, Dst IP, Dst port, Protocol type | SGC | 95.23 (Acc.) | ISCX |
Sun et al. [13] | Src IP, Src Port, Dst IP, Dst Port, Protocol type | GCN | 94.33 (Acc.) | ISCXVPN-nonVPN, USTC-TFC2016 |
Huoh et al. [14] | Packet raw byte, Src IP, Src port, Dst IP, Dst port, Protocol type | GNN | 89.55 (Acc.) | ISCXVPN 2016 |
Jiang et al. [15] | Packet length seq, Adjacent packet time GNN interval seq, Network flow start time | GNN | 98.66 (Acc.) | Wang [33], self-collected |
Diao et al. [16] | Packet length seq | GNN | 96.86 (Acc.) | OBW30, HW19, ISCX-Tor |
Huoh et al. [17] | IP, port, raw byte | GCN | 88.5 (Acc.) | ISCXVPN-nonVPN |
Okonkwo et al. [18] | Set data individually by OSI layer | GNN | 97.48 (Acc.) | ISCXVPN-nonVPN |
Pang et al. [19] | Handshake data | GNN | ≒96 (Acc.) | Pengcheng lab. real-world web testbed [34] |
Zhao et al. [20] | Flow seq, Src IP, Src port, Dst IP, Dst port, Protocol type, Raw byte, Interval time | ResGCN | 95.4 (Acc.) | SJTU-An21, ISCXVPN2016 |
Pham et al. [21] | Features related flow | GCN | ≒93 (Acc.) | Self-collected |
Shi et al. [22] | Byte/Packet level feature | BFCN | 99.12–99.65 (Acc.) | ISCX-VPN |
Zeng et al. [23] | Time sequence feature (arrival order, packet time arrival, etc.), flow ID, Src IP, Dst IP, Src Port, Dst Port, time series features, protocol features, payload features, statistical feature, etc. | WGAN | 99.5–100 (FSr) | CICIDS2017, DoHBrw2020 |
Notation | Description |
---|---|
Graph structure based on TLS session data ( denotes vertex (node) using ip address and port; denotes edge using CipherSuite, MessageLen, JA3, Label (normal or malicious)) | |
Vertex (node) features using TLS session data (IP add, port) | |
denotes the source and denotes the destination of network traffic for composing the edge | |
Edge features using TLS session data (CipherSuite, MessageLen, JA3, Label) ) | |
Number of epochs | |
Loss function (cross-entropy loss function in GraphSAGE model and binary cross-entropy loss function in prediction model) | |
Message passing depth for GraphSAGE model processing | |
Message aggregation function using the message passing depth for message passing in GraphSAGE model | |
Weight matrices value element from the latent vector | |
Outcomes of the GraphSAGE model, embedding result or contextualized representation of the input graph structure data ) | |
Sigmoid function symbol |
Network Traffic Dataset | Total | Normal | Malicious |
---|---|---|---|
All network traffic data | 1,275,521 | 822,367 | 453,154 |
TLS session data (server/client) | 7152 (3583/3569) | 3364 (1684/1680) | 3788 (1899/1889) |
Hyperparameter | Range |
---|---|
Message aggregator | Mean, maxpool, meanpool, and lstm |
Learning rate | 0.1, 0.01, 0.001, 0.0001, 0.00001 |
Batch size | 32, 64, 128, 256, 512 |
Max neighborhood depth for the first hope sampling | 15, 20, 25, 30 |
Max neighborhood depth for the second hope sampling | 5, 8, 10 |
Hidden-layer size | 64, 128, 256 |
Dropout probability | 0.1, 0.2, 0.3, 0.4, 0.5 |
Components | Specification |
---|---|
Operating system | Windows 10 Pro |
CPU | Intel(R) Core (TM) i7-4790K CPU @ 4.00 GHz ×64-based processor |
GPU | NVIDIA GeForce GTX 1060 6 GB |
Random access memory (RAM) | 32 GB DDR4 RAM |
Machine learning framework | PyTorch 1.13 |
Programming environment | Jupyter Notebook with Python 3.11 |
Author | Used Features | Number of Used Features | Accuracy (%) |
---|---|---|---|
Zhang et al. [9] | Packet header from which source and destination IP addresses are removed and port (sequence number, acknowledgment number, data offset, flag, window, checksum, etc.), packet payload | More than 7 | 95.97–98.88 |
Hong et al. [11] | Handshake data (version, extension) and session metadata (packet length seq., packet time interval seq., etc.), conn.log (IP addr, port, connection duration, number and size of upstream and downstream packet), ssl.log (timestamp, version, key, server name, etc.), x509.log (certificate serial number, version, issuer, validity period, server DNS, the type of key, the length of the key, and so on) | More than 23 | 99.90 |
Proposed model | TLS session data (IP addr., port, CipherSuite, MessageLen, JA3) | 5 | 99.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jung, I.-S.; Song, Y.-R.; Jilcha, L.A.; Kim, D.-H.; Im, S.-Y.; Shim, S.-W.; Kim, Y.-H.; Kwak, J. Enhanced Encrypted Traffic Analysis Leveraging Graph Neural Networks and Optimized Feature Dimensionality Reduction. Symmetry 2024, 16, 733. https://doi.org/10.3390/sym16060733
Jung I-S, Song Y-R, Jilcha LA, Kim D-H, Im S-Y, Shim S-W, Kim Y-H, Kwak J. Enhanced Encrypted Traffic Analysis Leveraging Graph Neural Networks and Optimized Feature Dimensionality Reduction. Symmetry. 2024; 16(6):733. https://doi.org/10.3390/sym16060733
Chicago/Turabian StyleJung, In-Su, Yu-Rae Song, Lelisa Adeba Jilcha, Deuk-Hun Kim, Sun-Young Im, Shin-Woo Shim, Young-Hwan Kim, and Jin Kwak. 2024. "Enhanced Encrypted Traffic Analysis Leveraging Graph Neural Networks and Optimized Feature Dimensionality Reduction" Symmetry 16, no. 6: 733. https://doi.org/10.3390/sym16060733
APA StyleJung, I. -S., Song, Y. -R., Jilcha, L. A., Kim, D. -H., Im, S. -Y., Shim, S. -W., Kim, Y. -H., & Kwak, J. (2024). Enhanced Encrypted Traffic Analysis Leveraging Graph Neural Networks and Optimized Feature Dimensionality Reduction. Symmetry, 16(6), 733. https://doi.org/10.3390/sym16060733