Application Layer Protocol Identification Method Based on ResNet
Abstract
:1. Introduction
- Proposed an entropy-based delimiter determination algorithm: This algorithm utilizes information entropy and frequency to determine an optimal set of delimiters in application layer protocols. These delimiters are used to segment the original data, extract feature data blocks, and generate a feature table.
- Introduced a composite-feature-based RGB image generation algorithm: This algorithm refines the original data using the feature table to obtain purified data with positional information. Subsequently, the original and purified data are merged to generate composite feature images. This approach comprehensively represents the data features and provides more distinct features for training tasks.
- Utilized the composite feature images for training and testing on the ResNet pre-processing model: Results demonstrate that after adopting the proposed method, the accuracy, precision, and recall of application layer protocol recognition all exceed 98%, affirming the effectiveness of this method.
2. Related Work
2.1. DPI
2.2. Protocol Feature Matching-Based Approach
2.3. Deep Learning
2.4. Protocol Format
2.4.1. BGP Protocol
2.4.2. SMB Protocol
2.4.3. HTTP Protocol
3. Materials and Methods
- Truncating and processing network traffic data.
- Utilizing a delimiter determination algorithm based on information entropy to determine an optimal set of delimiters. This set is then used to extract features from the application layer data, generating a feature table.
- Employing a composite-feature-based RGB image generation algorithm to generate images required for model training.
- Fine-tuning and protocol type recognition using a pre-trained ResNet model.
3.1. Data Truncation
3.2. Algorithm for Determining Delimiters Based on Information Entropy
3.3. Composite Feature RGB Image Generation Algorithm
3.3.1. Data Purification
3.3.2. Data Transformation and Composition
3.4. Model Training and Protocol Recognition
4. Experimental Results and Discussion
4.1. Datasets
4.1.1. CIC-IDS2017 Dataset
4.1.2. Shodan Dataset
4.2. Software and Hardware Environment
4.3. Evaluation Metrics
4.4. Experimental Results and Analysis
4.4.1. Determination of Optimal Delimiters
4.4.2. Generation of Feature Data Block Frequency Table
4.4.3. Training Performance
4.4.4. Test Results
4.4.5. Test Summary
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Marksteiner, S.; Jimenez, V.J.E.; Valiant, H.; Zeiner, H. An overview of wireless IoT protocol security in the smart home domain. In Proceedings of the 2017 Internet of Things Business Models, Users, and Networks, Copenhagen, Denmark, 23–24 November 2017; pp. 1–8. [Google Scholar]
- Contreras-Castillo, J.; Zeadally, S.; Guerrero-Ibañez, J.A. Internet of vehicles: Architecture, protocols, and security. IEEE Internet Things J. 2017, 5, 3701–3709. [Google Scholar] [CrossRef]
- Zhang, K.; Ni, J.; Yang, K.; Liang, X.; Ren, J.; Shen, X.S. Security and privacy in smart city applications: Challenges and solutions. IEEE Commun. Mag. 2017, 55, 122–129. [Google Scholar] [CrossRef]
- Drias, Z.; Serhrouchni, A.; Vogel, O. Analysis of cyber security for industrial control systems. In Proceedings of the 2015 International Conference on Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC), Shanghai, China, 5–7 August 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–8. [Google Scholar]
- Tarigan, I.J.; Alamsyah, B.; Aryza, S.; Siahaan AP, U.; Isa Indrawan, M. Crime aspect of telemedicine on health technology. Int. J. Civ. Eng. Technol. 2018, 9, 480–490. [Google Scholar]
- Pohle, J.; Voelsen, D. Centrality and power. The struggle over the techno-political configuration of the Internet and the global digital order. Policy Internet 2022, 14, 13–27. [Google Scholar] [CrossRef]
- Deri, L.; Fusco, F. Using deep packet inspection in cybertraffic analysis. In Proceedings of the 2021 IEEE International Conference on Cyber Security and Resilience(CSR), Virtual, 26–28 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 89–94. [Google Scholar]
- Liu, Z.; Zha, X.; Song, G.; Yao, Q. Unknown wireless protocol feature extraction method based on sequence statistics. Comput. Eng. 2021, 47, 192–197. [Google Scholar]
- Wang, W.; Bai, B.; Wang, Y.; Hei, X.; Zhang, L. Bitstream protocol classification mechanism based on feature extraction. In Proceedings of the 2019 International Conference on Networking and Network Applications(NaNA), Daegu, Republic of Korea, 10–13 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 241–246. [Google Scholar]
- Ma, R.; Qin, S. Identification of unknown protocol traffic based on deep learning. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications(ICCC), Chengdu, China, 13–16 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1195–1198. [Google Scholar]
- Afandi, W.; Bukhari, S.M.A.H.; Khan, M.U.; Maqsood, T.; Khan, S.U. Fingerprinting technique for youtube videos identification in network traffic. IEEE Access 2022, 10, 76731–76741. [Google Scholar] [CrossRef]
- Cheng, Q.; Wu, C.; Zhou, H.; Kong, D.; Zhang, D.; Xing, J.; Ruan, W. Machine learning based malicious payload identification in software-defined networking. J. Netw. Comput. Appl. 2021, 192, 103186. [Google Scholar] [CrossRef]
- Sun, R.; Shi, L.; Yin, C.; Wang, J. An improved method in deep packet inspection based on regular expression. J. Supercomput. 2019, 75, 3317–3333. [Google Scholar] [CrossRef]
- Ma, B.; Yang, C.; Chen, M.; Ma, J. Grammatch: An automatic protocol feature extraction and identification system. Comput. Netw. 2021, 201, 108528. [Google Scholar] [CrossRef]
- Wang, S.; Guo, F.; Fan, Y.; Wu, J. Association analysis and identification of unknown bitstream protocols based on composite feature sets. IEEE Access 2021, 9, 164454–164465. [Google Scholar] [CrossRef]
- Shi, J.; Yu, X.; Liu, Z. Nowhere to hide: A novel private protocol identification algorithm. Secur. Commun. Netw. 2021, 2021, 672911. [Google Scholar] [CrossRef]
- Yun, X.; Wang, Y.; Zhang, Y.; Zhou, Y. A semantics-aware approach to the automated network protocol identification. IEEE/ACM Trans. Netw. 2015, 24, 583–595. [Google Scholar] [CrossRef]
- Zhu, X.; Jiang, Z.; Zhang, Q.; Zhao, S.; Zhang, Z. An Unknown Protocol Identification Method for Industrial Internet. Wirel. Commun. Mob. Comput. 2022, 2022, 3792205. [Google Scholar] [CrossRef]
- Wang, Y.C.; Bai, B.B.; Hei, X.H.; Ren, J.; Ji, W.J. Classification and Recognition of Unknown Network Protocol Characteristics. J. Inf. Sci. Eng. 2020, 36, 765. [Google Scholar]
- Liu, Z.; Zha, X.; Song, G.; Yao, Q. Unknown wireless network protocol feature extraction method based on sequence association. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1916–1922. [Google Scholar]
- Wei, H. Research on Reverse Analysis Method for State Characteristics of Unknown Network Protocol; Xi’an University of Technology: Xi’an, China, 2023. [Google Scholar]
- Xue, J.; Chen, Y.; Li, O.; Li, F. Classification and identification of unknown network protocols based on CNN and T-SNE. J. Phys. Conf. Ser. IOP Publ. 2020, 1617, 012071. [Google Scholar] [CrossRef]
- Garshasbi, J.; Teimouri, M. CNNPRE: A CNN-Based Protocol Reverse Engineering Method. IEEE Access 2023, 11, 116255–116268. [Google Scholar] [CrossRef]
- Fu, Y.; Li, X.; Li, X.; Zhao, S.; Wang, F. Clustering unknown network traffic with dual-path autoencoder. Neural Comput. Appl. 2023, 35, 8955–8966. [Google Scholar] [CrossRef]
- Wu, J.; Hong, Z.; Ma, T.; Si, J. Unknown application layer protocol recognition method based on deep clustering. China Commun. 2024, 21, 275–296. [Google Scholar]
- Huang, C.; Zhu, Z. Complex communication application identification and private network mining technology under a large-scale network. Neural Comput. Appl. 2021, 33, 3871–3879. [Google Scholar] [CrossRef]
- Feng, W.; Hong, Z.; Wu, L.; Li, Y.; Lin, P. Application layer protocol recognition method based on convolutional neural networks. Comput. Appl. 2019, 39, 3615–3621. [Google Scholar]
- Mrdovic, S.; Drazenovic, B. KIDS–Keyed Intrusion Detection System. In Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment: 7th International Conference, DIMVA 2010, Bonn, Germany, 8–9 July 2010; Proceedings 7. Springer: Berlin/Heidelberg, Germany, 2010; pp. 173–182. [Google Scholar]
- Goo, Y.-H.; Shim, K.-S.; Lee, M.-S.; Kim, M.-S. A message keyword extraction approach by accurate identification of field boundaries. Int. J. Netw. Manag. 2021, 31, e2140. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Rosay, A.; Cheval, E.; Carlier, F.; Leroux, P. Network intrusion detection: A comprehensive analysis of CIC-IDS2017. In Proceedings of the 8th International Conference on Information Systems Security and Privacy, Online, 9–11 February 2022; SCITEPRESS-Science and Technology Publications. pp. 25–36. [Google Scholar]
BGP | SMB | HTTP |
---|---|---|
BGP: Message #1: Type: OPEN Length: 29 Version: 4 Hold Time: 90 ASN: 19009 BGP Identifier: 172.16.255.212 Message #2: Type: NOTIFICATION Length: 21 Error Code: Cease Error Subcode: Connection Rejected | rだ?2 ? ? EDEJEDCNFHEFECFDEFFCFGEFFCCACAAA FEEFFDFEECEFEEDBCACACACACACACABO SMB% E E V \MAILSLOT\BROWSE *€? 泟 U猚ic-webserver server (Samba, Ubuntu) | HTTP/1.1 200 OK Cache-Control: no-cache Pragma: no-cache Content-Length: 113 Content-Type: text/html; charset = utf-8 Expires: Thu, 06 Jul 2017 12:00:18 GMT Server: Microsoft-IIS/8.5 P3P: CP = “DSP CUR OTPi IND OTRi ONL FIN” X-Content-Type-Options: nosniff Strict-Transport-Security: max-age = 31536000 X-XSS-Protection: 1; mode = block Date: Thu, 06 Jul 2017 12:01:18 GMT Connection: close <Config> <DeviceID minversion = “16.000.26889.00”/> <MobileCfg minversion = “16.000.26208.0”/> </Config> |
Name | Model or Version |
---|---|
CPU | Intel Core i9-12900 K (24 CPUs), ~3.2 GHz |
Memory | 128 GB DDR4 3600 MHz |
Hard Disk | 2 TB SSD + 16 TB HDD |
Graphics Card | NVIDIA GeForce RTX 3090Ti |
Operating System | Windows 10 Professional 22H2 |
Python Interpreter | Conda 23.5.2 + Python 3.9 |
CUDA | 12.2 |
Pytorch | 1.12.1 + cu113 |
Protocol | Symbol | Information Entropy | Frequency |
---|---|---|---|
BGP | SPACE | 1.402 | 792 |
CR | 1.282 | 767 | |
LF | 1.275 | 767 | |
: | 1.257 | 767 | |
. | 0.381 | 299 | |
SMB | SPACE | 1.908 | 695 |
LF | 0.565 | 695 | |
\ | 0.213 | 695 | |
HTTP | SPACE | 2.397 | 271 |
/ | 1.938 | 523 | |
. | 1.893 | 253 | |
CR | 1.721 | 325 | |
LF | 1.602 | 325 |
Group | Delimiters |
---|---|
A | CR LF SPACE NUL , . : / \ & ? = ( ) [ ] “ ; < > |
B | LF NUL ! # “ % * + , - . / : @ _ ? = |
C | LF NUL , . : / |
Protocol | BGP | SMB | HTTP | Time (ms) | |||
---|---|---|---|---|---|---|---|
Format | Character | Character + Binary | Character | ||||
Group | Data Block | Frequency | Data Block | Frequency | Data Block | Frequency | |
A | 21 | 786 | MAILSLOT | 695 | HTTP | 679 | 857.768 |
#1 | 767 | server | 369 | GMT | 669 | ||
NOTIFICATION | 766 | Samba | 369 | application | 644 | ||
Code | 766 | Ubuntu | 369 | 2017 | 639 | ||
Subcode | 766 | FFECFFEOFEF FDBDECNDDDC CACACACAAA | 177 | com | 633 | ||
B | BGP | 772 | Ubuntu) | 355 | Connection | 515 | 823.088 |
NOTIFICATION\r | 766 | \x11\x1a\x1aV \x03\x01\x01\x02 | 113 | 54 | 469 | ||
21\r | 766 | 32 | 105 | Thu | 455 | ||
Error Code | 766 | webserver server (Samba | 96 | 1\r | 448 | ||
Error Subcode | 766 | \x11 | 89 | 5\r | 440 | ||
C | BGP | 767 | SMB%, | 695 | Connection | 515 | 786.628 |
Message #1 | 767 | \\MAILSLOT\\BROWSE | 695 | 54 | 463 | ||
NOTIFICATION\r | 766 | FEEFFDFEECE FEEDBCACACA CACACACABN | 408 | Thu | 455 | ||
21\r | 766 | Ubuntu) | 369 | 1\r | 445 | ||
Error Code | 766 | \x06\x01\x03\x9b\x81 | 276 | 5\r | 440 | ||
our | 21 | 786 | SMB% | 695 | HTTP | 679 | 831.932 |
#1 | 767 | \\MAILSLOT\\BROWSE | 695 | GMT | 646 | ||
NOTIFICATION | 766 | FEEFFDFEECE FEEDBCACACA CACACACABN | 408 | application | 644 | ||
Code | 766 | Server | 369 | 2017 | 639 | ||
Subcode | 766 | (Samba | 369 | com | 613 |
Model | Indicator | Category | ||||
---|---|---|---|---|---|---|
Origin | A | B | C | Our | ||
ResNet18 | Accuracy | 0.95775 | 0.988 | 0.9875 | 0.987 | 0.98325 |
Precision | 0.961408 | 0.989129 | 0.988522 | 0.988129 | 0.98377 | |
Recall | 0.95775 | 0.988 | 0.9875 | 0.987 | 0.98325 | |
F1 | 0.954432 | 0.988061 | 0.987562 | 0.98707 | 0.983358 | |
ResNet34 | Accuracy | 0.95775 | 0.98925 | 0.9875 | 0.98525 | 0.98875 |
Precision | 0.960454 | 0.990462 | 0.988699 | 0.985907 | 0.989778 | |
Recall | 0.95775 | 0.98925 | 0.9875 | 0.98525 | 0.98875 | |
F1 | 0.954994 | 0.989333 | 0.987546 | 0.985228 | 0.988799 | |
ResNet50 | Accuracy | 0.96225 | 0.98875 | 0.987 | 0.988 | 0.9875 |
Precision | 0.968945 | 0.989618 | 0.987622 | 0.989147 | 0.98897 | |
Recall | 0.96225 | 0.98875 | 0.987 | 0.988 | 0.9875 | |
F1 | 0.959011 | 0.988839 | 0.986982 | 0.988029 | 0.987634 |
Serial | HTTP | TLS | ||
---|---|---|---|---|
Data Block | Frequency | Data Block | Frequency | |
1 | HTTP | 679 | \x17\x03\x03 | 165 |
2 | GMT | 646 | \x06\t *\x86H\x86 | 83 |
3 | application | 644 | com | 75 |
4 | 2017 | 639 | \x16\x03\x03 | 65 |
5 | com | 613 | \x01\x01\x16\x03\x03 | 55 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fang, Z.; Gao, X.; Zhang, H.; Tang, J.; Gao, Q. Application Layer Protocol Identification Method Based on ResNet. Algorithms 2025, 18, 52. https://doi.org/10.3390/a18010052
Fang Z, Gao X, Zhang H, Tang J, Gao Q. Application Layer Protocol Identification Method Based on ResNet. Algorithms. 2025; 18(1):52. https://doi.org/10.3390/a18010052
Chicago/Turabian StyleFang, Zhijian, Xiang Gao, Huaxiong Zhang, Jingpeng Tang, and Qiang Gao. 2025. "Application Layer Protocol Identification Method Based on ResNet" Algorithms 18, no. 1: 52. https://doi.org/10.3390/a18010052
APA StyleFang, Z., Gao, X., Zhang, H., Tang, J., & Gao, Q. (2025). Application Layer Protocol Identification Method Based on ResNet. Algorithms, 18(1), 52. https://doi.org/10.3390/a18010052