A Review of Deep Learning-Based Binary Code Similarity Analysis
Abstract
:1. Introduction
2. Basic Process of BCSA
2.1. Compile Preprocessing
2.2. Basic Process of BCSA
3. Classification of BCSA
3.1. Analysis Methods
3.2. Feature Type
3.3. The Evolution of BCSA Techniques
4. How Deep Learning Technology Is Applied to Existing Technologies
4.1. Text Semantic Features
4.2. Functional Structural Features
5. Summary and Prospects
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Synopsys. 2022 Open Source Security and Analysis Report. Available online: https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html (accessed on 16 June 2022).
- CVE-2021-44228. Available online: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-44228 (accessed on 10 January 2022).
- Haq, I.U.; Caballero, J. A Survey of Binary Code Similarity. ACM Comput. Surv. 2022, 54, 1–38. [Google Scholar] [CrossRef]
- Kim, D.; Kim, E.; Cha, S.K.; Son, S.; Kim, Y. Revisiting BCSA Using Interpretable Feature Engineering and Lessons Learned. IEEE Trans. Softw. Eng. 2022, 49, 1661–1682. [Google Scholar] [CrossRef]
- Yu, Y.; Gan, S.; Qin, X.; Qiu, J.; Chen, Z. Research on the Technologies of BCSA and Their Applications on the Embedded Device Firmware Vulnerability Search. J. Softw. 2022, 33, 4137–4172. [Google Scholar] [CrossRef]
- Hex-Rays about IDA. Available online: https://www.hex-rays.com/products/ida/ (accessed on 10 January 2022).
- Brumley, D.; Jager, I.; Avgerinos, T.; Schwartz, E.J. BAP: A Binary Analysis Platform. In Computer Aided Verification, Proceedings of the 23rd International Conference, CAV 2011, Snowbird, UT, USA, 14–20 July 2011; Gopalakrishnan, G., Qadeer, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6806, pp. 463–469. [Google Scholar]
- Wang, F.; Shoshitaishvili, Y. Angr—The Next Generation of Binary Analysis. In Proceedings of the IEEE Cybersecurity Development, SecDev 2017, Cambridge, MA, USA, 24–26 September 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 8–9. [Google Scholar]
- Nethercote, N.; Seward, J. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, USA, 10–13 June 2007. [Google Scholar] [CrossRef]
- Xu, X.; Liu, C.; Feng, Q.; Yin, H.; Song, L.; Song, D. Neural Network-Based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; ACM: Dallas Texas USA, 2017; pp. 363–376. [Google Scholar]
- Ding, S.H.H.; Fung, B.C.M.; Charland, P. Kam1n0: MapReduce-Based Assembly Clone Search for Reverse Engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R., Eds.; ACM: New York, NY, USA, 2016; pp. 461–470. [Google Scholar]
- Lageman, N.; Kilmer, E.D.; Walls, R.J.; McDaniel, P.D. BinDNN: Resilient Function Matching Using Deep Learning. In Proceedings of the Security and Privacy in Communication Networks—12th International Conference, SecureComm 2016, Guangzhou, China, 10–12 October 2016; Deng, R.H., Weng, J., Ren, K., Yegneswaran, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; Volume 198, pp. 517–537. [Google Scholar]
- Hu, Y.; Zhang, Y.; Li, J.; Gu, D. Cross-Architecture Binary Semantics Understanding via Similar Code Comparison. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, Suita, Japan, 14–18 March 2016; IEEE Computer Society: Washington, DC, USA, 2016; Volume 1, pp. 57–67. [Google Scholar]
- Chandramohan, M.; Xue, Y.; Xu, Z.; Liu, Y.; Cho, C.Y.; Tan, H.B.K. BinGo: Cross-Architecture Cross-OS Binary Search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, WA, USA, 13–18 November 2016; ACM: Seattle, WA, USA, 2016; pp. 678–689. [Google Scholar]
- David, Y.; Partush, N.; Yahav, E. Statistical Similarity of Binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, 13–17 June 2016; Krintz, C., Berger, E.D., Eds.; ACM: New York, NY, USA, 2016; pp. 266–280. [Google Scholar]
- Feng, Q.; Zhou, R.; Xu, C.; Cheng, Y.; Testa, B.; Yin, H. Scalable Graph-Based Bug Search for Firmware Images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; ACM: Vienna, Austria, 2016; pp. 480–491. [Google Scholar]
- Eschweiler, S.; Yakdan, K.; Gerhards-Padilla, E. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In Proceedings of the 2016 Network and Distributed System Security Symposium, San Diego, CA, USA, 21–24 February 2016. [Google Scholar]
- Wang, S.; Wu, D. In-Memory Fuzzing for Binary Code Similarity Analysis. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, 30 October–3 November 2017; Rosu, G., Penta, M.D., Nguyen, T.N., Eds.; IEEE Computer Society: Washington, DC, USA, 2017; pp. 319–330. [Google Scholar]
- Ming, J.; Xu, D.; Jiang, Y.; Wu, D. BinSim: Trace-Based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking. In Proceedings of the 26th USENIX Security Symposium, USENIX Security 2017, Vancouver, BC, Canada, 16–18 August 2017; Kirda, E., Ristenpart, T., Eds.; USENIX Association: Washington, DC, USA, 2017; pp. 253–270. [Google Scholar]
- David, Y.; Partush, N.; Yahav, E. Similarity of Binaries through Re-Optimization. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, 18–23 June 2017; Cohen, A., Vechev, M.T., Eds.; ACM: New York, NY, USA, 2017; pp. 79–94. [Google Scholar]
- Feng, Q.; Wang, M.; Zhang, M.; Zhou, R.; Henderson, A.; Yin, H. Extracting Conditional Formulas for Cross-Platform Bug Search. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, 2–6 April 2017; Karri, R., Sinanoglu, O., Sadeghi, A.-R., Yi, X., Eds.; ACM: New York, NY, USA, 2017; pp. 346–359. [Google Scholar]
- Gao, J.; Yang, X.; Fu, Y.; Jiang, Y.; Sun, J. VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary. In Proceedings of the 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), Montpellier, France, 3–7 September 2018; pp. 896–899. [Google Scholar]
- Liu, B.; Huo, W.; Zhang, C.; Li, W.; Li, F.; Piao, A.; Zou, W. αDiff: Cross-Version Binary Code Similarity Detection with DNN. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September 2018; Huchard, M., Kästner, C., Fraser, G., Eds.; ACM: New York, NY, USA, 2018; pp. 667–678. [Google Scholar]
- David, Y.; Partush, N.; Yahav, E. FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018, Williamsburg, VA, USA, 24–28 March 2018; Shen, X., Tuck, J., Bianchini, R., Sarkar, V., Eds.; ACM: New York, NY, USA, 2018; pp. 392–404. [Google Scholar]
- Shalev, N.; Partush, N. Binary Similarity Detection Using Machine Learning. In Proceedings of the 13th Workshop on Programming Languages and Analysis for Security, PLAS@CCS 2018, Toronto, ON, Canada, 15–19 October 2018; Alvim, M.S., Delaune, S., Eds.; ACM: New York, NY, USA, 2018; pp. 42–47. [Google Scholar]
- Hu, Y.; Zhang, Y.; Li, J.; Wang, H.; Li, B.; Gu, D. BinMatch: A Semantics-Based Hybrid Approach on Binary Code Clone Analysis. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, 23–29 September 2018; IEEE Computer Society: Washington, DC, USA, 2018; pp. 104–114. [Google Scholar]
- Marastoni, N.; Giacobazzi, R.; Preda, M.D. A Deep Learning Approach to Program Similarity. In Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis, MASES@ASE 2018, Montpellier, France, 3 September 2018; Perrouin, G., Acher, M., Cordy, M., Devroey, X., Eds.; ACM: New York, NY, USA, 2018; pp. 26–35. [Google Scholar]
- Yuan, B.; Wang, J.; Fang, Z.; Qi, L. A New Software Birthmark Based on Weight Sequences of Dynamic Control Flow Graph for Plagiarism Detection. Comput. J. 2018, 61, 1202–1215. [Google Scholar] [CrossRef]
- Xue, Y.; Xu, Z.; Chandramohan, M.; Liu, Y. Accurate and Scalable Cross-Architecture Cross-OS Binary Code Search with Emulation. IEEE Trans. Softw. Eng. 2019, 45, 1125–1149. [Google Scholar] [CrossRef]
- Zuo, F.; Li, X.; Young, P.; Luo, L.; Zeng, Q.; Zhang, Z. Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs. In Proceedings of the 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, CA, USA, 24–27 February 2019; The Internet Society: Reston, VA, USA, 2019. [Google Scholar]
- Massarelli, L.; Luna, G.A.D.; Petroni, F.; Baldoni, R.; Querzoni, L. SAFE: Self-Attentive Function Embeddings for Binary Similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment, Proceedings of the 16th International Conference, DIMVA 2019, Gothenburg, Sweden, 19–20 June 2019; Perdisci, R., Maurice, C., Giacinto, G., Almgren, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11543, pp. 309–329. [Google Scholar]
- Redmond, K.; Luo, L.; Zeng, Q. A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis. In Proceedings of the Workshop on Binary Analysis Research (BAR) 2019, San Diego, CA, USA, 24 February 2019. [Google Scholar]
- Ding, S.H.H.; Fung, B.C.M.; Charland, P. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 472–489. [Google Scholar]
- Luo, M.; Yang, C.; Gong, X.; Yu, L. FuncNet: A Euclidean Embedding Approach for Lightweight Cross-Platform Binary Recognition. In Proceedings of the Security and Privacy in Communication Networks, Orlando, FL, USA, 23–25 October 2019; Chen, S., Choo, K.-K.R., Fu, X., Lou, W., Mohaisen, A., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 319–337. [Google Scholar]
- Massarelli, L.; Luna, G.; Petroni, F.; Querzoni, L. Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis. In Proceedings of the Workshop on Binary Analysis Research (BAR) 2019, San Diego, CA, USA, 24 February 2019. [Google Scholar]
- Li, Y.; Gu, C.; Dullien, T.; Vinyals, O.; Kohli, P. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 3835–3845. [Google Scholar]
- Duan, Y.; Li, X.; Wang, J.; Yin, H. DeepBinDiff: Learning Program-Wide Code Representations for Binary Diffing. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 1 January 2020. [Google Scholar]
- Sun, P.; Garcia, L.; Salles-Loustau, G.; Zonouz, S. Hybrid Firmware Analysis for Known Mobile and IoT Security Vulnerabilities. In Proceedings of the 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2020), Valencia, Spain, 29 June–2 July 2020. [Google Scholar]
- Yu, Z.; Cao, R.; Tang, Q.; Nie, S.; Huang, J.; Wu, S. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, 7–12 February 2020; AAAI Press: Menlo Park, CA, USA, 2020; pp. 1145–1152. [Google Scholar]
- Guo, H.; Huang, S.; Huang, C.; Zhang, M.; Pan, Z.; Shi, F.; Huang, H.; Hu, D.; Wang, X. A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features. IEEE Access 2020, 8, 120501–120512. [Google Scholar] [CrossRef]
- Pei, K.; Xuan, Z.; Yang, J.; Jana, S.; Ray, B. Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity. arXiv 2020, arXiv:2012.08680. [Google Scholar]
- Peng, D.; Zheng, S.; Li, Y.; Ke, G.; He, D.; Liu, T.-Y. How Could Neural Networks Understand Programs? In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event, 18–24 July 2021; Volume 139, pp. 8476–8486. [Google Scholar]
- Yang, J.; Fu, C.; Liu, X.-Y.; Yin, H.; Zhou, P. Codee: A Tensor Embedding Scheme for Binary Code Search. IEEE Trans. Softw. Eng. 2022, 48, 2224–2244. [Google Scholar] [CrossRef]
- Tian, D.; Jia, X.; Ma, R.; Liu, S.; Liu, W.; Hu, C. BinDeep: A Deep Learning Approach to Binary Code Similarity Detection. Expert Syst. Appl. 2021, 168, 114348. [Google Scholar] [CrossRef]
- Yang, S.; Cheng, L.; Zeng, Y.; Lang, Z.; Zhu, H.; Shi, Z. Asteria: Deep Learning-Based AST-Encoding for Cross-Platform Binary Code Similarity Detection. In Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021, Taipei, Taiwan, 21–24 June 2021; pp. 224–236. [Google Scholar]
- Wang, H.; Qu, W.; Katz, G.; Zhu, W.; Gao, Z.; Qiu, H.; Zhuge, J.; Zhang, C. jTrans: Jump-Aware Transformer for Binary Code Similarity Detection. In Proceedings of the ISSTA’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Republic of Korea, 18–22 July 2022; Ryu, S., Smaragdakis, Y., Eds.; ACM: New York, NY, USA, 2022; pp. 1–13. [Google Scholar]
- Kim, G.; Hong, S.; Franz, M.; Song, D. Improving Cross-Platform Binary Analysis Using Representation Learning via Graph Alignment. In Proceedings of the ISSTA’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Republic of Korea, 18–22 July 2022; Ryu, S., Smaragdakis, Y., Eds.; ACM: New York, NY, USA, 2022; pp. 151–163. [Google Scholar]
- Dai, H.; Dai, B.; Song, L. Discriminative Embeddings of Latent Variable Models for Structured Data. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York, NY, USA, 19–24 June 2016; Balcan, M.-F., Weinberger, K.Q., Eds.; JMLR: New York, NY, USA, 2016; Volume 48, pp. 2702–2711. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, AZ, USA, 2–4 May 2013; Bengio, Y., LeCun, Y., Eds.; Workshop Track Proceedings: New York, NY, USA, 2013. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, (Long and Short Papers). pp. 4171–4186. [Google Scholar]
- Le, Q.V.; Mikolov, T. Distributed Representations of Sentences and Documents. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014; JMLR: New York, NY, USA, 2014; Volume 32, pp. 1188–1196. [Google Scholar]
- Marcelli, A.; Graziano, M.; Xabier, U.-P.; Fratantonio, Y.; Mansouri, M.; Balzarotti, D. How Machine Learning Is Solving the Binary Function Similarity Problem. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; USENIX Association: Boston, MA, USA, 2022. [Google Scholar]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1263–1272. [Google Scholar]
- Pewny, J.; Garmany, B.; Gawlik, R.; Rossow, C.; Holz, T. Cross-Architecture Bug Search in Binary Executables. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, 17–21 May 2015; IEEE Computer Society: Washington, DC, USA, 2015; pp. 709–724. [Google Scholar]
- Tai, K.S.; Socher, R.; Manning, C.D. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 1556–1566. [Google Scholar]
- Baker, B.S. Compressing Difference of Executable Code. In Proceedings of the ACM SIGPLAN 1999 Workshop on Compiler Support for System Software (WCSSS’99), Atlanta, GA, USA, 1 May 1999. [Google Scholar]
Year | Name | Method | Feature Type | Open Source | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Static | Dynamic | Hybrid | Raw Bytes | Trace | Architecture | Strand | IR | Call | Word | Embedding | Code | Dataset | ||
2016 | Kam1n0 [11] | √ | √ | √ | ||||||||||
BinDNN [12] | √ | √ | ||||||||||||
MOCKINGBIRD [13] | √ | √ | √ | √ | ||||||||||
BinGo [14] | √ | √ | √ | |||||||||||
Esh [15] | √ | √ | √ | √ | ||||||||||
Genius [16] | √ | √ | √ | √ | ||||||||||
discovRE [17] | √ | √ | √ | |||||||||||
2017 | IMF-sim [18] | √ | √ | |||||||||||
BinSim [19] | √ | √ | √ | |||||||||||
GitZ [20] | √ | √ | √ | |||||||||||
Gemini [10] | √ | √ | √ | √ | √ | |||||||||
xmatch [21] | √ | √ | ||||||||||||
2018 | VulSeeker [22] | √ | √ | √ | √ | √ | √ | |||||||
αdiff [23] | √ | √ | √ | √ | √ | √ | ||||||||
FirmUP [24] | √ | √ | √ | |||||||||||
Zeek [25] | √ | √ | √ | |||||||||||
BinMatch [26] | √ | √ | √ | |||||||||||
MASES2018 [27] | √ | √ | √ | |||||||||||
WSB [28] | √ | √ | √ | |||||||||||
Bingo-E [29] | √ | √ | √ | √ | √ | |||||||||
2019 | InnerEye [30] | √ | √ | √ | √ | √ | √ | |||||||
Safe [31] | √ | √ | √ | √ | √ | |||||||||
InstrModel [32] | √ | √ | √ | √ | √ | |||||||||
Asm2Vec [33] | √ | √ | √ | √ | √ | √ | √ | |||||||
FuncNet [34] | √ | √ | √ | √ | √ | |||||||||
GENN [35] | √ | √ | √ | √ | √ | √ | ||||||||
GMN [36] | √ | √ | √ | |||||||||||
2020 | DeepBinDiff [37] | √ | √ | √ | √ | √ | √ | √ | √ | |||||
Patchecko [38] | √ | √ | √ | √ | ||||||||||
Ordermatters [39] | √ | √ | √ | √ | √ | |||||||||
ACCESS2020 [40] | √ | √ | ||||||||||||
Trex [41] | √ | √ | √ | √ | √ | √ | ||||||||
2021 | Oscar [42] | √ | √ | √ | √ | √ | ||||||||
TIKNIB [4] | √ | √ | √ | √ | √ | √ | ||||||||
Codee [43] | √ | √ | √ | √ | √ | √ | √ | |||||||
BinDeep [44] | √ | √ | √ | |||||||||||
Asteria [45] | √ | √ | √ | √ | ||||||||||
2022 | Jtrans [46] | √ | √ | √ | √ | √ | √ | |||||||
XBA [47] | √ | √ | √ | √ | √ |
Name | Semantic | Supporting Info | Graph | Dataset | Corpus | Comparison | Machine Learning Technology | Score |
---|---|---|---|---|---|---|---|---|
BinDNN [12] | Wordlist | 13 kF | Classifier | LSTM, Fully Connected | ||||
αdiff [23] | CNN | Call Info | 2.49 mFP | Vector | CNN | R: 0.955 | ||
InnerEye [30] | Word2Vec | LSTM Block Embedding | CFG | 830 kB | Manhattan | Word2Vec, LSTM, Siamese | A: 0.944 | |
Safe [31] | Word2Vec | Bi-RNN | 517 kF | 190 mL | Cosine | Word2Vec, Siamese, Bi-RNN | A: 0.992 | |
InstrModel [32] | Word2Vec | Instr Alignment | 202 kBP | Classifier | Word2Vec | A: 0.900 | ||
Asm2Vec [33] | PV-DM | Random Walk | CFG | 140 kF | Vector | PV-DM | R: 0.809 | |
GENN [35] | Word2Vec | Struc2Vec | CFG | 96 kF | Cosine | Word2Vec, Struc2Vec | A: 0.964 | |
DeepBinDiff [37] | Word2Vec | Random Walk | ICFG | 113 BF | K-Hop | Word2Vec | r: 0.904 | |
Ordermatters [39] | BERT | CNN, MPNN | CFG | 63 kF | Cosine | BERT, CNN, MPNN | R: 0.742 | |
Oscar [42] | BERT | Jump | 110 kF | 500 kF | Cosine | Moco, BERT | R: 0.884 | |
Codee [43] | Word2Vec | Random Walk | ICFG | 15 kF | LSH | Word2Vec | r: 0.851 | |
BinDeep [44] | Word2Vec | Classifier | 4.7 mFP | Vector | Word2Vec, RNN, CNN, LSTM, Siamese | r: 0.990 | ||
Jtrans [46] | BERT | Jump | 26 mF | Cosine | BERT | R: 0.962 |
Name | Struc Info | Graph | Semantic | Dataset | Cross Architecture | Comparison | Machine Learning Technology | Score |
---|---|---|---|---|---|---|---|---|
Gemini [10] | Struc2Vec | ACFG | 129 kF | Y | Cosine | Struc2Vec, Siamese | A: 0.971 | |
VulSeeker [22] | Struc2Vec | LSFG | 730 kF | Y | Cosine | Struc2Vec, Fully Connected | A: 0.885 | |
Asm2Vec [33] | Random Work | CFG | PV-DM | 140 kF | N | Vector | PV-DM | R: 0.809 |
FuncNet [34] | Struc2Vec | ACFG | 355 kF | Y | SOM | Struc2Vec, Fully Connected | A: 0.980 | |
GENN [35] | Struc2Vec | CFG | Word2Vec | 96 kF | Y | Cosine | Word2Vec, Struc2Vec | A: 0.964 |
GMN [36] | GMN | CFG | 64 kF | N | Hamming | GMN | A: 0.993 | |
DeepBinDiff [37] | Random Work | ICFG | Word2Vec | 113 BF | Y | K-Hop | Word2Vec | r: 0.904 |
Ordermatters [39] | CNN, MPNN | CFG | BERT | 63 kF | Y | Cosine Distance | BERT, CNN, MPNN | R: 0.742 |
Codee [43] | Random Work | ICFG | Word2Vec | 15 kF | Y | LSH | Word2Vec | r: 0.851 |
Asteria [45] | Tree-LSTM | AST | 7.56 mF | Y | Classifier | Tree-LSTM, Siamese | A: 0.969 | |
XBA [47] | GCN | BDG | - | Y | Vector Distance | Siamese, GCN |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Du, J.; Wei, Q.; Wang, Y.; Sun, X. A Review of Deep Learning-Based Binary Code Similarity Analysis. Electronics 2023, 12, 4671. https://doi.org/10.3390/electronics12224671
Du J, Wei Q, Wang Y, Sun X. A Review of Deep Learning-Based Binary Code Similarity Analysis. Electronics. 2023; 12(22):4671. https://doi.org/10.3390/electronics12224671
Chicago/Turabian StyleDu, Jiang, Qiang Wei, Yisen Wang, and Xiangjie Sun. 2023. "A Review of Deep Learning-Based Binary Code Similarity Analysis" Electronics 12, no. 22: 4671. https://doi.org/10.3390/electronics12224671
APA StyleDu, J., Wei, Q., Wang, Y., & Sun, X. (2023). A Review of Deep Learning-Based Binary Code Similarity Analysis. Electronics, 12(22), 4671. https://doi.org/10.3390/electronics12224671