APIASO: A Novel API Call Obfuscation Technique Based on Address Space Obscurity
Abstract
:1. Introduction
- We provide a comprehensive overview of existing API call obfuscation and deobfuscation techniques. We also discuss the limitations of current API obfuscation techniques and their inability to effectively counter advanced deobfuscation techniques.
- We present API address space obscurity (APIASO), an API call obfuscation technique specifically designed for Windows applications. APIASO provides stronger protection than existing API call obfuscation techniques by protecting the entire API call process and preventing analysts from accessing address information associated with known APIs during the API call process.
- We compare APIASO with several other existing API call obfuscation techniques. Our experiments show that APIASO is highly effective in thwarting existing deobfuscation techniques, and it provides a significant increase in the protection strength of program API information.
- We implement an automatic API call obfuscation system based on LLVM [2], which can automatically obfuscate the input program source code. The source code is available on GitHub (https://github.com/Rookiellvm/APIASO, accessed on 13 August 2022).
2. Background
2.1. API Call Obfuscation Techniques
2.2. API Deobfuscation Techniques
2.3. Motivation
3. API Address Space Obscurity Model
- (1)
- API call space obfuscation: This process involves the movement of API internal functions to user space for execution. The movement of the function requires a deeper analysis of the API’s internal functions. A function selection strategy is constructed by considering the call relationship, the function’s properties, the cost of the move, and the analyst’s experience.
- (2)
- API name clue obfuscation: This process involves building hash function generators and using more secure API address resolution methods and function movement schemes to obscure API name clues.
3.1. Overview of the APIASO
3.2. API Call Space Obfuscation
- (1)
- Pruning high-level function nodes: As can be seen from Definition 3, there are low-level and high-level functions in the API address space . Low-level functions are strongly associated with the upper API, and functions of Level 0 and 1 in the low-level functions are directly associated with the API itself, so such functions must be moved completely, while low-level functions of Level greater than 1 can be moved selectively according to the need for protection strength. Higher-level functions are called by many different APIs in the upper layers, and their calls are not sufficient to provide directly usable information for reconstructing the APIs. Moving them causes large memory and runtime overhead, so the higher-level function nodes are pruned on the call graph.
- (2)
- Adding special function nodes: In addition to low-level functions in the API address space , there are also functions with certain special call relationships. Although these functions are not low-level functions, they still provide key information for accessing the API. For example, the CreateFileA function call eventually translates into a call to CreateFileW, and address space obfuscation is required for API calls with dependencies on xxxA and xxxW. In addition, some APIs call functions with names beginning with Nt exported from Ntdll.dll, using the system call number to enter the kernel. This class of API names provides information that can be used for deobfuscation. These functions may not be lower-level functions in that address space, so it is significant to recover this part of the pruned special function node.
- (3)
- Adding bogus function node: low-level functions have a strong association with the API, and the movement of low-level functions in the address space of the protected API can hide the association, while the introduction of low-level functions in the address space of other APIs can increase the association with other APIs, thus misleading the analysts. Therefore, the low-level functions in other address spaces are chosen to be moved to the user space together as bogus function nodes.
3.3. API Name Clue Obfuscation
- An integer greater than is selected randomly, and two hash functions and from the set are selected randomly afterward.
- For each element in the set , find and .
- An undirected graph is created, of which the vertices are defined by and . Then, each pair of vertices and are linked up to obtain graph edges, in which each edge corresponds to each element of the set .
- is checked to see if it is acyclic, and if not, returns to Step 1.
- values are randomly selected in the hash generation space and randomly assigned to the edges of the graph as the value of each element .
- A randomly selected vertex is assigned a value of 0, and then a depth-first search is performed to traverse the graph vertices. Correspondingly, the value of two vertices that share the same edge is assigned according to the hash value of this edge, such that the sum of the values of these two adjacent vertices equals to the hash value of the edge.
- The sequence of vertices of the graph and their assigned hash values form a mapping function . Thereby and constitute a collision-free hash function.
Algorithm 1: Address Space Obscurity Algorithm |
Input: Program P, , Obfuscation threshold: Output: denotes the obfuscated program
|
4. System Implementation
5. Experimental Evaluation
- (1)
- Model protection strength evaluation: We compare the advantages of APIASO with other API call obfuscation techniques in resisting API deobfuscation techniques, and compare the dynamic analysis resistance of programs before and after APIASO protection using online antivirus and sandbox platforms.
- (2)
- Model protection efficiency evaluation: We test large-scale code to evaluate the availability and accuracy of APIASO and the program time execution overhead, before and after obfuscation.
5.1. Model Protection Strength Evaluation
5.1.1. The Obscurity Degree of API Address Space
5.1.2. Anti API Deobfuscation Techniques
5.1.3. Sandbox and Antivirus Platform Detection
5.2. Model Protection Effect Evaluation
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Choi, J.; Kim, K.; Lee, D.; Cha, S.K. NTFuzz: Enabling type-aware kernel fuzzing on windows with static binary analysis. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; pp. 677–693. [Google Scholar]
- Lattner, C.; Adve, V. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization, CGO 2004, San Jose, CA, USA, 20–24 March 2004; pp. 75–86. [Google Scholar]
- Kawakoya, Y.; Shioji, E.; Otsuki, Y.; Iwamura, M.; Yada, T. Stealth loader: Trace-free program loading for API obfuscation. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses, Atlanta, GA, USA, 18–20 September 2017; pp. 217–237. [Google Scholar]
- Cheng, B.; Ming, J.; Leal, E.A.; Zhang, H.; Fu, J.; Peng, G. Obfuscation-Resilient Executable Payload Extraction From Packed Malware. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual, 11–13 August 2021; pp. 3451–3468. [Google Scholar]
- Suenaga, M. A Museum of Api Obfuscation on Win32; Symantec Security Response; Symantec Corp: Tempe, AZ, USA, 2009. [Google Scholar]
- Roundy, K.A.; Miller, B.P. Binary-code obfuscations in prevalent packer tools. ACM Comput. Surv. (CSUR) 2013, 46, 1–32. [Google Scholar] [CrossRef]
- Cheng, B.; Ming, J.; Fu, J.; Peng, G.; Chen, T.; Zhang, X.; Marion, J. Towards paving the way for large-scale windows malware analysis: Generic binary unpacking with orders-of-magnitude performance boost. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 395–411. [Google Scholar]
- Ugarte-Pedrero, X.; Balzarotti, D.; Santos, I.; Bringas, B.G. SoK: Deep packer inspection: A longitudinal study of the complexity of run-time packers. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 17–21 May 2015; pp. 659–673. [Google Scholar]
- Aguila. Scylla—x64/x86 Imports Reconstruction. 2016. Available online: https://github.com/NtQuery/Scylla (accessed on 28 May 2022).
- Sharif, M.; Yegneswaran, V.; Saidi, H.; Porras, P.; Lee, W. Eureka: A framework for enabling static malware analysis. In Proceedings of the European Symposium on Research in Computer Security, Málaga, Spain, 6–8 October 2008; pp. 481–500. [Google Scholar]
- Wei, T.E.; Chen, Z.W.; Tien, C.W.; Wu, J.S.; Lee, H.M.; Jeng, A.B. RePEF—A system for restoring packed executable file for malware analysis. In Proceedings of the 2011 International Conference on Machine Learning and Cybernetics, Guilin, China, 10–13 July 2011; Volume 2, pp. 519–527. [Google Scholar]
- D’alessio, S.; Mariani, S. PinDemonium: A DBI-based generic unpacker for Windows executables. In Proceedings of the Black Hat USA 2016, Las Vegas, NV, USA, 30 July–4 August 2016. [Google Scholar]
- Polino, M.; Continella, A.; Mariani, S.; D’Alessio, S.; Fontana, L.; Gritti, F.; Zanero, S. Measuring and defeating anti-instrumentation-equipped malware. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Bonn, Germany, 6–7 July 2017; p. 7396. [Google Scholar]
- Kotov, V.; Wojnowicz, M. Towards generic deobfuscation of windows API calls. arXiv 2018, arXiv:1802.04466. [Google Scholar]
- Kawakoya, Y.; Iwamura, M.; Shioji, E.; Hariu, T. Api chaser: Anti-analysis resistant malware analyzer. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection, Rodney Bay, St. Lucia, 23–25 October 2013; pp. 123–143. [Google Scholar]
- Raber, J.; Krumheuer, B. QuietRIATT: Rebuilding the import address table using hooked DLL calls. In Proceedings of the Black Hat Technical Security Conference, Washington, DC, USA, 29–30 July 2009. [Google Scholar]
- Josse, S. Secure and advanced unpacking using computer emulation. J. Comput. Virol. 2007, 3, 221–236. [Google Scholar] [CrossRef]
- Kawakoya, Y.; Iwamura, M.; Miyoshi, J. Taint-assisted IAT Reconstruction against Position Obfuscation. J. Inf. Process. 2018, 26, 813–824. [Google Scholar] [CrossRef]
- Korczynski, D. Repeconstruct: Reconstructing binaries with self-modifying code and import address table destruction. In Proceedings of the 2016 11th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, PR, USA, 18–21 October 2016; pp. 1–8. [Google Scholar]
- Czech, Z.J.; Havas, G.; Majewski, B.S. An optimal algorithm for generating minimal perfect hash functions. Inf. Process. Lett. 1992, 43, 257–264. [Google Scholar] [CrossRef]
- Havas, G.; Majewski, B.S. Optimal Algorithms for Minimal Perfect Hashing; Key Centre for Software Technology, Department of Computer Science, University of Queensland: Brisbane, Australia, 1992. [Google Scholar]
- Bayer, U.; Comparetti, P.M.; Hlauschek, C.; Krügel, C. Scalable, behavior-based malware clustering. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2009, San Diego, CA, USA, 8–11 February 2009; Volume 9, pp. 8–11. [Google Scholar]
- Shirani, P.; Wang, L.; Debbabi, M. Binshape: Scalable and robust binary library function identification using function shape. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Bonn, Germany, 6–7 July 2017; pp. 301–324. [Google Scholar]
- Hex-Ray Corporation. Fast Library Identification and Recognition Technology [EB/OL]. Available online: https://hex-rays.com/products/ida/tech/flirt/ (accessed on 30 July 2021).
- Choi, S. API Deobfuscator: Identifying Runtime-obfuscated API calls via Memory Access Analysis. In Proceedings of the Black Hat Asia, Singapore, 24–27 March 2015. [Google Scholar]
Classification | Pathways |
---|---|
IAT redirection [4] | Anti-debugging, exception triggering, ROP |
Position obfuscation [5] | Instruction stolen, function stolen, DLL stolen |
Call site tampering [6] | GetModuleHandle/LoadLibrary and GetProcAddress |
Classification | Citations |
---|---|
Call site monitoring | BinUnpack [7], SOK [8], Scylla [9], Eureka [10], RePEc [11], PinDemonium [12], Arancino [13], Arg Prediction [14] |
Position monitoring | API Chaser [15], QuietRIATT [16], Secure unpack [17], Taint-assisted [18] |
Hybrid monitoring | API-Xray [4], RePEconstruct [19] |
Obfuscation Type | Resolution Process | Calling Process | Anti-Call Site Monitoring | Anti-Position Monitoring | Anti-Hybrid Monitoring |
---|---|---|---|---|---|
IAT redirection | × | √ | × | × | × |
Position obfuscation | × | √ | √ | × | × |
Call site tampering | × | √ | × | × | × |
APIASO | √ | √ | √ | √ | √ |
Notation | Description |
---|---|
DLL address space | |
API entry functions | |
API internal functions | |
Function call graph | |
API address space | |
Adjacency matrix | |
Reachable matrix | |
Level of functions in the DLL address space |
No. | Tools | Types |
---|---|---|
1 | Yoda’s Crypter | 1 |
2 | Yoda’s Protector | 1 |
3 | TELock | 1 |
4 | ZProtect | 1 |
5 | Enigma | 1 |
6 | Armadillo | 1 |
7 | Obsidium | 1 |
8 | PESpin | 1, 2 |
9 | PELock | 1, 2 |
10 | PEP | 1, 3 |
Program | Sandboxie | ||||||
---|---|---|---|---|---|---|---|
Type | I | II | III | IV | V | VI | |
IAT redirection | √ | √ | √ | √ | √ | √ | |
Position obfuscation | × | × | × | × | × | × | |
Call site tampering | √ | √ | √ | √ | √ | √ | |
APIASO | × | × | × | × | × | × |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Kang, F.; Shu, H.; Xiong, X.; Zhao, Y.; Sun, R. APIASO: A Novel API Call Obfuscation Technique Based on Address Space Obscurity. Appl. Sci. 2023, 13, 9056. https://doi.org/10.3390/app13169056
Li Y, Kang F, Shu H, Xiong X, Zhao Y, Sun R. APIASO: A Novel API Call Obfuscation Technique Based on Address Space Obscurity. Applied Sciences. 2023; 13(16):9056. https://doi.org/10.3390/app13169056
Chicago/Turabian StyleLi, Yang, Fei Kang, Hui Shu, Xiaobing Xiong, Yuntian Zhao, and Rongbo Sun. 2023. "APIASO: A Novel API Call Obfuscation Technique Based on Address Space Obscurity" Applied Sciences 13, no. 16: 9056. https://doi.org/10.3390/app13169056
APA StyleLi, Y., Kang, F., Shu, H., Xiong, X., Zhao, Y., & Sun, R. (2023). APIASO: A Novel API Call Obfuscation Technique Based on Address Space Obscurity. Applied Sciences, 13(16), 9056. https://doi.org/10.3390/app13169056