MultiTagging: A Vulnerable Smart Contract Labeling and Evaluation Framework
Abstract
:1. Introduction
- The article proposes a parser mechanism for efficiently extracting vulnerability tags from SC analysis tool reports. By automating the parsing process, it addresses the time-consuming and error-prone issue of manual parsing, ensuring consistent, standardized, and accurate tag extraction across diverse tools.
- To address the issue of inconsistent vulnerability labeling, this article introduces a mapper approach that automates the assignment of analysis tool tags to standard labels, namely SWC codes and DASP ranks. It also develops a new SC vulnerability taxonomy that maps SWC codes to the DASP Top 10 ranks. Furthermore, the article establishes a public registry that maps tool-specific vulnerability labels to standard labels, addressing labeling heterogeneity among analysis tools. This registry facilitates consistent comparisons across tools and fosters a cohesive research environment where findings are comparable, reproducible, and verifiable.
- To improve vote-based labeling accuracy, this article proposes Power-based voting. This novel method systematically establishes the role of each analysis tool and sets optimal voting thresholds for each vulnerability type. Additionally, it develops a decision strategy for selecting the optimal voting technique, based on two key factors: the degree of overlap and voters’ performance. These factors are designed to enhance labeling accuracy and reduce the likelihood of false positives. This approach increases the reliability of vulnerability detection across multiple analysis tools.
- To advance the field of SC vulnerability detection, this article presents an evaluation study and draws notable conclusions on the performance of six analysis tools—MAIAN, Mythril, Semgrep, Slither, Solhint, and VeriSmart. Additionally, it investigates the effectiveness of the Power-based voting method compared to two traditional voting methods—AtLeastOne and Majority voting. By evaluating recent versions of these SC analysis tools, fully disclosing the experimental setup, and providing a replication package, this study establishes a benchmark for tool performance, offering a valuable resource for researchers and practitioners. This methodology enables meaningful comparisons and iterative improvements to SC vulnerability detection methods.
2. Literature Review
2.1. Definition of SC Vulnerabilities
2.2. Identification of SC Vulnerabilities
2.3. Research Gaps
- Lack of a standardized and comprehensive taxonomy of SC vulnerabilities. Table 3 shows that studies label SC vulnerabilities using three approaches: (1) employing established taxonomies, such as DASP ranks or SWC codes; (2) using their own taxonomy; and (3) not applying any taxonomy, though labels are often similar to SWC codes. The absence of uniform definitions leads to inconsistencies, with multiple labels referring to the same vulnerabilities [39]. Table 2 demonstrates efforts to establish a uniform and simple taxonomy by mapping SWC codes to DASP categories; however, certain conflicts remain to be addressed. Furthermore, the majority of analysis tool developers use different tag names for their vulnerability detectors. Tool evaluators probably assign the same detectors to distinct vulnerability classes. Because of these variances, comparing evaluation study results and drawing general conclusions is challenging. This article addresses this gap by resolving conflicts in existing mappings and proposing a comprehensive taxonomy that aligns SWC codes with corresponding DASP ranks. Additionally, it develops a public registry that maps detectors from various SC analysis tools to the relevant tags using the proposed taxonomy. Designed to be generic and updatable, the registry enables the inclusion of additional analysis tools, thereby broadening its applicability across diverse tools and contexts.
- Lack of automated vulnerability parsing and mapping approaches. Analysis tools report results in various formats that require parsing and interpretation to yield understandable and comparable conclusions. Despite notable efforts to address this gap, further contributions are needed. For instance, the ScrawlD parser [34] is limited to five analysis tools, and the mapper supports only eight SWC codes. Additionally, this project has not been updated since July 2022. The SmartBugs parser [15,35,36] cannot directly extract tags from reports generated outside its framework. Furthermore, it does not provide an automated method for mapping tool tags to a common SC vulnerability taxonomy. The SolidiFI [33] and USCV [18] frameworks also lack comprehensive mappers that consider all detectors of the analysis tools. Additionally, their repositories appear to be deprecated, with the most recent updates in May 2022 and July 2021, respectively. This study contributes to bridging this gap by developing MultiTagging, an open-source framework that introduces a parser mechanism and mapping strategy for automating the generation of common vulnerability tags from contract analysis reports. Its modular architecture enables adaptation to various SC analysis tools, enhancing the efficiency and consistency of vulnerability identification.
- Lack of replication’s key information. As demonstrated in Table 3, many studies failed to provide essential details about the experiment—such as the execution environment, tool versions, and parameter values—making replication challenging. Additionally, variations in experimental settings can significantly influence tool performance, potentially leading to misleading conclusions [17]. This study addresses these issues by investigating the effectiveness of vote-based methods versus individual SC analysis tools in detecting SC vulnerabilities. By evaluating recent versions of multiple SC analysis tools, fully disclosing the experimental setup, and providing a replication package, this study establishes a current benchmark for tool performance and serves as a valuable resource for both researchers and practitioners. Both the comprehensive disclosure and the replicable setup enhance reproducibility, facilitate future comparisons and improvements, foster consistent assessment processes, and promote the ongoing development of effective vulnerability detection techniques.
3. SC Vulnerability Taxonomy
- SWC100 and SWC108 are mapped to DASP classes 2 and 10 in [22,29], respectively. Specifically, DASP class 2 [40]—termed Access Control class—includes all vulnerabilities that grant an attacker access to a contract’s private values. The insecure visibility setting is an example of such vulnerabilities. The titles of SWC100 [41] and SWC108 [42] are “Function Default Visibility” and “State Variable Default Visibility”, respectively. These vulnerabilities arise from failing to explicitly declare the visibility type (access modifier) of functions or variables. In Solidity, the default access modifier is “public”. Consequently, SWC100 and SWC108 should be classified under DASP class 2, not class 10.
- SWC106 is mapped to DASP classes 2 and 5 in [22,29], respectively. The title of SWC106 [43] is “Unprotected SELFDESTRUCT Instruction”. This vulnerability arises due to insufficient access control rules that allow attackers to trigger the self-destruct function of the contract. Since access control breaches can lead to a DoS attack, correctly assigning this SWC code to a DASP class is challenging. Considering the core cause of the vulnerability—improper access control rules—DASP class 2 is more appropriate for SWC106 than class 5.
- SWC121 and SWC122 codes are unmapped in studies [22,29]. The titles of SWC121 [44] and SWC122 [45] are “Missing Protection against Signature Replay Attacks” and “Lack of Proper Signature Verification”, respectively. SWC121 arises from the absence of a reliable mechanism for verifying cryptographic signatures, allowing a malicious user to gain unauthorized access by launching a signature replay attack with a hash of another processed message. SWC122 results from a lack of an effective method for verifying data authenticity. In blockchain systems, messages are authenticated using digital signatures; however, since SCs cannot sign messages, alternative signature verification procedures are necessary. Implementing an improper verification method can lead to the acceptance of invalid authentication data, compromising the system’s integrity. A malicious user could exploit this vulnerability to gain unauthorized access. Consequently, it is more appropriate to classify SWC121 and SWC122 under DASP class 2.
- SWC132 is mapped to DASP class 10 in [29]. The title of SWC132 [46] is “Unexpected Ether balance”. This vulnerability arises from strict equality checks on a contract’s Ether balance. A malicious user can manipulate the balance of the target contract by forcibly sending Ether using the “selfdestruct” function. This action can cause the check to fail and potentially lock the contract, resulting in a DoS attack. Given these characteristics, DASP class 5, which deals with vulnerabilities leading to DoS attacks, is more appropriate for SWC132.
4. MultiTagging Framework
4.1. Analysis Tool Reports Tagger
- Parser: This component takes two inputs—the analysis reports generated by each tool and the analysis time taken to produce each report. Each tool uses a specific keyword to indicate the vulnerability location in its report (e.g., Slither uses “check”). The parser scans each report for this indicator keyword and extracts the relevant vulnerability tags. It then passes the extracted tags and corresponding analysis time to the Mapper for further processing.
- Mapper: To address the lack of uniformity in naming SC vulnerabilities, we developed a public Vulnerability Map Registry, which is available online [47], for six selected tools. This registry, however, can be expanded to support additional tools. The Mapper uses this registry to map the tool-specific tags to standardized SWC/DASP labels. It outputs a dataframe consisting of the following columns: vulnerability tags extracted by the tool, SWC codes and titles, DASP ranks and titles, and the analysis time.
4.2. Analysis Tool Evaluator
- Preparer: This component reads and filters both actual (base data) and predicted labeled data from analysis tools based on user requirements. It applies three filters: (1) Tools, which includes only the tools specified by the user; (2) Base, which considers scores derived from base data identified by the user, allowing for cases where multiple datasets are tested; and (3) Fairness, which addresses instances where some tools fail to process certain samples, allowing the user to include only those samples analyzed by all tools. The Preparer then identifies the vulnerabilities that each tool can detect and, if Fairness is not applied, removes any samples that the tool failed to analyze. This process ensures an accurate evaluation of each tool.
- Counter: This component compares each tool’s predicted labels to the actual labels, using the Preparer’s output to generate a confusion matrix for each tool. The confusion matrix provides essential metrics—true positives, true negatives, false positives, and false negatives—which are passed to the next component to obtain a detailed assessment of each tool’s performance.
- Performance Measure: This component uses the tool’s confusion matrix to compute a range of performance metrics, such as precision and recall. These metrics provide valuable insights into the tool’s performance for each label (SC vulnerability).
4.3. Labels Elector
- Preparer: This component reads and filters the labeled data produced by each tool according to user requirements, applying the same filters as the Evaluator module—Tools, Base, and Fairness. It then passes a dataframe containing aggregated votes (i.e., labels produced by each tool) for each vulnerability to the next component.
- Voter: This component applies voting methods to the aggregated vote data produced by the Preparer. It outputs a dataset labeled based on tool votes. The Labels Elector module offers two voting mechanisms, threshold-based and power-based, which are described in the following subsections.
4.3.1. Threshold-Based Voting
- AtLeastOne. The contract’s vulnerability exists if at least one tool predicts it.
- Majority. The contract’s vulnerability exists if at least half of the tools can identify it.
4.3.2. Power-Based Voting
Algorithm 1: Power-based Voting |
- Tool Roles Identification. This step examines and determines the role of each tool in the vulnerability detection process, which helps improve voting results. Each analysis tool may play one of three roles:
- None: The tool is excluded since it cannot identify the vulnerability. The tool is excluded in two cases: (i) its recall and precision scores are both zero; (ii) the tool’s results are a subset of a better-performing tool findings. The was established at 50% to guarantee that low-performing tools are not eliminated unless better-performing tools are available.
- Inverter: This contributes to adjusting the findings of another tool. The extremely low recall score, which is lower than the , implies that the majority of the tool’s flags are false positives. If the similarity rate between such a tool and a higher-performing one is large, it is probably due to false positives. In this case, the poorly performing tool can assist in correcting (inverting) the false positives of the other tool, thus increasing its precision. To ensure that the tool’s true positives are close to zero, the and are set at 10% and 20%, respectively. The is set at 60%.
- Voter: This tool is engaged in the voting process.
- Voting Methods Identification. This step determines the appropriate voting method for each vulnerability class. For each class, voters are categorized as high- and low-performance tools based on their recall scores, with an of 95% to ensure that the majority of positive samples are identified. The appropriate voting method is then determined as follows: (a) Majority: Used when all voters are high-performance tools, intended to reduce false positives while maintaining a high true positive rate; (b) AtLeastOne: Applied when all voters are low-performance tools, aimed at enhancing the true positive rate; and (c) Weight-based: Used when there is a combination of high- and low-performance tools. Majority voting is applied to high-performance tools, while AtLeastOne voting is applied to the others. The outputs of these two methods are then combined using OR logic to determine the final voting result.
4.4. Evaluation Scores Plotter
- Preparer: This component reads and filters the evaluation data based on user requirements, applying the same filters as the Evaluator module—Tools, Base, and Fairness. The prepared data is then passed to the Plotter.
- Plotter: This component takes two main inputs: (1) plotting style and (2) performance scores. The plotting style specifies whether to display the scores of a single tool or a group of tools, and whether or not to present the tool’s performance across various datasets. By processing these inputs, the Plotter generates graphical charts representing the evaluation metrics of the tools.
5. Research Method and Design
5.1. Goal
5.2. Research Questions
- RQ 1: What is the best SC analysis tool for identifying each SC vulnerability in terms of precision and recall scores?
- RQ 2: To what extent are the investigated analysis tools comparable in terms of SC vulnerability detection?
- RQ 3: To what extent will voting methods offer an increase in multi-tagging coverage when used to identify various SC vulnerabilities?
5.3. Benchmark
5.4. Analysis Tools
- MAIAN [6] is an open-source, Python-based dynamic analysis tool, developed collaboratively by researchers from the National University of Singapore and University College London and launched in 2018. MAIAN takes SC bytecode as input, generated by a custom-built EVM. It runs multiple symbolic execution traces until it discovers one that meets a predefined set of properties. MAIAN uses the Z3 solver [58] to produce concrete values for symbolic inputs. If an SC is flagged as positive—meaning a trace is found—MAIAN performs a validation step to reduce the false positive rate. It deploys the SC on a private Ethereum blockchain network to validate the detected properties. MAIAN considers three kinds of vulnerable SCs that violate either safety or liveness properties: (1) Suicidal contracts; (2) Prodigal contracts; (3) Greedy contracts.
- Mythril [23] is an open-source, Python-based dynamic analysis tool developed by the ConsenSys team and launched in 2017. Mythril uses the Z3 solver [58], a symbolic virtual machine (SVM) called LASER [59], and a control-flow graph to detect a variety of SC vulnerabilities. It accepts bytecode as input and employs concolic execution for in-depth analysis.
- Semgrep [60] is an open-source, lightweight static analysis tool written in Python 3 and launched in 2020. It supports a variety of programming languages, including Solidity, which was added in December 2021. It was developed by Semgrep, a cybersecurity company founded in 2017. Semgrep scans SC codes to detect vulnerability patterns and style violations using predefined or user-defined custom rules. These rules are written in YAML. Each rule contains metadata, conditions, and actions that instruct the analyzer to perform specific actions when certain conditions are met.
- Slither [61] is an open-source static analysis framework written in Python 3, developed by the Trail of Bits team and launched in 2018. Slither accepts the Solidity Abstract Syntax Tree (AST) as input, which is generated by the Solidity compiler from SC code. It first extracts information from AST, including the SC’s inheritance graph, control-flow graph, and list of expressions. Next, it converts the SC code into an internal representation language called SlithIR, which uses the Static Single Assessment (SSA) [62] form to facilitate code analysis computations. Slither can be utilized to identify SC vulnerabilities or to optimize and understand SC code. The latest version of Slither [63] includes more than 90 detectors.
- Solhint [18,19] is an open-source static analysis tool developed in Java and launched in 2017. It uses predefined patterns and rules to detect code security vulnerabilities. It employs an ANTLR4-based Solidity parser. Solhint also provides recommendations on style and best coding practices. It is customizable, allowing users to modify existing rules or add new ones.
- VeriSmart [64] is an OCaml-based, open-source, dynamic analysis tool introduced in 2020 by the Software Analysis Lab at Korea University. Like MAIAN and Mythril, VeriSmart uses the Z3 solver [58] but performs domain-specific preprocessing and optimization before employing it. It automatically generates contract assertion statements, using a Counterexample-Guided Inductive Synthesis (CEGIS) verification method that iteratively searches for the hidden invariants necessary to validate safety properties. VeriSmart consists of two main components: a generator and a validator. The generator produces candidate invariants, which the validator then uses to prove or disprove assertion safety. The validator flags unproven assertions, prompting the generator to produce new invariants. This process repeats until the contract is verified as safe or the time budget is exhausted.
5.5. Evaluation Measures
- Average Analysis Time (): This measures the average analysis time of the analysis tool. It can be computed using Formula (1).
- Failure Rate (): This measures the failure rate of the analysis tool. The lower the failure rate, the more robust the tool. It can be computed using Formula (2).
- Coverage: This shows the proportion of unique vulnerabilities correctly reported by the analysis tool when applied to the benchmark. It is computed using Formula (3).
- Precision: This is the proportion of positive samples that are accurately classified as positive. It is computed using Formula (4).
- Recall: This evaluates the analysis tool’s capacity to identify positive samples and is computed using Formula (5).
- Overlap degree: This computes the agreement degree among analysis tools in terms of judgments. It was proposed by Di Angelo et al. [20]. For tool t, let be the set of DASP classes that t can identify, and let be the set of positive samples flagged by tool t as containing the vulnerability v. The overlap between and can be computed by Formula (6). The numerator is the total number of samples flagged by both tools for all vulnerabilities common between the two tools, whereas the denominator represents the number of samples flagged by . This metric is asymmetric, meaning that Overlap is not necessarily equal to Overlap .
5.6. Execution Environment
6. Results and Discussion
6.1. Individual-Based Labeling
6.1.1. Analysis Tools Efficiency
6.1.2. Analysis Tools Performance
6.1.3. Similarity
6.2. Vote-Based Labeling
6.3. MultiTagging Effectiveness
7. Threats to Validity
8. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bocek, T.; Stiller, B. Smart contracts–blockchains in the wings. In Digital Marketplaces Unleashed; Springer: Berlin/Heidelberg, Germany, 2017; pp. 169–184. [Google Scholar]
- Choi, T.M.; Siqin, T. Blockchain in logistics and production from Blockchain 1.0 to Blockchain 5.0: An intra-inter-organizational framework. Transp. Res. Part E Logist. Transp. Rev. 2022, 160, 102653. [Google Scholar] [CrossRef]
- Buterin, V. A next-generation smart contract and decentralized application platform. White Pap. 2014, 3, 1–36. [Google Scholar]
- Zheng, Z.; Xie, S.; Dai, H.N.; Chen, W.; Chen, X.; Weng, J.; Imran, M. An overview on smart contracts: Challenges, advances and platforms. Future Gener. Comput. Syst. 2020, 105, 475–491. [Google Scholar] [CrossRef]
- Wang, S.; Yuan, Y.; Wang, X.; Li, J.; Qin, R.; Wang, F.Y. An overview of smart contract: Architecture, applications, and future trends. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 108–113. [Google Scholar]
- Nikolić, I.; Kolluri, A.; Sergey, I.; Saxena, P.; Hobor, A. Finding the greedy, prodigal, and suicidal contracts at scale. In Proceedings of the 34th Annual Computer Security Applications Conference, San Juan, PR, USA, 3–7 December 2018; pp. 653–663. [Google Scholar]
- Ibba, G.; Pierro, G.A.; Di Francesco, M. Evaluating machine-learning techniques for detecting smart ponzi schemes. In Proceedings of the 2021 IEEE/ACM 4th International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), Madrid, Spain, 31–31 May 2021; pp. 34–40. [Google Scholar]
- Bartoletti, M.; Carta, S.; Cimoli, T.; Saia, R. Dissecting Ponzi schemes on Ethereum: Identification, analysis, and impact. Future Gener. Comput. Syst. 2020, 102, 259–277. [Google Scholar] [CrossRef]
- Slowmist. 2024. Available online: https://hacked.slowmist.io/?c=ETH (accessed on 18 November 2024).
- Ivanov, N.; Li, C.; Yan, Q.; Sun, Z.; Cao, Z.; Luo, X. Security threat mitigation for smart contracts: A comprehensive survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Jiang, F.; Chao, K.; Xiao, J.; Liu, Q.; Gu, K.; Wu, J.; Cao, Y. Enhancing smart-contract security through machine learning: A survey of approaches and techniques. Electronics 2023, 12, 2046. [Google Scholar] [CrossRef]
- Smart Contract Weakness Classification (SWC). 2020. Available online: https://swcregistry.io/ (accessed on 18 November 2024).
- Decentralized Application Security Project (DASP) Top 10. 2018. Available online: https://dasp.co/ (accessed on 18 November 2024).
- Parizi, R.M.; Dehghantanha, A.; Choo, K.K.R.; Singh, A. Empirical vulnerability analysis of automated smart contracts security testing on blockchains. In Proceedings of the 28th Annual International Conference on Computer Science and Software Engineering, Markham, ON, Canada, 29–31 October 2018; pp. 103–113. [Google Scholar]
- Durieux, T.; Ferreira, J.F.; Abreu, R.; Cruz, P. Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 530–541. [Google Scholar]
- Zhang, P.; Xiao, F.; Luo, X. A framework and dataset for bugs in ethereum smart contracts. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Adelaide, Australia, 28 September–2 October 2020; pp. 139–150. [Google Scholar]
- Ren, M.; Yin, Z.; Ma, F.; Xu, Z.; Jiang, Y.; Sun, C.; Li, H.; Cai, Y. Empirical evaluation of smart contract testing: What is the best choice? In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual, Denmark, 11–17 July 2021; pp. 566–579. [Google Scholar]
- Ji, S.; Kim, D.; Im, H. Evaluating countermeasures for verifying the integrity of Ethereum smart contract applications. IEEE Access 2021, 9, 90029–90042. [Google Scholar] [CrossRef]
- Kushwaha, S.S.; Joshi, S.; Singh, D.; Kaur, M.; Lee, H.N. Ethereum smart contract analysis tools: A systematic review. IEEE Access 2022, 10, 57037–57062. [Google Scholar] [CrossRef]
- Di Angelo, M.; Durieux, T.; Ferreira, J.F.; Salzer, G. Evolution of automated weakness detection in Ethereum bytecode: A comprehensive study. Empir. Softw. Eng. 2024, 29, 41. [Google Scholar] [CrossRef]
- SWC-Registry. Available online: https://github.com/SmartContractSecurity/SWC-registry (accessed on 18 November 2024).
- Rameder, H.; Di Angelo, M.; Salzer, G. Review of automated vulnerability analysis of smart contracts on Ethereum. Front. Blockchain 2022, 5, 814977. [Google Scholar] [CrossRef]
- Mueller, B. Smashing ethereum smart contracts for fun and real profit. HITB SECCONF Amst. 2018, 9, 4–17. [Google Scholar]
- NCC Group. Available online: https://www.nccgroup.com/us/ (accessed on 18 November 2024).
- Common Weakness Enumeration (CWE). 2024. Available online: https://cwe.mitre.org/index.html (accessed on 18 November 2024).
- EEA EthTrust Security Levels Specification Version 2. 2023. Available online: https://entethalliance.org/specs/ethtrust-sl/v2/ (accessed on 18 November 2024).
- Wang, S.; Ouyang, L.; Yuan, Y.; Ni, X.; Han, X.; Wang, F.Y. Blockchain-enabled smart contracts: Architecture, applications, and future trends. IEEE Trans. Syst. Man, Cybern. Syst. 2019, 49, 2266–2277. [Google Scholar] [CrossRef]
- Dia, B.; Ivaki, N.; Laranjeiro, N. An empirical evaluation of the effectiveness of smart contract verification tools. In Proceedings of the 2021 IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC), Perth, Australia, 1–4 December 2021; pp. 17–26. [Google Scholar]
- di Angelo, M.; Salzer, G. Consolidation of Ground Truth Sets for Weakness Detection in Smart Contracts. In Proceedings of the Financial Cryptography and Data Security. FC 2023 International Workshops, Brač, Croatia, 5 May 2023; Essex, A., Matsuo, S., Kulyk, O., Gudgeon, L., Klages-Mundt, A., Perez, D., Werner, S., Bracciali, A., Goodell, G., Eds.; Springer: Cham, Switzerland, 2024; pp. 439–455. [Google Scholar]
- Chen, J.; Xia, X.; Lo, D.; Grundy, J.; Luo, X.; Chen, T. Defining smart contract defects on ethereum. IEEE Trans. Softw. Eng. 2020, 48, 327–345. [Google Scholar] [CrossRef]
- Di Angelo, M.; Salzer, G. A survey of tools for analyzing ethereum smart contracts. In Proceedings of the 2019 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPCON), Newark, CA, USA, 4–9 April 2019; pp. 69–78. [Google Scholar]
- Leid, A.; van der Merwe, B.; Visser, W. Testing ethereum smart contracts: A comparison of symbolic analysis and fuzz testing tools. In Proceedings of the Conference of the South African Institute of Computer Scientists and Information Technologists 2020, Cape Town, South Africa, 14–16 September 2020; pp. 35–43. [Google Scholar]
- Ghaleb, A.; Pattabiraman, K. How effective are smart contract analysis tools? evaluating smart contract static analysis tools using bug injection. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, 18–22 July 2020; pp. 415–427. [Google Scholar]
- Yashavant, C.S.; Kumar, S.; Karkare, A. Scrawld: A dataset of real world ethereum smart contracts labelled with vulnerabilities. arXiv 2022, arXiv:2202.11409. [Google Scholar]
- Ferreira, J.F.; Cruz, P.; Durieux, T.; Abreu, R. Smartbugs: A framework to analyze solidity smart contracts. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Virtual Event, Australia, 21–25 December 2020; pp. 1349–1352. [Google Scholar]
- Di Angelo, M.; Durieux, T.; Ferreira, J.F.; Salzer, G. Smartbugs 2.0: An execution framework for weakness detection in ethereum smart contracts. In Proceedings of the 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), Luxembourg, 11–15 September 2023; pp. 2102–2105. [Google Scholar]
- Smartbugs. Available online: https://github.com/smartbugs/smartbugs (accessed on 18 November 2024).
- USCV: A Unified Smart Contract Validator. Available online: https://github.com/93suhwan/uscv (accessed on 18 November 2024).
- Zhou, H.; Milani Fard, A.; Makanju, A. The state of ethereum smart contracts security: Vulnerabilities, countermeasures, and tool support. J. Cybersecur. Priv. 2022, 2, 358–378. [Google Scholar] [CrossRef]
- DASP 2. Available online: https://dasp.co//#item-2 (accessed on 18 November 2024).
- SWC100. Available online: https://swcregistry.io/docs/SWC-100/ (accessed on 18 November 2024).
- SWC108. Available online: https://swcregistry.io/docs/SWC-108/ (accessed on 18 November 2024).
- SWC106. Available online: https://swcregistry.io/docs/SWC-106/ (accessed on 18 November 2024).
- SWC121. Available online: https://swcregistry.io/docs/SWC-121/ (accessed on 18 November 2024).
- SWC122. Available online: https://swcregistry.io/docs/SWC-122/ (accessed on 18 November 2024).
- SWC132. Available online: https://swcregistry.io/docs/SWC-132/ (accessed on 18 November 2024).
- Mapping Registry. 2024. Available online: https://github.com/MultiTagging/MultiTagging/blob/main/Mapping/VulnerablityMap.xlsx (accessed on 18 November 2024).
- MultiTagging Framework. 2024. Available online: https://github.com/MultiTagging/MultiTagging (accessed on 18 November 2024).
- Doublade. 2019. Available online: https://doublade.readthedocs.io/en/latest/index.html (accessed on 18 November 2024).
- Schneidewind, C.; Grishchenko, I.; Scherer, M.; Maffei, M. eThor: Practical and provably sound static analysis of ethereum smart contracts. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 9–13 November 2020; pp. 621–640. [Google Scholar]
- NotSoSmartC. 2023. Available online: https://github.com/crytic/not-so-smart-contracts/ (accessed on 18 November 2024).
- Tsankov, P.; Dan, A.; Drachsler-Cohen, D.; Gervais, A.; Buenzli, F.; Vechev, M. Securify: Practical security analysis of smart contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 67–82. [Google Scholar]
- Torres, C.F.; Iannillo, A.K.; Gervais, A.; State, R. Confuzzius: A data dependency-aware hybrid fuzzer for smart contracts. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P), Vienna, Austria, 6–10 September 2021; pp. 103–119. [Google Scholar]
- Liu, Z.; Qian, P.; Yang, J.; Liu, L.; Xu, X.; He, Q.; Zhang, X. Rethinking smart contract fuzzing: Fuzzing with invocation ordering and important branch revisiting. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1237–1251. [Google Scholar] [CrossRef]
- Tikhomirov, S.; Voskresenskaya, E.; Ivanitskiy, I.; Takhaviev, R.; Marchenko, E.; Alexandrov, Y. Smartcheck: Static analysis of ethereum smart contracts. In Proceedings of the 1st International Workshop on Emerging Trends in Software Engineering for Blockchain, Gothenburg, Sweden, 27 May 2018; pp. 9–16. [Google Scholar]
- Mossberg, M.; Manzano, F.; Hennenfent, E.; Groce, A.; Grieco, G.; Feist, J.; Brunson, T.; Dinaburg, A. Manticore: A user-friendly symbolic execution framework for binaries and smart contracts. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, 11–15 November 2019; pp. 1186–1189. [Google Scholar]
- Torres, C.F.; Schütte, J.; State, R. Osiris: Hunting for integer bugs in ethereum smart contracts. In Proceedings of the 34th Annual Computer Security Applications Conference, San Juan, PR, USA, 3–7 December 2018; pp. 664–676. [Google Scholar]
- De Moura, L.; Bjørner, N. Z3: An efficient SMT solver. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Budapest, Hungary, 29 March–6 April 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 337–340. [Google Scholar]
- LASER-Ethereum. Available online: https://github.com/muellerberndt/laser-ethereum (accessed on 18 November 2024).
- Semgrep. Available online: https://semgrep.dev/ (accessed on 18 November 2024).
- Feist, J.; Grieco, G.; Groce, A. Slither: A static analysis framework for smart contracts. In Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), Montreal, QC, Canada, 27 May 2019; pp. 8–15. [Google Scholar]
- Rosen, B.K.; Wegman, M.N.; Zadeck, F.K. Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, USA, 10–13 January 1988; pp. 12–27. [Google Scholar]
- Slither, the Smart Contract Static Analyzer. Available online: https://github.com/crytic/slither (accessed on 18 November 2024).
- So, S.; Lee, M.; Park, J.; Lee, H.; Oh, H. Verismart: A highly precise safety verifier for ethereum smart contracts. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1678–1694. [Google Scholar]
- VeriSmart. Available online: https://github.com/kupl/VeriSmart-public (accessed on 18 November 2024).
DASP Rank | Class | Description |
---|---|---|
1 | Reentrancy | This occurs when external contract calls initiate new calls to the calling contract before the first execution is completed. |
2 | Access Control | This occurs when an attacker gains illegal access rights. |
3 | Arithmetic Issues | This occurs when an attacker exploits the absence of mechanisms to verify that the arithmetic operation result is within the data type scope, allowing state variable tampering. |
4 | Unchecked Low-Level Calls | This occurs because there is no mechanism to propagate exceptions in low-level external calls, causing the code to continue running despite the failure. |
5 | Denial of Service (DoS) | DoS can occur in various ways, e.g., by intentionally raising the gas required to execute a function or by abusing access control rules. |
6 | Bad Randomness | This occurs due to the use of predictable randomness. |
7 | Front Running | Blockchain transactions are executed in a certain sequence, often based on the transaction fees. Because transactions are publicly available, malicious users can exploit this gap by raising the gas fees for their transactions to be processed first. |
8 | Time Manipulation | This occurs when a random number is generated using an initial seed that miners can control. A malicious miner can exploit such variables to their benefit. |
9 | Short Address Attack | Ethereum Virtual Machine (EVM) inserts zeros at the end of transactions that are less than 32 bytes long. However, the issue may arise if the address, rather than the data, is shorter, resulting in an incorrect address being accepted. |
10 | Unknown Unknowns | Unknown vulnerabilities. |
DASP Rank | SWC Code Mapping | ||
---|---|---|---|
Dia et al. [28] | Rameder et al. [22] | Di Angelo and Salzer [29] | |
1 | 107 | 107 | 107 |
2 | - | 100, 108, 112, 115 | 105, 106, 112, 115, 117, 118, 124 |
3 | 101 | 101 | 101 |
4 | 104 | 104 | 104 |
5 | 106 | 106, 113, 126, 128 | 113, 126, 128, 134 |
6 | 120 | 120 | 120 |
7 | 114 | 114 | 114 |
8 | 116 | 116 | 116 |
9 | - | - | - |
10 | - | - | 100, 102, 103, 108–111, 119, 123, 125, 127, 129, 130–133, 135, 136 |
Unmapped Codes | 100, 102, 103, 105, 108–113, 115, 117–119, 121–136 | 102, 103, 105, 109–111, 117–119, 121–125, 127, 129–136 | 121, 122 |
Ref. | Year | No. of Tools | Benchmark Size | Labels Taxonomy and No. | Evaluation Metrics | Automation | Execution Environment Declared | Tools’ Info Reported | ||
---|---|---|---|---|---|---|---|---|---|---|
Parsing | Mapping | Version | Settings | |||||||
Parizi et al. [14] | 2018 | 4 | 10 SCs | None: 11 | ROC, accuracy | ✗ | ✗ | ✓ | ✗ | ✗ |
Durieux et al. [15] | 2020 | 9 | 47,587 SCs | DASP: 10 | Accuracy, ET | ✓ | ✗ | ✓ | ✗ | ✓ * |
Leid et al. [32] | 2020 | 3 | 20 tokens | Not clear | Coverage, ET | ✗ | ✗ | ✓ | ✓ | ✓ |
Ghaleb and Pattabiraman [33] | 2020 | 6 | 50 SCs | None: 8 | FN, FP | ✓ | ✓ | ✗ | ✓ | ✓ * |
Zhang et al. [16] | 2020 | 9 | 176 SCs | None: 49 | Coverage, precision, recall | ✗ | ✗ | ✗ | ✗ | ✗ |
Dias et al. [28] | 2021 | 3 | 222 SCs | Own: 141 | TP, FP, TN, FN, recall, F1-score, markedness, informedness | ✓ | ✗ | ✗ | ✓ | ✓ |
Ren et al. [17] | 2021 | 9 | 46,186 SCs | None: 8 ** | Coverage, precision, recall | ✗ | ✗ | ✓ | ✗ | ✓ |
Ji et al. [18] | 2021 | 8 | 273 SCs | DASP: 6 | TP, FP, TN, FN, precision, recall, accuracy, F1-score, AUC | ✓ | ✓ | ✗ | ✗ | ✗ |
Kushwaha et al. [19] | 2022 | 13 | 30 SCs | None: 13 | ET | ✗ | ✗ | ✗ | ✗ | ✗ |
Di Angelo et al. [20] | 2024 | 12 | 248,328 SCs | SWC: 15 | Error rate, overlap | ✓ | ✓ | ✓ | ✓ | ✗ |
Benchmark | Year | Type | No. of Entries | No. of Positive Entries | No. of SWC Classes | No. of DASP Classes | Final No. of Samples |
---|---|---|---|---|---|---|---|
Doublade [49] | 2019 | Wild | 319 | 152 | 5 | 4 | 225 |
eThor [50] | 2020 | Wild | 720 | 196 | 1 | 1 | 223 |
JiuZhou [16] | 2020 | Crafted | 168 | 68 | 33 | 10 | 164 |
SBcurated [35] | 2020 | Crafted | 143 | 145 | 16 | 10 | 101 |
SolidFI [33] | 2020 | Crafted | 350 | 350 | 7 | 6 | 343 |
SWCregistry [12] | 2020 | Crafted | 117 | 76 | 33 | 6 | 91 |
NotSoSmartC. [51] | 2023 | Crafted | 31 | 24 | 12 | 6 | 26 |
Study Dataset (Total) | 1848 | 1011 | 33 | 10 | 1173 |
Analysis Tool | Implemented Version | Available On |
---|---|---|
MAIAN | #4bab09a | https://github.com/smartbugs/MAIAN, Accessed on: 17 December 2023 |
Mythril | v0.24.8 | https://github.com/ConsenSys/mythril-classic, Accessed on: 24 July 2024 |
Semgrep | #c3a9f40 | https://github.com/Decurity/semgrep-smart-contracts, Accessed on: 28 January 2024 |
Slither | v0.10.0 | https://github.com/crytic/slither, Accessed on: 17 December 2023 |
Solhint | v4.1.1 | https://github.com/protofire/solhint, Accessed on: 12 January 2024 |
VeriSmart | #36d191e | https://github.com/kupl/VeriSmart-public, Accessed on: 25 January 2024 |
Actual No. of Samples | MAIAN | Mythril | Semgrep | Slither | Solhint | VeriSmart | No. of Common Samples |
---|---|---|---|---|---|---|---|
1173 | 1084 | 852 | 1153 | 1063 | 1159 | 663 | 645 |
Label | No. of Positive Samples | Analysis Tool | TP | TN | FP | FN | Recall | Precision |
---|---|---|---|---|---|---|---|---|
Reentrancy | 54 | Mythril | 46 | 452 | 139 | 8 | 0.85 | 0.25 |
Semgrep | 1 | 583 | 8 | 53 | 0.02 | 0.11 | ||
Slither | 54 | 373 | 218 | 0 | 1.00 | 0.20 | ||
Solhint | 35 | 457 | 134 | 19 | 0.65 | 0.21 | ||
AtLeastOne voting | 54 | 313 | 278 | 0 | 1.00 | 0.16 | ||
Majority voting | 53 | 416 | 175 | 1 | 0.98 | 0.23 | ||
Power-based voting | 54 | 319 | 272 | 0 | 1.00 | 0.17 | ||
Access Control | 76 | MAIAN | 12 | 553 | 16 | 64 | 0.16 | 0.43 |
Mythril | 62 | 360 | 209 | 14 | 0.82 | 0.23 | ||
Semgrep | 3 | 569 | 0 | 73 | 0.04 | 1.00 | ||
Slither | 60 | 257 | 312 | 16 | 0.79 | 0.16 | ||
Solhint | 63 | 217 | 352 | 13 | 0.83 | 0.15 | ||
VeriSmart | 57 | 262 | 307 | 19 | 0.75 | 0.16 | ||
AtLeastOne voting | 74 | 100 | 469 | 2 | 0.97 | 0.14 | ||
Majority voting | 63 | 324 | 245 | 13 | 0.83 | 0.20 | ||
Power-based voting | 74 | 100 | 469 | 2 | 0.97 | 0.14 | ||
Arithmetic | 45 | Mythril | 37 | 460 | 140 | 8 | 0.82 | 0.21 |
Semgrep | 0 | 600 | 0 | 45 | 0.00 | NaN | ||
Slither | 4 | 547 | 53 | 41 | 0.09 | 0.07 | ||
VeriSmart | 44 | 256 | 344 | 1 | 0.98 | 0.11 | ||
AtLeastOne voting | 44 | 216 | 384 | 1 | 0.98 | 0.10 | ||
Majority voting | 38 | 466 | 134 | 7 | 0.84 | 0.22 | ||
Power-based voting | 44 | 216 | 384 | 1 | 0.98 | 0.10 | ||
Unchecked Return Values | 116 | Mythril | 67 | 519 | 10 | 49 | 0.58 | 0.87 |
Slither | 89 | 422 | 107 | 27 | 0.77 | 0.45 | ||
Solhint | 87 | 406 | 123 | 29 | 0.75 | 0.41 | ||
AtLeastOne voting | 89 | 388 | 141 | 27 | 0.77 | 0.39 | ||
Majority voting | 87 | 438 | 91 | 29 | 0.75 | 0.49 | ||
Power-based voting | 89 | 388 | 141 | 27 | 0.77 | 0.39 | ||
DoS | 38 | MAIAN | 1 | 586 | 21 | 37 | 0.03 | 0.05 |
Mythril | 12 | 544 | 63 | 26 | 0.32 | 0.16 | ||
Slither | 24 | 507 | 100 | 14 | 0.63 | 0.19 | ||
Solhint | 9 | 549 | 58 | 29 | 0.24 | 0.13 | ||
AtLeastOne voting | 26 | 423 | 184 | 12 | 0.68 | 0.12 | ||
Majority voting | 15 | 554 | 53 | 23 | 0.39 | 0.22 | ||
Power-based voting | 26 | 444 | 163 | 12 | 0.68 | 0.14 | ||
Bad Randomness | 4 | Mythril | 1 | 618 | 23 | 3 | 0.25 | 0.04 |
Semgrep | 0 | 637 | 4 | 4 | 0.00 | 0.00 | ||
Slither | 1 | 586 | 55 | 3 | 0.25 | 0.02 | ||
Solhint | 1 | 631 | 10 | 3 | 0.25 | 0.09 | ||
AtLeastOne voting | 3 | 569 | 72 | 1 | 0.75 | 0.04 | ||
Majority voting | 0 | 626 | 15 | 4 | 0.00 | 0.00 | ||
Power-based voting | 3 | 573 | 68 | 1 | 0.75 | 0.04 | ||
Front Running | 34 | Mythril | 25 | 453 | 158 | 9 | 0.74 | 0.14 |
AtLeastOne voting | 25 | 453 | 158 | 9 | 0.74 | 0.14 | ||
Majority voting | 25 | 453 | 158 | 9 | 0.74 | 0.14 | ||
Power-based voting | 25 | 453 | 158 | 9 | 0.74 | 0.14 | ||
Time Manipulation | 35 | Mythril | 28 | 539 | 71 | 7 | 0.80 | 0.28 |
Slither | 34 | 493 | 117 | 1 | 0.97 | 0.23 | ||
Solhint | 34 | 459 | 151 | 1 | 0.97 | 0.18 | ||
AtLeastOne voting | 34 | 459 | 151 | 1 | 0.97 | 0.18 | ||
Majority voting | 34 | 492 | 118 | 1 | 0.97 | 0.22 | ||
Power-based voting | 34 | 492 | 118 | 1 | 0.97 | 0.22 |
Vulnerability Class | Voters | Inverter: [Tools] | Vote Strategy |
---|---|---|---|
Reentrancy | [Mythril, Slither, Solhint] | Semgrep: [Slither] | AtLeastOne |
Access Control | [MAIAN, Mythril, Semgrep, Slither, Solhint, VeriSmart] | AtLeastOne | |
Arithmetic | [Mythril, Slither, VeriSmart] | AtLeastOne | |
Unchecked Return Values | [Mythril, Slither, Solhint] | AtLeastOne | |
DoS | [Mythril, Slither, Solhint] | MAIAN: [Slither] | AtLeastOne |
Bad Randomness | [Mythril, Slither, Solhint] | Semgrep: [Slither, Solhint] | AtLeastOne |
Front Running | [Mythril] | AtLeastOne | |
Time Manipulation | [Mythril, Slither, Solhint] | Majority |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alsunaidi, S.J.; Aljamaan, H.; Hammoudeh, M. MultiTagging: A Vulnerable Smart Contract Labeling and Evaluation Framework. Electronics 2024, 13, 4616. https://doi.org/10.3390/electronics13234616
Alsunaidi SJ, Aljamaan H, Hammoudeh M. MultiTagging: A Vulnerable Smart Contract Labeling and Evaluation Framework. Electronics. 2024; 13(23):4616. https://doi.org/10.3390/electronics13234616
Chicago/Turabian StyleAlsunaidi, Shikah J., Hamoud Aljamaan, and Mohammad Hammoudeh. 2024. "MultiTagging: A Vulnerable Smart Contract Labeling and Evaluation Framework" Electronics 13, no. 23: 4616. https://doi.org/10.3390/electronics13234616
APA StyleAlsunaidi, S. J., Aljamaan, H., & Hammoudeh, M. (2024). MultiTagging: A Vulnerable Smart Contract Labeling and Evaluation Framework. Electronics, 13(23), 4616. https://doi.org/10.3390/electronics13234616