1. Introduction
The rapid development of communication and data analysis technology has caused various paradigm changes in the area of networks. The commercialization of fifth-generation (5G) communication and the growth of the Internet-of-Things (IoT) have connected various devices to the network, and edge and cloud computing technologies have enabled high-level services such as smart cities and SCADA networks [
1]. These changes have dramatically increased the size and diversity of the entire network, creating a variety of added value along with large amounts of data collected from various sources [
2].
However, the increase in connectivity and diversity among devices constituting the network have caused various problems in terms of information security [
3]. The variety of networks has increased the types and numbers of vulnerabilities, and this has resulted in the expansion of attackers’ attack vectors [
4]. Attackers can use advanced attack vectors to perform more intelligent and targeted attacks. In particular, the incidence of threats, such as the advanced persistent threat (APT), which carry out long-term attacks for specific purposes, is continuously increasing [
5]. These advanced threats continuously collect information about the specific targets over a long time and use targeted attack techniques that exploit various vulnerabilities to maximize the ability to attack. This type of attack is more difficult to detect on other nodes and takes much more time to determine if a breach has occurred [
6]. In addition, as a result of the diversity of the network, many new vulnerabilities can emerge, resulting in zero-day attacks using these unknown vulnerabilities. Zero-day [
7] attacks are usually severe because they can cause lasting damage until security patches become available. It is also challenging to detect a zero-day attack because it uses unknown attack patterns [
8].
Existing security and incident response systems, represented by firewalls, intrusion detection/prevention systems (IDS/IPS), security information and event management (SIEM) systems, are not sufficient to respond to unknown attack patterns. To cope with and predict the advanced threats that use new attack vectors and patterns, large amounts of data are essential for in-depth analysis, such as machine learning or deep learning. However, because the types, techniques, and victims of the attacks are quite different, the types of observed data are very diverse. In particular, it is challenging to collect a large amount of security-related data in a feasible form. Thus, there is a need to integrate differentiated security systems and share threat-related information in a usable form to raise the level of understanding of cyber threats and establish active and effective countermeasures. The cyber threat intelligence (CTI) system is a threat analysis and information sharing system for improving the understanding of cyber threats and proactively responding to them. CTI systems enhance the understanding of cyber threats by reorganizing and analyzing threat-related data into a formalized form. Additionally, the core of the CTI system is to maximize the threat response capability of each node by sharing information. This approach enables profiling of attack types and patterns, attackers, and attack groups, thereby predicting potential threats and responding proactively.
However, CTI systems also face the challenge of collecting the amount of data required to analyze and share. To collect a large amount of data, not only an internal data collector but also open source intelligence (OSINT) and various data-collection channels are additionally used. However, the data collected from such sources may be inaccurate or malicious. Because the CTI system forms reputation information for a specific network node, an attacker can perform a Sybil attack that spreads a large number of malicious data to isolate a specific node and undermine the availability of the network. Thus, resistance to Sybil attacks is a security requirement that must be considered in the operation of the CTI system.
This study proposes a blockchain-based open CTI framework that can verify the validity of data by giving traceability, integrity, and Sybil-resistance. The proposed framework consists of contributors which collect and share threat-related data, consumers which consume such data, and feeds that provide CTI data sharing services. The proposed framework allows data collection through contributors to maximize the ability to collect threat-related data, while at the same time providing a mechanism to prevent Sybil attacks from malicious contributors. An attacker may perform an attack that damages reputation information of a specific node using malicious contributors and miners. The proposed framework includes a mechanism to validate the data provided by contributors to prevent the continuous distribution of data by malicious contributors. The data verification performed by the CTI feed degrades the malicious contributor’s data dissemination capability by evaluating the data contributor’s reliability. The framework also increases the mining costs of malicious miners by undermining the ability of malicious contributors to lose their deposits. This mechanism allows the CTI system to block the malicious data injection automatically.
This paper proposes a blockchain-based open CTI framework for collecting reliable data from various channels and proposes a design method to implement the framework.
Section 2 introduces the related works and security considerations of the CTI system, and
Section 3 describes the system model.
Section 4 illustrates the proposed framework in detail for each layer, and we propose the detailed implementations of our proposal in
Section 5. In
Section 6, the simulation results of the proposed framework are presented in terms of the proposed security considerations, and
Section 7 concludes this research.
2. Related Work
In this section, we discuss the background, related research, and security considerations related to CTI technology.
2.1. Background
In this section, we describe the basic concepts and principles of CTI and the characteristics of Sybil attacks that target CTI systems.
2.1.1. CTI System
The CTI system is an evidence-based intelligence threat detection and prevention system [
9]. The final goal of CTI is to have the capability of a preemptive response to cyber threats, such as the advanced persistent threat (APT) and zero-day attacks, and to profile the attackers and groups of attackers. The CTI system collects threat-related data from various channels, analyzes, and shares the information with other systems. The CTI system analyzes network logs, system logs, firewall logs, traffic, reputation information of network resources, and information collected from a security information and event management (SIEM). The CTI system extracts information such as attack patterns, identifiers, malware, attackers, and tactic-technique-procedures (TTP) from various types of data, and expresses them as entities to analyze the association between them. Structured threat information expression (STIX) [
10] is the most commonly used CTI data expression language, and TAXII is the data communication protocol for exchanging data expressed as STIX. The analyzed CTI data is expressed in the STIX language and then interchanged through the TAXII protocol [
10]. Fast and efficient sharing of data is essential to mitigate cyber threats proactively. If the malicious behavior of the attacker is observed, the CTI system generates information associated with cyberthreats by combining collected and preidentified information. This information includes the type and procedure of attack and the course of action. By sharing this information with other nodes participating in the CTI system, the information on the cyber threat can be spread quickly to other nodes. Each node establishes and updates its security policy using this information. The CTI system also performs profiling of attackers and groups of attackers to thwart zero-day attacks. By stereotyping the behavior patterns of attackers and malware, the system can predict future patterns of attacks and quickly establish countermeasures against them.
Figure 1 shows the system model of the traditional CTI system.
2.1.2. Reputation Information
Reputation is one of the representative methods to perform practical functions using CTI. Reputation information allows you to determine the malicious behavior of a network node based on a unique identifier (such as IP, domain, hash). Nodes that detect malicious attacks or threats generate reputation information about the source of the attack and share it with other nodes. CTI system using reputation information can easily convert the received reputation information into a Snort rule or Yara rule to form a practical security rule [
11]. Reputation information is, therefore, one of the most feasible and efficient applications of the CTI system.
2.1.3. Sybil Attack on CTI System
A security policy, which is generated based on reputation information in the CTI system, performs the security function by blocking the resource on the network. If incorrect data are input to the CTI system, the system generates false reputation information, which allows innocent nodes to be blocked from the network. Using this, an attacker can perform a series of Sybil attacks that produce malicious CTI data to manipulate the data so that the CTI system generates false reputation information.
As the first step in a Sybil attack, an attacker can inject malicious data through a naive data-collection point. For example,
ThreatCrowd.org [
12], one of the well-known OSINT CTI services, uses voting results as one of the indicators of reputation information for a particular domain. If CTI service users detect malicious activity in a particular domain, they can vote to notify the CTI service that the domain is malicious. Some CTI services, including
ThreatCrowd.org, have no restrictions on who can vote, and an attacker can exploit these vulnerabilities to compromise the domain’s reputation, as shown in
Figure 2. The left screen of
Figure 2 shows the query result of the reputation on the ’seoultech.ac.kr’ domain. An attacker can contribute malicious data to compromise the reputation of target domain by impersonating multiple nodes or users. The right screen of
Figure 2 shows the results of an experimental attack that lowered the reputation of the ’seoultech.ac.kr’ domain after spoofing the IP using the tor browser.
OSINT CTI services cross-reference each other’s data to collect data effectively. For example, data from ThreatCrowd.org is also used by other CTI services, such as
ThreatMiner.org [
13], and other OSINT data-collection and analysis tools, such as
Maltego [
14]. This approach to efficiently collect massive amounts of data can be an effective means to spread a malicious reputation. An attacker can use these data propagation paths in a Sybil attack by using a variety of data contribution paths and using a proper cost to impersonate a valid user.
2.2. State of the Art
CTI systems formalize and classify cyber threat patterns to increase the understanding of cyber threats. In [
15], the types of CTI data in terms of data sharing are categorized. The CTI system can use the kill-chain model [
16] as a threat response technique that formalizes cyber threat stages and uses optimal countermeasures for each attack phase. In [
17,
18,
19], the types of CTI data based on the kill-chain model are classified. Furthermore, to identify cyber threats from the data, [
20] proposed a threat analysis method based on the data. In [
9], they propose a classification model of technologies for CTI data exchange.
The issue of how to represent threat-related data is an essential issue for CTI’s practical use. Indicators of compromise (IoC) is the main index for CTI systems to represent cyber threats. Any information that can identify the target on the network and system, such as the hash value of the malware, the file name, the IP address used to distribute the malware, the domain name, and the URL, can be used as an indicator of IoC. STIX [
10] is a threat-related data representation language and is currently used as a de facto standard. In [
21], they propose an extension of the language model to improve STIX’s ability to express threat data. In [
22], they analyzed the data exchange format using ontology to enhance the capabilities of cyber threat information sharing standard technology.
Data-collection methods and collection channels are also important factors in operating a CTI system. In [
23,
24,
25,
26,
27], they studied how to collect, refine, and operate CTI data from the Darknet and hacker forums. In [
28,
29], they propose a methodology for collecting CTI data from public source information and sharing them. Since the data collected from these sources exist in natural language or a similar form, not in a standardized form, a method of extracting the context of information from this type of data is required. In [
29,
30,
31,
32,
33], they propose methods to collect and analyze CTI data using natural language processing and semantic analysis techniques.
When the data is collected from various data channels, the validity and reliability of the data must be considered. In our previous study [
34], we proposed a model for determining the reliability of data collected from open source intelligence (OSINT). In [
35], they show that CTI data analysis using machine learning technology may be vulnerable to data poisoning attacks. CTI systems also use a data-sharing framework for efficient operation. In [
36,
37,
38,
39], they propose a cyber threat information sharing framework, and [
40,
41] propose a CTI sharing framework through the blockchain.
2.3. Security Consideration
This subsection describes essential security considerations to design a CTI framework: traceability, Sybil-resistance, and privacy.
2.3.1. Traceability
Traceability, as an essential element to verify the context and validity of the data, means the tracing of data from generation to the analysis process and applications [
34]. For the traceability of CTI data, metainformation of data-collection channels, environmental characteristics, and threat information should be configured together with enough strength of integrity fo the data. Metainformation about a collection channel could be a quantitative scale indicating the reliability of the channel. This composition prevents an attacker from continuously disseminating malicious data and allows tracking of data to detect abnormal behavior. In addition, it is possible to determine the importance of the information through the environmental characteristics of the data-collection channel, thereby determining the priority of the data processing process. For example, the context and importance of observed data about malware between the terminal node and the central server are significantly different. Therefore, CTI information should provide the context for cyber threats, including the environmental characteristics of the data channel.
2.3.2. Sybil-Resistance
Sybil attacks are attacks where an attacker configures multiple nodes for a specific purpose, disguising the attacker’s action from actions of the crowd. This attack, which can occur in networks such as social network services (SNS) or blockchains, can be fatal in an environment where the identity and owner of terminal nodes are not identifiable. In particular, networks that use a reputation for specific nodes are more vulnerable to Sybil attacks. An attacker can cause the node to be blocked from the network by disseminating the target node as dangerous. If the attacker’s capabilities are sufficient during the CTI data sharing process, the node can be completely excluded from the network, which could compromise the availability of the entire network. Thus, the CTI system requires a fundamental defense mechanism against Sybil attacks.
2.3.3. Privacy
The biggest problem of operating a CTI system is the privacy issue [
42]. The CTI system aims to increase the understanding of cyber threats by sharing information about the threat and IoC with other nodes. When the identification of a specific object is exposed during this information sharing process, the actions involved in information sharing may cause a deterioration of the source’s reputation. Therefore, there is a need to apply deidentification techniques in the process of sharing data. However, deidentification means the partial loss of data, and this could reduce the usefulness of the information. If the quality of information is decreased in the course of preserving privacy, the performance of the CTI system deceases. Thus, CTI systems must be able to meet the trade-off between privacy and the usefulness of the information in the process of data sharing.
3. System Model
In this section, we describe the system model of blockchain-based CTI data-collection and sharing framework. The proposed framework consists of nodes and entities with multiple roles.
Table 1 describes the notations used in the proposed framework.
3.1. Blockchain-Based Cyber Threat Intelligence System Model
As shown in
Figure 3, the blockchain-based CTI system is composed of five main entities and their interactions: feeds, contributor, consumer, miner, and blockchain network.
Feeds: defined as . Feeds collect security-related data from users and their data-collection channels, reconstruct it into an actionable CTI information. Each CTI feed defines an evaluation and reward method for the observed data to evaluate and encourage users’ contributions. In addition, CTI feeds have a user list to propagate the analyzed CTI data when a critical threat is reported.
Contributor: defined as . A contributor is a user who participates in the CTI system and shares the threat-related data observed from their internal systems, such as firewall or IDS, with the CTI system. Contributors transmit the observed threat-related log data through the smart contract to the blockchain network of the CTI system and pay a deposit . The data reported by the contributor is evaluated by the function of each CTI feed , and if the data is determined to be useful to the feed, the feed rewards the contributor. Each contributor has an individual reliability value , which is adjusted by the reward provided by the CTI feeds.
Consumer: defined as . A consumer is a user who participates in the CTI system and is the primary entity that consumes CTI data. Consumers can query specific CTI data q to blockchain networks by consuming cost , and each feed provides CTI data corresponding to consumer’s requests and obtains . Also, by registering themselves in the CTI feed’s user list , consumers are periodically provided with information related to cyber threats when the CTI Feed detects a critical threat.
Blockchain Network: A blockchain network performs the core functions of delivering and storing data in the CTI system. Smart contracts implemented on the blockchain allow users to communicate and share data through reliable procedures. CTI feeds can also operate CTI data through a set of procedures implemented as smart contracts. Blockchain-based CTI frameworks can provide a high level of integrity and traceability for data, thus facilitating the assessment of the validity of the data. CTI systems must encrypt the data for privacy while storing them in a block.
Miner: Using a consensus mechanism of the blockchain network, miners store the data request, contribution, and reward transactions in the block. In blockchain-based CTI framework, cryptocurrencies mined by miners are used as a means to use the CTI system, and all users can play both roles as a contributor and as a miner simultaneously. Users spend the cryptocurrency as a cost for requesting CTI data, or they get the cryptocurrency as a reward for data contribution. Each cost on data request and reward on data contribution has a different amount based on the importance of data.
3.2. Threat Model
In this section, we describe the threat models of existing Sybil attacks on the CTI system and the threat models that can arise in the operation of blockchain-based CTI systems.
Malicious Contributor: Attackers can attack CTI systems through malicious contributors by reporting false data. An attacker can pay a high deposit so that the malicious data reported by the attacker are stored first in the block. In addition, by reporting a large number of redundant data, the attacker can increase the probability of malicious data stored in the block. By creating a relation between the target node and malware or malicious behavior, the attacker can reduce the reputation of the target node. When this malicious information is input to the CTI system, a security policy may be generated to block access to the target node. In this case, the attacker aims to inject as much malicious data as possible into the system at the least cost e. At this point, the attacker acts to maintain a high level of reliability to increase the probability of malicious data being injected into the system.
Malicious Miners: An attacker can act as both a malicious contributor and a malicious attacker to efficiently inject malicious data into the CTI system. The attacker aims to inject as much malicious data as possible into the system. At the same time, since continuous malicious data injection requires the deposit , the attacker aims to recover as much deposit e as possible in the mining process.
The proposed framework should be able to block such malicious user behavior systematically.
4. BLOCIS: Blockchain-Based Open Cyber Threat Intelligence System
The traceability of the data and the validity of the data and its contributors must be covered to build a reliable open CTI system. We used blockchain to meet these two security considerations. The basic structure and function of the blockchain ensure the traceability of CTI data. Additionally, validity assessments of data and data contributors can be carried out through the smart contract on the blockchain, and the results can also be shared transparently through the smart contract. Furthermore, we covered the privacy issues that may arise when sharing data on the blockchain. In this section, we illustrate the proposed BLOCIS (blockchain-based open cyber threat intelligence system) framework.
4.1. Architecture of BLOCIS
The proposed BLOCIS is a blockchain-based open cyber threat intelligence sharing framework that is resistant to Sybil attacks. BLOCIS classifies the layers according to the environment in which the actual data are collected and operated for data-interchangeability. This section describes each layer of the BLOCIS framework.
Figure 4 shows the architecture of proposed framework. On the basis of the system model mentioned in Chapter 3, BLOCIS consists of three layers: the user layer, blockchain network layer, and feed layer.
User Layer: At the user level, contributors and consumers act as actual users. BLOCIS is an open CTI system that collects data from multiple data sources, including the user’s environment. Users can have internal security systems such as firewalls, IDS, IPS, honeypots, and get benefits from sharing the threat-related data observed with the BLOCIS framework. Contributors also need to convert threat-related data that they collected and observe into standard specifications such as STIX. To end this, the client needs a data parser that can collect and preprocess the data appropriate for the user’s environment. The users can build this parser as an extension of the security system of their environment. Consumers request specific threat-related data and periodically receive CTI reports from feeds through the blockchain network.
Blockchain Network Layer: The BLOCIS framework uses blockchain technology for efficient management and sharing of CTI data. The blockchain network layer is composed of blockchain storage nodes and miner nodes. When observation data are reported from the user layer or when a user requests specific CTI information, the information is transmitted to the blockchain network. Miner nodes obtain a reward by checking the user’s requests and get the deposit by processing them. All processes for querying user requests, and reporting and receiving data from feeds are conducted through smart contracts. The blockchain network ensures the integrity and traceability of the data running in the CTI system by recording data reported from users, information about each contributor, data-collection procedures, and the history between users and feeds.
Feed Layer: In the feed layer, various feeds provide CTI services. These are individual web or application services that provide CTI information to consumers included in their user list. Each feed serves a different purpose and does not need to utilize all of the CTI data reported from the various channels. Each feed has its data evaluation function, which selectively collects data reported to the blockchain network. The feed also determines the validity of the collected data to adjust reputation of the contributors. The feed generates an alert about the data contributor if the data obtained from the blockchain network are determined as being malicious. This alert information lowers the reputation of the contributor. By doing this, the user’s expected result on an evaluation function of feeds decreases, and this makes a malicious user’s data contribution more difficult.
4.2. CTI Data Contribution and Sharing Process
In this section, we describe the CTI data sharing process of the proposed BLOCIS framework. Inspired by [
40], we use a cryptocurrency as a token to represent the reliability and solvency of users. The data sharing process of BLOCIS consists of the five steps for CTI data sharing and propagation.
Step 1: In the first step, users such as contributors or consumers register their account and address to the blockchain network. The blockchain network gives them tokens for the solvency of CTI data requests and reporting. The blockchain network adjusts the initial reliability of each contributor and transmits an encryption key to be used in the data contribution and sharing processes to the users.
Step 2: The user (contributor) converts observed data into STIX-based CTI data and transmits it to the blockchain network. The contributor sets target feeds to provide their data and executes a smart contract to contribute data to that feed. This smart contract receives information about data contributors, target feeds, CTI data, and deposits as input.
Step 3: The smart contract first performs data validation using reported data. Validation is a prefiltering operation to detect unstructured data or noise data and perform verification of the data format and integrity. If the reported data are valid, the smart contract performs the following steps. If the entered data are not valid, the smart contract adjusts the reliability of the contributor who contributed the abnormal data using the penalty term p.
Step 4: The smart contract executes the data evaluation function of the target CTI feeds on valid input. This function evaluates the validity and importance of the reported CTI data according to the feed’s internal policies and determines whether to accept the data based on the results of the evaluation. If the results of the evaluation function are higher than the feed’s criteria, the feed stores the reported CTI data in an internal database, and gives rewards to the contributor by offering cryptocurrency and increasing the contributor’s reliability. If the result of the evaluation function is lower than the evaluation criteria, the smart contract ignores the data and adjusts the contributor’s reliability according to a predetermined policy.
Step 5: After the evaluation of feeds, each feed analyzes new data with their internal strategies, policies, and database to find a substantial threat that the data represents. If a critical threat is expected through the accumulation of CTI data, the feed broadcasts the report on the expected threat to the users which are in the feed’s list. The broadcast process is delivered directly to the user through a separate and reliable communication channel, and log information about the broadcasted data is encrypted and stored on the blockchain.
5. Implementation of BLOCIS
In this section, we illustrate the detailed scheme and procedures of BLOCIS using the pseudocode of the algorithm and smart contracts.
5.1. Environments
For the implementation of the proposed BLOSIS framework, we used the Ethereum framework [
43]. The functions of BLOCIS are implemented using Solidity language [
44]. We used the Ganache framework for the blockchain environment and the Truffle framework as an integrated development environment to write and compile the smart contracts. In addition, we used Metamask as a wallet interface for the user.
5.2. Smart Contract for CTI Data Sharing
In this subsection, we illustrate the detailed content of the smart contract that composes the proposed framework. The smart contracts are designed based on the interactions between each node in the framework. To implement the interactions, we devised three smart contracts: the user management contract (), the data report contract (), and the alert contract ().
5.2.1. User Management Contract (UMC)
As the open network, the BLOCIS includes various types of client users, and each user is classified into two roles: consumer or contributor. To manage and adjust the action and behavior of users, each user should enroll their identity to the blockchain network, where the address of the user is the only way to identify them. This address helps the user to keep their privacy. Each user enrolls their address into the blockchain network, and into the broadcast list of CTI feeds. The broadcast list is used to disseminate the alerts when the critical threat is observed. Furthermore, when a contributor gets into the blockchain network, the proposed framework gives them the initial reliability. This reliability is used as a reward for data contribution. Algorithm 1 shows the procedure of
. Algorithm 1 conducts initialization of user reliability and key exchange between users and feeds using a key-exchange scheme based on a public-key cryptographic scheme such as RSAES-OAEP [
45]. Each user and feed have their public and private key pair, and these are used to verify each other and to share a secret key for encrypted data sharing.
Algorithm 1: Pseudocode of User Management Contract (UMC) |
|
5.2.2. Data Report Contract (DRC)
CTI data collected from the end node (user) is encrypted using a preshared secret key k, and then reported through a smart contract. However, in the open data-collection channel, the validity of data is one of the significant issues. To evaluate the validity of reported threat-related data, each feed assesses the validity of reported data using their evaluation function based on their internal policy.
If the result of the evaluation function exceeds the threshold defined by each feed, the feed sends the transaction that includes reward on the contribution of the user. The reward on the contribution adjusts the reliability of the contributor and gives the token as an incentive. If the result of the evaluation function is lower than the threshold, the feed ignores the contributions and adjusts the reliability of the contributor with a penalty term p. This penalty term mitigates the impact and damage from invalid or malicious contributions and can be adjusted considering the network circumstances, such as number of reported data, users, and contributors. Algorithm 2 shows the data reporting process and rewards for the user.
In the process of contributing data, the size of the deposit is usually set smaller than the amount of compensation for the contribution. As a result, users can accumulate their assets (currency) through continuous data contributions. This accumulated asset represents the user’s activity. An enormous asset means that the user has contributed a lot of high-quality data, which can be another indicator of the reliability of the user.
Futhermore, the user’s deposit is entered as a parameter in the data evaluation function of each CTI feed. This parameter can make the user adjust the amount of deposit to adjust the output. By spending high deposits in data , users can increase the probability of successful reporting and the priority of the contribution. In other words, users can use their assets as their ability to contribute data.
CTI feeds generally have dedicated purposes, some of which may require critical, sensitive, and urgent data. CTI feeds that only need reliable data can filter out users who contribute the CTI data by setting a high minimum deposit amount. In addition, by intentionally setting high deposits and low compensation (even less than deposits), only voluntary data contributions can be allowed. This operation strategy of the CTI feed can improve the reliability of the reported data, and users can use the reliable CTI feed by consuming assets.
Algorithm 2: Pseudocode of Data Report Contract (DRC) |
|
5.2.3. Alert Contract ()
The alert contract disseminates the threat-related alerts to the consumers. CTI feeds in the BLOCIS framework continuously analyze reported data to make profiles on the cyber threat. Each feed has its analysis mechanism for a specific purpose. If a feed
deduces a cyber threat from the result of data analysis, it alerts the information to the users in their broadcast list
. Metadata about these alerts are encrypted with the secret key and stored in the blockchain networks, and the full contents of encrypted CTI information are transmitted to the users of the user layer through the blockchain or other data links (Algorithm 3).
Figure 4 shows the detailed procedures of alert contracts.
Algorithm 3: Pseudocode of Alert Contract (AC) |
|
6. Experiment and Result
This section explains the experiment and simulation results that we performed to prove the efficiency of the proposed framework. The primary purpose of the proposed framework is to have the ability to resist the threat model of Sybil attacks mentioned in
Section 3.2 Therefore, in this experiment, we evaluated how the attacker’s attack ability in the CTI system could be compromised.
To evaluate the attacker’s ability to attack, we first defined the attacker’s ability to attack. The attacker’s ability to attack Sybil is related to the amount of cryptocurrency the attacker has and the attacker’s reliability value. Since the amount of cryptocurrency possessed by an attacker can be used to increase the probability of malicious contribution, it represents the risk of a Sybil attack from a short-term perspective. The attacker’s reliability is related not only to the probability of malicious contribution but also to a long-term attack. If an attacker can maintain a high degree of reliability while simultaneously conducting a malicious contribution within the CTI system, the attacker can continue to contribute malicious data, thereby compromising the reliability of the entire CTI system. Therefore, in this experiment, we considered the attacker’s ability to attack Sybil as the amount of cryptocurrency possessed by the attacker and the reliability of the attacker’s node.
This experiment shows how the proposed framework can reduce the attacker’s attack ability through smart contracts implemented on the blockchain network.
6.1. Normal and Malicious Contributor
In the experiment, we simulated two types of contributors: a normal contributor and malicious contributor. A CTI system that uses open sources as the data-collection channel has the risk of noise data. Noise data are defined as useless or unusable data or data intentionally modified by an attacker [
34]. This definition of noise data means that the malicious contribution through noise data could be conducted not only by the attacker, but also occasionally by the normal user. Furthermore, determining whether a threat-related data are malicious (i.e., fabricated by an attacker) is different for different CTI systems’ operational algorithms and policies. Evaluating the accuracy of data requires additional postanalysis of data, such as cross-validation. Thus, we simulated the behaviors of normal and malicious contributors using the possibility of the noise-data contribution. To simulate the possibility of the noise data, we used the normal distribution. Each contributor in the experiments has a mean and standard deviation for the possibility of noise-data contribution. In addition, in our experiment, we supposed that the malicious contributor is not naive. Thus, the malicious contributor imitates the behavior of the normal contributors. We set up a malicious contribution cycle
t for each attacker, and the attacker attempts the malicious contribution after
t normal contribution cycles. These periodic Sybil attacks retain the reputation of malicious contributors during their attacks. In the experiment, we set various malicious contributors with different Sybil attack cycles.
6.2. Reliability of Malicious Contributor
Each contributor has a capacity of
on noise data.
m is the mean, and
is the standard deviation of possibility for noise-data contribution. To show the trends of reliability for each contributor, we simulated three contributors with different possibilities of noise-data contribution: normal contributor
, malicious contributor
with
, and recklessly malicious contributor
with
.
Figure 5 shows the trends in the reliability of each contributor. The malicious contributor
has the same capacity as the normal contributor and performs the Sybil attack with noise data every
normal contributions. The recklessly malicious contributor
is set with much lower capacity and short Sybil cycle
to emphasize the availability of the attacker by comparing
.
As illustrated in
Section 4 and
Section 5, each contribution is validated by the CTI feed
F. If the result of the validation function is lower than the threshold, that means a contribution is determined as noise data, the contributor of that data does not get the rewards and loses the deposit. In addition, the reliability of each contributor affects the result of the validation function. Thus, the reliability of contributors is related to the availability of contributors. In the experiment shown in the
Figure 5, the threshold of validation function was set to
. Each solid line shows the reliability value of each contributor, and each dashed line shows the cryptocurrencies each contributor has.
The contributions from normal contributors are determined as valid data by the CTI feed F. This probability increases the reliability of normal contributors by the iterations. Thus, with each iteration, the reliability of the normal contributors (solid blue line) increases continuously and converges to . However, the reliability of the malicious contributor (solid red line) and the recklessly malicious contributor (solid black line) converged to the after iterations. Even though the attacker imitates the behaviors of normal contributors through valid contributions, the attacker cannot retain their reputation since the loss from the penalty of malicious contributions lowers their reputation more significantly compared to the gains through the rewards. Thus, the results show that the proposed framework can effectively and promptly screen the malicious contributions of the Sybil attack.
6.3. Cost of Malicious Contribution
Increasing the cost of the Sybil attack gives the Sybil-resistance to the system. In the proposed framework, each contribution consumes default cost (deposit). To maintain the ability for contribution, each contributor needs to get the rewards from the CTI feeds. However, the reliability of contributors affects the result of the validation function, and this mechanism makes the attacker lose their ability for contribution by preventing them from gaining the rewards for their malicious contributions. In
Figure 5, the amount of cryptocurrencies of the normal contributor (dashed blue line) has continuously increased since the normal contributor gets the rewards through their valid behavior. However, the cryptocurrencies of the recklessly malicious contributor (dashed black line) have dramatically decreased since the low reliability of that node means them continuously losing the deposit . Even though the malicious contributor imitates the behavior of a normal contributor, they could not earn a meaningful amount of cryptocurrencies or even lose it. In the experiment shown in the
Figure 5, the malicious contributor, which has the Sybil cycle
, has lost all their cryptocurrencies after 170 iterations.
In our experiment, we supposed that the attacker imitates the behavior of a normal contributor. To disguise themselves as a normal user, the attackers need to contribute valid data. These valid contributions are used as a cost to perform the Sybil attack, maintaining the reliability of the attacker for continuous attack. Thus, we performed simulations to show the impact of the attacker’s cost at the point of the Sybil cycle
t.
Figure 6,
Figure 7,
Figure 8 and
Figure 9 show the property of each contributor. In these experiments, we simulated one normal contributor and four malicious contributors by their Sybil cycle. Every contributor has the same capacity for normal contribution
, and each malicious contributor reports invalid data
after every
t normal contributions.
The result of the CTI feed’s validation function gives the penalty
p to the updates of the contributor’s reliability. The high weight of the penalty term accelerates the decrease in reliability caused by invalid contributions. In the figures, we show the impacts of the attacker’s cost (Sybil cycle
t) and the penalty term
p. In
Figure 6, when the penalty weights are much lower than the attacker’s Sybil cycles (i.e.,
), the system cannot screen the tricky contributors with high Sybil cycles such as
or
. Their reliabilities became similar to the reliability of the normal contributors. However, when the penalty weights are adjusted corresponding with the attacker’s cost (i.e.,
), in
Figure 7 and
Figure 8, malicious contributors and their Sybil cycles are revealed. When the penalty weight is enough (
), in
Figure 9, the availability of an attacker converges quickly to 0. Each CTI feed could adjust the penalty term concerning the circumstances of the whole network. These results show that by evaluating the data reported from users through the blockchain network and sharing the evaluation results, it is possible to effectively block attackers who attempt to perform Sybil attacks. In the proposed CTI framework, attackers quickly lose cryptocurrency assets in the process of conducting Sybil attacks. This loss leads to an attacker losing the ability to contribute the malicious data. Additionally, even if an attacker is disguised as a normal user to perform a long-term Sybil attack, the attacker shows that he must perform a considerable number of normal contributions to achieve the average reliability level of the normal users. This dramatically reduces the effectiveness of long-term Sybil attacks. Thus, the results of this experiment show that it can effectively mitigate two Sybil threats to the CTI system.
7. Conclusions
CTI technology gives an effective and proactive method to mitigate intelligent and advanced cyber threats. Through data analysis and profiling of cyber threats, the CTI system enhances the comprehension of them and provides actionable countermeasures. For this purpose, the CTI system requires various forms and types of data and a massive dataset. Many CTI feeds, which provide CTI services, use open-source intelligence as a data-collection channel to cover the dataset. However, the major problem of this approach is the reliability of the data. Because this data-collection approach permits unconstrained reporting, an attacker can inject maliciously generated or modified data into the system through a Sybil attack to compromise the reputation of specific nodes. Security policies generated by malicious data can misjudge the reputation of network nodes, and this can seriously deteriorate the availability of the entire network.
The BLOCIS framework introduced in this paper is a way to give Sybil-resistance to the CTI system through blockchain-based smart contracts. In our framework, we defined a three-layered architecture for the blockchain-based CTI system. The proposed framework collects the CTI data from various sources and evaluates the validity of data and contributors. This approach can effectively distinguish malicious contributors in numerous data-collection methods . Evaluating all the data and assessing the contributor’s reliability can isolate the nodes that continuously report invalid or malicious data. In this paper, we suggest a detailed way to operate and implement the proposed framework in the form of smart contracts and explain the evaluation model for the reliability of contributors. Furthermore, to prove the effectiveness and performance of the proposed framework, we performed simulations in terms of the attacker’s reliability and their cost to operate the Sybil attack. In the simulation results, we show that our proposed framework can effectively distinguish the malicious contributor without harmful effects on other normal contributors.
In this research, we evaluated the validity of the data with probability. Evaluating and analyzing the meaning and impact of CTI data is a very domain-specific problem. In future research, we will discuss a way to analyze CTI data considering the risk of Sybil attacks in order to expand the domain of the proposed framework.