A Novel Data Sanitization Method Based on Dynamic Dataset Partition and Inspection Against Data Poisoning Attacks

Lee, Jaehyun; Cho, Youngho; Lee, Ryungeon; Yuk, Simon; Youn, Jaepil; Park, Hansol; Shin, Dongkyoo

doi:10.3390/electronics14020374

Open AccessArticle

A Novel Data Sanitization Method Based on Dynamic Dataset Partition and Inspection Against Data Poisoning Attacks

by

Jaehyun Lee

^1,†

,

Youngho Cho

^1,†

,

Ryungeon Lee

¹,

Simon Yuk

²,

Jaepil Youn

³

,

Hansol Park

⁴ and

Dongkyoo Shin

^4,*

¹

Department of Defense Science (Computer Engineering), Graduate School of Defense Management, Korean National Defense University, Nonsan 33021, Republic of Korea

²

Defense AI Promotion Team, Ministry of National Defense, Seoul 04351, Republic of Korea

³

Department of Defense Cyber Science, Korea Army Academy at Yeongcheon (KAAY), Yeongcheon 38900, Republic of Korea

⁴

Department of Computer Science and Engineering, Graduate School of Convergence Major for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(2), 374; https://doi.org/10.3390/electronics14020374

Submission received: 15 December 2024 / Revised: 10 January 2025 / Accepted: 15 January 2025 / Published: 18 January 2025

(This article belongs to the Special Issue Big Data Analytics and Information Technology for Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning (DL) technology has shown outstanding performance in various fields such as object recognition and classification, speech recognition, and natural language processing. However, it is well known that DL models are vulnerable to data poisoning attacks, where adversaries modify or inject data samples maliciously during the training phase, leading to degraded classification accuracy or misclassification. Since data poisoning attacks keep evolving to avoid existing defense methods, security researchers thoroughly examine data poisoning attack models and devise more reliable and effective detection methods accordingly. In particular, data poisoning attacks can be realistic in an adversarial situation where we retrain a DL model with a new dataset obtained from an external source during transfer learning. By this motivation, we propose a novel defense method that partitions and inspects the new dataset and then removes malicious sub-datasets. Specifically, our proposed method first divides a new dataset into n sub-datasets either evenly or randomly, inspects them by using the clean DL model as a poisoned dataset detector, and finally removes malicious sub-datasets classified by the detector. For partition and inspection, we design two dynamic defensive algorithms: the Sequential Partitioning and Inspection Algorithm (SPIA) and the Randomized Partitioning and Inspection Algorithm (RPIA). With this approach, a resulting cleaned dataset can be used reliably for retraining a DL model. In addition, we conducted two experiments in the Python and DL environment to show that our proposed methods effectively defend against two data poisoning attack models (concentrated poisoning attacks and random poisoning attacks) in terms of various evaluation metrics such as removed poison rate (RPR), attack success rate (ASR), and classification accuracy (ACC). Specifically, the SPIA completely removed all poisoned data under concentrated poisoning attacks in both Python and DL environments. In addition, the RPIA removed up to 91.1% and 99.1% of poisoned data under random poisoning attacks in Python and DL environments, respectively.

Keywords:

deep learning; data poisoning attack; data sanitization; partition and inspection

1. Introduction

With the rapid advancement of Artificial Intelligence (AI) and computing technologies, deep learning (DL) has demonstrated exceptional performance in various fields, including image classification [1], speech recognition [2], natural language processing [3], and fault detection [4]. However, many studies have shown that DL models are vulnerable to adversarial attacks that can severely compromise their reliability and performance [5,6,7,8,9]. Therefore, it is essential to design and construct robust DL models in the presence of adversarial attacks.

Adversarial attacks are typically categorized into evasion attacks or poisoning attacks [10]. In particular, poisoning attacks involve deliberately introducing malicious data into the training dataset to corrupt the DL model’s integrity, degrade its performance, and induce misclassification [11]. While evasion attacks are performed at the inference phase of a DL model, poisoning attacks are conducted at the training phase of a DL model, especially during transfer learning in which a new training dataset is collected from external untrustful sources [12].

Meanwhile, existing defense methods against poisoning attacks can be categorized into the following three approaches [13]. First, data aggregation integrates data from multiple sources to enhance the reliability of collected data and mitigate the impact of poisoned data through various techniques such as majority voting, weighted averaging, and outlier removal [14,15,16,17,18]. Second, data augmentation dilutes the effect of poisoned data by employing various augmentation methods such as Mixup, CutMix, and Dilution [19,20,21,22]. Third, data sanitization removes the poisoned data included in a new dataset by using outlier detection, data complexity values, influence function, etc. [23,24,25,26,27,28,29,30,31,32,33,34].

Existing representative sanitization methods have the following limitations [25,26,29,30]. First, sub-sampling approach-based methods reduce the number of poisoned samples through sub-sampling data from the transferred dataset and thus enhance the model robustness [25,26]. However, randomly reducing the dataset through sub-sampling poses the risk of removing critical, valuable information that should be considered to improve a model. In addition, this may lead to insufficient data for the model to learn from effectively, resulting in degraded performance. Next, partition and removal approach-based methods focus on improving the model accuracy by dividing a dataset into sub-datasets, using them separately for model training, and removing abnormal sub-datasets that significantly degrade the model accuracy [29,30]. However, this approach focuses on the model accuracy and thus misses the chance of detecting and capturing poisoned data, which also requires repeated training and evaluation for each sub-dataset, which leads to high computation complexity. Section 2.4 further analyzes these limitations and provides detailed discussions and illustrative examples.

In this paper, we study advancing the data sanitization approach. Specifically, we propose a novel data sanitization method based on dynamic dataset partitioning and inspection. The proposed method works in the following three steps. In step 1 (data partition), an input dataset is partitioned into multiple sub-datasets in sequential or randomized ways. In step 2 (sub-dataset inspection and removal), each sub-dataset is inspected by a poisoned model detector and removed when classified as poisoned. In step 3 (clean dataset generation), after removing poisoned sub-datasets, a refined dataset is generated and then used for transfer learning. In this study, to make our research problem simple and clear, we assume that a pre-trained model is not poisoned so that we can use it as a poisoning attack detector in step 2.

The main contributions of this study can be summarized as follows:

We proposed a novel data sanitization method based on dynamic partitioning and inspection to defend against data poisoning attacks on deep learning (DL) models. Considering two data poisoning attack strategies (concentrated poisoning attacks and random poisoning attacks) [13,35,36], we design and implement two partition and inspection algorithms such as the Sequential Partitioning and Inspection Algorithm (SPIA) and the Randomized Partitioning and Inspection Algorithm (RPIA).
We conducted two kinds of experiments in the Python environment (Experiment 1) and DL environment (Experiment 2) to validate and evaluate the defensive performance of our two proposed methods under concentrated poisoning attacks and random poisoning attacks. According to our experimental results, the SPIA completely removed all poisoned data under concentrated poisoning attacks in both Python and DL environments. In addition, the RPIA removed up to 91.1% and 99.1% of poisoned data under random poisoning attacks in Python and DL environments, respectively.

The rest of this paper is organized as follows. In Section 2, we overview the background knowledge and related studies. In Section 3, we design our proposed methods based on dynamic dataset partitioning and inspection. In Section 4, we conduct two experiments and then analyze the results. Finally, we conclude with our future research directions in Section 5.

2. Background and Related Works

2.1. Adversarial Attacks

Adversarial attacks (also known as adversarial machine learning) aim to disrupt the reliability and accuracy of AI models. These attacks exploit vulnerabilities in machine learning algorithms and can significantly impact the model’s output.

According to Vorobeychik et al. [37], adversarial attack methods can be classified based on the attack timing, the amount of information the attacker possesses about the model, and the attack’s objective (see Table 1). This classification helps in understanding the attack methods and analyzing their risks, which are discussed below.

First, based on the attack timing, adversarial attacks can be divided into decision-time attacks on algorithms and training-time attacks on models (see Figure 1) [38,39,40]. Specifically, decision-time attacks (also known as evasion attacks) are performed after the model has been trained. These attacks intentionally manipulate input samples by adding perturbation to induce misclassification by the model. Altered inputs are known as adversarial examples. On the other hand, training-time attacks involve deliberately manipulating the training data before the model is trained. These attacks are referred to as poisoning attacks.

Second, based on the attack information, adversarial attacks can be categorized into white-box attacks and black-box attacks [36,40,41]. White-box attacks occur when the attacker has full knowledge of the model’s internal structure (such as algorithms and weights) and training data, allowing for the effective exploitation of specific vulnerabilities. In contrast, black-box attacks occur when the attacker has little or no information about the model. In such cases, the attacker collects data by querying the target model and then builds a pseudo model for further analysis to carry out the attack.

Last, based on the attack’s objective, adversarial attacks can be divided into targeted attacks and reliability attacks [35,36]. Targeted attacks aim to make the model misclassify a specific class i as class j where j ≠ i. Meanwhile, reliability attacks degrade the overall classification accuracy of the model, thereby undermining the model’s reliability.

2.2. Poisoning Attacks on Deep Learning Models

We further explain the poisoning attacks on which we focus in this study. Poisoning attacks are designed to significantly break or degrade the performance of DL models, which modify the training data maliciously to mislead the model into incorrect classifications to ultimately undermine its reliability and accuracy. Poisoning attacks are categorized into three dimensions: attack objectives, execution methods, and the distribution of poisoned data.

First, based on their objectives, poisoning attacks can be categorized into two types: availability attacks (also known as reliability attacks) and targeted attacks [13,35]. Availability attacks aim to degrade the overall performance of a model by inserting maliciously crafted data into the training dataset. The primary objective of availability attacks is to maximize the model’s test loss, weakening its generalization capabilities. These attacks cause a DL model to produce inaccurate or unstable predictions across all input data, significantly undermining its reliability and robustness. For example, in an autonomous vehicle’s traffic sign recognition system, an availability attack prevents the model’s ability to identify traffic signs by poisoned data or adversarial perturbations, potentially resulting in dangerous driving decisions and undermining the system’s reliability [7]. In contrast, targeted attacks focus on causing the misclassification of specific instances according to the attacker’s intention. This is achieved by injecting deliberately manipulated data into the training dataset, ensuring that the instances of a particular class are misclassified as a different, incorrect class. For example, in a spam filtering system, the targeted attack manipulates the model to prevent blocking emails containing specific content from being flagged as spam. Unlike general poisoning attacks, targeted attacks achieve specific objectives without significantly degrading the overall performance of the model [42].

Second, based on the execution method, poisoning attacks can also be classified into dirty-label attacks and clean-label attacks [42]. In dirty-label attacks, the attacker intentionally modifies the labels of training data to mislead the model during the training phase. This approach requires the attacker to have explicit authority to alter the labels of the training data. The mislabeled data distort the model training process, causing the model to fail in correctly classifying specific instances. On the other hand, clean-label attacks manipulate the data features while leaving the training data labels unchanged. By altering the features, the attacker induces the model to learn incorrect patterns during training. These attacks often exploit data collection processes. For example, an attacker could upload manipulated images to a website, which are then collected by a web crawler and included in the training dataset. This occurs frequently during transfer learning [43]. Clean-label attacks are particularly effective as they can corrupt the classification results of specific target instances without significantly degrading the model’s overall performance.

Third, based on the distribution of contaminated data in the dataset, poisoning attacks can be categorized into concentrated poisoning attacks and random poisoning attacks [44]. In concentrated poisoning attacks, the attacker poisons data samples in a specific dataset region. In particular, this concentrated strategy is effective in targeting some classes when they are located densely in a certain region. Meanwhile, random poisoning attacks randomly distribute malicious data across the dataset. This random strategy degrades the overall performance of the model and is thus typically used to implement availability attacks.

2.3. Defense Approach Against Poisoning Attacks

Defenses against poisoning attacks can be categorized into three main approaches: data aggregation, data augmentation, and data sanitization. The details of each approach are as follows [13].

Data aggregation mitigates the impact of adversarial data by collecting and combining data from various sources and thus improves the overall reliability of a dataset [14]. Various methods, such as majority voting [15] and weighted averaging [16] are used for the data aggregation approach. In addition, to address the zero-trust concept that no source can be entirely trustful, truth discovery algorithms have been introduced to verify the authenticity of the collected data [18]. Specifically, these algorithms evaluate the reliability of individual contributors, estimate the trustworthiness of data sources, and infer the true values by applying these reliability assessments.

Data augmentation dilutes the effect of poisoned data by employing various augmentation methods. Borgnia et al. [19] used data augmentation methods such as Mixup [20] and CutMix [21] to reduce the success rate of poisoning attacks while maintaining model accuracy. Mixup generates augmented samples through the linear interpolation of input data and corresponding labels, promoting improved generalization and robustness to adversarial examples, while CutMix enhances training efficiency, generalization, and localization by replacing patches of one image with those from another and proportionally mixing their labels. In addition, Park et al. [22] proposed the dilution-based defense method as a data augmentation approach, which mitigates the impact of poisoning attacks by incorporating additional clean data into the training dataset. This method effectively reduces the proportion of contaminated data and enhances the classification accuracy of deep learning models.

Data sanitization focuses on identifying and removing abnormal data that significantly differ from normal data distributions before training the model [23,24]. By training on datasets with reduced or eliminated poisoned data, models can effectively counter poisoning attacks. However, since there is a risk of misclassifying clean data as malicious, leading to information loss, it is critical to design methods that achieve a high removal rate of poisoned data while minimizing false positives. Representative studies on data sanitization techniques are discussed in detail in Section 2.4.

2.4. Existing Methods Using Data Sanitization Approach

There are two representative data sanitization approaches against data poisoning attacks: Sub-sampling approach [25,26,27,28] and partition and removal approach [29,30,31,32,33,34].

First, sub-sampling approach-based methods reduce the number of poisoned samples by sub-sampling data from the transferred dataset and thus enhance the model robustness [26]. Li et al. [25] proposed sub-sampling by designing a rating matrix sampling method to identify poisoning data. This involves sampling differences from the original matrix and calculating the distance of rating vectors to locate malicious data. Yang et al. [27] identified poisoned data by isolating examples with significantly different gradients during training. In this study, sub-sampling involves iteratively dropping data points from low-density gradient regions using medoid clustering. Poudel et al. [28] proposed optimal sub-sampling by prioritizing data points based on their importance and informativeness, aiming to enhance robustness and interpretability in collaborative filtering systems. Thus, data sub-sampling involves randomly selecting a subset of the dataset rather than using the entire dataset for training. This method aims to reduce memory usage and computational costs, making it efficient for large-scale data environments. However, while advantageous in minimizing computational overhead, randomly reducing the dataset through sub-sampling poses the risk of removing critical, valuable information that should be considered to improve a model. In addition, this may lead to insufficient data for the model to learn from effectively, resulting in degraded performance.

Next, partition and removal approach-based methods focus on improving the model accuracy by dividing a dataset into sub-datasets, using them separately for model training, and removing abnormal sub-datasets that significantly degrade the model accuracy [29,30,31,32,33,34]. As a representative method in this approach, RONI [29] evaluates the impact of each subset on the model’s performance to detect and remove poisoned data. Specifically, this method divides the dataset into several subsets, compares the model’s performance with and without each subset, and removes subsets that significantly degrade the model performance. In addition, P. P. Chan et al. [30] proposed an improved data sanitization method based on a metric called data complexity that distinguishes between poisoned data and clean data. However, this approach focuses on the model accuracy and thus misses the chance of detecting and capturing poisoned data, and it also requires repeated training and evaluation for each sub-dataset, which leads to high computation complexity.

In addition to the above studies, we briefly overview a couple of recent related works as follows. P. P. Chan et al. [31] proposed the L2 defense as a data preprocessing method to counter poisoning attacks. This method primarily analyzes the statistical properties of the dataset to identify data points that deviate significantly from the overall distribution as outliers. These outliers are removed or ignored to minimize their impact on model training. Ho et al. [32] addressed clean-label attacks in malware detection systems using a nested training approach. This method involves iteratively augmenting the training dataset based on consensus among multiple classifiers. As a result, poisoned samples are detected by monitoring error rate changes during classifier agreement checks. Seetharaman et al. [33] introduced a defensive method combining the Slab defense and influence function. The Slab defense projects data points onto a line defined by the distances between class centroids to remove abnormally distributed data points and sanitize the training dataset. The influence function quantifies the impact of specific data points on model performance when they are added or removed from the training dataset. This combined approach further refines the dataset to improve its robustness against poisoning attacks by identifying and mitigating the effects of high-impact data points. Biggio et al. [34] proposed gradient ascent optimization to identify and mitigate poisoned data by analyzing the influence of samples on the SVM decision boundary. Poisoned samples are iteratively removed based on their contribution to increased validation errors.

In this study, we study advancing the data sanitization approach. Specifically, we propose a novel data sanitization method based on dynamic dataset partitioning and inspection. Our dynamic partition strategies effectively remove poisoned data from a new dataset for transfer learning. We explain our defense idea and method in Section 3.

3. Proposed Defense Method

3.1. Basic Idea and Working Steps

To explain the basic idea of our method, we consider a transfer learning scenario where a new training dataset is collected or provided from an outer source. In this case, a part of the incoming dataset is poisoned maliciously.

To defend against poisoning attacks, we propose a novel data sanitization method based on dynamic partition and inspection given a new dataset as follows (see Figure 2). Let us assume that a victim has a DL model M_t1 from training on a clean dataset D_c. To improve the performance of M_t1, the victim collects a new dataset D_t for transfer learning. However, an attacker pre-emptively injects poisoned data D_p into D_t to achieve specific attack objectives, such as degrading reliability or inducing misclassification. Meanwhile, the proposed method mitigates this threat by partitioning and inspecting the poisoned dataset D_t to identify and remove poisoned data. Thus, this defense process yields a cleaner dataset D_u with a zero or significantly reduced poison rate. Consequently, the victim can train the model using D_u to construct a more robust and reliable model M_t2, thereby mitigating the impact of poisoning attacks during transfer learning.

The proposed method operates in the following three steps:

• Step 1 (data partition): The dataset D_t is divided into n sub-datasets; we call this n-way partition. When D_t is partitioned into n sub-datasets,

D_{t} = \sum_{k = 1}^{n} D_{tk}

. For example, when n = 1, D_t is treated as a single sub-dataset without partitioning, and when n = 2 (i.e., 2-way partition), the dataset is divided into two sub-datasets such that D_t = D_t1 ∪ D_t2. For partitioning D_t into n sub-datasets, we propose two partitioning approaches: sequential partitioning and random partitioning. We will explain in more detail in Section 3.2.2.

• Step 2 (sub-dataset inspection and removal): Each sub-dataset is inspected by using the existing model M_t1 as a poisoned model detector. Thus, our method evaluates the accuracy of each sub-dataset given a test dataset. If the measured accuracy is lower than a predefined threshold, the corresponding sub-dataset is considered harmful and thus discarded. Unlike the existing approach [29], this process is performed proactively before conducting transfer learning.

• Step 3 (clean dataset generation): After completing step 2, all non-removed sub-datasets are concatenated to form the dataset D_u. As a result, D_u is used for transfer learning to create a model M_t2. The transfer-learned model M_t2 is more resilient to the effects of poisoned data, thereby preventing performance degradation caused by poisoning attacks and maintaining stability.

3.2. Design

3.2.1. Motivation of n-Way Partition and Inspection

In this section, we explain the necessity of dividing the dataset into n sub-datasets for inspection in our proposed method.

First, we justify the necessity of using a partition and inspection approach to increase the possibility of capturing poisoned data in a dataset. As depicted in Figure 3, consider a dataset D_t with 30 data samples (5 poisoned samples and 25 clean samples). Without applying partitioning (left figure), the poison rate of D_t is approximately 16.7% (=5/30). In this case, if we use a detector whose threshold > 17%, then this poisoned dataset will not be removed and thus used for transfer learning. On the other hand, when the 2-way partition and inspection method is applied (right figure), the poison rate in the left region grows to 33% (=5/15), allowing M_t1 to detect and remove it. As a result, using the final dataset D_u for transfer learning can achieve a zero poison rate. While this partition and inspection method effectively eliminates poisoned data, it will also remove some benign data samples in the detected sub-dataset.

Next, we discuss the necessity of using various partitioning approaches. Consider the dataset D_t as shown in Figure 4. In this case, when we use the detection threshold of 20%, the 2-way partition and inspection fails to remove poisoned data because the poison rates (=13%) in both regions are lower than the threshold. However, when we apply the 4-way partition and inspection method, the dataset is divided into four partitions (region 1, 2, 3, and 4). As a result, the poison rates of region 2 and region 3 will grow to 26.7%. Therefore, the poisoned regions 2 and 3 will be detected and eliminated. This example clearly demonstrates that if the distribution of poisoned data are unknown, using multiple partitioning strategies can effectively improve the possibility of capturing and removing poisoned date samples from the dataset.

3.2.2. Two Partition and Inspection Algorithms: SPIA and RPIA

We designed the following two defense methods (SPIA and RPIA) to address the two types of poisoning attack strategies (concentrated poisoning attacks and random poisoning attacks) introduced in Section 2.2.

Sequential partition and inspection algorithm (SPIA)

In the case of concentrated poisoning attacks where poisoned data are concentrated within specific regions of the dataset, the Sequential Partitioning and Inspection Algorithm (SPIA) can effectively identify and remove such poisoned data.

Figure 5 illustrates an example of a dataset affected by a concentrated poisoning attack where the black dots represent benign data and the red dots indicate poisoned data. All poisoned data are located at the beginning of the dataset according to the attacker’s strategy. Our SPIA sequentially divides the dataset into 16 sub-sections, and since the poison rate of the first four sub-sections is 100%, all poisoned data can be removed (see the red box in Figure 5). Meanwhile, the remaining 12 sub-sections are not removed and then used for transfer learning.

Algorithm 1 describes the working steps of the SPIA. The SPIA takes as input the model M_t1, which is pre-trained based on the clean dataset D_c, the dataset D_t for transfer learning, the detection threshold δ, and the number of partitions n. In the initial step (line 2), the algorithm calculates the base size for partitioning the dataset D_t of size m into n partitions; the floor function is used to determine the size of each partition. Following this, line 3 calculates m mod n to distinguish between r partitions and (n-r) partitions. Lines 4–6 initialize the variables necessary for partitioning. The loop in lines 7–16 generates r partitions of size q + 1 and (n-r) partitions of size q, which operate in O(n) in terms of algorithmic time complexity. The dataset D_t is sequentially divided based on the calculated partition sizes and numbers. Lines 17–21 evaluate each partition’s accuracy using the pre-trained model M_t1. If a partition’s accuracy meets the threshold defined as the difference between M_t1’s accuracy and δ, it is included in D_u. Otherwise, it is discarded. Finally, the refined dataset D_u is used to perform transfer learning, creating a new model M_t2 that minimizes the influence of incoming poisoned data and results in a more robust model.

Algorithm 1: Sequential Partition and Inspection Algorithm (SPIA)
Input: M_t1: a model pre-trained on a clean dataset D_c D_t: a new dataset of size m from an outer source δ: detection threshold n: the number of partitions
Output: D_u: a cleaned dataset after inspection
1	load model M_t1, D_t
2	$q = \frac{m}{n}$ # the base size of each dataset with floor function $⌊ a ⌋$
3	r = m mod n
4	D_u = [ ]
5	Partitions ← [ ] # List to store the split datasets
6	index ← 0 # Starting index for splitting
7	for i = 1 to r do
8		Partition ← D[index: index + q + 1] # Create a partition of size (q + 1)
9		Partitions.APPEND(Partition)
10		index ← index + q + 1
11	end for
12	for i = r + 1 to n do
13		Partition ← D[index: index + q ] # Create a partition of size q
14		Partitions.APPEND(Partition)
15		index ← index + q
16	end for
17	for each Partition in Partitions do # Evaluate each partition
18		if accuracy(partition, M_t1) $\geq$ (M_t1.accuracy $-$ δ) then
19			D_u ← D_u + Partition
20		end if
21	end for
22	Return D_u

2.: Random partition and inspection algorithm (RPIA)

In the case of random poisoning attacks where poisoned data are distributed randomly across the dataset, the detection performance of the SPIA is limited since the SPIA cannot capture the randomness of the attack. Instead, we propose the Randomized Partitioning and Inspection Algorithm (RPIA) to defend against the random poisoning attacks (see Algorithm 2).

With our randomized partitioning approach, some sub-datasets are more likely to contain more poisoned data due to the random distribution. This increases the chances of detecting and removing poisoned data from specific sub-datasets, making it an effective strategy for addressing random poisoning attacks. Figure 6 illustrates an example of a dataset poisoned by a random poisoning attack. The black dots represent benign data, while the red dots indicate poisoned data distributed throughout the dataset to degrade overall model performance. Unlike the SPIA, our RPIA randomly divides the dataset into 16 sub-sections and then examines each sub-section. If the accuracy of a sub-section is lower than a detection threshold, then that sub-section is removed (see the red regions in Figure 6). Meanwhile, the remaining 12 sub-sections are not removed and then used for transfer learning.

The RPIA (Algorithm 2), in contrast to the SPIA, randomly partitions the data and repeats this process multiple times to detect and remove poisoned data. Algorithm 2 describes the working steps of the RPIA. The RPIA takes as input the pre-trained model M_t1, trained on a clean dataset D_c, the dataset D_t for transfer learning, the detection threshold δ, the number of partitions n, and the number of iterations k. In line 2, the dataset D_t is randomly divided into n partitions, and this process is repeated k times, resulting in a time complexity of O(k

\times

n). Lines 7–14 create r partitions of size q + 1 and (n-r) partitions of size q, with each partition sampled randomly from D_t. Lines 15–19 evaluate the accuracy of each partition using the model M_t1. Partitions that do not satisfy the threshold, defined as M_t1’s accuracy reduced by δ, are excluded from further partitioning. After k iterations, the remaining data from the final refined dataset D_u (Line 21) are used for transfer learning to create a model that minimizes the impact of poisoned data.

Algorithm 2: Randomized Partition and Inspection Algorithm (RPIA)
Input: M_t1: a model pre-trained on a clean dataset D_c D_t: a new dataset of size m from an outer source δ: detection threshold n: the number of partitions k: the number of inspection iterations
Output: D_u: a cleaned dataset after inspection
1	load model M_t1, D_t
2	for iteration = 1 to k do
3		Partitions ← [ ] # List to store the split datasets
4		m ← size of D_t
5		$q = \frac{m}{n}$ # the base size of each dataset with floor function $⌊ a ⌋$
6		r = m mod n
7		for i = 1 to r do
8			Partition ← randomly select a subset of size (q + 1) from D_t
9			Partitions.APPEND(Partition)
10		end for
11		for i = r + 1 to n do
12			Partition ← randomly select a subset of size q from D_t
13			Partitions.APPEND(Partition)
14		end for
15		for each Partiton in Partitions do
16			if accuracy(partition, M_t1) < (M_t1.accuracy $-$ δ) then
17				D_t ← D_t $-$ Partiton
18			end if
19		end for
20	end for
21	D_u ← D_t
21	Return D_u

In sum, the SPIA is more effective when the distribution of poisoned data are concentrated in a certain area. In contrast, the RPIA is anticipated to perform better when the distribution of poisoned data is irregular and random. The rationale for considering various poisoning attacks and partitioning methods lies in the defender’s inability to predict the distribution of poisoned data in advance. Under such uncertainty, selecting the optimal partitioning method is crucial for effectively mitigating the impact of poisoning attacks.

4. Experiments

In this section, we conduct two experiments to validate and evaluate the performance of our proposed methods.

Experiment 1: Performance evaluation using Python simulation
Experiment 2: Performance evaluation using DL model training

4.1. Experiment 1: Performance Evaluation Using Python Simulation

4.1.1. Experimental Setup and Procedure

The purpose of Experiment 1 is to validate the effectiveness of the proposed method through Python simulations. To this end, four performance metrics are used as follows.

Removed poison rate (RPR): RPR represents the ratio (%) of successfully removed poisoned data to all poisoned data after applying the defense method. A higher RPR indicates more effective removal of poisoned data. RPR is the most important metric for evaluating the performance of the defense method.
Attack success rate (ASR): ASR measures the success rate of the poisoning attack and is defined as the ratio (%) of non-removed poisoned data to all poisoned data after applying the defense method. Thus, ASR is equal to 100 − RPR. A lower ASR indicates better defense performance.
Removed benign data rate (RBR): RBR indicates the proportion of benign data falsely removed during the partition and inspection and is defined as the ratio (%) of falsely removed benign data to all benign data. A lower RBR reflects more accurate detection performance.
Accuracy: Accuracy measures how a DL model correctly predicts given input. Accuracy is calculated as (TP + TN)/(TP + TN + FP + FN) where TP denotes positive samples correctly classified as positive, TN denotes negative samples correctly classified as negative, FP denotes negative samples misclassified as positive, and FN denotes positive samples misclassified as negative. Thus, a higher accuracy indicates a better performance of the DL model. We note that since Experiment 1 is a Python simulation experiment to briefly analyze the partition and inspection performance of our proposed methods and thus DL model training is not conducted, and measuring accuracy is not used. Meanwhile, we use the accuracy metric in Experiment 2.

For the experimental environment, we used Python 3.9 programming language and a desktop PC with an AMD Ryzen 7 1700 CPU and a GeForce RTX 1080Ti GPU with 11 GB of RAM.

The specific experimental setups are as follows.

Data structure and experimental design: To evaluate the feasibility of the proposed methods, we implemented a Python-based simulation and structured a data pool with integers ranging from 1 to 10,000. For each integer, we determined whether it was correctly detected or misclassified by the detection accuracy (81.34%) of a pre-trained model used in the DL experiment described in Section 4.2.
Poisoning attack methods: To vary the distribution of poisoned data in a new dataset, we considered two poisoning attacks (concentrated poisoning attacks and random poisoning attacks). For concentrated poisoning attacks, the first 20% of the dataset was designated as poisoned data. In contrast, for random poisoning attacks, we selected and poisoned 20% of the dataset randomly.
Partition methods for defense: For partition methods, two partition and inspection algorithms (SPIA and RPIA) were used and implemented based on Algorithm 1 and Algorithm 2, respectively. The detection threshold δ was set to 0.2. For the number of partitions n, we used 2, 4, 8, 10, 50, 100, 200, 500, 1000, 2000, 5000, and 10,000. For the RPIA, the number of iterations k was set to 100. The total number of data samples was 10,000.
Inspection methods for defense: Partitioned datasets were inspected by using a pre-trained DL model M_t1 as a detector. To implement this process in the Python experiment, for data samples of each sub-dataset, we classified them according to the accuracy (81.34%) of a pre-trained model used in the DL experiment in Section 4.2. If the evaluated accuracy of a sub-dataset was lower than a detection threshold of 61.34%, then it was removed. After the inspection and removal processes were completed, the performance metrics were calculated for the refined dataset combining all remaining sub-datasets.

4.1.2. Results and Analysis

We explain our experimental results and findings as follows.

First, under concentrated poisoning attacks, both the SPIA and the RPIA effectively removed poisoned data, and the SPIA showed superior defense performance compared to the RPIA (see Table 2, Figure 7). Specifically, when applying the SPIA (from n = 2 to n = 1000), all poisoned data were successfully removed (RPR = 100 and ASR = 0) as indicated in the green cells in Table 2. The best performance of the SPIA was observed when the partition size n = 10, 50, 100, and 200 (see the yellow cells in Table 2), where all poisoned data were eliminated (RPR = 100 and ASR = 0) while preserving all benign data (RBR = 0). For comparison, the baseline performance without any defense method showed ASR = 100, RPR = 0, and RBR = 0 (see the gray cells in Table 2). These results demonstrated that our SPIA can completely remove contaminated data samples concentrated in the first 20% of the dataset manipulated by the concentrated poisoning attack. For example, at n = 1000, the RPR was 100%. In this case, 1000 sub-datasets with 10 samples were created and the first 200 sub-datasets were completely removed. However, unlike when n < 2000, when n ≥ 2000, the RPR ranges from 81.3 to 99.5, which is not 100% because there is a chance that the accuracy of sub-datasets can be lower than a poisoned detection threshold due to the size of samples in sub-datasets being relatively low and the detector’s false classification. On the other hand, the RPIA achieved its best defense performance at n = 1000 (see the blue cells in Table 2) by removing up to 90.8% of the poisoned data (RPR = 90.8 and ASR = 9.2). However, it also removed 60.2% of the benign data (RBR = 60.2).

Second, under random poisoning attacks, both the SPIA and the RPIA effectively removed poisoned data, and the RPIA showed better defense performance than SPIA (see Table 3, Figure 8). The best defense performance of the SPIA was observed at n = 8000 (see the yellow cells in Table 3) where 83.1% of the poisoned data were removed (RPR = 83.1), limiting the attack success rate to 16.9% (ASR = 16.9). However, the RBR was 29.3%. These results indicate that the SPIA was less effective in removing randomly distributed poisoned data under random poisoning attacks. In contrast, the RPIA achieved its highest defense performance at n = 1000 (see the blue cells in Table 3) where it removed 91.1% of the poisoned data (RPR = 91.1), reducing the attack success rate to 8.9% (ASR = 8.9). The RPIA’s superior defense performance against random poisoning attacks can be attributed to random partition and iterative inspections. Thus, grouping sub-datasets iteratively and randomly during the partitioning process increased the chance of capturing many poisoned samples in a section. However, the RBR was 59%, which indicates a significant removal of benign data, and it can negatively affect the accuracy of a DL model.

4.2. Experiment 2: Performance Evaluation Using DL Model Training

4.2.1. Setup and Procedure

The objective of Experiment 2 is to evaluate the performance of our proposed methods during the transfer learning of a DL model. To this end, we use all four evaluation metrics including accuracy, introduced in Section 4.1.1.

For DL training and testing experiments, we used the Anaconda integrated development environment with the TensorFlow 2.10 deep learning library running on Python 3.9.

The specific experimental setups are as follows.

Target DL model: The target deep learning model used in this experiment is ResNet18, which is pre-trained on the CIFAR-10 dataset [45]. ResNet18 is a convolutional neural network (CNN) model based on residual blocks. A DL model trained with ResNet18 and CIFAR-10 can be used for developing applications based on computer vision tasks such as object recognition and image classification. It consists of 18 layers and approximately 11.1 million parameters. The CIFAR-10 dataset comprises 60,000 image samples (size: 32 × 32 pixels, the number of classes: 10), representing various objects such as birds, frogs, and airplanes (see Figure 9). For the experiments, an initial clean dataset D_c was created using 30,000 training images and 6000 test images from CIFAR-10. During transfer learning, an additional dataset D_t was used, which consists of 10,000 training images and 2000 test images.
Poisoning attack methods: With a poison rate of 20%, we poisoned 2000 out of 10,000 samples in the dataset such that the labels of the poisoned data were flipped (dirty-label attack) and the poisoned data were distributed by using concentrated poisoning attacks and random poisoning attacks. For concentrated poisoning attacks, the goal was to target specific classes (class 0 and class 1); to this end, the entire dataset was sorted by class order, and then we manipulated the labels of the first 2000 samples randomly. Random poisoning attacks are used to degrade the model’s reliability. In this case, 2000 poisoned samples were randomly distributed across all classes such that 200 poisoned samples were allocated to each class. The resulting poisoned dataset D_p was incorporated into the transfer learning dataset D_t.
Training and test dataset for performance evaluation: To evaluate the performance of the proposed method, the transfer learning dataset D_t containing 10,000 samples was divided into multiple sub-datasets with varying numbers of partitions based on a logarithmic scale. Each subset was inspected using the pre-trained model M_t1. and removed according to the inspection result. After completing the inspection process, the remaining sub-datasets were used for transfer learning to create the updated model M_t. Finally, the performance of M_t2 was evaluated based on a test dataset consisting of 2000 images and four evaluation metrics.

4.2.2. Experimental Results

We explain and analyze our experimental results as the following two aspects.

First, under concentrated poisoning attacks, both the SPIA and the RPIA effectively removed poisoned data, and they improved the performance of the transfer-learned model compared to the model with no defense method (see Table 4, Figure 10). We note that the accuracy of the DL model trained without any defense method was 67.1% (see the gray cells in Table 4). When applying the SPIA, it was possible to completely remove all poisoned data when the number of partitions ranged from n = 2 to n = 8000 (see the green cells in Table 4). Compared to the results in Experiment 1, the range of partitions that successfully removed poisoned data in Experiment 2 was larger because of the difference between classification methodologies (i.e., binary classification in Experiment 1 and multi-class classification in Experiment 2). Due to the nature of multi-class classification, the predictions are more detailed and complex, allowing for the more effective identification of potentially contaminated data. As a result, a higher number of sub-datasets are removed in Experiment 2. The best performance of the SPIA was achieved at n = 100 where all poisoned data were removed (RPR = 100 and ASR = 0). Furthermore, the RBR remained low at 7.5%. Consequently, compared to 67.1% without any defense method, the accuracy (ACC) increased to 81.2% by 14.1 percentage points. In contrast, the best performance of the RPIA was observed at n = 8000 where 98.7% of the poisoned data were removed, resulting in an ACC of 80.7% after transfer learning. Thus, the SPIA showed slightly better performance than the RPIA in defending against concentrated poisoning attacks. Therefore, these results confirm that the proposed method not only effectively removes poisoned data while maintaining a low benign data removal rate but also enhances the accuracy of DL models through transfer learning.

Second, under random poisoning attacks, both the SPIA and the RPIA effectively removed poisoned data and also enhanced the classification performance of the DL model (see Table 5, Figure 11). Compared to Experiment 1, the removal rates of poisoned data increased in both partition methods. Specifically, the best defense performance of the SPIA was observed at n = 8000, where RPR = 97.6 and RBR = 29.9. In addition, the transfer-learned model achieved an ACC of 80.3%, reflecting a 10.9%p improvement compared to the baseline model without any defense method. In contrast, the best defense performance of the RPIA occurred at n = 4000, achieving RPR = 98.8 and RBR = 41.9, with an ACC of 79.9%. This represents a 12.8%p increase compared to the baseline model without any defense method. The RPIA method could remove more than 97% of the poisoned data under random poisoning attacks when n ranges from 500 to 8000 (see the green cells in Table 5).

We now report confusion matrices after testing the best-performing models using no defense method (no sanitization), the SPIA, and the RPIA under concentrated poisoning attacks and random poisoning attacks as shown in Figure 12 and Figure 13, respectively.

First, we examine how the multi-classification results of DL models with no defense, the SPIA, and the RPIA change under concentrated poisoning attacks by using their confusion matrices (see Figure 12). These confusion matrices visualize the evaluation results of the transfer-learned model M_t2 using the test dataset. Figure 12a shows the case of the DL model without any defense method. The red-boxed section indicates the effect of concentrated poisoning attacks. Thus, many data samples whose ground-truth class label is either 0 or 1 are misclassified into other class labels, revealing a low accuracy of approximately 53%. In contrast, Figure 12b,c demonstrate the defense effect of our SPIA and RPIA methods, with classification accuracies for classes 0 and 1 improving to 82.3% and 81.5%. These results highlight a substantial improvement of approximately 30 percentage points regarding the classification accuracy of the targeted classes since our proposed defense methods successfully mitigate the impact of concentrated poisoning attacks.

Next, we explain the case of random poisoning attacks (see Figure 13). Figure 13a shows the confusion matrix of the DL model without any defense method. Unlike the case of concentrated poisoning attacks, the misclassification results are randomly spread over all class labels, and the overall accuracy is 61.7%. In contrast, Figure 13b,c clearly demonstrate the effectiveness of our two defense methods (SPIA and RPIA) with overall accuracies improving to 80.3% and 79.7%.

4.3. Discussion

We explain the main contributions and achievements of our study as follows.

First, unlike previous studies defending targeted attacks and reliability attacks [13,35,36], we newly considered concentrated poisoning attacks and random poisoning attacks, which are categorized based on the distribution of contaminated data in the dataset. To address these attacks, we have devised and implemented two novel dynamic partition and inspection algorithms: the Sequential Partition and Inspection Algorithm (SPIA) and the Randomized Partition and Inspection Algorithm (RPIA).

Second, through experiments, we demonstrated that our two algorithms successfully detect and remove most of the poisoned data samples under both concentrated poisoning attacks and random poisoning attacks. Specifically, we conducted two kinds of experiments in the Python environment (Experiment 1) and the DL environment (Experiment 2) to validate and evaluate the defensive performance of our two proposed methods under concentrated poisoning attacks and random poisoning attacks). According to our experimental results, the SPIA completely removed all poisoned data under concentrated poisoning attacks in both Python and DL environments. In addition, the RPIA removed up to 91.1% and 99.1% of poisoned data under random poisoning attacks in Python and DL environments, respectively.

Third, our defense methods enhanced the classification accuracy of a DL model by effectively inspecting and removing poisoned samples from an incoming new dataset, and thus they improved the reliability and trustworthiness of the transfer learning procedure. Based on the performance evaluation results in Experiment 2 (DL model training), both the SPIA and the RPIA improved the accuracy of a DL model by 14.1% and 13.6% points, respectively, compared to a DL model without using defense methods.

Meanwhile, it is noteworthy to acknowledge and discuss the limitations of our study along with potential ideas and approaches to address them. We intend to address some of these limitations in our future studies, as detailed in Section 5. Each identified limitation can be formulated as a challenging research problem, offering valuable opportunities for exploration by the broader research community.

Lowering high false-positive rate: While our methods demonstrated effective defense capabilities by successfully removing most of the poisoned data, they also falsely identified a substantial portion of benign data as poisoned. This limitation could adversely impact the utility of new datasets, particularly in transfer learning applications. In general, there exists an inherent trade-off between the preservation of benign data (RBR) and the removal of poisoned data (RPR), making the simultaneous achievement of low RBR and high RPR a particularly challenging problem. A potential solution to address this issue involves integrating a clustering algorithm with a sanitization method. Specifically, poisoned samples could be grouped into distinct clusters, which are subsequently inspected and removed using a reliable sanitization technique.
Lowering high computation cost: In the random partition and inspection algorithm (RPIA), as the iteration k grows, the computation cost also increases; the time complexity of the RPIA is O(kn) where n is the number of partitions, while the time complexity of the SPIA is O(n). Since a larger k improves the possibility of capturing poisoned data samples in a randomly constructed sub-dataset, it is necessary to optimize the RPIA or devise a mechanism that reduces its computational cost. One potential solution to address the latter is to redesign the RPIA by leveraging the parallel processing approach, which could significantly mitigate the computational burden.
Finding optimal parameters and mechanisms: This study primarily evaluated the performance of our two algorithms using a fixed detection threshold. The detection threshold was carefully set to 0.2, as preliminary experiments—considering various thresholds such as 0.1 and 0.3 on small datasets—indicated that it provided the best defense performance. However, we did not extensively investigate the identification of optimal thresholds under diverse attack scenarios and conditions. Given the numerous possible combinations of attack scenarios and conditions, determining the optimal detection thresholds in such settings remains a complex and challenging task. Furthermore, exploring alternative partitioning strategies could be a valuable direction for future research. For instance, datasets could be partitioned into multiple heterogeneous sub-datasets of varying sizes or designed to allow overlapping data samples among partitions.
Defending against various sophisticated attacks and scenarios: This study focuses on two types of data poisoning attack models—concentrated poisoning attacks and random poisoning attacks—employing the dirty-label flipping approach. However, numerous other sophisticated data poisoning attack methods exist. Furthermore, it is expected that novel and increasingly complex attack models will continue to emerge, designed to circumvent the existing defense mechanisms known to adversaries. Consequently, it is essential to assess the performance and limitations of our proposed methods against such advanced attacks under diverse adversarial scenarios. Examples of these include clean-label attacks [42], dirty-label attacks incorporating complex patterns [46], and backdoor attacks leveraging adversarial examples [47].
Conducting various, comparative empirical studies: This study demonstrates the effectiveness of our proposed methods within an experimental framework utilizing the ResNet18 model and the CIFAR-10 dataset as the target deep learning model. However, numerous deep learning models and architectures exist, such as transformers and recurrent neural networks (RNNs), alongside diverse data types including time series, tabular, text, and video. To comprehensively evaluate the scalability and performance of our methods, it is necessary to experiment with significantly larger datasets. Furthermore, given the variety of existing defense approaches, conducting extensive comparative studies under fair and standardized conditions is essential. Such empirical investigations would not only enable a thorough evaluation of the defense models’ performance but also provide valuable insights to further optimize and advance their development.

5. Conclusions and Future Works

In this study, we proposed a novel dynamic dataset partitioning and inspection framework to defend against data poisoning attacks employing concentrated or random poisoning strategies in adversarial scenarios. These scenarios involve attackers maliciously modifying or injecting poisoned data samples into a new dataset for transfer learning purposes. Specifically, we designed and implemented two defense algorithms: the Sequential Partitioning and Inspection Algorithm (SPIA) and the Randomized Partitioning and Inspection Algorithm (RPIA). These algorithms effectively partition a new dataset into multiple sub-datasets and inspect them to remove poisoned sections. Through comprehensive experiments conducted in Python and deep learning (DL) environments, we demonstrated that the proposed methods effectively defend against two data poisoning attack models—concentrated poisoning attacks and random poisoning attacks—based on evaluation metrics such as removed poison rate (RPR), attack success rate (ASR), and classification accuracy (ACC). Notably, the SPIA successfully removed all poisoned data under concentrated poisoning attacks in both Python and DL environments. Similarly, the RPIA achieved removal rates of up to 91.1% and 99.1% for poisoned data under random poisoning attacks in Python and DL environments, respectively.

Our future research directions are as follows. First, we will further study finding an optimal partitioning method to maximize defense performance and minimize the false removal of benign data under various adversarial scenarios. Second, we will improve our proposed method to effectively defend against other types of sophisticated data poisoning attacks employing clean-label attacks or backdoor attacks. Last, we will investigate the integration of complementary defense approaches, such as data aggregation and data augmentation, with our proposed methods to enhance overall defense performance. Readers can find detailed insights and motivations for these future research directions in Section 4.3 of this paper. We believe this study provides a strong foundation for further advancements in defending against data poisoning attacks in transfer learning and related applications.

Author Contributions

Conceptualization, J.L. and Y.C.; methodology, Y.C.; software, J.L. and R.L.; validation, J.L. and Y.C.; formal analysis, J.L. and Y.C.; investigation, J.L., Y.C., R.L., S.Y., J.Y. and H.P.; resources, J.L.; data curation, J.L. and R.L.; writing—original draft preparation, J.L., Y.C. and R.L.; writing—review and editing, Y.C., S.Y., J.Y., H.P. and D.S.; visualization, J.L. and R.L.; supervision, Y.C. and D.S.; project administration, D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

An earlier version of this paper was presented and selected as one of the outstanding papers at the KSII Autumn Conference in October 2024 in the Republic of Korea.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
Andor, D.; Alberti, C.; Weiss, D.; Severyn, A.; Presta, A.; Ganchev, K.; Petrov, S.; Collins, M. Globally Normalized Transition-Based Neural Networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016—Long Papers, Berlin, Germany, 7–12 August 2016; Volume 4. [Google Scholar]
Bono, F.M.; Cinquemani, S.; Chatterton, S.; Pennacchi, P. A Deep Learning Approach for Fault Detection and RUL Estimation in Bearings. In Proceedings of the NDE 4.0, Predictive Maintenance, and Communication and Energy Systems in a Globally Networked World, Long Beach, CA, USA, 6 March–11 April 2022. [Google Scholar]
Wang, Y.; Mianjy, P.; Arora, R. Robust Learning for Data Poisoning Attacks. In Proceedings of the Machine Learning Research, Online, 18–24 July 2021; Volume 139. [Google Scholar]
Shayegani, E.; Al Mamun, A.; Fu, Y.; Zaree, P.; Dong, Y.; Abu-Ghazaleh, N. Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks. arXiv 2023, arXiv:2310.10844. [Google Scholar]
Jiang, W.; Li, H.; Liu, S.; Luo, X.; Lu, R. Poisoning and Evasion Attacks against Deep Learning Algorithms in Autonomous Vehicles. IEEE Trans. Veh. Technol. 2020, 69, 4439–4449. [Google Scholar] [CrossRef]
Ren, K.; Zheng, T.; Qin, Z.; Liu, X. Adversarial Attacks and Defenses in Deep Learning. Engineering 2020, 6, 346–360. [Google Scholar] [CrossRef]
Akhtar, N.; Mian, A. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. IEEE Access 2018, 6, 14410–14430. [Google Scholar] [CrossRef]
Qiu, S.; Liu, Q.; Zhou, S.; Wu, C. Review of Artificial Intelligence Adversarial Attack and Defense Technologies. Appl. Sci. 2019, 9, 909. [Google Scholar] [CrossRef]
Goldblum, M.; Tsipras, D.; Xie, C.; Chen, X.; Schwarzschild, A.; Song, D.; Madry, A.; Li, B.; Goldstein, T. Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1563–1580. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Fan, J.; Yan, Q.; Li, M.; Qu, G.; Xiao, Y. A Survey on Data Poisoning Attacks and Defenses. In Proceedings of the Proceedings—2022 7th IEEE International Conference on Data Science in Cyberspace, DSC 2022, Guilin, China, 11–13 July 2022. [Google Scholar]
Yuan, D.; Li, G.; Li, Q.; Zheng, Y. Sybil Defense in Crowdsourcing Platforms. In Proceedings of the International Conference on Information and Knowledge Management, Proceedings, Singapore, 6–10 November 2017. Volume Part F131841. [Google Scholar]
Miao, C.; Li, Q.; Su, L.; Huai, M.; Jiang, W.; Gao, J. Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing. In Proceedings of the Web Conference 2018—World Wide Web Conference, WWW 2018, Lyon, France, 23–27 April 2018. [Google Scholar]
Li, Y.; Gao, J.; Lee, P.P.C.; Su, L.; He, C.; He, C.; Yang, F.; Fan, W. A Weighted Crowdsourcing Approach for Network Quality Measurement in Cellular Data Networks. IEEE Trans. Mob. Comput. 2017, 16, 300–313. [Google Scholar] [CrossRef]
Levine, A.; Feizi, S. Deep partition aggregation: Provable defenses against general poisoning attacks. In Proceedings of the ICLR 2021—9th International Conference on Learning Representations, Online, 3–7 May 2021. [Google Scholar]
Li, Y.; Gao, J.; Meng, C.; Li, Q.; Su, L.; Zhao, B.; Fan, W.; Han, J. A Survey on Truth Discovery. ACM Sigkdd Explor. Newsl. 2016, 17, 1–16. [Google Scholar] [CrossRef]
Borgnia, E.; Cherepanova, V.; Fowl, L.; Ghiasi, A.; Geiping, J.; Goldblum, M.; Goldstein, T.; Gupta, A. Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks without an Accuracy Tradeoff. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, Toronto, ON, Canada, 6–11 June 2021; Volume 2021. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. MixUp: Beyond Empirical Risk Minimization. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018—Conference Track Proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Yun, S.; Han, D.; Chun, S.; Oh, S.J.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; Volume 2019. [Google Scholar]
Park, H.; Cho, Y. A Dilution-Based Defense Method against Poisoning Attacks on Deep Learning Systems. Int. J. Electr. Comput. Eng. 2024, 14, 645–652. [Google Scholar] [CrossRef]
Koh, P.W.; Steinhardt, J.; Liang, P. Stronger Data Poisoning Attacks Break Data Sanitization Defenses. Mach. Learn. 2022, 111, 1–47. [Google Scholar] [CrossRef]
Steinhardt, J.; Koh, P.W.; Liang, P. Certified Defenses for Data Poisoning Attacks. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 2017. [Google Scholar]
Li, M.; Lian, Y.; Zhu, J.; Lin, J.; Wan, J.; Sun, Y. A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation Systems. Mathematics 2024, 12, 247. [Google Scholar] [CrossRef]
Kearns, M.; Li, M. Learning in the Presence of Malicious Errors. SIAM J. Comput. 1993, 22, 807–837. [Google Scholar] [CrossRef]
Yang, Y.; Liu, T.Y.; Mirzasoleiman, B. Not All Poisons Are Created Equal: Robust Training against Data Poisoning. In Proceedings of the Machine Learning Research, Baltimore, MD, USA, 17–23 July 2022; Volume 162. [Google Scholar]
Poudel, S. Improving Collaborative Filtering Recommendation Systems via Optimal Sub-Sampling and Aspect-Based Interpretability. Ph.D. Thesis, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, 2022. [Google Scholar]
Barreno, M.; Nelson, B.; Joseph, A.D.; Tygar, J.D. The Security of Machine Learning. Mach. Learn. 2010, 81, 121–148. [Google Scholar] [CrossRef]
Chan, P.P.K.; He, Z.; Hu, X.; Tsang, E.C.C.; Yeung, D.S.; Ng, W.W.Y. Causative Label Flip Attack Detection with Data Complexity Measures. Int. J. Mach. Learn. Cybern. 2021, 12, 103–116. [Google Scholar] [CrossRef]
Chan, P.P.K.; He, Z.M.; Li, H.; Hsu, C.C. Data Sanitization against Adversarial Label Contamination Based on Data Complexity. Int. J. Mach. Learn. Cybern. 2018, 9, 1039–1052. [Google Scholar] [CrossRef]
Ho, S.; Reddy, A.; Venkatesan, S.; Izmailov, R.; Chadha, R.; Oprea, A. Data Sanitization Approach to Mitigate Clean-Label Attacks Against Malware Detection Systems. In Proceedings of the Proceedings—IEEE Military Communications Conference MILCOM, Rockville, MD, USA, 28 November–2 December 2022; Volume 2022. [Google Scholar]
Seetharaman, S.; Malaviya, S.; Vasu, R.; Shukla, M.; Lodha, S. Influence Based Defense Against Data Poisoning Attacks in Online Learning. In Proceedings of the 2022 14th International Conference on Communication Systems and Networks, COMSNETS 2022, Bangalore, India, 4–8 January 2022. [Google Scholar]
Biggio, B.; Nelson, B.; Laskov, P. Poisoning Attacks against Support Vector Machines. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, 26 June–1 July 2012. [Google Scholar]
Barreno, M.; Nelson, B.; Sears, R.; Joseph, A.D.; Tygar, J.D. Can Machine Learning Be Secure? In Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, ASIACCS ’06, Taipei, Taiwan, 21–24 March 2006; Volume 2006. [Google Scholar]
Jagielski, M.; Oprea, A.; Biggio, B.; Liu, C.; Nita-Rotaru, C.; Li, B. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. In Proceedings of the Proceedings—IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 20–24 May 2018; Volume 2018. [Google Scholar]
Vorobeychik, Y.; Kantarcioglu, M. Adversarial Machine Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning; Springer: Cham, Switzerland, 2018; Volume 12. [Google Scholar] [CrossRef]
Tian, Z.; Cui, L.; Liang, J.; Yu, S. A Comprehensive Survey on Poisoning Attacks and Countermeasures in Machine Learning. ACM Comput. Surv. 2022, 55, 1–35. [Google Scholar] [CrossRef]
Liang, H.; He, E.; Zhao, Y.; Jia, Z.; Li, H. Adversarial Attack and Defense: A Survey. Electronics 2022, 11, 1283. [Google Scholar] [CrossRef]
Demontis, A.; Melis, M.; Pintor, M.; Jagielski, M.; Biggio, B.; Oprea, A.; Nita-Rotaru, C.; Roli, F. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. In Proceedings of the 28th USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August 2019. [Google Scholar]
Yerlikaya, F.A.; Bahtiyar, Ş. Data Poisoning Attacks against Machine Learning Algorithms. Expert Syst. Appl. 2022, 208, 118101. [Google Scholar] [CrossRef]
Shafahi, A.; Ronny Huang, W.; Najibi, M.; Suciu, O.; Studer, C.; Dumitras, T.; Goldstein, T. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 2018. [Google Scholar]
Wang, S.; Nepal, S.; Rudolph, C.; Grobler, M.; Chen, S.; Chen, T. Backdoor Attacks Against Transfer Learning with Pre-Trained Deep Learning Models. IEEE Trans. Serv. Comput. 2022, 15, 1526–1539. [Google Scholar] [CrossRef]
Cho, Y. Intelligent On-off Web Defacement Attacks and Random Monitoring-Based Detection Algorithms. Electronics 2019, 8, 1338. [Google Scholar] [CrossRef]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Science Department, University of Toronto: Toronto, ON, Canada, 2009; Technical Report. [Google Scholar]
Paudice, A.; Muñoz-González, L.; Lupu, E.C. Label Sanitization Against Label Flipping Poisoning Attacks. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Dublin, Ireland, 10–14 September 2018; Springer: Cham, Switzerland, 2019; Volume 11329 LNAI. [Google Scholar]
Li, Y.; Lyu, X.; Koren, N.; Lyu, L.; Li, B.; Ma, X. Anti-Backdoor Learning: Training Clean Models on Poisoned Data. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Volume 18. [Google Scholar]

Figure 1. Classification of adversarial attacks based on attack timing.

Figure 2. Proposed defense method based on dataset partition and inspection.

Figure 3. No partition vs. 2-way partition.

Figure 4. Comparison of 2-way partition and 4-way partition.

Figure 5. An example of the sequential partition and inspection algorithm (SPIA).

Figure 6. An example of random partition and inspection algorithm (RPIA).

Figure 7. Graphical representation of evaluation results for concentrated poisoning attacks (Python simulation).

Figure 8. Graphical representation of evaluation results for random poisoning attacks (Python simulation).

Figure 9. Example images from each class in the CIFAR-10 dataset.

Figure 10. Graphical representation of evaluation results for concentrated poisoning attacks (DL training).

Figure 11. Graphical representation of evaluation results for random poisoning attacks (DL training).

Figure 12. This figure illustrates the performance of a transfer-learned model using various defense methods under concentrated poisoning attacks. The accuracy of the pre-trained model (Mt1) is 81.34%. When 20% poisoned data are added to Dt, the following transfer results are observed: (a) No sanitization: Without any defense method, achieves an accuracy of 61.7% under a 0% RPR. (b) SPIA defense: Applying the SPIA defense method with n = 100 reduces the impact of poisoning and improves the accuracy to 81.2%. (c) RPIA defense: Using the RPIA defense method with n = 8000 achieves an accuracy of 80.7%, effectively mitigating the poisoning attack with an RPR of 98.7%. The regions highlighted in red show the attack causes a significant number of misclassifications, particularly affecting specific classes (Class 0 and Class 1).

Figure 13. This figure illustrates the performance of a transfer-learned model using various defense methods under random poisoning attacks, while maintaining the experimental conditions consistent with those described in Figure 7: (a) No sanitization: Without any defense method, an accuracy of 61.7% with a 0% RPR. (b) SPIA defense: Applying the SPIA defense method with n = 8000 brings the model’s accuracy to 81.2% and an RPR of 97.6%. (c) RPIA defense: Using the RPIA defense method with n = 4000 achieves an accuracy of 80.7% and an RPR of 98.8%.

Table 1. The three dimensions of attacks on machine learning [37].

Attack Characteristic	Attack Type
Attack Timing	Decision time (evasion attack) vs. training time (poisoning attack)
Attacker Information	White-box attacks vs. black-box attacks
Attack Goals	Targeted attacks vs. reliability attacks

Table 2. Evaluation results of the proposed methods under concentrated poisoning attacks (Python simulation).

Method	Metric	No Defense (n = 1)	n
Method	Metric	No Defense (n = 1)	2	4	8	10	50	100	200	500	1000	2000	4000	8000	10,000
SPIA	ASR(%)	100	0	0	0	0	0	0	0	0	0	0.5	8.2	3.4	18.7
	RPR(%)	0	100	100	100	100	100	100	100	100	100	99.5	91.8	96.6	81.3
	RBR(%)	0	37.5	6.3	6.3	0	0	0	0	1.3	9.3	23.1	22.1	22.5	18.7
RPIA	ASR(%)	100	100	100	100	100	61.8	32.6	19.2	10.8	9.2	10.4	13.2	16.1	18.7
	RPR(%)	0	0	0	0	0	38.2	67.4	80.8	89.2	90.8	89.6	86.8	83.9	81.3
	RBR(%)	0	0	0	0	0	30.4	50.3	60.7	62.6	60.2	55.9	40.5	28.5	18.7

Table 3. Evaluation results of the proposed methods under random poisoning attacks (Python simulation).

Method	Metric	No Defense (n = 1)	n
Method	Metric	No Defense (n = 1)	2	4	8	10	50	100	200	500	1000	2000	4000	8000	10,000
SPIA	ASR	100	100	100	100	100	97.8	91.9	86.5	65.5	46.6	28.2	39.2	16.9	18.7
	RPR	0	0	0	0	0	2.2	8.1	13.5	34.5	53.4	71.8	60.8	83.1	81.3
	RBR	0	0	0	0	0	2	6.7	9.8	24.9	35.4	44.9	28.2	29.3	18.7
RPIA	ASR	100	100	100	100	100	61.1	32.2	17.9	11	8.9	11	13.2	16.5	18.7
	RPR	0	0	0	0	0	38.9	67.8	82.1	89	91.1	89	86.8	83.5	81.3
	RBR	0	0	0	0	0	31.9	51.5	61.9	61.9	59	54.7	40.8	29.3	18.7

Table 4. Evaluation results of the proposed methods under concentrated poisoning attacks (DL training).

Method	Metric	No Defense (n = 1)	n
Method	Metric	No Defense (n = 1)	2	4	8	10	50	100	200	500	1000	2000	4000	8000	10,000
SPIA	ACC(%)	67.1	80.7	80.8	80.8	80.7	81.1	81.2	80.7	80.8	80.4	80.8	80.7	80.8	80.5
	ASR(%)	100	0	0	0	0	0	0	0	0	0	0	0	0	1
	RPR(%)	0	100	100	100	100	100	100	100	100	100	100	100	100	99
	RBR(%)	0	37.5	6.3	6.3	12.5	10	7.5	8.1	12	17.5	28.1	22.8	25.3	20.4
RPIA	ACC(%)	67.1	69.1	66.8	68.2	67.3	76.3	71.4	78.9	79.8	80.3	79.6	80.4	80.7	79.8
	ASR(%)	100	100	100	73.6	69.5	19.4	8.2	3.2	1.1	0.6	0.8	0.9	1.3	1.5
	RPR(%)	0	0	0	26.4	30.5	80.6	91.8	96.8	98.9	99.4	99.2	99.1	98.7	98.5
	RBR(%)	0	0	0	22.7	26.2	68.1	74.8	76	71.5	67.4	63.3	44.2	31.6	20.3

Table 5. Evaluation results of the proposed methods under random poisoning attacks (DL training).

Method	Metric	No Defense (n = 1)	n
Method	Metric	No Defense (n = 1)	2	4	8	10	50	100	200	500	1000	2000	4000	8000	10,000
SPIA	ACC(%)	67.1	66.8	67.6	67.4	66.2	63.4	64.9	67.6	69	68.3	75.2	75	80.3	80.2
	ASR(%)	100	100	100	100	100	90.6	82.2	68.5	49.1	36.2	20.2	28.3	2.4	2.8
	RPR(%)	0	0	0	0	0	9.4	17.8	31.5	50.9	63.8	79.8	71.7	97.6	97.2
	RBR(%)	0	0	0	0	0	7.7	14.3	22.7	36.5	42.9	50	29.9	29.9	18.4
RPIA	ACC(%)	67.1	68.4	67.6	69.6	69.5	70.5	76.1	76.8	79.5	79.1	76.6	79.9	79.9	79.8
	ASR(%)	100	100	100	100	100	35.9	17.2	7.1	1.6	1.2	0.9	1.2	2.4	2.8
	RPR(%)	0	0	0	0	0	64.1	82.8	92.9	98.4	98.8	99.1	98.8	97.6	97.2
	RBR(%)	0	0	0	0	0	51.1	63.6	68.5	67.4	62.1	59.5	41.9	29.7	18.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, J.; Cho, Y.; Lee, R.; Yuk, S.; Youn, J.; Park, H.; Shin, D. A Novel Data Sanitization Method Based on Dynamic Dataset Partition and Inspection Against Data Poisoning Attacks. Electronics 2025, 14, 374. https://doi.org/10.3390/electronics14020374

AMA Style

Lee J, Cho Y, Lee R, Yuk S, Youn J, Park H, Shin D. A Novel Data Sanitization Method Based on Dynamic Dataset Partition and Inspection Against Data Poisoning Attacks. Electronics. 2025; 14(2):374. https://doi.org/10.3390/electronics14020374

Chicago/Turabian Style

Lee, Jaehyun, Youngho Cho, Ryungeon Lee, Simon Yuk, Jaepil Youn, Hansol Park, and Dongkyoo Shin. 2025. "A Novel Data Sanitization Method Based on Dynamic Dataset Partition and Inspection Against Data Poisoning Attacks" Electronics 14, no. 2: 374. https://doi.org/10.3390/electronics14020374

APA Style

Lee, J., Cho, Y., Lee, R., Yuk, S., Youn, J., Park, H., & Shin, D. (2025). A Novel Data Sanitization Method Based on Dynamic Dataset Partition and Inspection Against Data Poisoning Attacks. Electronics, 14(2), 374. https://doi.org/10.3390/electronics14020374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Data Sanitization Method Based on Dynamic Dataset Partition and Inspection Against Data Poisoning Attacks

Abstract

1. Introduction

2. Background and Related Works

2.1. Adversarial Attacks

2.2. Poisoning Attacks on Deep Learning Models

2.3. Defense Approach Against Poisoning Attacks

2.4. Existing Methods Using Data Sanitization Approach

3. Proposed Defense Method

3.1. Basic Idea and Working Steps

3.2. Design

3.2.1. Motivation of n-Way Partition and Inspection

3.2.2. Two Partition and Inspection Algorithms: SPIA and RPIA

4. Experiments

4.1. Experiment 1: Performance Evaluation Using Python Simulation

4.1.1. Experimental Setup and Procedure

4.1.2. Results and Analysis

4.2. Experiment 2: Performance Evaluation Using DL Model Training

4.2.1. Setup and Procedure

4.2.2. Experimental Results

4.3. Discussion

5. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI