5.1. Security and Privacy Analysis
- (1)
Privacy Preservation for Local Model
Users encrypt their local models using the encryption keys of their respective groups to obtain encrypted models, which are then uploaded to the blockchain. The service provider downloads the encrypted models from the blockchain for potential violator detection and model aggregation. Throughout this process, all other members of the blockchain have access to these encrypted models. Therefore, ensuring that users’ encrypted models do not leak sensitive information after being obtained by the service provider and other users becomes the focal point of our research.
We employ secure multi-party computation based on Shamir’s secret sharing [
29] to negotiate keys under the assumption of honesty and curiosity. In a group comprising three users, let us assume user
holds a secret
. The secret sharing scheme divides the secret
into three shares
, where three shares are required to reconstruct the secret
. Throughout the negotiation process, user
only knows the secret
they hold, one share of the secret sent by each of the other two users
, and the negotiation result
. They cannot deduce the secrets
held by the other two users based on the secure multi-party computation process. Therefore, under the assumption of honesty and curiosity, users cannot infer the secrets held by other users through the secure multi-party computation process.
In the secure multi-party computation process, the secret is a random number generated by user , denoted as . If other users cannot know the secret , then the key also remains unknown. Therefore, the process of encrypting local models is secure in terms of key confidentiality.
- (2)
Potential Violator Detection
The encryption of local models by users not only ensures the protection of their own privacy, guards against malicious attackers attempting to infer sensitive information and compromise privacy interests, but also introduces a challenge. Specifically, it deprives the direct ability of the service provider to conduct quality assessments on the local models provided by users. This opens up more covert channels for violators to engage in malicious behavior, such as data poisoning and free-riding attacks.
To address this issue, we leverage the non-repudiation property of blockchain to facilitate the detection of user behavior. By recording the outputs of each user on the blockchain and synchronizing this information across the entire peer-to-peer (P2P) network, we enable the service provider to indirectly assess the quality of user’s local models. This is achieved through multiple rounds of group aggregation of user outputs recorded in the blockchain. Such an approach allows the service provider to evaluate the presence of data poisoning attacks and free-riding attacks, thereby implementing a system for the detection of potential violators.
5.2. Experiment
In this section, we will conduct experimental evaluations on the blockchain-based fine-grained potential violator detection algorithm to demonstrate its effectiveness and performance in a blockchain network. This entails ensuring that (i) the algorithm can execute within the constraints of limited computing resources, and (ii) it can effectively detect potential violators.
Experimental setup: We employed a total of eight CPU servers, each equipped with a Hygon C86 7159 16-core processor, which is produced by Hygon Information Technology Co., Ltd. in Beijing, China, and two GPU servers; each GPU server was outfitted with eight NVIDIA TESLA T4 GPUs, which are produced by NVIDIA in City of Santa Clara, CA, USA, and 16 GB of memory. These servers were utilized to conduct training for data owners.
Dataset: We conducted our tests using the MNIST and CIFAR-10 datasets. The MNIST dataset consists of binary images of handwritten digits and was curated by the National Institute of Standards and Technology in the United States. MNIST is widely employed in machine learning for training and testing purposes and serves as a benchmark in the field. On the other hand, CIFAR-10 is a color image dataset and represents objects more akin to everyday entities. In contrast to MNIST, CIFAR-10 presents real-world objects because it introduces significant noise and variations in object proportions and features. This realistic complexity poses substantial challenges for recognition tasks and makes CIFAR-10 a valuable resource for assessing model performance under more demanding conditions.
Q1: Can our approach accurately detect data poisoning and free-riding attacks? Furthermore, can it pinpoint the identity of malicious users?
To validate the feasibility of our potential violator detection algorithm, we employ a Convolutional Neural Network (CNN) model for image classification tasks on the MNIST and CIFAR-10 datasets, subsequently assessing its accuracy. Specifically, the CNN architecture for the MNIST dataset comprised two convolutional layers, two pooling layers, and two fully connected layers. On the CIFAR-10 dataset, the CNN model featured eight convolutional layers, four pooling layers, and three fully connected layers. The aggregation algorithm for collaborative training is FedAvg, with a learning rate of 0.01 for each round. The batch size is set as B = 64, and the maximum communication round (epochs) is set as E = 50. In this experiment, potential violators will not be excluded from the collaborative training process.
In each learning round, nine client machines perform local training, and the resulting models are aggregated by the service provider. We evaluated the collaborative training accuracy in the absence of malicious users, involving the accuracies of global and each group models in the MNIST and CIFAR-10 datasets. As shown in
Figure 3, the results indicate that, under sufficient training, the accuracy of the group models closely approaches that of the global model as baseline, where no malicious user is present. After 50 iterations, the global model achieved an accuracy of 98.63% on the MNIST dataset and 80.93% on the CIFAR-10 dataset.
In our simulated experiments on data poisoning attacks, we randomly designate a device as the poison attacker (PAC). This chosen device uploads encrypted local models containing negative gradient information acquired during the training process, simulating a data poisoning attack. We assess the impact of poisoning attacks through group aggregation detection, recording the influence of PAC attacks on the accuracy of global model and group models.
As shown in
Figure 4a,b, the simulated data poisoning actions by the poison attacker (PAC) exhibit fluctuations in the accuracy of the global model on both the MNIST and CIFAR-10 datasets. These fluctuations hinder convergence and result in a significantly lower global model accuracy compared to the baseline accuracy. The impact of PAC is further evident in the aggregation model accuracies of the two groups influenced by PAC. These group models not only have lower accuracies compared to the global model but also show no signs of convergence. This implies that PAC receives lower scores in our user detection and is identified as a potential violator, leading to its exclusion from the collaborative training process. These results substantiate the effectiveness of our user detection algorithm in resisting users’ data poisoning attack and accurately pinpointing attacker within the collaborative training framework.
In our simulated experiments on free-riding attacks, we randomly designate a device as the free-riding attacker (FAC). This device does not actively participate in regular local training and only uploads zero-gradient parameters to blockchain in each iteration to simulate a free-riding attack. Group aggregation detection is employed to assess free-riding behavior, and we record the impact of FAC’s on the accuracies of global model and group models.
As shown in
Figure 4c,d, in the MNIST dataset, the accuracies of the two groups influenced by FAC are lower than those of the global model and other groups during the early iterations with significant accuracy fluctuations. Although the accuracies of the FAC-influenced groups gradually increase and approach the accuracy of the global model in later iterations, the stability of the model accuracy is comparatively poorer and the magnitude of accuracy fluctuations in these groups is greater than that observed in other groups. In the CIFAR-10 dataset, during the early iterations, the model accuracies of the FAC-influenced groups are not significantly different from those of the global model and other groups. However, with continued iterations, it becomes evident that the accuracies of these two groups are consistently lower than those of the global model and other groups. Free-riding attacks initiated by malicious users may converge with increasing iteration counts, indicating that detecting free-riding attacks is more challenging compared to data poisoning attacks. Nevertheless, through extended observations, abnormal behaviors and the underlying FAC can still be identified. This result demonstrates the effectiveness of our user detection algorithm in resisting user’s free-riding attacks and accurately pinpointing attackers.
To validate the scalability of our proposed method and examine its ability to identify potential violators in the presence of multiple malicious users, we conducted an assessment with nine participants and without excluding potential violators. We performed four basic groupings per round with epoch = 50 and analyzed the changes in user scores and confidence intervals, as depicted in
Figure 5.
In
Figure 5, we observe that as the number of malicious users increases, the difference in scores between general users and poisoning attackers, as well as free-riding attackers, decreases, and the overlapping area of the confidence intervals for scores becomes larger. However, even with the presence of three poisoning attackers and three free-riding attackers, i.e., when two-thirds of the participants are malicious users, the scores and confidence intervals for general users remain significantly higher than those for malicious users. This indicates that in the collaborative training process, malicious users can be identified as potential violators and excluded from the collaborative training process, demonstrating good scalability.
Q2: Is our proposed method resistant to malicious attacks?
To validate the impact of our user detection algorithm on the accuracy of collaborative training, we compared the accuracy variations among three categories of collaborative training scenarios: collaborative training with benign users (no malicious behavior), collaborative training with malicious attacks and no user detection, and collaborative training with malicious attacks and user detection.
From
Figure 6, it is evident that on the MNIST dataset, the user detection algorithm enhances the stability of collaborative training processes that include malicious users engaging in poisoning attacks. This improvement allows the global model to reach a more stable state with a higher accuracy. For collaborative training involving free-riding users, the user detection algorithm also contributes to an increased model accuracy, aligning it closely with the accuracy of collaborative training with malicious users. On the CIFAR-10 dataset, the use of our user detection algorithm provides the accuracy and convergence speed of the global model in collaborative training scenarios with malicious users. This indicates that our proposed user detection algorithm is effective in identifying potential violators during the collaborative training process, enhancing both the convergence speed and final model accuracy of the entire collaborative training process.
Q3: Does our proposed approach effectively improve performance under ciphertext potential violator detection?
In our user detection algorithm, secure multi-party computation based on Shamir’s secret sharing and multiple group localization plays a crucial role. In this section, we assess the feasibility of secure multi-party computation based on Shamir’s secret sharing and multiple rounds of grouping localization by examining the overhead of encryption and decryption, as well as the communication overhead for users under differential numbers of groups. We employ the BCP algorithm for encrypting and decrypting model parameters [
6] as a baseline.
To validate the effectiveness of secure multi-party computation based on Shamir’s secret sharing, we evaluated its overhead. Since the model decryption process aligns with the normal model aggregation process, we have omitted the decryption process and only calculated the overhead incurred by key negotiation and computation during encryption, along with the necessary communication.
The overhead is evaluated based on the number of encryption and decryption iterations. As shown in
Table 1, the experimental results demonstrate that the BCP algorithm requires 96.35 s for encryption and 175.19 s for decryption in 20 iterations. In contrast, our proposed method incurs no additional overhead during the decryption process, and the overhead for 20 encryption iterations is only 5.06 s. This is significantly lower than the overhead associated with the BCP algorithm.
This is attributed to the fact that our method only has complex computations in the secret key negotiation phase based on Shamir’s secret sharing. The computation in the model encryption phase is straightforward and results in an overall lower computational load. Additionally, the user detection algorithm constrains the scale of key negotiation within each group. This contributes to a lower overhead of communication during the secret key negotiation process. As a result, our proposed method incurs a significantly lower overhead compared to the BCP algorithm.
To validate the effectiveness of multiple rounds of grouping localization, we evaluated its overhead of communication by calculating the overhead of communication incurred through uploading group models via the blockchain and necessary communication during the encryption process.
The overhead of communication was assessed based on the number of groups. Experimental results, as shown in
Table 2, indicate that the encryption algorithm based on BCP requires the transmission of 170.23 MB of data during 20 times of group model uploads, whereas our proposed method only requires 41.94 MB, which is lower than the overhead of communication of the BCP. This is because our proposed grouping encryption method does not expand the messages during the encryption process, resulting in the same size for both plaintext and ciphertext. However, in the BCP algorithm, only one group model needs to be uploaded per round of training, whereas our proposed method requires uploading multiple groups. Therefore, in practice, if there are too many groups in one training round, it can lead to significant overhead of communication.
Furthermore, our proposed method requires only one blockchain compared to the BCP. All content uploaded to the blockchain is encrypted, and no key-related content is uploaded, thus avoiding the risk of a single node simultaneously being part of two blockchains, where it could potentially access both keys and ciphertexts and decrypt them to obtain the original local models if the entire blockchain content is readable by blockchain nodes.