LFL-COBC: Lightweight Federated Learning on Blockchain-Based Device Contribution Allocation †
Abstract
:1. Introduction
- This paper presents a privacy protection scheme for the industrial environment that integrates the blockchain and federated learning (LFL-COBC), mitigating the aggregation of local models by the central server through pruning the models trained by node devices.
- In order to enhance data privacy and alleviate the burden of the blockchain in the industrial Internet environment, a lightweight federated learning model for device anomaly detection is proposed. We make global and local adjustments to adapt to the local data set. A global model trained using such imbalanced data is challenging to generalize well on the data of each customer.
- We further update the blockchain incentive mechanism to achieve distributed multi-party data sharing, reduce the disparity between local and global, and we verify data calculations, thereby enhancing the fairness and effectiveness of each node’s equipment operation.
2. Paper Organization
3. Lightweight Federated Learning on Blockchain-Based Device Contribution Allocation
3.1. System Model
3.2. The Constituents of the Overall Architecture
- Local devices: The local industrial equipment trains the model based on the collected data and transmits the trained model to the edge cloud.
- Edge cloud: The edge cloud is capable of uploading the model from the local device to the blockchain, selecting the committee through an incentive consensus mechanism, specifying the highest score for summarizing the local device’s model, and updating the global model.
- Blockchain: The blockchain is a trusted database that establishes a secure connection between terminal computing devices through its encrypted records. It can use digital encryption and timestamping methods to realize peer-to-peer trustless transactions [19].
Algorithm 1 Proposed blockchain-based lightweight federated learning approach |
|
- Node registration. All computing devices need to be registered on the blockchain before joining the network. This can be achieved by a decentralized registration mechanism that uses smart contracts to verify the identity of devices and issue unique identifiers.
- The excitation of the node. With the help of the PODQ incentive policy, each node device is scored regularly, and then the members of the committee are elected according to the scores of each node device. According to the scores of nodes, rewards are automatically allocated through smart contracts to encourage nodes to actively participate in network maintenance and data processing.
- Block generation and verification. Each node device mines on the blockchain to find available blocks, and the node device mines blocks through the PODQ mechanism. After block generation, it needs to be verified by other nodes in the network. Byzantine Fault Tolerance (BFT) can be implemented to ensure that more than 2/3 of the nodes reach a consensus on a new block. Once a block is verified enough, it will be added to the blockchain, ensuring the immutability and security of the data.
- Model upload. After scoring each node device based on its mining ability, the quality of its local owned data, and the efficiency of its work, the node device uses its local data to train the model or participates in the federated learning process to cooperate with other nodes to train a shared model. Each node can choose to upload its own model or aggregate the models of other nodes based on its score and computing power. The uploaded model needs to be verified by the network to ensure the validity and security of the model. The model verification process can be implemented through smart contracts. The model that passes the verification will be added to the blockchain for use by other nodes or for further training.
3.3. Attack Model
- Data tampering: An attacker could manipulate certain samples or features within the data set or might inject counterfeit data samples into the federated learning model to disrupt the training and prediction process of the model.
- Malicious model update: An attacker may modify model parameters or updates to produce erroneous model output results. They can do this by injecting malicious code into the model updates.
- Collusion attack: The attacker may collude with other participants to increase the success rate of the attack by jointly tampering or falsifying the data. This type of attack may make affect the results of federated learning more.
4. Blockchain-Enabled Federated Learning Approach for Industrial Equipment Inspection
4.1. Data Structure Design for Blockchain
4.2. Incentive: Certificate of Training Quality and Data Management (PODQ)
4.3. Federated Learning
5. Real-Time State Anomaly Detection of Industrial Equipment
5.1. Industrial Equipment Anomaly Detection
5.2. Lightweight Model Pruning
- Drop-Out: On the left of Figure 5, the output of the neuron is randomly set to zero.
- Drop-Connect: In the process of training the neural network model, the right figure of Figure 5, the solid line is the input neuron under the retention, and the dotted line with red x is the redundant input neuron. Instead of randomly changing the output of the hidden layer node to 0, it changes the weight of each input connected to it in the node to 0, and the weight of the input neuron under the retention to 1.
- Pre-training. The network model is first trained to construct a baseline model for the pruning algorithm to obtain the original model trained on the specific underlying task.
- Pruning. The magnitude of the weight values is ranked, and connections below a preset pruning threshold or ratio are removed to obtain the pruned network.
- Fine-tuning. Fine-tune the pruned network to recover lost performance, and then continue with the second step, alternating in this order until the termination condition is satisfied, e.g., the accuracy drops within a certain range.
Algorithm 2 Model pruning algorithm |
Input: Trained model , pruning threshold Output: Pruned model
|
6. Experiment and Evaluation
6.1. Data Set
- HDFS [26]: HDFS (Hadoop Distributed File System) is a distributed file system running on general-purpose hardware to support the Hadoop ecosystem, a big data processing framework. There are three groups of HDFS logs: HDFS v1, HDFS v2, and HDFS v3. HDFS v1 was generated using a benchmark workload in a 203-node HDFS and manually marked via manual rules to identify anomalies. In addition, HDFS v1 provides specific anomaly-type information while also allowing us to study duplicate problem identification. HDFS v2 is collected by aggregating logs from an HDFS cluster in our laboratory environment, which includes a name node and 32 data nodes HDFS v3 is an open data set from trace-oriented monitoring, which was collected by detecting HDFS systems using MTracer 2.1 in a real IaaS environment.
- BGL [26]: BGL is an open log data set gathered by the BlueGene/L supercomputer system, which possesses 131,072 processors and 32,768 GB of memory. It is located at Lawrence Livermore National Laboratory (LLNL), Livermore, California [27]. The log encompasses both alert and non-alert messages, identified through alert category labels. In the first column of the log, a `-’ indicates non-alert messages, whereas others signify alert messages. The label information is conducive to studies on alert detection and prediction.
6.2. Security Analysis
- Data tampering: Malicious participants may attempt to obtain sensitive data of other participants and violate the privacy rights of users. In this paper, we adopt federated learning techniques, which can perform calculations without exposing data, having the ability to protect data privacy and ensure the accuracy of experimental results.
- Malicious model update: Attackers may attempt to perform attacks on communication channels to steal data or tamper with fake data transmissions. In this experiment, we use blockchain technology. Moreover, the data uploaded to the blockchain can only be stored in the blockchain when more than 2/3 of the clients pass the verification test, which ensures the integrity and security of the data in the transmission process.
- Collusion attack: There may be malicious participants in blockchain-federated learning experiments who try to corrupt the results of the experiment or obtain sensitive information of other participants. However, we use an incentive mechanism based on PODQ for the election of the committee to ensure that only legitimate participants with the highest rewards can participate in the experiment and ensure the reliability of the experiment.
6.3. The Performance of the Proposed Scheme
6.3.1. Comparison and Evaluation of Incentive Mechanisms
6.3.2. Model Pruning Evaluation
- Comparison of aggregation times before and after model pruning. We compared the changes in the aggregation time of the models before and after pruning. As can be seen in Figure 9, we found that the aggregation time of the models after pruning is much shorter. At the same time, the size of the models after pruning was relatively reduced by 87%, which improves the efficiency and resource utilization of the overall system architecture to a certain extent.
- Comparison of node device work efficiency before and after model pruning. In order to investigate the effect of model pruning on the model training results of node devices, we compared the model training results of the virtual machines before and after model pruning, as shown in Figure 10. Model lightweighting had almost no effect on the level of model accuracy loss; however, the loss rate of the node devices was lower after model pruning. These data results suggest that the original model is over-parameterized and lightweight model operations are essential.
7. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, Y.; Poskitt, C.M.; Sun, J. Learning from Mutants: Using Code Mutation to Learn and Monitor Invariants of a Cyber-Physical System. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–23 May 2018; pp. 648–660. [Google Scholar]
- Yeom, S.; Giacomelli, I.; Fredrikson, M.; Jha, S. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting. In Proceedings of the 2018 IEEE 31st Computer Security Foundations Symposium (CSF), Oxford, UK, 9–12 July 2018; pp. 268–282. [Google Scholar] [CrossRef]
- Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
- Qu, X.; Wang, S.; Hu, Q.; Cheng, X. Proof of federated learning: A novel energy-recycling consensus algorithm. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2074–2085. [Google Scholar] [CrossRef]
- Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A Survey of Model Compression and Acceleration for Deep Neural Networks. arXiv 2017, arXiv:1710.09282. [Google Scholar] [CrossRef]
- Guo, Y.; Wu, Y.; Zhu, Y.; Yang, B.; Han, C. Anomaly Detection using Distributed Log Data: A Lightweight Federated Learning Approach. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Virtual, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl. Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
- Luo, J.; Wu, J. An Entropy-based Pruning Method for CNN Compression. arXiv 2017, arXiv:1706.05791. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Lu, Y.; Huang, X.; Dai, Y.; Maharjan, S.; Zhang, Y. Blockchain and federated learning for privacy-preserved data sharing in industrial IoT. IEEE Trans. Ind. Inform. 2019, 16, 4177–4186. [Google Scholar] [CrossRef]
- Singh, S.K.; Yang, L.T.; Park, J.H. FusionFedBlock: Fusion of blockchain and federated learning to preserve privacy in industry 5.0. Inf. Fusion 2023, 90, 233–240. [Google Scholar] [CrossRef]
- Li, Y.; Chen, C.; Liu, N.; Huang, H.; Zheng, Z.; Yan, Q. A Blockchain-Based Decentralized Federated Learning Framework with Committee Consensus. IEEE Netw. 2021, 35, 234–241. [Google Scholar] [CrossRef]
- Ali, S.; Li, Q.; Yousafzai, A. Blockchain and federated learning-based intrusion detection approaches for edge-enabled industrial IoT networks: A survey. Ad Hoc Netw. 2024, 152, 103320. [Google Scholar] [CrossRef]
- Kim, H.; Park, J.; Bennis, M.; Kim, S.L. Blockchained On-Device Federated Learning. IEEE Commun. Lett. 2020, 24, 1279–1283. [Google Scholar] [CrossRef]
- He, Y.; Li, H.; Cheng, X.; Liu, Y.; Yang, C.; Sun, L. A blockchain based truthful incentive mechanism for distributed P2P applications. IEEE Access 2018, 6, 27324–27335. [Google Scholar] [CrossRef]
- Han, R.; Yan, Z.; Liang, X.; Yang, L.T. How Can Incentive Mechanisms and Blockchain Benefit with Each Other? A Survey. ACM Comput. Surv. 2022, 55, 1–38. [Google Scholar] [CrossRef]
- Li, Q.; Sun, Y.; Xi, N. LFL-COBC:Lightweight Federated Learning On Blockchain-based Device Contribution Allocation. In Proceedings of the 2024 International Conference on Networking and Network Applications (NaNA), Yinchuan, China, 9–12 August 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Hewa, T.M.; Hu, Y.; Liyanage, M.; Kanhare, S.S.; Ylianttila, M. Survey on blockchain-based smart contracts: Technical aspects and future research. IEEE Access 2021, 9, 87643–87662. [Google Scholar] [CrossRef]
- Xu, S.; Liu, S.; He, G. A Method of Federated Learning Based on Blockchain. In Proceedings of the 5th International Conference on Computer Science and Application Engineering, Sanya, China, 19–21 October 2021. [Google Scholar] [CrossRef]
- Mohammed, M.A.; Lakhan, A.; Abdulkareem, K.H.; Khanapi Abd Ghani, M.; Abdulameer Marhoon, H.; Nedoma, J.; Martinek, R. Multi-objectives reinforcement federated learning blockchain enabled Internet of things and Fog-Cloud infrastructure for transport data. Heliyon 2023, 9, e21639. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, D.C.; Ding, M.; Pham, Q.V.; Pathirana, P.N.; Le, L.B.; Seneviratne, A.; Li, J.; Niyato, D.; Poor, H.V. Federated learning meets blockchain in edge computing: Opportunities and challenges. IEEE Internet Things J. 2021, 8, 12806–12825. [Google Scholar] [CrossRef]
- Xu, W.; Huang, L.; Fox, A.; Patterson, D.; Jordan, M.I. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT, USA, 11–14 October 2009; pp. 117–132. [Google Scholar] [CrossRef]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; pp. 2938–2948. [Google Scholar]
- He, S.; Zhu, J.; He, P.; Lyu, M.R. Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. arXiv 2020, arXiv:2008.06448. [Google Scholar] [CrossRef]
- Oliner, A.; Stearley, J. What Supercomputers Say: A Study of Five System Logs. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Edinburgh, UK, 25–28 June 2007. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Q.; Sun, Y.; Gao, K.; Xi, N.; Zhou, X.; Wang, M.; Fan, K. LFL-COBC: Lightweight Federated Learning on Blockchain-Based Device Contribution Allocation. Electronics 2024, 13, 4395. https://doi.org/10.3390/electronics13224395
Li Q, Sun Y, Gao K, Xi N, Zhou X, Wang M, Fan K. LFL-COBC: Lightweight Federated Learning on Blockchain-Based Device Contribution Allocation. Electronics. 2024; 13(22):4395. https://doi.org/10.3390/electronics13224395
Chicago/Turabian StyleLi, Qiaoyang, Yanan Sun, Ke Gao, Ning Xi, Xiaolin Zhou, Mingyan Wang, and Kefeng Fan. 2024. "LFL-COBC: Lightweight Federated Learning on Blockchain-Based Device Contribution Allocation" Electronics 13, no. 22: 4395. https://doi.org/10.3390/electronics13224395
APA StyleLi, Q., Sun, Y., Gao, K., Xi, N., Zhou, X., Wang, M., & Fan, K. (2024). LFL-COBC: Lightweight Federated Learning on Blockchain-Based Device Contribution Allocation. Electronics, 13(22), 4395. https://doi.org/10.3390/electronics13224395