Communication-Efficient and Private Federated Learning with Adaptive Sparsity-Based Pruning on Edge Computing
Abstract
:1. Introduction
Can model pruning and differential privacy be orthogonal? How does pruning affect the fine-tuning of the private noise?
- (1)
- We proposed a communication-efficient federated learning framework that achieves a high-accuracy classification model with a privacy guarantee. This framework jointly considers the dynamic balance between the efficiency and privacy issues under non-IID distributions while achieving good model performance.
- (2)
- To enhance communication efficiency, we proposed an adaptive channel pruning mechanism using a Jensen-Shannon divergence feature to generate the sparsity matrix, reducing the rounds required for equivalent performance by more than 2 times.
- (3)
- To ensure the privacy guarantee, we estimate the sensitivity after pruning and introduce a Gaussian noise to the aggregated model. In addition, we give a lower bound of variance for adaptive collaborative training. Our method achieves a higher model accuracy (82.42%) and 40% improvement in communication rounds to the baseline methods.
2. Related Works
2.1. Differential Privacy for Federated Learning
2.2. Model Pruning
3. Our Methodology: ASPFL Framework
3.1. Overview and Problem Statement
3.2. ASPFL Algorithm
- (1)
- Global model initialization. The server first initializes global model parameters (Line 1 in Algorithm 1) and distributes to each client.
- (2)
- Local models update. In the global round t, K clients are randomly selected to participate in local training. Local model parameters is first initialized as the global model parameters . Then L steps local training are performed to update local model parameters . We designed the gradient clipping process of incorporating tailoring factors into the stochastic gradient descent (SGD) optimizer to obtain a local model . A tailoring factor is applied to control the magnitude of parameter updates, which is represented as:
- (3)
- Sparsity-based channel pruning. In the round t, the client finishes L steps local training and obtains a new local model . To reduce the communication overhead required for uploading parameters, we employ an adaptive sparse matrix to discard the less significant parts of the model updates. A sparse matrix composed of elements “1” and “0” is allocated for the client . By Hadamard product of model updates with this matrix, the less important weight parameters will be zeroed out. The client combines the accumulated gradients with and the uploaded part is formulated as:
- (4)
- Model aggregation and noise perturbation. Initially, the server receives all model updates from participating clients and computes the weighted average of model updates, yielding a preliminary version of the aggregated global model. To fortify privacy preservation via differential privacy, the server introduces adaptive Gaussian noise onto the preliminary global model. The variance of this noise is judiciously determined based on the desired privacy budget (ε and δ) and the sensitivity . Mathematically, the noise-injected global model is denoted as:
- (5)
- Global model broadcasting. Upon completion of the global model update in round t, the server is responsible for broadcasting the newly generated global model to all participating clients, thereby initiating the subsequent round of local training processes as .
Algorithm 1 ASPFL |
Initialization: tailoring threshold , local models, local datasets , privacy configurations (), etc. |
1 for global round t = 1, 2, …, T do |
2 for participating client i in parallel do ※client side |
3 for each step l = 1, 2, …, L do |
4 Clip local gradient: ; |
5 Update |
6 end |
7 Quantize the JS distribution feature: ; |
8 Generalize the mask matrix: {}; |
9 Conduct sparsity pruning: ; |
10 Upload model updates to the server; |
11 end |
12 Aggregate updates and calculate the sensitivity ; ※server side |
13 Introduce Gaussian noise to ; |
14 Download the global model: ; ※client side |
15 end |
Output: global model: . |
3.3. Adaptive Sparsity-Based Channel Pruning
- First set a fixed value ;
- Then, specify the layer-preserving rate (LPR) that satisfies the Bernoulli(p) distribution;
- Finally, prune the cumulative gradients according to LPR.
- Extraction of Label Distributions: Initially, the label distributions pertinent to each client’s local dataset are meticulously extracted and quantified.
- Computation of JS Divergence: the JS divergence between the client’s label distribution and a predefined uniform distribution is calculated to quantify the similarity. Since the range value of JS divergence is in [0, ], we make a linear transformation of the feature following:
- Normalization and Weighting: The resulting should be normalized to ensure comparability across different clients.
- Matrix Construction: The sparse matrix is sampled from the Bernoulli () distribution for each client. This matrix reflects the unique statistical characteristics of the client’s data, particularly focusing on the sparsity patterns that emerge from the label distribution disparities.
3.4. Privacy Guarantee after Pruning
- A.
- L2-norm Sensitivity
- B.
- Privacy Guarantee in Each Round
3.5. Computational Complexity Analysis
4. Results and Discussion
4.1. Experimental Settings
4.2. Pruning Performance Comparison
4.3. Privacy Performance Comparison
4.4. Impact on the Number of Participants
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. Available online: https://link.springer.com/article/10.1007/s11263-015-0816-y (accessed on 7 June 2024). [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. Available online: http://arxiv.org/abs/1810.04805 (accessed on 28 May 2024).
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. Available online: https://proceedings.mlr.press/v54/mcmahan17a.html (accessed on 7 June 2024).
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated Machine Learning: Concept and Applications. arXiv 2019, arXiv:1902.04885. Available online: http://arxiv.org/abs/1902.04885 (accessed on 13 April 2024). [CrossRef]
- Denil, M.; Shakibi, B.; Dinh, L.; Ranzato, M. Predicting Parameters in Deep Learning. arXiv 2014. [Google Scholar] [CrossRef]
- Michel, P.; Levy, O.; Neubig, G. Are Sixteen Heads Really Better than One? arXiv 2019, arXiv:1905.10650. Available online: http://arxiv.org/abs/1905.10650 (accessed on 14 June 2024).
- Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2017, arXiv:1610.05492v2. Available online: https://arxiv.org/abs/1610.05492v2 (accessed on 8 June 2024).
- Balle, B.; Cherubin, G.; Hayes, J. Reconstructing Training Data with Informed Adversaries. arXiv 2022, arXiv:2201.04845. Available online: http://arxiv.org/abs/2201.04845 (accessed on 27 December 2023).
- Wang, Z.; Song, M.; Zhang, Z.; Song, Y.; Wang, Q.; Qi, H. Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 2512–2520. [Google Scholar] [CrossRef]
- Wang, L.; Jia, R.; Song, D. D2P-Fed: Differentially Private Federated Learning with Efficient Communication. arXiv 2021, arXiv:2006.13039. Available online: http://arxiv.org/abs/2006.13039 (accessed on 19 January 2024).
- McMahan, H.B.; Ramage, D.; Talwar, K.; Zhang, L. Learning Differentially Private Recurrent Language Models. arXiv 2018, arXiv:1710.06963. Available online: http://arxiv.org/abs/1710.06963 (accessed on 7 January 2024).
- Wu, X.; Zhang, Y.; Shi, M.; Li, P.; Li, R.; Xiong, N.N. An adaptive federated learning scheme with differential privacy preserving. Future Gener. Comput. Syst. 2022, 127, 362–372. [Google Scholar] [CrossRef]
- Shi, Y.; Liu, Y.; Wei, K.; Shen, L.; Wang, X.; Tao, D. Make Landscape Flatter in Differentially Private Federated Learning. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 24552–24562. [Google Scholar] [CrossRef]
- Mireshghallah, F.; Backurs, A.; Inan, H.A.; Wutschitz, L.; Kulkarni, J. Differentially Private Model Compression. arXiv 2022, arXiv:2206.01838. Available online: http://arxiv.org/abs/2206.01838 (accessed on 13 June 2024).
- Lin, J. On the Interaction Between Differential Privacy and Gradient Compression in Deep Learning. arXiv 2022, arXiv:2211.00734. Available online: http://arxiv.org/abs/2211.00734 (accessed on 13 June 2024).
- Li, M.; Lai, L.; Suda, N.; Chandra, V.; Pan, D.Z. PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training. arXiv 2018, arXiv:1709.06161. Available online: http://arxiv.org/abs/1709.06161 (accessed on 13 June 2024).
- Wang, J.; Bao, W.; Sun, L.; Zhu, X.; Cao, B.; Yu, P.S. Private Model Compression via Knowledge Distillation. arXiv 2018, arXiv:1811.05072. Available online: http://arxiv.org/abs/1811.05072 (accessed on 13 June 2024). [CrossRef]
- Yang, J.; Chen, S.; Wang, G.; Wang, Z.; Jie, Z.; Arif, M. GFL-ALDPA: A gradient compression federated learning framework based on adaptive local differential privacy budget allocation. Multimed. Tools Appl. 2024, 83, 26349–26368. [Google Scholar] [CrossRef]
- Konečný, J.; McMahan, H.B.; Ramage, D.; Richtárik, P. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv 2016, arXiv:1610.02527. [Google Scholar] [CrossRef]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3876. [Google Scholar]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; ACM: New York, NY, USA, 2016; pp. 308–318. [Google Scholar] [CrossRef]
- Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
- McMahan, H.B.; Andrew, G.; Erlingsson, U.; Chien, S.; Mironov, I.; Papernot, N.; Kairouz, P. A General Approach to Adding Differential Privacy to Iterative Training Procedures. arXiv 2019, arXiv:1812.06210. Available online: http://arxiv.org/abs/1812.06210 (accessed on 22 January 2024).
- Zheng, Q.; Chen, S.; Long, Q.; Su, W.J. Federated f-Differential Privacy. arXiv 2021, arXiv:2102.11158. [Google Scholar] [CrossRef]
- Kim, M.; Günlü, O.; Schaefer, R.F. Federated Learning with Local Differential Privacy: Trade-Offs Between Privacy, Utility, and Communication. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2650–2654. [Google Scholar] [CrossRef]
- Ling, X.; Fu, J.; Wang, K.; Liu, H.; Chen, Z. ALI-DPFL: Differentially Private Federated Learning with Adaptive Local Iterations. arXiv 2023, arXiv:2308.10457. Available online: http://arxiv.org/abs/2308.10457 (accessed on 2 February 2024).
- Jiang, Y.; Wang, S.; Valls, V.; Ko, B.J.; Lee, W.-H.; Leung, K.K.; Tassiulas, L. Model Pruning Enables Efficient Federated Learning on Edge Devices. arXiv 2022, arXiv:1909.12326. Available online: http://arxiv.org/abs/1909.12326 (accessed on 6 January 2024). [CrossRef] [PubMed]
- Zhu, Z.; Shi, Y.; Luo, J.; Wang, F.; Peng, C.; Fan, P.; Letaief, K.B. FedLP: Layer-Wise Pruning Mechanism for Communication-Computation Efficient Federated Learning. In Proceedings of the ICC 2023—IEEE International Conference on Communications, Rome, Italy, 28 May–1 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1250–1255. [Google Scholar] [CrossRef]
- Zhu, Z.; Shi, Y.; Xin, G.; Peng, C.; Fan, P.; Letaief, K.B. Towards Efficient Federated Learning: Layer-Wise Pruning-Quantization Scheme and Coding Design. Entropy 2023, 25, 1205. [Google Scholar] [CrossRef] [PubMed]
- He, Y.; Xiao, L. Structured Pruning for Deep Convolutional Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 2900–2919. [Google Scholar] [CrossRef] [PubMed]
- Qian, Y.; Rao, L.; Ma, C.; Wei, K.; Ding, M.; Shi, L. Toward Efficient and Secure Object Detection With Sparse Federated Training Over Internet of Vehicles. IEEE Trans. Intell. Transp. Syst. 2024, 1–14. [Google Scholar] [CrossRef]
- Collins, L.; Hassani, H.; Mokhtari, A.; Shakkottai, S. Exploiting Shared Representations for Personalized Federated Learning. arXiv 2023, arXiv:2102.07078. [Google Scholar] [CrossRef]
- Balle, B.; Barthe, G.; Gaboardi, M. Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Available online: https://proceedings.neurips.cc/paper_files/paper/2018/hash/3b5020bb891119b9f5130f1fea9bd773-Abstract.html (accessed on 26 August 2024).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Symbol | Descriptions |
---|---|
N | total number of clients |
T | total number of global epochs |
t | the index of the t-th global epoch |
L | number of local epochs for clients in a round |
l | the index of the l-th local epoch |
gradient of the model | |
local dataset of the i-th client | |
B | batch size |
set of participating clients | |
local model parameters of the i-th client | |
weight matrices of the i-th client | |
size of set | |
global privacy budget | |
the noise level | |
the sensitivity of the query function | |
model update of the i-th client | |
the sparse matrix | |
γ | tailoring factor |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, S.; Du, S.; Song, Y.; Zhu, Y. Communication-Efficient and Private Federated Learning with Adaptive Sparsity-Based Pruning on Edge Computing. Electronics 2024, 13, 3435. https://doi.org/10.3390/electronics13173435
Song S, Du S, Song Y, Zhu Y. Communication-Efficient and Private Federated Learning with Adaptive Sparsity-Based Pruning on Edge Computing. Electronics. 2024; 13(17):3435. https://doi.org/10.3390/electronics13173435
Chicago/Turabian StyleSong, Shijin, Sen Du, Yuefeng Song, and Yongxin Zhu. 2024. "Communication-Efficient and Private Federated Learning with Adaptive Sparsity-Based Pruning on Edge Computing" Electronics 13, no. 17: 3435. https://doi.org/10.3390/electronics13173435
APA StyleSong, S., Du, S., Song, Y., & Zhu, Y. (2024). Communication-Efficient and Private Federated Learning with Adaptive Sparsity-Based Pruning on Edge Computing. Electronics, 13(17), 3435. https://doi.org/10.3390/electronics13173435