Towards Federated Learning with Byzantine-Robust Client Weighting
Abstract
:Featured Application
Abstract
1. Introduction
2. Materials and Methods
2.1. Problem Setup
2.1.1. Optimization Goal
2.1.2. Collaboration Model
2.1.3. Federated Learning Meta Algorithm
- —receives possibly Byzantine s from clients and produces secure estimates marked as s. To the best of our knowledge, previous works ignore this procedure and assume that the s are correct.
- —per-client computation. In , this corresponds to a few local mini-batch SGD rounds. See Algorithm 2 for the pseudocode.
- —the server’s strategy for updating w. In , this corresponds to the weighted arithmetic mean, i.e., .
Algorithm 1 Federated Learning Meta-Algorithm |
Given procedures: , , and .
|
Algorithm 2 FedAvg: |
Hyperparameters: learning rate (), number of epochs (E), and batch size (B).
|
2.2. Preprocessing Client-Declared Sample Sizes
2.2.1. Preliminaries
2.2.2. Truncating the Values of N
Finding Given
Optimality
The - Trade-Off
Algorithm 3 Plot (, ) Pairs |
for
to
do |
while do |
solve Equation (5) for |
output (, ) |
end while |
end for |
2.2.3. Truncation Given a Partial View of N
2.2.4. Convergence Analysis
Objective Function Skewness Bound
Convergence Guarantees
3. Experimental Evaluation
3.1. Experimental Setup
3.1.1. The Machine Learning Tasks and Models
Shakespeare: Next-Character-Prediction Partitioned by Speaker
MNIST: Digit Recognition with Synthetic Client Partitioning
3.1.2. The Server
3.1.3. The Clients and Attackers
3.2. Experimental Results
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
FL | Federated Learning |
ERM | Empirical Risk Minimization |
FedAvg | Federated Averaging |
SGD | Stochastic Gradient Descent |
MWP | Maximal Weight Proportion |
trunc | truncation |
Appendix A. Proofs
Appendix A.1. Truncation Optimality Proof
Appendix A.2. Truncation Given a Partial View of N Proof
- : The collection of largest values in .
- : The sum of all elements in .
Appendix B. MNIST Experimental Results
Appendix C. Convergence of FedAvg in the Absence of Byzantine Clients
Appendix D. Proof of Lemma 1
References
- Konečnỳ, J.; McMahan, B.; Ramage, D. Federated optimization: Distributed optimization beyond the datacenter. arXiv 2015, arXiv:1511.03575. [Google Scholar]
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. Found. Trends® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Bonawitz, K.; Eichner, H.; Grieskamp, W.; Huba, D.; Ingerman, A.; Ivanov, V.; Kiddon, C.; Konecny, J.; Mazzocchi, S.; McMahan, H.B.; et al. Towards federated learning at scale: System design. arXiv 2019, arXiv:1902.01046. [Google Scholar]
- Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates: Red Hook, NY, USA, 2017; pp. 119–129. [Google Scholar]
- Ghosh, A.; Hong, J.; Yin, D.; Ramchandran, K. Robust Federated Learning in a Heterogeneous Environment. arXiv 2019, arXiv:1906.06629. [Google Scholar]
- Alistarh, D.; Allen-Zhu, Z.; Li, J. Byzantine Stochastic Gradient Descent. In Advances in Neural Information Processing Systems 31; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 4613–4623. [Google Scholar]
- Li, L.; Xu, W.; Chen, T.; Giannakis, G.B.; Ling, Q. Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In Proceedings of the AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 8–12 October 2019; Volume 33, pp. 1544–1551. [Google Scholar]
- Haddadpour, F.; Mahdavi, M. On the Convergence of Local Descent Methods in Federated Learning. arXiv 2019, arXiv:1910.14425. [Google Scholar]
- Pillutla, K.; Kakade, S.M.; Harchaoui, Z. Robust Aggregation for Federated Learning. arXiv 2019, arXiv:1912.13445. [Google Scholar] [CrossRef]
- Chen, Y.; Su, L.; Xu, J. Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent. Proc. ACM Meas. Anal. Comput. Syst. 2017, 1, 1–25. [Google Scholar] [CrossRef]
- Yin, D.; Chen, Y.; Ramchandran, K.; Bartlett, P. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar] [CrossRef]
- Chen, X.; Chen, T.; Sun, H.; Wu, Z.S.; Hong, M. Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms. arXiv 2019, arXiv:1906.01736. [Google Scholar]
- Chen, Y.; Sun, X.; Jin, Y. Communication-Efficient Federated Deep Learning with Asynchronous Model Update and Temporally Weighted Aggregation. arXiv 2019, arXiv:1903.07424. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated Learning with Non-IID Data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the convergence of fedavg on non-iid data. arXiv 2019, arXiv:1907.02189. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Software. Available online: tensorflow.org (accessed on 1 July 2022).
- Caldas, S.; Duddu, S.M.K.; Wu, P.; Li, T.; Konečný, J.; McMahan, H.B.; Smith, V.; Talwalkar, A. LEAF: A Benchmark for Federated Settings. arXiv 2019, arXiv:1812.01097. [Google Scholar]
- Shakespeare, W. The Complete Works of William Shakespeare; Complete Works Series; Wordsworth Editions Ltd.: Stansted, UK, 1996. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Reddi, S.; Charles, Z.; Zaheer, M.; Garrett, Z.; Rush, K.; Konečný, J.; Kumar, S.; McMahan, H.B. Adaptive Federated Optimization. arXiv 2020, arXiv:2003.00295. [Google Scholar]
- LeCun, Y.; Cortes, C.; Burges, C. MNIST Handwritten Digit Database. ATT Labs. 2010, Volume 2. Available online: http://yann.Lecun.Com/exdb/mnist (accessed on 1 July 2022).
- Vershynin, R. Introduction to the non-asymptotic analysis of random matrices. arXiv 2010, arXiv:1011.3027. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Portnoy, A.; Tirosh, Y.; Hendler, D. Towards Federated Learning with Byzantine-Robust Client Weighting. Appl. Sci. 2022, 12, 8847. https://doi.org/10.3390/app12178847
Portnoy A, Tirosh Y, Hendler D. Towards Federated Learning with Byzantine-Robust Client Weighting. Applied Sciences. 2022; 12(17):8847. https://doi.org/10.3390/app12178847
Chicago/Turabian StylePortnoy, Amit, Yoav Tirosh, and Danny Hendler. 2022. "Towards Federated Learning with Byzantine-Robust Client Weighting" Applied Sciences 12, no. 17: 8847. https://doi.org/10.3390/app12178847
APA StylePortnoy, A., Tirosh, Y., & Hendler, D. (2022). Towards Federated Learning with Byzantine-Robust Client Weighting. Applied Sciences, 12(17), 8847. https://doi.org/10.3390/app12178847