1. Introduction
Federated learning (FL) is an innovative approach to decentralized machine learning, particularly relevant in an era characterized by modern mobile devices that gather vast amounts of user data. As highlighted by McMahan, these devices can significantly enhance the user experience by using data to train models, such as improving speech recognition or photo selection. However, due to the huge and privacy-sensitive nature of the data, central storage and traditional training methods are often impractical. FL proposes a paradigm whereby the data remain on the device, and a shared model is developed by aggregating locally computed updates. This method effectively handles unbalanced and non-IID data distributions while reducing the communication costs, making it an innovative solution in machine learning. The core of FL involves a framework allowing devices or nodes, like computers and mobile phones, to learn autonomously from their locally stored data. Instead of sending all raw data, they perform on-site training and share only model updates with a central server. This server aggregates the updates to refine a global model, which is then shared back with the devices for further improvements. Federated learning is a revolutionary advancement from traditional machine learning, where data had to be centralized. Now, model training happens directly at the data source, preserving privacy and reducing the transfer costs. FL is not only theoretical—it is enabling advances in industries like healthcare and industrial IoT, where data privacy and bandwidth limitations are critical. However, with great innovation, new challenges emerge. FL confronts privacy issues directly, but it also raises questions about model security and optimization. The models relied on previously for centralized systems are not suitable for a federated environment, so we need new algorithms and strategies. FL’s decentralized approach is not merely focused on security—it considers efficiency. It reduces the data transfer overhead, making it a powerful solution in environments with limited bandwidth and computing power. Moreover, it is in line with strict data protection laws like GDPR, offering smart solutions for cross-border data transfer. As for economics, there are significant reductions in data storage, transfer, and processing costs. This study examines the domain of federated learning, highlighting its complexities, potential, and challenges. We start by defining FL and its performance, providing a strong foundation. Then, we explore how FL is already being used across mobile environments, organizations, and the IoT. We also address the challenges that come with FL’s decentralized nature. Next, we contrast FL’s unique features with those of traditional centralized learning. We delve into data privacy tools like differential privacy and homomorphic encryption and explore personalization strategies. We consider non-IID data in federated systems, along with the design space of FL and data distribution strategies, as well as ensuring robustness in distributed learning. We examine how FL interacts with traditional machine learning, breaking down different FL structures like horizontal and vertical FL, and we discuss federated transfer learning. We present potential threats and a full view of both the opportunities and vulnerabilities.
IID refers to independent and identically distributed data, which are commonly contrasted with non-IID data in the context of federated learning. This term refers to data in which each data point is independent of others and follows the same probability distribution. In federated learning, handling non-IID data is a significant challenge, as the data held by different devices or clients are often not uniformly distributed, leading to potential biases in the global model and slower convergence rates.
Finally, we give details regarding the algorithms and frameworks that enable FL to operate, like the Federated Averaging (FedAvg) algorithm and tools like TensorFlow and PySyft. We also address the risks—model inversion, data poisoning, and Sybil attacks—and the defenses needed to keep FL systems secure.
This survey offers a comprehensive examination of federated learning (FL), a decentralized machine learning approach that enhances data privacy and reduces communication overheads. It covers both the theoretical underpinnings and practical applications of FL in sectors like healthcare and IoT, addressing challenges such as data security, regulatory compliance, and model optimization. This paper stands out by exploring FL algorithms, frameworks, and security risks while proposing mitigation strategies. Its holistic approach caters to both beginners and experts, providing valuable insights to leverage FL’s potential in real-world applications and guiding future research in this evolving field.
Our paper’s major goal is to present a comprehensive overview of federated learning (FL), including the current achievements, problems, and future applications in areas such as IoT, healthcare, and data privacy. While survey articles typically summarize the state of the art, our purpose goes beyond simply gathering information. Our paper’s specific goal is to examine and highlight existing difficulties in FL, as well as presenting a road map for future study. The research questions (RQ) that we examine are as follows.
RQ1: How can federated learning (FL) strike a balance between data privacy, security, and communication costs while retaining model performance?
This point is critical since FL aims to keep data decentralized while maintaining high accuracy and security standards. Understanding how to ensure this equilibrium can help to guide future algorithmic advancements.
RQ2: What security problems, such as adversarial attacks, must be addressed to ensure the viability of FL deployment in real-world applications?
With the rising number of security threats in decentralized systems, particularly in vital sectors such as healthcare, this paper aims to identify the limitations of the present security solutions and consider how future work can overcome these. These questions not only correlate with this paper’s primary thesis, but they also demonstrate the survey’s structure by focusing on crucial areas of FL that are yet to be examined.
This paper is structured as follows. In
Section 2, we introduce federated learning by covering its background and foundational concepts.
Section 3 explores the opportunities, challenges, and applications of FL across various domains.
Section 4 categorizes the types of FL, such as horizontal and vertical FL.
Section 5 examines the key attributes and characteristics of FL, like privacy and scalability.
Section 6 discusses the algorithms and frameworks that underpin FL.
Section 7 addresses potential attack strategies and mitigation techniques.
Section 8 provides a critical discussion of the findings, and
Section 9 concludes with future implications and directions for research.
2. Background Material—Introduction to FL
Federated learning (FL) changes machine learning by training models on devices like smartphones and IoT sensors instead of a central server. This method enhances data privacy, reduces transport costs, and meets real-time processing needs [
1]. It improves privacy in healthcare and overcomes bandwidth issues in industrial IoT. FL also reduces biases and creates more inclusive models. However, ensuring the security of decentralized systems is a challenge. FL helps to comply with privacy regulations like GDPR, and its efficiency could lead to lower costs in data management. As edge computing grows, FL is key to a privacy-focused, efficient future.
2.1. FL Basics
FL pioneers a unique approach: instead of transferring data to a central point for analysis and model building [
2,
3], the training itself takes place where the data reside; see
Figure 1.
FL allows multiple users in a network to utilize their local data to contribute to an integrated, global model. This process, while ensuring data privacy, also uses the diverse and expansive nature of the data residing on these devices, leading to more robust and generalizable models.
The FL process consists of three core parts.
Learning Algorithm and Training: The server trains a model by repeating several steps, like designing the learning algorithm, selecting clients, distributing the model, updating the client model, and server-side aggregation and model updates.
Privacy Protection Mechanism: FL can protect users’ data privacy by employing data encryption training, ensuring that the model does not reveal the original data. Additionally, encrypting the data transmission process ensures that only intermediary results without extra information are relayed during the training process.
Incentive Mechanism: Since FL relies on participation, it is essential to offer adequate incentives to participants. The incentive mechanism strives to equitably share the benefits of FL, motivating users to participate consistently and deterring malicious actors from dominating the process.
A more analytical outline of the FL process can be provided as follows [
4].
Initialization: On a server, a global model is created.
Model Distribution: This global model is dispatched to the various participating nodes or devices.
Local Training: Each node dedicates itself to training this model using its reservoir of local data.
Model Update Sharing: After this training, the nodes send back model updates to the central server.
Aggregation: The central entity aggregates these updates, often employing techniques like Federated Averaging, as proposed by [
5].
Iteration: The previous three steps (distribution, training, and aggregation) are reiterated until the model achieves an acceptable level of performance.
Despite FL’s design, issues persist, such as ensuring the twin objectives of secure and efficient communication, navigating the heterogeneity of systems, handling skewed and non-IID data, and combating malicious threats. In [
6], the authors specifically underscore the importance of data security and the need for efficient communication channels that prioritize the transfer of only the most pivotal model updates, minimizing data breach risks. Several distinct concerns associated with the more conventional centralized learning approach are addressed seamlessly by FL, which can be categorized into the following three classes.
Regulatory Hurdles: Globally, there has been a surge in regulations, such as GDPR in Europe, CCPA in California, and PDPB in India. These laws present strict rules regarding the movement and mixture of sensitive data, especially when the data protection standards vary across regions. FL addresses these regulatory complexities by minimizing data movement.
User Privacy: Users are increasingly concerned about the privacy of their data. Applications that involve inputting sensitive information like passwords or credit card details raise expectations that such details remain on the device, shielded from external servers. FL addresses this expectation, reinforcing users’ trust.
Data Volume and Bandwidth: A growing number of devices, especially those like surveillance cameras, generate vast data volumes. Transferring this vast amount of data centrally is not only a logistical challenge but is often not economically viable. By enabling on-site training, FL substantially reduces the bandwidth demands, making it a more scalable option.
2.2. Definitions and Alternative Nomenclature in Distributed Learning
Federated learning is a rapidly evolving topic in machine learning, marked by the emergence of various terminologies that often overlap with or resemble FL. Understanding these terms is crucial for scholars, practitioners, and industry experts to ensure clear communication and accurate knowledge dissemination. By definition, FL is a distributed machine learning approach where models are trained across multiple devices or nodes without the need to centralize data, preserving data privacy and reducing the data transfer overheads, particularly with large, decentralized datasets [
7,
8,
9].
Key related terms include the following.
Collaborative Learning: Often used synonymously with FL, emphasizing cooperative model training without sharing raw data.
Distributed Machine Learning (DML): A broader concept encompassing FL, focusing on distributed computations with FL’s emphasis on privacy.
Edge Learning/Training: Aligns with FL but focuses on training at the network’s periphery, such as on mobile devices or IoT endpoints [
10].
On-Device Learning: Highlights training on devices like smartphones without offloading data, central to FL’s paradigm.
Decentralized Machine Learning: Encompasses any machine learning technique without a central authority, with FL focusing on decentralized data and privacy.
Privacy-Preserving Machine Learning: Includes methods like homomorphic encryption, SMPC, and differential privacy, with FL as a key pillar.
2.3. Integrating Security and Storage
Here, we clarify which parts of the process described above deal with security and storage considerations, which are discussed later in the paper. FL prioritizes local data storage on devices (e.g., cellphones, IoT devices) to safeguard privacy and reduce the reliance on centralized servers. This local storage solution is critical in complying with data privacy rules (such as GDPR) and minimizing the risk of exposing sensitive data. However, it poses new issues in controlling storage restrictions on edge devices, which we will describe later. During the local training phase, the FL process uses local storage to compute updates, rather than sending raw data. Security is a critical component during the sharing of model updates. When devices send their locally trained updates back to the central server, privacy and security risks emerge. Malicious entities could intercept the updates or manipulate them to corrupt the global model (via model poisoning or adversarial attacks). Therefore, secure aggregation techniques and differential privacy mechanisms must be integrated into the FL process to ensure that updates cannot be reverse-engineered to expose sensitive data or be compromised by attackers. This security challenge relates directly to the sharing and aggregation stages of the process. In the aggregation phase, the central server aggregates the updates from multiple devices to update the global model. To secure this process, homomorphic encryption and secure multi-party computation (SMPC) are commonly used to ensure that the updates are encrypted and privacy is preserved during aggregation. These measures address both security (protecting the updates from interception) and storage efficiency (since raw data are never transferred, and only model updates are sent, minimizing the storage load on the server).
3. General Opportunities, Challenges, and Applications
Recognizing the opportunities, understanding the challenges, and identifying new applications are critical in today’s technology. We investigate the broad prospects of developing technologies, as well as their specialized applications in cellphones, organizational infrastructures, and IoT, while also addressing the challenges that come with these improvements.
3.1. Opportunities
Federated learning allows model training across multiple devices while keeping data local. This decentralized approach offers major benefits in data privacy and training efficiency, making FL valuable for various industries. Research highlights FL’s importance in machine learning and data science [
11,
12]. Experts like Kairouz in [
13] stress the need to address issues like data handling, communication costs, and security.
3.1.1. Enhanced Security and Reduced Data Transfer Costs
With increasing data breaches, protecting data privacy is crucial, especially in sensitive fields like healthcare and personal communications [
14]. FL’s decentralized approach ensures that raw data remain on users’ devices, minimizing security risks and enhancing trust. This infrastructure is particularly beneficial in sectors like healthcare and banking, where data-driven insights are needed without exposing raw data [
15,
16,
17,
18]. FL also reduces the costs and environmental impacts by minimizing data transfer and processing data locally [
19,
20].
3.1.2. Improved Model Personalization
Today’s consumers, equipped with a plethora of choices, desire tailored experiences. FL is perfectly positioned to respond to this demand. By continually refining models based on individual device data, FL crafts a learning model that is inextricably linked to user behavior. As these models evolve, they intuitively align with the users’ preferences, resulting in experiences that are not only personalized but deeply resonant. This level of customization is paramount in ensuring user retention and satisfaction in an increasingly competitive market [
21,
22].
3.1.3. Scalability
At its core, FL is designed for expansion. Its structure, which encourages the addition of devices to its network, is inherently scalable. Each device that becomes part of this network augments the collective intelligence without ever directly sharing its raw data with a centralized entity. This architecture ensures that, as the network grows, so does its diversity and richness in data, all while maintaining a lightweight and efficient structure [
23].
3.1.4. Utilization of Edge Devices and Real-Time Learning
Federated learning utilizes the computing capabilities of edge devices like smartphones and IoT tools, incorporating them into its learning structure. This method helps to maximize the resources and promotes the widespread use of machine learning, leading to a more inclusive concept [
24]. In dynamic sectors like autonomous driving and emergency medical response, real-time data processing is critical. FL excels in these areas by enabling models to adapt rapidly to new data, avoiding the latency associated with traditional batch updates. This capability is vital when even slight delays can have significant consequences.
3.1.5. Decentralization, Robustness, and Optimized Network Traffic
Unlike centralized systems, FL’s decentralized nature improves resilience by eliminating single points of failure. Despite failures in individual nodes, the system remains functioning, demonstrating resilience [
25,
26]. FL distinguishes itself in a regulatory context centered on data protection by guaranteeing compliance without sacrificing data-driven insights. Its emphasis on data localization enables organizations to function within stringent regulatory constraints. FL optimizes network traffic by transmitting model updates instead of substantial raw data, ensuring effective bandwidth use and decreasing the latency [
5]. The practical impact of FL across diverse sectors, from healthcare to smart cities, underscores its transformative role in data analytics.
3.2. Areas of Application
There are a number of areas of application for FL (see
Table 1).
3.3. Main Challenges
The decentralized design of FL, which emphasizes data privacy and distributed learning, differs from that of traditional, centralized machine learning models, introducing novel challenges at every stage. Pioneering research has been instrumental in outlining the numerous challenges that affect FL, and it is imperative to understand these in detail if we are to exploit the full potential of FL. While federated learning offers exciting possibilities, it represents a significant shift from traditional methodologies. This innovation, albeit promising, introduces new layers of complexity, especially in implementation. Techniques that worked well in centralized systems might not be directly applicable any longer. The expansion of FL across large and diverse networks brings challenges related to synchronization, data standardization, and real-time collaboration. Moreover, the varying capabilities and potential failures of devices, as previously mentioned, add to these complexities. To fully leverage FL, researchers and practitioners need to innovate, adapt, and create new strategies suited to this model. The inherent design of FL promotes data security. However, the very nature of distributed systems introduces multiple points of potential vulnerability. Every interaction and every data exchange, regardless of whether it is minimal, needs to be fortified against breaches. Integrating secure multi-party computation (SMC) with FL is one way to enhance the privacy and security in federated settings. However, consistently encrypting these interactions without affecting the performance poses a challenge [
44].
The delivery of consistent and accurate results is challenging. Variabilities in data quality, distribution, and computational capabilities can influence models’ performance. The statistical variations in data sources add to this complexity. While techniques like data preprocessing and augmentation offer some solutions, maintaining privacy while achieving large-scale optimization in FL is a complex task [
45,
46,
47]. Challenges such as adapting methodologies to specific project demands, managing non-independent and identically distributed (non-IID) data, and optimizing communication are critical tasks in FL. However, as research advances, solutions to these challenges are expected to emerge, paving the way for the broader adoption of FL [
48,
49,
50,
51].
3.3.1. Communication and System Heterogeneity
Communication is fundamental to every distributed system, including federated learning (FL). In the age of big data and artificial intelligence, FL stands out due to its ability to protect data privacy while minimizing the need to transfer and process massive volumes of data, all while maintaining the advantages of machine learning. Unlike typical centralized training, FL uses collaborative model training to share parameter updates. However, its wider implementation is hampered by the significant communication and processing overheads resulting from the high computing demands and huge parameter updates sent. This problem is particularly acute in Internet of Things (IoT) applications, where edge and fog devices frequently have limited computing power, bandwidth, and data capacities. This study aims at filling this gap by providing a number of recent works conducted to improve the communication and/or computational efficiency in FL [
6,
44].
Communication costs are the primary restriction, and we demonstrate a 10–100× reduction in communication rounds compared to synchronized stochastic gradient descent. However, FL operates in an environment marked by diverse systems and protocols. Managing the complicated process of interaction within such a heterogenous landscape is challenging. The variabilities in storage, processing power, and communication capabilities among federated components add to this complexity. Each system might have its own limitations, capacities, and peculiarities, which can affect the overall synchronization and efficiency. While solutions like model compression and decentralized training [
45] provide some benefit, implementing these solutions seamlessly across a diverse network remains a challenge.
3.3.2. Threats and Adversarial Attacks
Most digital systems have vulnerabilities, and FL, despite its advanced architecture, is not exempt. FL faces risks from adversarial attacks [
52,
53]. These attacks are sophisticated attempts to deceive or manipulate the learning process. The decentralized nature of FL makes it imperative to have robust defense mechanisms in place. Techniques such as secure function evaluation, homomorphic encryption, and differential privacy can be applied in federated settings to provide privacy and combat threats. These methods offer potential defenses [
2].
3.3.3. Infrastructure and Bandwidth Issues
A solid infrastructure is required for FL to work properly. However, difficulties such as unreliable networks and restricted bandwidths, particularly in edge devices, might pose substantial impediments. These limits might delay model updates and training, thereby impacting real-time decision-making and overall system performance [
24].
4. Types of FL
Different situations call for specialized FL strategies, leading us to identify distinct categories. In the literature, FL is usually categorized into three primary categories, which we expand upon here, together with a novel one: federated reinforcement learning (FRL). Each of these types of FL (see
Figure 2) utilizes a variety of methods and techniques for model training, drawing upon the shared information sources available. The comprehension of these categories allows researchers to examine the unique characteristics and applications of each type, which in turn aids in the advancement of FL strategies [
54].
4.1. Horizontal FL (HFL)
Machine learning traditionally relies on a centralized training paradigm, where data from various sources are pooled into a single repository. This method simplifies preprocessing, data access, and model evaluation, leading to robust performance and iterative improvements due to the data’s homogeneity and volume. However, as the data distribution becomes more complex and concerns about privacy and security grow, new methodologies have emerged. One such approach is horizontal federated learning (HFL), or “sample-based” FL, designed for decentralized data. HFL is suitable for scenarios where entities collect data with shared features but from diverse samples [
55].
The foundational aspect of HFL is its consistent feature space, maintained across all participating nodes or devices, while individual sample spaces or data records differ. This framework ensures that the data integrity and privacy are safe. In essence, while the data features are consistent among the participants, the need for direct data sharing is obviated, allowing entities to collaboratively refine a shared model [
54]. For clearer insight into HFL’s applicability, consider two geographically separated hospitals. Both institutions might capture analogous parameters for their patients—age, weight, and blood pressure, to name a few. The divergence arises from the distinct patient demographics that each caters to. In this scenario, HFL acts as a nexus, enabling these hospitals to synergistically enhance their diagnostic models. A notable advantage of this model is the implicit assurance that patient-specific information remains contained within its originating institution, supporting the principles of data privacy, fairness, and accuracy [
56]. In this type, data samples from various devices are split horizontally, meaning that each device has the same labels for a subset of features. The objective is to prepare a model that ensures the protection of information in a cooperative way. The usual method of handling AI, which includes concentrating information on a server, presents several challenges—for example, high mailing costs, excessive battery usage, and risks to the protection and security of customer information. FL has gained considerable attention for its ability to build powerful models in a decentralized way, without direct access to customer information, thus ensuring security. Unlike conventional centralized AI, federated learning tends to address the difficulties presented by the non-IID data and imbalanced information presented in real-world applications—for example, mobile phone applications and mode identification trips using non-IID GPS addresses. With the increasing complexity of information gathering and division among organizations, especially when managing sensitive information, disconnected information repositories maintained by individual information owners have become prevalent. This requires the development of AI models that can be prepared without unifying all of the information. Regardless of the potential of HFL, it also faces challenges. One major concern is the potential imbalance of data among participants. Variations in the dataset’s volume or diversity between entities could introduce bias in the model’s performance. Addressing this requires strategies that ensure fair representation and learning, thereby upholding the integrity of the HFL methodology.
Regarding security, we note the following. In HFL, data are dispersed among numerous edge devices, like mobile phones or IoT devices, that train local models independently before providing changes to a central server. This decentralized structure provides an opportunity for hostile actors to exploit flaws by inserting poisoned or modified data into the local updates, possibly damaging the global model. Securing the communication connections between the central server and the local devices is crucial in reducing these risks. We stress the use of secure aggregation techniques, which ensure that individual device updates are encrypted and aggregated without disclosing raw data. This provides an additional degree of security against update interception or manipulation. Moreover, the importance of differential privacy, which limits the ability of adversaries to reconstruct sensitive data from the aggregated model, is highlighted. By introducing noise into local updates, differential privacy further reduces the likelihood of an attacker inferring individual data points, thereby enhancing the robustness of the overall system against adversarial attacks while maintaining privacy.
Consider a scenario whereby mobile devices collaborate to train a shared model for predictive text. If an attacker compromises a few devices and injects poisoned data into the system, the global model could learn incorrect patterns, reducing its accuracy and even spreading misinformation. This example highlights the importance of detecting malicious updates and implementing robust defense mechanisms, such as Byzantine-tolerant algorithms.
4.2. Vertical FL (VFL)
Vertical federated learning (VFL) is a specialized variant of federated learning designed for scenarios where local parties have diverse attributes within the same user cohort, unlike general FL systems that deal with numerous records sharing the same feature space. In VFL, there are two main types of participants: the active party, which initiates the training task and holds the primary label for data samples, and the passive participants, which contribute additional features for the same user set. For instance, a bank with limited transactional features can use VFL with another entity possessing complementary data to predict default risks and credit scores [
34,
57].
VFL differs from horizontal FL (HFL) by focusing on scenarios where entities possess complementary datasets (data from B parts), while HFL deals with entities having similar types of data but from different users (data from A parts). In VFL, parties target the same user set but with significantly different features, such as one entity having demographic data and another focusing on online behavior. A common identifier, like user IDs, allows collaboration without compromising data privacy. Throughout this process, the feature values remain undisclosed, ensuring data confidentiality. A key feature of VFL is its emphasis on combining different features of the same user set, in contrast to HFL’s approach, where data are distributed across numerous users.
This cooperative approach in VFL aids in assembling AI models that use each participant’s contribution, streamlining the model training process in settings where traditional averaging methods might fall short [
58]. An illustrative real-world application of VFL would be the following. Consider a scenario whereby a financial institution, with abundant transactional data, partners with an e-commerce platform, which tracks users’ browsing and purchasing behaviors. Historically, insights from either of these entities would remain in isolated repositories. However, VFL introduces the potential for a transformative, symbiotic relationship. Using common user identifiers, these entities can collaboratively design predictive models, giving rise to insights such as predictions of purchasing likelihoods based on a mixture of financial history and online behavior [
57].
To further illustrate this concept, consider a multi-party, multi-class VFL (MMVFL) system. This framework takes into account the distribution of labels across VFL participants while ensuring data privacy. It empowers multiple entities to collaborate and share labels, thereby enhancing the overall learning experience. A more simplified approach reduces the complexity of the VFL system architecture and coordination requirements by eliminating the need for a central coordinator. Such advancements elevate the efficiency and adaptability of VFL frameworks. Nonetheless, VFL, despite being innovative, has its own complexities.
Challenges in federated learning (FL) include aligning data across different parties, upholding privacy standards, and integrating models trained on varied datasets. Hu et al. [
59] stress that the goal is not merely combining data but creating actionable insights. To handle issues such as diverse and unreliable network connections, asynchronous VFL architectures are being used. These allow model updates without synchronizing data sharing. Modern VFL approaches balance protection, competition, and efficiency [
60], shifting away from solely cryptographic methods.
Traditional VFL uses a central coordinator to securely aggregate the results, preventing data breaches. However, new methods propose removing this coordinator to improve the efficiency, with direct communication between the participants. While this could enhance the efficiency, it requires careful attention to security to prevent breaches and resist attacks.
In VFL, entities (such as businesses or institutions) exchange diverse aspects of the same data topic, such as demographic data from a bank paired with purchasing activity data from an e-commerce platform. This collaborative procedure poses novel security issues related to data privacy and model inversion attacks, in which attackers seek to reverse-engineer sensitive data from shared model modifications. Because the entities in VFL frequently work with complimentary datasets, it is critical that no sensitive information is exposed during model aggregation.
Homomorphic encryption and secure multi-party computation (SMPC) are often used to safeguard data in VFL, allowing organizations to perform calculations on encrypted data while never disclosing the raw data. The security vulnerabilities in VFL, especially surrounding data leakage during model updates, underline the necessity for advanced cryptographic methods, which are discussed in later sections.
In VFL, consider a bank collaborating with an insurance company, where both parties share different attributes of the same users to create a credit risk model. If either party’s data are not fully encrypted or anonymized, there is a risk of model inversion, where an attacker might infer sensitive customer details from the shared model. This emphasizes the need for homomorphic encryption or secure multi-party computation (SMPC) to ensure that the shared updates do not expose any private information.
4.3. Federated Transfer Learning
Federated transfer learning (FTL) combines transfer learning and federated learning to adapt models trained in one context to a related task efficiently. It addresses differences in data distributions, advancing beyond traditional models that handle limited data [
61,
62]. FTL leverages transfer learning to apply knowledge across domains and federated learning to fine-tune models on diverse datasets without direct data sharing. This approach enhances the privacy and reduces the computational demands, marking a significant shift from traditional methods that centralize data or operate within fixed boundaries [
63].
To clarify the potential of FTL, consider the domain of medical diagnostics, an area where data sensitivity and specificity are paramount. Let us assume a pre-existing machine learning model that is adept in diagnosing skin diseases based on a European demographic dataset. Historically, to adapt this model to an Asian demographic, a complete retraining process would ensue, often necessitating the transfer of high-volume, sensitive data across locations. However, with FTL, the model can be efficiently recalibrated using data from Asian hospitals, without the actual datasets ever leaving their respective institutions. This not only upholds data privacy but also capitalizes on the foundational knowledge embedded within the initial model, as shown in [
64]. As with any advanced methodology, federated transfer learning (FTL) faces its own set of challenges.
A major issue in federated transfer learning (FTL) is negative transfer, where knowledge from the source domain harms the model’s performance in the target domain. This requires careful validation and refinement to ensure that the transferred knowledge is beneficial. Traditional machine learning models often struggle with tasks outside their specific training data, showing poor performance with minor changes in the data distribution or objectives. FTL, however, offers dynamic adaptability, allowing models to use previous knowledge and adjust to new contexts without sharing raw data. This makes FTL more resilient and versatile compared to traditional models.
4.4. Federated Reinforcement Learning
Federated reinforcement learning (FRL) represents a complex extension of reinforcement learning (RL), where multiple agents in distinct environments are not only optimized locally but also share insights across a network. For instance, self-driving cars in different cities can learn from each other’s experiences, improving their overall driving skills. However, FRL is still evolving, with challenges in creating effective communication systems and ensuring the accuracy of shared experiences across varying environments. While FRL combines RL and FL, it remains an area of ongoing research and development.
4.5. Differential Privacy and Homomorphic Encryption in FL
Differential privacy and homomorphic encryption are closely related to all forms of federated learning because they solve two of the most important issues in FL: data privacy and security.
In horizontal and vertical FL (described in the previous subsections), numerous entities or devices collaborate to train models while protecting the local data’s privacy. To prevent data leaks during this procedure, differential privacy assures that individual data points are not rebuilt from shared model updates. Furthermore, homomorphic encryption enables calculations on encrypted data, which is critical when the data between different institutions must remain secure (for example, between hospitals or banks collaborating in a vertical FL environment). These techniques safeguard each participant’s privacy.
Federated transfer learning (FTL) tries to adapt models across disparate datasets, such as healthcare and finance, when the data are sparse or from distinct domains. Differential privacy and homomorphic encryption are critical in allowing enterprises to exchange insights and models without disclosing the underlying private data. The safe transmission of model updates enabled by these techniques assures compliance with privacy rules such as GDPR, while also protecting the data’s proprietary nature.
Some practical examples for each type of FL, demonstrating how differential privacy and homomorphic encryption are used in real-world scenarios, are as follows. For instance, in vertical FL (where two organizations may share different aspects of the same users), homomorphic encryption can provide secure computation throughout encrypted data without the need for the data to be decrypted. In horizontal FL, differential privacy ensures that user-specific data on mobile devices cannot be reconstructed from the global model.
5. Attributes and Characteristics of FL
This section explores the abilities of federated learning, with a focus on data privacy and methods such as differential privacy, secure multi-party computation, and homomorphic encryption. It also includes customization in FL, the influence of non-IID data, and the complete design area involving the data distribution, privacy, security, and robustness.
5.1. Data Privacy
FL places a heightened emphasis on data privacy. As highlighted by McMahan in [
5], with the prevalence of sensitive data and the tightening of data privacy regulations, there is increased interest in innovative solutions that respect data privacy. FL aptly responds to this call, allowing machine learning models to be trained without centralizing or exposing the raw data.
5.1.1. Introduction to Data Privacy in FL
Safeguarding data privacy is crucial in today’s digital life. Federated learning is at the forefront of this effort, dedicated to protecting user data through advanced techniques like secure multi-party computation (SMC) and differential privacy [
39]. As the digital environment evolves with challenges like quantum computing and deepfakes, FL must enhance its privacy measures. Communication within FL, especially across numerous devices, is vital. Almanifi et al., in [
45], highlight the importance of efficient data transmission and resource management. Compression techniques can help to reduce the data transmission volumes, making FL feasible even in bandwidth-limited scenarios. Building on [
23], FL is promising for decentralized data frameworks and extends to ensemble models, complex collaborative filtering, and more intricate architectures. The decentralized nature of FL means that models are trained directly on edge devices or nodes, minimizing the risk of data exposure and breaches by sending only model updates or essential information to the central model.
5.1.2. Motivations for Enhanced Data Privacy
Empowering original data is a significant aspect of FL [
13]. Data remain at their origin, whether generated or stored, which ensures that data owners maintain control over their sensitive information, reducing risks and legal liabilities. Mechanisms such as secure aggregation, which uses cryptographic protocols to mix model updates from different nodes [
65], help to maintain privacy during the aggregation process. Differential privacy, another key technique, ensures that the output remains statistically indistinguishable regardless of the individual data’s inclusion, providing strong mathematical privacy guarantees. Homomorphic encryption and SMC further emphasize FL’s commitment to privacy [
66]. Enhanced data privacy in FL is driven by technological, ethical, and legal considerations, particularly for sectors like healthcare and finance.
5.1.3. Differential Privacy in FL
Federated learning’s decentralized approach trains models directly at the data sources, reducing the data’s exposure and risks. Differential privacy adds controlled randomness to the data or model training process, ensuring that the outputs are indistinguishable, whether individual data are included or not [
16]. Secure aggregation, as highlighted by [
67], uses cryptographic methods to combine model updates while protecting individual contributions. Combining FL with differential privacy offers robust privacy guarantees and effective data utilization. However, balancing data privacy with model accuracy remains challenging, and ongoing research aims to refine such techniques to meet stringent privacy standards while maintaining model performance. This balance is crucial as FL evolves to address contemporary regulatory requirements [
5].
5.2. Secure Multi-Party Computation (SMPC or SMC)
Secure multi-party computation (SMPC or SMC) is a critical concept within federated learning. With FL’s decentralized approach, which involves training models directly at the data sources (such as edge devices or nodes), there is a growing demand for advanced techniques that can safeguard data privacy while maintaining model efficiency. SMPC directly meets this requirement. In essence, SMPC allows multiple participants, each with their private inputs, to collaboratively conduct computations and obtain a result without revealing their individual inputs to each other. In the context of FL, where data remain decentralized, SMPC’s importance cannot be overstated. It allows for the collaborative training of models across multiple nodes, without the necessity to expose individual datasets [
37,
68]. In FL, models are trained at the data’s source, ensuring minimal risks associated with data breaches and exposure. Within this framework, SMPC operates by facilitating collaborative computation across these decentralized nodes. Rather than sharing raw data, which could compromise privacy, each participant or node shares encrypted fragments of computations. These fragments, when combined, can provide a result (like a model update) without ever revealing individual data points.
While secure aggregation focuses on a mixture of updates, SMPC takes this one step further by allowing entire computations to be jointly performed, yet without the nodes having to reveal their proprietary datasets to one another. Given FL’s application in data-sensitive domains like healthcare and finance, the utility of SMPC becomes even more relevant. These sectors can use SMPC within the FL framework to derive insights collaboratively from multiple sources, all while ensuring that individual data remain confidential [
69]. However, SMPC, despite its strengths, is not without limitations. The complexity of collaborative computation across multiple nodes can introduce latency, especially as the number of participating nodes increases. Moreover, the cryptographic techniques underlying SMPC, while ensuring privacy, can sometimes be computationally intensive, demanding more resources and potentially affecting the efficiency of the FL process.
5.3. Homomorphic Encryption in FL
Homomorphic encryption (HE) is crucial in enhancing data security, especially within federated learning. HE allows computations on encrypted data without decryption, ensuring that the results are consistent with those from the original data once decrypted. In FL, where models are trained directly at the data sources, like edge devices or nodes, HE is essential in preserving data privacy. It enables computations and model training on encrypted data while ensuring that any information sent back to the central model remains secure and encrypted [
70].
Benefits of Using Homomorphic Encryption
Enhanced Data Privacy: HE ensures that raw data remain encrypted throughout the computation process. This means that the data, even when being used for training or analysis, are never exposed in their raw form, thus significantly mitigating the risks associated with data breaches or unauthorized access.
Flexibility in Computation: Despite the data being encrypted, HE allows for meaningful computations, ensuring that the insights or model updates derived from such computations are accurate and reflective of the original data.
Compatibility with FL: Given the decentralized nature of FL, where data do not need to be centralized, HE fits seamlessly, ensuring that data privacy is maintained without hampering the training process.
Trade-Offs, e.g., Computational Costs:
While HE offers a robust solution to many challenges posed by data privacy, it comes with its set of trade-offs.
Computational Intensity: The very processes that allow computations on encrypted data can be computationally intensive, necessitating more robust computational resources and potentially extending the training or analysis time.
Increased Latency: Especially in real-time applications or scenarios where a rapid response is crucial, the additional time required for HE computations can introduce delays.
Complexity: Implementing and managing HE can be complex, requiring specialized knowledge and potentially complicating the deployment of FL solutions in certain environments.
5.4. Federated Identity and Access Management (FIAM)
Federated identity and access management (FIAM) is crucial for cyber-security in federated learning (FL), where sensitive data are processed across many devices. FIAM adapts traditional systems to the federated context [
71,
72].
Authentication is vital in FL to prevent data tampering or malicious introduction. FIAM ensures that nodes are verified and authorized. Role-based access control (RBAC) within FIAM manages nodes’ permissions based on their role, function, and trust level [
73,
74,
75]. For example, a healthcare node may be restricted to sharing only aggregated insights, enhancing data control and preventing breaches.
5.5. Privacy-Preserving Data Aggregation
Differential privacy adds “noise” during data aggregation to protect individual data points [
76,
77]. k-Anonymity hides records among others to ensure privacy, while data masking alters data values to obscure true information. Secure aggregation protocols use encryption and integrity checks to maintain the privacy and accuracy of aggregated data.
5.6. Privacy Challenges and Limitations in FL
Model inversion attacks pose a risk by allowing adversaries to reverse-engineer training data from model outputs. Despite protecting the raw data, the model parameters or outputs in FL may still be vulnerable.
Section 8 will discuss these attacks and countermeasures in detail.
Balancing user privacy with model accuracy is challenging. Techniques like differential privacy [
66], which introduce noise to protect individual data points, can obscure genuine data patterns and reduce the predictive precision. This raises questions about the value of an extremely private model if its accuracy is compromised, and vice versa. This balance is crucial in sectors like healthcare, where both privacy and precise predictions are essential [
38]. Additionally, privacy safeguards can introduce computational and communication burdens, potentially slowing model training and increasing the resource demands in constrained environments.
5.7. Best Practices and Standards
Below, we summarize the best practices and standards and give guidelines to ensure privacy in FL deployment.
Local Data Storage: At its core, FL ensures that data remain on the user’s device (like edge devices or nodes). This practice should be maintained, ensuring that raw data are never transmitted, reducing the potential for data exposure.
Data Minimization: Only the essential data required for model training should be processed. This reduces the volume of data at risk and lessens the potential harm in the event of a breach.
Regular Model Audits: Frequent validation and evaluation of the FL models help in identifying and addressing any potential privacy vulnerabilities.
Differential Privacy: One should ensure the implementation of techniques like differential privacy to introduce calibrated noise into the data, ensuring that individual contributions are masked during aggregation.
Encryption: One should always encrypt data during transmission. Techniques like homomorphic encryption, which allows computation on encrypted data without needing decryption, can be pivotal.
Emerging standards and protocols for data privacy in FL:
ISO/IEC Standards: Various standards, particularly those of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), address the privacy aspects of cloud computing and distributed systems, which are relevant to FL.
NIST Guidelines: The National Institute of Standards and Technology (NIST) offers guidelines and practices on securing data during transmission and storage [
78].
Community-Driven Initiatives: Open-source communities and academic consortia are collaborating to define standards and protocols specific to FL, given its growing significance [
79].
5.8. Non-IID Data in FL
In FL, the way in which data are distributed remains crucial for the effectiveness of the resulting models. Specifically, the challenge posed by non-IID data, which introduce unique complications in a decentralized training environment, requires careful consideration.
5.8.1. Understanding Non-IID Data
In an FL framework, non-IID data are characteristic of situations whereby the participating devices, whether clients or nodes, possess data distributions that are not a balanced representation of the broader population. Several factors can lead to this imbalance. For instance, different devices might adopt varying data collection procedures, reflecting the diversity and variability in devices. Further complicating factors include user-specific behaviors, which can create unique data environments. Challenges like device-specific characteristics and interactions could further skew the data.
5.8.2. Implications and Challenges of Non-IID Data
Model Convergence Issues: Navigating model convergence becomes challenging with non-IID data. Wang et al. [
48] shed light on this issue, highlighting that an uneven data distribution among nodes leads to slower convergence rates. This prolongs the training process and could diminish the effectiveness of the model.
Performance Degradation: A model’s real-world effectiveness is compromised when it is dominantly trained on non-representative data. In scenarios where such data are prominent, the model, when exposed to varied real-world scenarios, might underperform or yield unpredictable results.
Biased Outcomes: A skewed data source, if dominant during training, can lead to models that are biased towards particular user groups or scenarios. Such bias not only reduces the model’s broad application but also raises ethical concerns. As Briggs et al. note, data biases can inadvertently promote certain groups while marginalizing others, thus compromising the fairness and ethical nature of the model [
80].
5.8.3. Non-IID: Potential Strategies
Addressing non-IID challenges also requires two approaches.
Data Preprocessing: A foundational step towards countering data imbalances is the use of preprocessing measures. Implementing techniques like data augmentation, stratified sampling, and synthetic data generation can equalize the data distribution across nodes, ensuring a more balanced training environment.
Model Aggregation Approaches: Aggregation techniques bear significance in a non-IID environment. Recent methodologies, such as those echoing the principles of FedAvg and FedProx, show promise. These methods aim to address the challenges posed by skewed data, enhancing both model performance and convergence.
5.8.4. Real-World Implications Across Sectors
The practicality of non-IID data settings in FL is seen in numerous sectors. First, the decentralized nature of FL, combined with non-IID data considerations, permits hospitals to collaboratively train models without compromising patients’ confidentiality. Moreover, in the finance sector, FL offers a mechanism to analyze distributed financial datasets securely, ensuring that proprietary and confidential information remains within its respective entities. Lastly, IoT, with its vast array of heterogeneous devices, produces diverse datasets. FL becomes indispensable in such settings, allowing for collaborative model training that respects the unique data’s origins.
5.8.5. Synchronous vs. Asynchronous Updates
Synchronous and asynchronous mechanisms are central to federated learning’s model updates. In synchronous FL, a central server waits for updates from all nodes before aggregating the model. This approach faces issues such as “stale updates” when some devices lag, due to device differences and network reliability [
81].
5.8.6. Privacy and Security in Non-IID Settings
Federated learning focuses on data privacy and efficient communication, but security concerns persist. These include risks to user privacy, model integrity, and data attributes, particularly when the data are unevenly distributed across nodes. The management of non-IID data in FL is complex and requires tailored solutions to address these security and privacy challenges effectively.
6. Algorithms and Frameworks in Federated Learning
Federated learning transforms machine learning by allowing model training across many devices while keeping users’ data private. This section explores the algorithms and frameworks driving FL, crucial for its success and future development. FL algorithms are key in managing decentralized, unevenly distributed data and optimizing communication, unlike traditional models that rely on centralized, consistent datasets. With the growing volume of data from connected devices, FL algorithms, such as Federated Averaging (FedAvg) and Federated Stochastic Gradient (FedSGD), handle data locally and transmit only compressed model updates, enhancing the efficiency and privacy. Understanding these algorithms is vital in leveraging FL’s full potential [
82].
6.1. Federated Averaging (FedAvg)
Federated Averaging, or FedAvg, addresses the need for effective decentralized machine learning [
5]. As data become more distributed across devices, centralizing them for training becomes impractical due to bandwidth and privacy issues. FedAvg provides a solution by allowing local model training on each device, while synchronizing through a global shared model, denoted as
, where
t indicates the communication round. Each client uses methods like Stochastic Gradient Descent (SGD) for local updates, and these updates are averaged at the server to iteratively improve the global model. Recent evaluations of FL algorithms highlight FedAvg’s performance. Nilsson et al. utilized the MNIST dataset, a standard for digit classification, to benchmark algorithms in a client–server setup [
82]. Their comparative analysis, focusing on the communication efficiency and convergence, demonstrated FedAvg’s strong performance, particularly with client communication limited to 10,000, under IID data conditions. This empirical evidence supports FedAvg’s effectiveness in various FL scenarios.
6.1.1. Potential Benefits and Use Cases of FedAvg
FedAvg combines theoretical elegance with practical use by allowing decentralized devices to learn locally and contribute to a global model. This method reduces the need for large data transfers to a central server, saving the bandwidth and speeding up model training. It keeps raw data on devices, enhancing users’ privacy compared to centralized models. FedAvg is ideal for scenarios with many IoT devices, smartphones, and medical devices, where it supports local learning while maintaining data privacy.
6.1.2. Limitations and Challenges of FedAvg
FedAvg faces challenges like differing data distributions across devices and asynchronous updates, which can impact model accuracy and introduce biases. Hyperparameter tuning is critical; misconfiguration can hinder model performance. Additionally, the distributed nature of FedAvg raises security concerns, as sharing model updates across networks can expose the system to adversarial attacks [
83].
6.2. Federated Stochastic Gradient
In decentralized learning, Federated Stochastic Gradient (FSG) is a key technique, combining federated learning (FL) with stochastic gradient methods. It enhances the computational efficiency and supports resilient model training across distributed devices.
Federated Stochastic Gradient Descent (FedSGD) adapts the classical Stochastic Gradient Descent (SGD) for FL, addressing privacy and data distribution issues. In FedSGD, each device performs local SGD computations and sends updates to a central server, which aggregates them to refine the global model [
84,
85]. This approach avoids centralizing the data while improving the model performance.
6.2.1. Algorithmic Steps
The algorithmic procedure can be described by natural language in terms of the following steps [
86].
Initialization: A global model is initialized, and its parameters are shared with all participating clients.
Local Computation: Each client computes the gradient of the model on its local data using SGD.
Model Aggregation: Clients send their computed gradients or model parameters to the central server.
Global Model Update: The central server aggregates these updates (typically by averaging) to adjust the global model.
Distribution: The updated global model is shared back with all clients.
Iterations: Steps 2–5 are repeated until convergence or for a predefined number of communication rounds.
While this mechanism may seem straightforward, several sensitivities, such as handling communication inefficiencies and stragglers and ensuring privacy, play a critical role in its actual implementation.
6.2.2. Challenges
The FedSGD algorithm is critical in decentralized machine learning, with enormous potential but practical limitations. Federated learning (FL) is based on consistent model gradient exchanges between clients and a central server, although network variability can create delays, particularly for larger models. “Stragglers,” or slower nodes, further impede global model updates. Another risk is privacy, as sending model changes may mistakenly reveal sensitive information. Although techniques like differential privacy and encrypted computation offer viable solutions, integrating them with FedSGD while retaining its efficiency and accuracy is challenging [
87].
6.2.3. Comparison with FedAvg
While both FedSGD and FedAvg are techniques in FL that aim to aggregate local model updates, the primary distinction is the method of aggregation. FedSGD focuses more on gradient updates, sending these to the central server for aggregation. In contrast, FedAvg sends the model parameters themselves after local training. Essentially, while FedSGD is gradient-centric, FedAvg is model-centric in its aggregation approach.
6.3. Model Aggregation Methods
In federated learning, model aggregation plays a crucial role by bringing together local updates from different devices or nodes into a single global model. The essence of FL lies in distributing the training of models across numerous devices, followed by consolidating these contributions centrally through model aggregation. This section explores various model aggregation techniques, emphasizing their strengths, challenges, and suitable contexts for application.
6.3.1. Simple Averaging
The most straightforward method of aggregation is simple averaging, prominently used in the Federated Averaging (FedAvg) algorithm [
88]. In this method, updates from each client (e.g., gradient updates or model parameters) are averaged to produce the global update. Mathematically, for
N clients,
where
represents the model parameters from the
ith client.
Advantages:
Challenges:
Assumes IID data across clients, which is often not the case.
Susceptibility to adversarial attacks, as malicious clients can skew the average.
6.3.2. Weighted Averaging
An evolution of the simple averaging method is weighted averaging, where each client’s updates are weighted by the number of samples that it possesses. This approach gives more importance to clients with more data, which can be beneficial in non-IID settings.
where
wi is the weight (often the number of samples) of the
ith client.
Advantages:
Challenges:
6.3.3. Geometric Median Aggregation
Instead of averaging, some aggregation techniques aim to find the geometric median of the model updates. The geometric median offers robustness against adversarial or Byzantine attacks, as it is less susceptible to outliers.
Advantages:
Challenges:
6.3.4. Personalized Aggregation
Recent advances have led to methods for personalized model aggregation, where the global model is fine-tuned or adjusted based on individual clients’ characteristics. This approach acknowledges the heterogeneity of the data and user behaviors across clients.
Advantages:
Challenges:
6.3.5. Other Aggregation Methods
Federated learning involves various model aggregation techniques that unify individual learning experiences into a global model. Basic methods like average aggregation compute the mean of the updates from client nodes, serving as a foundation for more advanced techniques. Clipped average aggregation refines this by limiting the updates within a set range to reduce distortions from outliers. Secure aggregation uses cryptographic methods like homomorphic encryption to protect client data during aggregation. Differential privacy average aggregation enhances the privacy by adding noise to the updates, balancing confidentiality with accuracy. Momentum aggregation includes past model shifts to accelerate convergence, while Bayesian aggregation applies Bayesian inference to address model parameter uncertainties. Adversarial aggregation detects and mitigates malicious updates through outlier detection and anomaly recognition. Quantization aggregation optimizes communication by compressing the model updates, and hierarchical aggregation organizes aggregation in tiers to reduce the overhead by aggregating locally before higher-level synthesis.
Model aggregation is crucial in FL in determining the effectiveness and robustness of the global model. Choosing the appropriate method depends on the federated data’s characteristics, the network architecture, and the deployment needs, with new techniques continuing to be introduced as FL advances [
88].
In
Table 2, we provide a comparison of the key federated learning algorithms: FedAvg vs. alternatives.
6.4. Adaptive Learning Rates in Federated Contexts
In machine learning, the learning rate plays a crucial role as a hyperparameter, determining the size of the steps that an optimization algorithm takes in pursuit of a minimum. If too large, it risks overshooting; if too small, it may become trapped in local minima or converge very slowly. In federated learning contexts, these challenges are accentuated due to the decentralized nature of the data and the necessity to combine diverse model updates from numerous devices. This section examines the adaptive learning rates in federated settings, exploring their importance, the algorithms customized for them, and the associated advantages and obstacles.
6.4.1. Federated Adaptive Algorithms and the Importance of Adaptive Learning in a Federated Setting
Traditional centralized learning benefits from consistent, IID data, but federated learning encounters challenges with non-IID data from diverse devices. This variability requires adaptive learning rates to handle each client’s unique data and model progress. Adaptive mechanisms are essential for quicker and more stable convergence in federated settings [
89,
90]. Two notable adaptive algorithms in FL are as follows.
- -
Federated Adagrad: Enhances the Adagrad algorithm by adjusting the learning rates based on unique client data gradients, ensuring personalized optimization [
91].
- -
Federated Adam: Builds on the Adam optimizer, which combines Adagrad and RMSprop’s advantages, maintaining running averages of gradients and their squares for more balanced learning rate adaptation.
These adaptive methods promise improved convergence and stability but face challenges, such as computational overhead and synchronization issues, particularly with non-IID data. They require careful management to avoid divergence and still necessitate hyperparameter tuning. Research highlights the effectiveness of Adagrad, especially in communication-efficient strategies and diverse datasets. Future FL advancements will need to balance the adaptive learning rates’ benefits with the management of computational and communication complexities [
91].
6.4.2. Federated Adagrad
The Adaptive Gradient Algorithm (Adagrad) is an algorithm specifically designed to improve the convergence performance of machine learning models. It achieves this by adapting the learning rate for each parameter individually. Parameters that are updated infrequently receive larger updates, while frequently updated parameters receive smaller updates. This approach helps to address the issue of diminishing gradients, which can slow down or even prevent convergence during training. Federated Adagrad extends this adaptive learning rate mechanism to the decentralized realm of federated learning. In FL, training data are distributed across multiple devices or servers, and models are trained collaboratively without directly sharing the underlying data. Federated Adagrad enables efficient training in such scenarios by allowing each participating device to adapt the learning rate to its local model parameters.
Challenges in Federated Settings:
The potential for greater communication overheads as both gradients and squared gradient accumulations might need to be communicated.
Differences in the magnitude and direction of client updates, due to non-IID data, can cause issues in global aggregation.
6.4.3. Federated Adam
Adaptive Moment Estimation (Adam) is an optimization algorithm that builds upon the foundation of Stochastic Gradient Descent (SGD) by introducing adaptive learning rates for each parameter. This adaptation is achieved by considering the first and second moments of the gradients, which provide estimates of the mean and variance, respectively. Adam utilizes these moments to dynamically adjust the learning rate for each parameter, leading to potentially faster convergence compared to standard SGD.
The federated version of Adam, known as Federated Adam, incorporates this adaptive learning rate methodology into the distributed setting of federated learning. In FL, the training data are split across multiple devices or servers, and models are collaboratively trained without directly sharing the underlying data. Federated Adam enables efficient training in such scenarios by allowing each participating device to estimate the local first and second moments of the gradients for its model parameters. These local estimates are then aggregated across devices to compute global estimates, which are subsequently used to update the learning rates for all participating devices. This approach facilitates adaptive learning in a decentralized manner, promoting faster convergence and improved model performance in FL settings.
Challenges in Federated Settings:
6.4.4. Asynchronous Methods and Algorithms
FL inherently relies on utilizing data from numerous devices spread across different geographies and functionalities. It is crucial that these devices are not always synchronized. This requirement calls for methods that enable devices or clients to update the global model asynchronously. This section explores the intricacies of asynchronous methods in FL, including their rationale, detailed explanations, a comparison to synchronous approaches, and the associated benefits and challenges.
The diverse array of devices involved in FL necessitates mechanisms capable of accommodating varied network conditions, computational capacities, and availability. Asynchronous methods provide flexibility in this regard, allowing updates to flow more organically, without a rigid structure. Here, we examine the complexities of asynchronous strategies in FL.
6.4.5. Motivation for Asynchronous Updates in FL
The complexities of real-world distributed systems present unique challenges that asynchronous methods can address.
Network Disparities: In environments where devices span a wide geographical area, the network speeds and stability vary drastically. A smartphone in an urban setting with 5G connectivity contrasts starkly with an IoT sensor in a remote location that is reliant on low-bandwidth connections. Asynchronous updates respect these disparities, allowing each device to communicate based on its optimal conditions [
1].
Computational Constraints: The devices participating in FL range from powerful servers to resource-constrained sensors. Expecting them to process and deliver updates concurrently becomes unreasonable. Asynchronous methods provide an avenue for each device to contribute based on its computational pace.
Intermittent Availability: Devices might have sporadic connectivity or might be set to participate only at specific intervals (e.g., during low-usage times). Asynchronous updates accommodate such varied participation patterns, ensuring that every device, regardless of its schedule, contributes to the global model.
6.4.6. Explanation of Asynchronous Algorithms
Asynchronous approaches in federated learning focus on flexibility and adaptability.
Staleness: Updates from slower clients may become outdated by the time they reach the central server, raising questions about how to weigh these older updates against newer ones.
Decentralized Aggregation: Unlike traditional FL, which aggregates client updates in batches after receiving all or most updates, asynchronous methods continuously update the model as new data arrive.
6.5. Comparison with Synchronous Updates
Latency: Asynchronous methods aim to reduce the latency by using the available data immediately, promoting faster model evolution.
Communication Efficiency: They avoid waiting for stragglers, leading to potentially fewer communication rounds and quicker learning.
Convergence: Synchronous updates offer stable convergence patterns, whereas asynchronous updates may face challenges in achieving consistent convergence.
Scalability: Asynchronous methods support large-scale deployment better, handling more devices without significant changes [
92].
6.6. Benefits and Challenges
Benefits
Real-Time Learning: The global model adapts quickly to changes, enhancing its responsiveness and accuracy.
Efficiency: Continuous updates optimize server resource use and minimize idle times.
Scalability: Asynchronous methods easily manage growing networks.
Challenges
Staleness Management: Managing outdated updates requires complex strategies.
Overhead Complexity: Integrating diverse updates can lead to high computational overheads and complex aggregation.
Inconsistency: The global model might reflect transient states rather than providing a consistent network-wide representation.
Embracing asynchronous updates can enhance federated learning, especially in large and diverse networks, but we require robust strategies to address its inherent challenges.
6.6.1. State-of-the-Art FL Algorithms: An Indicative Comparative Benchmark Against FedAvg
This section examines two algorithms: CO-OP and Federated Stochastic Variance Reduced Gradient (FSVRG). We discuss their core ideas, the ways in which they work, and their most important aspects. We also present improved versions of the classic algorithms discussed earlier.
Understanding the theory is only one aspect. The most important quality is how well these algorithms perform in real life. To obtain a clear picture, we explore a benchmarking analysis of real-world studies. Here, CO-OP and FSVRG are considered head-to-head with FedAvg, a popular baseline algorithm in FL.
We explore how well they scale, the types of problems that they are best suited for, and how they compare against FedAvg in real-world scenarios. The goal is to show the strengths and weaknesses of each approach, so that one can determine which of them might be the best fit for one’s needs.
Federated Stochastic Variance Reduced Gradient (FSVRG)
FSVRG, introduced by [
93] and inspired by the Unified Stochastic Fluctuation Reduced Slope, aims to address the inherent challenges of FL, particularly the distributed nature of the data. The core concept behind FSVRG is the mixture of an initial in-depth gradient evaluation with subsequent iterative stochastic updates on each participating client. By doing so, it captures the global data structure while considering the specific intricacies of each client’s dataset, offering a balance between generalization and specialization.
Both FedSGD and FSVRG share the general goal of refining a global model through local client computations. However, the sensitivities of their methodologies offer different perspectives. While FedSGD focuses on synchronizing local updates to shape a centralized model, FSVRG integrates a preliminary variance reduction phase via comprehensive gradient calculations. This strategy enables FSVRG to handle datasets characterized by pronounced variability. Nonetheless, this additional computational layer implies that FSVRG could demand more computational resources than its FedSGD counterpart, especially during the initial gradient assessment and the ensuing stochastic rounds.
The benchmarking studies by Nilsson et al. [
82] underscore the efficiency of FSVRG in federated environments. The results showed that FSVRG eventually outperformed FedAvg in terms of execution, highlighting its adaptability and capability. This empirical evidence demonstrates the strengths of FSVRG in particular contexts.
CO-OP Algorithm
The asynchronous CO-OP protocol arises as a novel approach to the complexities of distributed machine learning, particularly in dynamic environments. Instead of relying on static datasets, as is common with many existing FL frameworks, CO-OP is designed to adapt to dynamically generated data. As users interact with their mobile devices, the individual datasets of these users grow in real time, introducing an added layer of complexity.
Consider a system whereby K mobile clients are actively participating in a cooperative learning process. Each client, governed by the unique interaction patterns of its user, gathers its own set of data. Once a pre-defined threshold (B samples, known as the local batch size) is reached, the client begins the process of refining its local model, employing gradient descent techniques based on these recently accumulated data.
Traditional FL frameworks often employ a more centralized approach, wherein the server periodically prompts specific clients to contribute to model updates. CO-OP, however, champions decentralization by empowering clients to initiate model updates asynchronously. This autonomy allows clients to choose the best time and environment for local training, such as during optimal network conditions. After refining their local models, clients can then liaise with the central server, merging their updates and subsequently obtaining the revised global model.
The global model parameters on the server are denoted as w, with each client’s parameters represented as . Within the CO-OP framework, a critical metric is the age of the global model, indicating the update frequencies of clients. Clients monitor their own model’s age, recalibrated using the CO-OP protocol. Starting with a common model w, clients gather data, fine-tune their models, and synchronize them with the server. CO-OP’s age filter ensures timely and relevant client integration, avoiding outdated updates. As FL evolves, CO-OP addresses both IID and non-IID data, but empirical studies show that it faces strong competition from established algorithms like FSVRG.
6.7. Frameworks and Tools
The implementation and deployment of federated learning have undergone significant improvements thanks to the introduction of specialized frameworks and tools. These tools, designed specifically to handle the complexities of federated architectures, are crucial not only in simplifying the process but also in ensuring strong security and reliability. In the following sections, we examine the three notable frameworks that have played a key role in shaping the FL landscape: TensorFlow Federated, PySyft, and FATE.
6.7.1. TensorFlow Federated (TFF)
TensorFlow Federated arises from the lineage of the TensorFlow framework. As a specialized extension, TFF brings the strengths of TensorFlow into the domain of FL. The framework gives a comprehensive ecosystem, which means that developers have access to a vast library of functions and tools specifically crafted for federated scenarios. One of the standout features of TFF is its local simulation environment. This environment is invaluable for developers, providing them with a sandbox to iteratively refine and test their federated models without the problems of full deployment.
The advantages of TFF are numerous. Its seamless integration with TensorFlow means that developers familiar with TensorFlow can easily transition into the federated realm with minimal difficulty. Additionally, due to the widespread adoption of TensorFlow in the machine learning community, TFF benefits from extensive documentation, community-driven content, and numerous tutorials. However, every tool has its challenges. The richness of TFF can also pose challenges, especially for newcomers. The vastness of the TensorFlow ecosystem can sometimes lead to a steep learning curve. Nonetheless, the rewards upon overcoming this learning curve are substantial.
6.7.2. PySyft
In federated learning, PySyft distinguishes itself by not only providing tools for FL but also for encrypted and private machine learning. This framework expands upon popular deep learning platforms like PyTorch and TensorFlow, facilitating multi-party computation. What makes PySyft unique is its emphasis on decentralized deep learning. Its versatile API is designed with a focus on differential privacy and encrypted computation, ensuring that the privacy concerns inherent to FL are effectively addressed [
94].
PySyft’s versatility is a standout feature. Beyond FL, it serves as a versatile tool for practitioners interested in privacy-preserving machine learning. Its integration with PyTorch, a favorite in the deep learning community, provides a familiar environment, which is always advantageous. However, PySyft, being relatively new compared to TensorFlow, may occasionally exhibit features that are in beta or still being developed [
95]. Nonetheless, the PySyft community, led by OpenMined, is vibrant, passionate, and continually pushing boundaries.
6.7.3. Federated AI Technology Enabler (FATE)
FATE is an innovation in federated learning. It handles all aspects, from preparing data to deploying the trained model. A unique feature of FATE is that it understands that data in FL can come from anywhere and take any form. Thus, FATE is built to work with all types of data sources and computing environments, regardless of their differences.
The scalability of FATE is commendable. Whether it is applied to large-scale datasets or complex models, FATE is constructed to handle them with ease. An emphasis on secure exchange protocols ensures that data security, especially during inter-party exchanges, is never compromised. However, as with any sophisticated tool, FATE comes with its own set of challenges. The broad range of features that it offers can sometimes be daunting for beginners, translating to a steeper learning curve. Nonetheless, as the community around FATE grows, and as more practitioners adopt it, it is expected that the collective knowledge will make the adoption of FATE smoother.
In
Table 3, we provide a comparison of these federated learning frameworks: TFF vs. PySyft vs. FATE.
7. Attack Strategies
Federated learning offers a promising approach to model training across multiple devices while maintaining data privacy by keeping them localized. It aims to address the privacy concerns associated with centralized machine learning models. However, FL is vulnerable to attacks, particularly model inversion attacks, which pose significant risks. Understanding these vulnerabilities and potential exploits is essential in securely deploying and scaling FL systems. This section examines various attack strategies, the risks that they present in the context of FL, and mitigation techniques to strengthen FL systems against these threats.
7.1. Contextualizing Attacks in FL
FL has revolutionized machine learning by decentralizing data processing to enhance privacy and reduce large-scale data transfer. While it keeps data local and shares model updates, this decentralized approach brings its own set of challenges. FL’s multiple nodes introduce new vulnerabilities, expanding the risk of adversarial attacks. Malicious nodes can disrupt the training process by sending biased or erroneous updates, a tactic exemplified by Byzantine attacks [
96]. Despite these risks, tools like Varma et al.’s “Legato” algorithm offer solutions, focusing on layerwise gradient aggregation to mitigate the impact of malicious nodes [
97].
7.2. Significance of Addressing Attack Vectors
Federated learning allows the training of machine learning models on many devices without sharing raw data, enhancing the privacy but introducing new security challenges. For instance, deep learning, commonly used in FL, is susceptible to adversarial attacks, where small changes to data can mislead models, such as altering a stop sign to a yield sign in self-driving car training data [
98]. As FL evolves, so must the defenses against vulnerabilities like white-box attacks, where attackers exploit model information. Researchers are developing protective measures, such as methods to detect and remove poisoned data. While FL safeguards privacy, ongoing security improvements are crucial.
7.3. Model Inversion Attacks
One of the predominant concerns in the security of FL revolves around model inversion attacks. These attacks essentially calculate the outputs of a machine learning model to infer and reconstruct its training data. As FL often deals with sensitive data distributed over multiple nodes, understanding and mitigating such attacks becomes especially vital.
7.3.1. Implications of Model Inversion in FL
One of the abilities of federated learning is data localization, where each node retains its data and only shares model updates for aggregation, thus enhancing the privacy by keeping the raw data local. However, this decentralized approach introduces vulnerabilities, particularly through model inversion attacks. Even without direct access to raw data, attackers can infer details from aggregated model updates, like piecing together a puzzle. For instance, in a healthcare FL model, such attacks could reveal sensitive patient information, undermining the trust in FL. Additionally, vulnerabilities in individual nodes could expose broader insights, emphasizing the need to address the potential security risks in FL [
98].
7.3.2. Mitigating Model Inversion Attacks
Model inversion attacks, which seek to reverse-engineer and reconstruct private training data from trained models, highlight the critical need for robust mitigation strategies to protect federated learning systems. These attacks exploit the detailed information retained by overfitted models, which capture not only general patterns but also the specific sensitivities and irregularities of the training data, making them particularly vulnerable.
To mitigate these risks, it is essential to prevent models from overfitting [
99]. Key strategies include the following.
Regularization Techniques: Methods such as L1 and L2 regularization add a penalty to the loss function, discouraging models from focusing too much on any single feature and promoting a more generalized understanding of the data.
Validation Protocols: Implementing rigorous validation checks during training helps to monitor model performance on unseen data, allowing the early detection and prevention of overfitting.
Data Augmentation: By increasing the diversity of the training data through augmentation techniques, models are less likely to memorize specific data points and more likely to learn generalized patterns.
Recent advancements, such as the “ResSFL” framework, offer sophisticated strategies tailored to FL scenarios [
100]. Integrating these strategies with fundamental modeling principles bolsters FL systems’ defenses, safeguarding data privacy and resilience. Ongoing innovations will require vigilance and adaptability to stay ahead of potential threats.
7.4. Membership Inference Attacks
7.4.1. The Nature of Membership Inference
Membership inference attacks exploit machine learning models’ tendency to unintentionally memorize training data. These attacks probe models with specific inputs and analyze confidence levels. High-confidence responses suggest that the input data were part of the training set, revealing a privacy vulnerability in complex models like deep neural networks [
101,
102].
7.4.2. Threats to Data Privacy and Impacts
These attacks pose significant privacy risks, particularly in sensitive sectors like healthcare and finance, potentially leading to bias, societal exclusion, reputational damage, and loss of trust. They also erode public confidence in technology, emphasizing the need for responsible data use [
103,
104].
7.4.3. Strategies to Counter Membership Inference Attacks
In machine learning, membership inference attacks have become a significant threat to data privacy. Addressing these attacks requires a strong understanding of their nature and a comprehensive defensive strategy [
103].
Embracing Differential Privacy: One of the most effective defenses against membership inference attacks is differential privacy. Based on a strong mathematical framework, differential privacy introduces calibrated noise to a model’s outputs. This could be during its training phase or even after training has been completed. The main idea is to add a degree of randomness such that attackers find it nearly impossible to determine whether a specific data point was in the training set.
Data Sanitization: Before any training begins, it is crucial to rigorously clean and sanitize the data. This ensures that patterns or identifiable markers are eliminated. Data anonymization techniques, such as k-anonymity, help in masking specific attributes, making datasets more homogeneous. The l-diversity approach, on the other hand, ensures that sensitive attributes in the data are diverse enough to prevent the singling out of individual records. Together, these techniques make the identification of individual data points a much more complex task.
Model Auditing: Data security is an ongoing challenge. To stay ahead of attackers, regular checkups are essential. These can be performed by one’s own security team or outside experts. They can be considered as practice rounds against membership inference attacks. By simulating these attacks in a safe environment, we can find weaknesses in our machine learning models before they are exploited in reality. This provides us with time to fix the problems and make our defenses even stronger.
Regularization Techniques: A model that fits well to its training data is a model that is suitable for exploitation. Regularization techniques, like L1 and L2 regularization, add penalty terms to the model during the training process, discouraging it from becoming overly reliant on any single attribute. Techniques like dropout, where random neurons are “dropped out” during training, ensure that the model generalizes better, reducing the risk of overfitting and the consequent susceptibility to attacks.
Output Aggregation: Instead of relying on a single model’s prediction, aggregating the outputs from multiple models can be an effective strategy. Ensemble methods, which combine predictions from different models, tend to generalize better. By pooling insights, the resulting predictions are not only more accurate but also less revealing about the specific sensitivity of the training data.
Reduced Model Precision: In the complicated process of machine learning, sometimes less can be more. Reducing the precision of the model weights and outputs introduces an element of controlled uncertainty, which can be used to enhance the privacy. Techniques like quantization, which limit the precision of the model parameters, create an environment where attackers, even if they gain access to the model, face significant challenges in making precise inferences about the underlying data.
Here, “precision” refers to the number of bits used to represent a number. By reducing the precision of the model weights and outputs, we introduce a degree of vagueness into the model’s calculations. This vagueness acts as a form of protection, as it makes it more difficult for attackers to extract sensitive information from the model. Even if an attacker were to gain access to the model itself, the reduced precision would blur the results, making it a challenging task to draw accurate conclusions about the training data.
Model Architectural Decisions: The architecture of a machine learning model can be a source of vulnerability or strength. By choosing architectures that are leaner, with fewer parameters or layers, there is less room for data leakage. The choice of architecture can serve as the first line of defense against membership inference attacks.
User Awareness and Education: Beyond algorithms and architectures lies the human factor. Keeping users in the loop, educating them about the risks, and ensuring that they are well informed can be as crucial as any technical measure. A well-informed user is a vigilant user, and their understanding and cooperation can be instrumental in creating robust defense against data breaches.
Together, these strategies form a multi-layered defense system against membership inference attacks, emphasizing both technical excellence and ethical responsibility. The counteraction of membership inference attacks represents a mixture of technical rigor, strategic planning, and ethical responsibility.
7.5. Eavesdropping and Man-in-the-Middle Attacks
The continuous exchange of data across networks is inherently associated with risks as attackers aim to intercept and exploit this information. Among the most significant of these risks are those posed by eavesdropping and man-in-the-middle (MitM) attacks, which are especially concerning in scenarios involving sensitive data. Due to its decentralized structure and dependence on communication between numerous nodes, federated learning is particularly vulnerable to such security threats.
7.5.1. Characterizing Eavesdropping in a Federated Setting
The unauthorized interception of communications in cyber-security, especially within a federated learning setting, is known as eavesdropping. In contrast to centralized systems, FL allocates learning among local devices and only sends model updates to a central server. This design presents various communication points, all susceptible to interception. Experienced attackers might be able to deduce sensitive information from intercepted communications, thereby compromising FL’s privacy objectives. Furthermore, man-in-the-middle (MitM) attacks present a live risk, as attackers are able to intercept, change, or reroute communications, leading to major threats to the reliability and safety of the FL procedure [
105].
Alter Model Updates: Introducing slight biases or modifications in the model updates being sent to the central server. Over time, these could skew the global model in malicious or unintended ways.
Introduce Malicious Instructions: For federated systems that rely on dynamic model structures, attackers could potentially alter the model architectures or parameters, leading to compromised nodes.
Data Deception: By modifying the communicated updates, attackers could deceive the central server about the nature of the data at the edge, leading to models that might be ineffectual or even counterproductive.
In federated settings, eavesdropping and man-in-the-middle (MitM) attacks present a dual threat. On one hand, there is the ongoing risk of data inference through eavesdropping, which is passive yet persistent. On the other hand, MitM attacks pose an active and potentially catastrophic threat that could undermine the core principles of FL. Effectively safeguarding against these threats necessitates an in-depth understanding of their complexities, followed by the implementation of strong countermeasures [
106].
7.5.2. Potential Damages and Consequences
As federated learning becomes increasingly popular for its decentralized approach, concerns about eavesdropping and man-in-the-middle (MitM) attacks have gained significance. FL allows multiple edge devices to collaboratively learn a shared model while keeping the data local, but this distributed setup introduces vulnerabilities. Eavesdropping in FL involves the unauthorized interception of communications between client nodes and the central server. While raw data are not shared, model weights, gradients, and updates are transmitted, which an eavesdropper could exploit to infer information about the original training data. Over time, a persistent eavesdropper could develop a shadow model, approximating a client’s data distribution.
MitM attacks are even more severe. An attacker could intercept, alter, and forward communications between clients and the server, manipulating gradients or model updates. This could poison the global model, causing it to degrade or behave unpredictably, potentially leading to global model drift. Such attacks can also undermine aggregation mechanisms like Federated Averaging, skewing the model in unintended directions, especially when differential privacy is involved. These attacks pose significant risks to data privacy, potentially revealing sensitive information like personal identifiable information (PII). Moreover, MitM attacks can disrupt consensus protocols in federated settings, leading to delays or halting the learning process. To protect FL systems, advanced cryptographic techniques, secure aggregation methods, and robust consensus protocols are essential [
105].
7.6. Secure Communication Protocols
FL, with its decentralized design, promises new horizons in innovation and collaboration, but it also exposes systems to vulnerabilities, especially from eavesdropping and MitM threats. To counteract these vulnerabilities, the importance of secure communication protocols cannot be overstated. End-to-end encryption forms the foundation of these protocols, promising foundational privacy. By encrypting data directly at their source and only decrypting them at their destination, this mechanism ensures that intercepted communications remain unintelligible. Furthermore, the dynamic nature of modern encryption algorithms ensures that they can adapt to and combat emerging threats, providing a continually evolving protective barrier. However, encryption alone is not sufficient. This is where the Public Key Infrastructure (PKI) becomes relevant, introducing a robust framework of trust. PKI operates based on the validation provided by certificate authorities (CAs) to verify the authenticity of participants. This added layer of trustworthiness means that MitM attackers find it exceptionally difficult to impersonate genuine nodes, thanks to the rigorous verification steps that PKI entails [
107].
Another crucial aspect is secure aggregation. Instead of transmitting raw model updates, which could inadvertently reveal patterns to attackers, nodes employ cryptographic techniques. They send aggregated and encrypted summaries, which remain indecipherable until they reach the central server. There, the processes of aggregation and decryption unfold. This approach ensures that individual node updates remain shielded from potential eavesdroppers. However, the pursuit of security does not end here. Continuous authentication protocols introduce dynamic trust verification. Regular authentication checks throughout communication sessions improve the security by discovering anomalies early on and preventing man-in-the-middle (MitM) attacks. Using virtual private networks (VPNs) or secure tunnels enhances the data security by encrypting and encapsulating packets, lowering the risk of eavesdropping. The human factor is also important: participants can become vulnerabilities if they are not educated about secure communication and threat recognition. Regular training and an emphasis on cyber-hygiene can transform participants into watchful defenders, preventing them from becoming vulnerabilities. A complete methodology that combines modern technology with proactive human involvement maintains the integrity and security of federated learning systems.
7.7. Data Poisoning
7.7.1. Introduction to Data Poisoning in FL
Data poisoning poses a significant threat in machine learning, particularly within federated learning (FL), where model training is decentralized. FL enhances privacy and efficiency but lacks centralized data quality controls, making it susceptible to malicious nodes injecting corrupted data or model updates. Attackers often employ subtle manipulations, making minor changes that evade detection yet degrade the model integrity over time [
108]. For example, in facial recognition, slight alterations to genuine images can misguide the model. Coordinated attacks by multiple compromised nodes further amplify the risks, allowing attackers to drive the central model toward harmful objectives and undermining FL’s resilience.
7.7.2. Defense Mechanisms Against Poisoning
Defending against poisoning attacks in FL demands a blend of sophisticated techniques and human vigilance. One of the primary defense pillars is model validation combined with anomaly detection. Regularly measuring the global model’s performance against trusted benchmarks or validation datasets ensures that anomalies in accuracy or behavior are swiftly found. Simultaneously, by utilizing advanced statistical or machine-learning-driven anomaly detection methods, the system can monitor each node’s updates. Nodes that consistently display unexpected or erratic contributions can be singled out for deeper scrutiny [
109].
The Byzantine fault tolerance concept arises as a crucial asset. Initially crafted to uphold system reliability even in the face of malicious nodes, its principles can be seamlessly integrated into FL. This ensures that, even if certain nodes are compromised, the overall system remains resilient, preventing malicious inputs from derailing the global model [
110].
Directly overseeing the nature of node updates offers another layer of defense. Techniques such as gradient clipping place a limit on the magnitude of updates, ensuring that no single node exerts a disproportionate influence over the model’s trajectory. In the same way, normalization practices guarantee that updates from all nodes adhere to an anticipated range, negating the possibility of the model taking a drastic turn due to tainted data [
111].
However, trust within a decentralized setup does not need to be all-or-nothing. Introducing reputation systems can transform trust into a dynamic attribute. Nodes earn scores reflecting their historical behaviors and contributions. Those persistently associated with questionable updates might witness a decline in their trustworthiness ratings [
111]. As this system matures, nodes with dwindling trust scores could face increased scrutiny and, in severe cases, might even be sidelined to avoid influencing the global model.
7.8. Backdoor Attacks
7.8.1. Understanding the Nature of Backdoor Attacks
Backdoor attacks in cyber-security are stealthy intrusions that avoid disrupting machine learning models’ performance under normal conditions. Instead, they introduce latent triggers, only activating under specific circumstances to serve an attacker’s goals while undetected [
112]. For instance, a facial recognition system could be subtly compromised with a backdoor that responds to a particular pattern, like a specific hat. When the trigger pattern appears, the model misidentifies the individual, potentially enabling unauthorized access. These attacks are especially dangerous because they are difficult to detect and can be triggered at critical moments, facilitating targeted disruptions and security breaches.
7.8.2. Implications of Backdoor Attacks in FL
Federated learning uses a decentralized approach to enhance the data privacy and computational efficiency. However, this decentralization also introduces vulnerabilities, particularly to backdoor attacks. In FL, nodes independently train models on local data and send updates to a centralized server, assuming that each node operates with integrity. This contrasts with centralized systems, where backdoor attacks typically involve manipulating a central dataset. FL’s structure, with numerous independent nodes, increases the attack surface as each node could potentially be compromised. A malicious node can subtly introduce tainted updates, affecting the global model without immediate detection. The aggregation process in FL, notably federated averaging, is susceptible to sophisticated attackers who can craft precise updates that blend with legitimate ones. These malicious updates can embed backdoors into the global model, which then functions normally for most inputs but triggers malicious behavior for specific inputs. FL’s commitment to keeping raw data localized enhances its privacy but complicates backdoor detection. Unlike centralized systems that have access to raw data for comprehensive auditing, FL’s central server only receives aggregated updates, lacking the granularity needed to detect anomalies. This opacity hinders the identification of tainted contributions from individual nodes. In summary, while FL’s decentralized nature offers significant benefits, it also presents unique challenges, particularly in defending against backdoor attacks. Recognizing these challenges is crucial in strengthening FL systems against such covert threats [
113].
Centralized learning systems, with their unrestricted access to raw data, can employ a variety of auditing and anomaly detection tools. In contrast, FL faces limitations in this area. The central server, limited to aggregated updates, lacks the detailed insight provided by raw data. This lack of transparency complicates the identification of the origin of each update, making it difficult to determine whether a node’s contribution has been compromised by an embedded backdoor [
36].
7.8.3. Overcoming the Threat of Backdoor Attacks
Federated learning faces significant challenges due to backdoor attacks, necessitating robust countermeasures. The FL community is actively pursuing strategies to enhance the security, using technological advancements and collaboration. Differential privacy is a key method, introducing statistical noise to node updates to obscure data and reduce the effectiveness of backdoor triggers, thereby complicating attackers’ efforts. Robust aggregation methods further strengthen FL by vetting node updates, considering both content and historical reliability. This diminishes the impact of suspicious contributions. Current countermeasures often exclude benign models from aggregation, particularly those with diverse data distributions, leading to underperformance. To solve this, DeepSight was established, which uses unique methodologies to analyze the data distribution and identify variations in the model structure and outputs. DeepSight detects and removes tainted model clusters, while the current defenses combat any leftover poisoned models. Anomaly detection is critical, requiring complex algorithms to detect irregularities in node updates. These technologies function as early warning systems for potential backdoors. Routine validation against trustworthy datasets ensures that the global model’s behavior matches expectations, identifying unusual variations as potential backdoor signs. Model interpretability gives an additional degree of transparency by assisting in identifying unexplainable behaviors or biases that indicate the presence of a backdoor. FL relies heavily on a culture of transparency and collaboration. Encouraging nodes to share observations and insights while maintaining data privacy is crucial. This networked vigilance ensures a united defense against backdoors, using the collective expertise of the FL ecosystem. With innovative techniques, shared vigilance, and continuous refinement, FL can navigate the complex interaction of decentralized data processing and backdoor threats, balancing its decentralized nature with protective mechanisms [
114,
115,
116].
7.9. Sybil Attacks
Sybil attacks received their name from John Douceur’s 2002 work, introducing the concept within peer-to-peer networks. The term references the book Sybil, representing the idea of one entity assuming multiple identities [
117]. Historically, these attacks have challenged decentralized systems, from early file sharing to blockchain. In distributed systems, where trust and authenticity are critical, Sybil attacks exploit these by creating numerous fake identities. This allows attackers to dominate decision making, sway consensus, and distort information sharing. Such manipulation degrades the trust that is essential to distributed systems, undermining their cooperative nature [
118,
119].
Defense Protocols Against Sybil Attacks in FL
Navigating the FL landscape necessitates not only the appreciation of its decentralized nature but also keen awareness of the underlying threats, which include Sybil attacks. As FL systems grow in complexity and adoption, ensuring the integrity and authenticity of the participating nodes becomes paramount. With this goal in mind, several defense protocols have emerged to counter the threat of Sybil attacks.
One of the foundational approaches in this regard is the implementation of strong identity verification protocols. In an FL environment, the very act of onboarding a node carries huge significance. Ensuring that every node, before it can participate and contribute updates, undergoes a rigorous identity verification process can serve as the first line of defense. This involves not only traditional authentication mechanisms but can also include cryptographic methods, hardware attestations, or even third-party verifications. The primary objective is clear: to validate the legitimacy of a node before it becomes an active participant in the learning process.
Identity verification, while vital, might not suffice on its own, especially if an attacker manages to bypass this first gate. This necessitates a second layer of defense in the form of rate limiting. By controlling and limiting the number of updates or the participation rate for newly onboarded or untrusted nodes, FL systems can diminish the potential impact of malicious entities. For instance, a new node’s updates could be given lesser weight in the global model until it establishes a track record of consistent and genuine contributions. This ensures that, even if a Sybil node were to infiltrate the network, its influence would be curtailed, at least until it gains undue credibility [
120].
While these proactive measures build a strong defense, they are complemented by reactive strategies centered around behavioral analysis. In FL, where data remain localized but updates are shared, monitoring the nature, pattern, and timing of these updates can reveal important information. By employing advanced analytics and machine learning techniques, it is possible to discern patterns consistent with Sybil attacks, such as suspiciously synchronized updates from multiple nodes. Detecting such anomalies in real time can allow for immediate intervention, either in the form of isolating the suspected nodes or subjecting their updates to further scrutiny.
8. Discussion
This article examines the fundamentals, methods, advancements, and uses of federated learning, with a specific emphasis on its incorporation in IoT settings. The goal was to examine the benefits that FL provides with its decentralized structure and also to recognize the difficulties and weaknesses that come with this new approach. Our paper emphasizes FL’s ability to improve data privacy and efficiency in machine learning and also highlights the significant security threats linked to decentralized systems. It explores the equilibrium between customization and privacy in federated learning apps, the requirement for teamwork and standardization among FL networks, and the significance of continued alertness toward evolving cyber-threats.
The answers to the research questions stated in the Introduction are given below.
RQ1: Balancing the trade-offs between data privacy, security, and communication costs while retaining model performance calls for a comprehensive strategy in federated learning (FL). Privacy-preserving approaches, such as differential privacy and homomorphic encryption, ensure that sensitive data are protected during the training process, although they sometimes entail a cost in terms of the computing overhead or model accuracy. To address this, secure aggregation methods are used to protect model updates while not dramatically increasing the communication costs. On the other hand, methods such as FedAvg are frequently used to reduce the communication costs by reducing the frequency of data exchange, although these techniques may struggle with extremely heterogeneous, non-IID data. Integrating advanced privacy-preserving approaches with communication-efficient protocols provides a pathway to retaining robust model performance while ensuring security and cost-effectiveness in large-scale FL systems.
RQ2: Federated learning (FL) faces significant security challenges that must be addressed to ensure robust deployment in real-world applications. One of the primary concerns is adversarial attacks, where malicious actors can manipulate local model updates or inject poisoned data into the training process, ultimately degrading the global model’s performance. Model poisoning and data poisoning are critical threats that expose vulnerabilities in the aggregation phase. Additionally, inference attacks, such as model inversion and membership inference, threaten the privacy of individuals by attempting to reconstruct sensitive data from shared model updates. To mitigate these risks, techniques such as homomorphic encryption, differential privacy, and secure multi-party computation (SMPC) are employed, ensuring that data remain protected even in the presence of adversaries. However, balancing these security measures with system scalability and performance remains a crucial challenge for real-world FL deployments.
Considering previous works, we point out the following. In [
121], it is mentioned that FL is vulnerable to inference and poisoning attacks, with challenges in balancing efficiency and security, as privacy-preserving protocols often compromise the performance while protecting sensitive data from breaches and malicious servers. In [
122], it is shown that FL faces vulnerabilities due to inference and poisoning attacks, compromising data privacy and performance. A malicious server can further threaten systems’ integrity, and balancing efficiency with strong security remains a challenge due to the trade-offs in privacy-preserving protocols. In [
123], the PEFL framework presented faces challenges in balancing privacy and security, with privacy-preserving methods hindering poisoning attack detection. Its reliance on homomorphic encryption creates a large computational overhead, and, while effective against specific attacks, it may not cover all vectors. Implementation is also complex. In [
124], the FedPD algorithm addresses cross-client interference in open-set recognition but lacks focus on privacy or security mechanisms, a key concern in FL applications. It introduces the LPD and GDCA techniques but does not cover adversarial attacks or detailed IoT use cases. The study in [
125] addresses challenges in training complex models, including biases due to the architecture and data. Its federated learning focus limits its generalizability to broader contexts. Techniques like active learning and knowledge distillation may show variable effectiveness, and the efficacy analysis lacks consideration of all relevant variables. In [
126], it is shown that FL enhances privacy by keeping data local but faces challenges in ensuring complete security. Communication bottlenecks can occur, especially with large models, and the performance may lag compared to centralized models due to data heterogeneity. The need for GDPR compliance further complicates FL’s implementation. Our paper offers a specific focus on advanced cryptographic techniques like homomorphic encryption, differential privacy, and SMC to address security and privacy in FL. It also extensively covers IoT applications, GDPR compliance, and defense against adversarial attacks. While not as broad as other surveys, its strength lies in addressing critical intersections among security, privacy, and legal frameworks, making it highly relevant to real-world FL challenges.
The results indicate that FL offers a promising solution to the privacy issues found in centralized machine learning models, but it also brings security issues. Some of the challenges include being vulnerable to risks such as eavesdropping and Sybil attacks, which may undermine the integrity of the learning process. Our paper confirms the theory that proactive defense mechanisms are crucial in preserving trust and security in FL networks. FL’s decentralized nature naturally ensures data privacy by storing data on individual devices; however, it also demands strong security measures to manage the potential risks. This study highlights the significance of consistent surveillance, the verification of identities, and promoting a culture of cooperation and openness in order to avoid and uncover security breaches. Furthermore, the balance between providing tailored experiences and protecting data privacy is a key area of attention. With the increasing sophistication of FL algorithms, it is crucial to ensure a careful balance between these two objectives to prevent sensitive information from being compromised.
In contrast to traditional centralized models, FL implements a system whereby security is spread among several nodes, all of which may be susceptible to attacks. Prior research has identified comparable security concerns but frequently emphasized defensive tactics in response. This study builds on this discussion by promoting a proactive stance on security in FL, stressing the importance of ongoing monitoring and adaptive defense mechanisms in light of the ever-changing cyber-security environment. The results aid in explaining decentralized machine learning’s value, especially for IoT applications. FL disrupts traditional beliefs about centralized data and enables the exploration of secure, privacy-focused machine learning models. Our study underscores the importance of gaining more profound insights into how personalization and privacy intersect in FL, representing a delicate equilibrium that has the potential to reshape the limits of data-driven technologies. FL’s decentralized structure is ideal for industries like healthcare and urban planning, where data privacy is crucial. However, in order to exploit the full potential of FL, there is a need to create standardized protocols and frameworks to guarantee compatibility among various systems. The proactive defense tactics mentioned in our paper are crucial in safeguarding the security of FL networks, especially as they grow in popularity and attract more advanced threats.
The ethical implications of FL, particularly concerning data privacy and the potential for bias in decentralized models, warrant careful consideration. Ensuring that FL systems are designed with fairness and transparency in mind will be critical for their acceptance and success. FL’s impact extends beyond the technical realm, influencing how data are handled and protected in a wide range of industries. Its success could lead to a shift in how society views data privacy, potentially driving new standards and regulations that prioritize user consent and data security.
Future Implications and Directions
In the future, it is important for research to focus on creating adaptive FL models that can respond to changing security threats. More research is necessary to examine the relationship between customizing services and safeguarding privacy, especially in critical settings such as healthcare. Moreover, exploring standardized protocols for FL systems may promote increased collaboration and compatibility, minimizing the risk of fragmentation in the FL environment. Numerous unanswered questions persist about achieving ideal personalization while safeguarding data privacy and ensuring that FL systems can withstand advanced cyber-threats. Answering these questions will be essential for the ongoing progress of FL. Further investigation into FL’s wider use cases, like smart cities and autonomous systems, has the potential to reveal possibilities for decentralized machine learning. Furthermore, it will be crucial to conduct research on the regulatory consequences of FL, specifically within the topic of data protection legislation, in order to guarantee its sustainability in the long run. This paper has brought attention to FL’s dual role as a valuable tool to improve data privacy and as a possible target for advanced cyber-attacks. It is essential to implement proactive defense strategies, collaboration, and standardization in order to fully exploit FL’s capabilities. FL’s capacity to safeguard data confidentiality while facilitating decentralized machine learning proves to be an essential advancement for big data and IoT technologies. However, its success will depend on the ongoing efforts of researchers, developers, and policymakers to address its inherent challenges. More efficient and scalable privacy-preserving mechanisms, such as homomorphic encryption and secure multi-party computation (SMC), continue to be a major research focus. The current methods are either computationally intensive or degrade the model performance. The future of FL is anticipated to involve the development of lighter encryption methods and more efficient, safer aggregation systems that can operate in real time, particularly in resource-constrained situations such as IoT. Many existing FL techniques assume IID data, although this is rarely the case in real-world applications. Because non-IID data remain one of the most significant challenges, future research should focus on developing adaptive aggregation approaches and innovative algorithms that can manage skewed data distributions across a wide range of devices. Future FL systems must be resistant to adversarial attacks such as model poisoning and data inversion, while still preserving players’ trust. Research on federated trust mechanisms, such as reputation-based systems and blockchain for federated learning, could be critical to safeguarding FL settings. Communication constraints continue to be one of FL’s most significant issues. Future research should concentrate on asynchronous communication protocols, model compression approaches, and adaptive learning rate optimization in order to reduce the volume of data transferred while maintaining model correctness and fairness between nodes.
As FL evolves, there is an opportunity to change how machine learning is implemented in various industries. By acknowledging the potential benefits and limitations, the FL community can lead the development of a safe, distributed, and cooperative machine learning setting.
9. Conclusions
Our paper addresses various FL topics but is constrained by the ongoing development of current research. We suggest that future research could focus on developing new FL algorithms that protect privacy and are resilient to adversarial attacks. Furthermore, there is a definite requirement for more research on the relationship between customization and confidentiality, as well as the creation of uniform procedures to support cooperation among various FL platforms. While our paper provides an extensive review of federated learning, the goal was also to identify research gaps that future work can address.
Based on our analysis of the literature and the current trends in FL, we identify the following gaps.
Gap 1: Insufficient Resistance to Adversarial Attacks. Although numerous strategies, including differential privacy and safe aggregation, have been presented, there is still no agreement on how to properly secure FL models against adversarial assaults. Research in this field is still dispersed, and existing solutions frequently compromise the performance or scalability.
Gap 2: Difficulties in Managing Non-IID Data. Non-IID data present substantial obstacles to FL’s performance. Most known algorithms, such as FedAvg, assume IID data distributions, and solutions for non-IID circumstances are still in their early stages.
Gap 3: Scalability Issues in Real-World Applications. While FL has been successfully deployed in small-scale settings, there is a lack of study on its scalability over large networks, particularly in resource-constrained situations such as IoT devices.
This paper mainly targeted the current FL applications and recognized the security risks, possibly neglecting upcoming trends and future obstacles. These limitations indicate that the findings may not fully capture the future directions of FL, especially as new technologies and threats emerge. This study’s emphasis on the present difficulties could also restrict its relevance to upcoming advancements in the field. In order to overcome these constraints, upcoming works should integrate new studies and investigate innovative uses of FL that are starting to become popular. Furthermore, conducting research on proactive defense strategies in actual FL implementations could offer useful perspectives.