FedBeam: Reliable Incentive Mechanisms for Federated Learning in UAV-Enabled Internet of Vehicles

Hu, Gangqiang; Zhu, Donglin; Shen, Jiaying; Hu, Jialing; Han, Jianmin; Li, Taiyong

doi:10.3390/drones8100567

Open AccessArticle

FedBeam: Reliable Incentive Mechanisms for Federated Learning in UAV-Enabled Internet of Vehicles

by

Gangqiang Hu

¹

,

Donglin Zhu

¹

,

Jiaying Shen

¹

,

Jialing Hu

¹

,

Jianmin Han

^1,*

and

Taiyong Li

^2,*

¹

School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China

²

School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu 611130, China

^*

Authors to whom correspondence should be addressed.

Drones 2024, 8(10), 567; https://doi.org/10.3390/drones8100567

Submission received: 9 September 2024 / Revised: 2 October 2024 / Accepted: 7 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue Trends in Embodied-Intelligence Unmanned Vehicle Technology and Applications of Intelligent Transport Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Unmanned aerial vehicles (UAVs) can be utilized as airborne base stations to deliver wireless communication and federated learning (FL) training services for ground vehicles. However, most existing studies assume that vehicles (clients) and UAVs (model owners) offer services voluntarily. In reality, participants (FL clients and model owners) are selfish and will not engage in training without compensation. Meanwhile, due to the heterogeneity of participants and the presence of free-riders and Byzantine behaviors, the quality of vehicles’ model updates can vary significantly. To incentivize participants to engage in model training and ensure reliable outcomes, this paper designs a reliable incentive mechanism (FedBeam) based on game theory. Specifically, we model the cooperation problem between model owners and clients as a two-layer Stackelberg game and prove the existence and uniqueness of the Stackelberg equilibrium (SE). For the cooperation among model owners, we formulate the problem as a coalition game and based on this, analyze and design a coalition formation algorithm to derive the Pareto optimal social utility. Additionally, to achieve reliable FL model updates, we design a weighted-beta (Wbeta) reputation update mechanism to incentivize FL clients to provide high-quality model updates. The experimental results show that compared to the baselines, the proposed incentive mechanism improves social welfare by 17.6% and test accuracy by 5.5% on simulated and real datasets, respectively.

Keywords:

incentive mechanism; internet of vehicles; federated learning; reputation management; game theory

1. Introduction

With the rapid advancement of artificial intelligence (AI) and unmanned aerial vehicle (UAV) technology, traditional remote sensing and data collection methods are progressively transforming into UAV-assisted intelligent sensing systems (UASs) [1]. A UAS represents an integrated system that leverages the mobility, coverage, and deployment capabilities of UAVs, along with their built-in sensors, communication systems, and edge computing resources, to facilitate data-driven applications in environmental monitoring, disaster response, and agriculture [2]. Furthermore, advancements in machine learning algorithms, coupled with the extensive data collected by UAVs and enhanced computational capabilities, enable the development of AI-based intelligent systems, thereby improving the efficiency and effectiveness of various applications [3]. In light of the increasing demands of contemporary smart city applications, aerial platforms, particularly UAVs, have become essential. UAVs are extensively utilized for data collection and computational offloading in urban settings due to their high mobility, flexible deployment, and cost-effectiveness, providing superior coverage compared to ground-based systems [4]. Within the context of the Internet of vehicles (IoV), UAVs play a pivotal role [5]. They facilitate real-time data collection, serve as communication relays in areas with weak signals, and enhance road monitoring and safety. In emergency situations, UAVs offer rapid site access and real-time data support to emergency responders. Overall, UAVs significantly improve data collection, communication, and traffic management within intelligent transportation systems.

Data sharing plays a vital role in enhancing model performance during the model training process for drone companies. However, the increasing emphasis on data privacy and the enforcement of stringent regulations, such as the General Data Protection Regulation (GDPR), present significant challenges for data sharing [6]. GDPR mainly includes content such as data protection principles, user consent, and right to access and erasure. To address these issues and enable effective collaborative model training while ensuring privacy protection, federated learning (FL) has emerged as a solution [7]. FL enables drone companies to train a global model collaboratively without exchanging raw data, thereby improving the performance of machine learning models while preserving data privacy. This approach complies with privacy protection requirements and promotes data collaboration and innovation among drone companies. By integrating UAVs with FL, the UAV acts as the central server, enabling vehicles to serve as clients for model training. Additionally, multiple UAVs can collaborate [8].

However, current research on applying FL within the domain of drones largely operates under the assumption that all participants willingly contribute their data [4,9,10]. This voluntary participation is presumed consistent and cooperative, with little consideration given to scenarios where participants may be reluctant or have competing interests. This assumption simplifies the implementation of FL but may not fully reflect real-world conditions where incentives, trust, and data ownership concerns can significantly impact the willingness of entities to share their data. In [10], the authors proposed an energy-efficient FL algorithm utilizing UAVs and wireless-powered communications to address the challenges of terrestrial communication unavailability and battery limitations, demonstrating superior energy efficiency through comprehensive resource allocation. In [4], the authors proposed a UAV-assisted FL system that optimizes device scheduling, UAV trajectory, and time allocation to minimize FL training completion time, leveraging the UAV’s mobility to enhance communication efficiency and mitigate the communication straggler effect, with simulation results demonstrating significant improvements in the tradeoff between completion time and test accuracy. The studies mentioned above optimistically assume that each participant voluntarily contributes to the model training without compensation, which clearly does not reflect the actual situation. Therefore, it is essential to design an incentive mechanism to encourage participation.

Consequently, further research is needed to address these complexities and develop robust incentive mechanisms that encourage and manage participation in FL for drone applications. The design of robust incentives is hindered by three challenges: C1: Heterogeneity. Due to the characteristics of FL itself, heterogeneity is prevalent in the FL process [11]. Different FL clients have different data of different quality and quantity, and the cost of collecting and processing each piece of data may also be different. C2: Selfishness and rationality. In FL, participants are often self-interested and rational, seeking to maximize their benefits while minimizing resource expenditure. C3: Free-riding and Byzantine behavior. In FL, free-riding and Byzantine behavior pose significant challenges that can undermine the effectiveness and reliability of the learning process.

Inspired by game theory and reputation mechanisms, we designed a robust incentive mechanism for UAV-enabled FL to enhance the efficiency and reliability of the training process. Game theory is a mathematical framework used for analyzing strategic interactions between rational decision-makers. It studies how individuals or groups make choices that affect one another, focusing on the outcomes of these interactions. A Stackelberg game is a strategic game in game theory where players make decisions in a sequential manner. In this type of game, one player, known as the leader, makes a move first, and the other player(s), known as followers, make their moves afterward, based on the leader’s choice. The Stackelberg game addresses the cooperation issues among clients (vehicles), while coalition games resolve cooperation challenges among model owners (UAVs). In addition, the Wbeta reputation mechanism is a strategy employed to evaluate and manage the reputation of participants within a system or community. By using this mechanism, the trustworthiness and performance of individuals are assessed, encouraging positive behavior and discouraging negative actions.

The following are the paper’s significant contributions:

(1): We propose a reliable incentive mechanism (FedBeam) based on game theory, and it can improve the performance and reliability of the model.
(2): We describe the cooperation between model owners (UAVs) and FL clients (vehicles) as a two-stage Stackelberg game and prove the existence and uniqueness of NE. Among UAVs, we formulate the collaboration as a coalitional game. Based on these game theories, we investigate how to balance the benefits of FL clients and owners of FL in order to achieve the Pareto optimum of social utility.
(3): The weighted-beta reputation mechanism (Wbeta) is designed as an effective measure to select reliable clients for FL to prevent some clients from making unreliable schema updates.
(4): The experimental results show that compared to the baseline, the proposed incentive mechanism improves social welfare by 17.6% and test accuracy by 5.5% on simulated and real datasets, respectively.

The article follows the following arrangement. In Section 2, we present the related work. In Section 3, we focus on the modeling and formalization of the problem. Next, in Section 4, we perform the analysis and design of the incentive mechanism. In Section 5, we conduct a series of experiments on the mechanism we designed. Finally, in Section 6, we summarize our work and future research directions.

2. Related Works

This section reviews the literature on FL in the context of the UAV-enabled Internet of vehicles and incentive mechanisms for FL in the IoV.

UAV-enabled FL in the IoV: Integrating UAVs into FL frameworks offers several advantages, such as enhanced mobility, flexible deployment, and improved communication coverage. UAVs can act as mobile edge servers, facilitating data aggregation and model training in dynamic IoV environments. Recent works have explored UAV-assisted FL systems, focusing on optimizing UAV trajectories, resource allocation, and communication efficiency [2,4,10,12,13,14,15,16]. In [2], the authors propose a blockchain-based hierarchical FL framework for UAV-enabled IoT networks, addressing data volume and privacy concerns by mitigating model impairment from imbalanced data distribution, enhancing trust in decentralized model aggregation, and optimizing device association, resource allocation, and UAV deployment, with extensive experiments demonstrating superior performance over state-of-the-art alternatives. In [13], the authors propose a secure and efficient FL framework for UAV-enabled networks, featuring a secure UE selection scheme to prevent eavesdropping and a joint UAV placement and resource allocation optimization to minimize training time and energy consumption, using an LSTM-DDPG algorithm for real-time decision making, with simulations demonstrating superior efficiency and security over state-of-the-art methods. In [14], the author proposes an air–ground integrated FL framework for 6G networks, leveraging UAVs to supplement terrestrial base stations and jointly optimizing UAV location and resource allocation to minimize energy consumption and balance energy usage with training latency, using successive convex approximation techniques, with simulations and real-world experiments demonstrating superior cost reduction and learning accuracy over benchmarks. While these papers investigate the enhancement of model performance and resource allocation in UAV-enabled FL for the Internet of vehicles, they assume that participants willingly and freely join the FL process. This assumption does not align with real-world scenarios, where incentives and motivations are crucial in participant engagement. Effective FL systems must consider the need for robust incentive mechanisms to ensure active and fair participation from all entities involved.

Incentive mechanisms for FL in the IoV: Incentive mechanisms motivate participants to contribute their resources, such as computational power and data, to the FL process. Existing research has explored various incentive schemes, including monetary rewards, reputation-based systems, and gamification techniques [17,18,19,20,21,22,23]. In [17], the authors proposed a UAV-assisted FL framework for the Internet of vehicles to address challenges of intermittent connectivity, low proactivity, and limited resources of vehicle users, demonstrating a reduction in energy consumption and an improvement in model convergence speed compared to baseline algorithms. In [18], the authors presented a prospect theory (PT)-based incentive mechanism for the FL-enabled Internet of UAVs, addressing issues of information asymmetry and bounded rationality and demonstrating superior performance over baseline methods by modeling risk-awareness behavior and optimizing subjective utility through FL for crowd-sensing tasks. In [19], the authors propose an incentive mechanism for FL involving a base station and multiple mobile users, modeled as an auction game where the BS acts as the auctioneer and mobile users as sellers, using a primal–dual greedy auction mechanism to ensure truthfulness, individual rationality, and efficiency, with numerical results demonstrating its effectiveness. In [20], the authors propose an FL approach for UAV-based service providers in IoV applications to enable privacy-preserving collaborative machine learning, using a multi-dimensional contract to ensure truthful reporting of UAV types and the Gale–Shapley algorithm for efficient matching, with simulations validating the incentive compatibility and efficiency of our design. In [21], the authors propose using UAVs as wireless relays to enhance FL in the Internet of vehicles by addressing communication link failures and node losses through a joint auction–coalition formation framework that maximizes UAVs’ profits and accounts for IoV components’ preferences, with simulations showing that UAVs’ profit-maximizing behavior can make grand coalitions unstable and that higher cooperation costs lead UAVs to prefer independent operation. In [22], the authors propose a fair and robust federated learning scheme utilizing edge computing and an optimal incentive mechanism to ensure participation and address information asymmetry, alongside Byzantine-robust aggregation and reputation mechanisms to enhance efficiency and prevent free-riding. In [23], to encourage user participation, the authors propose a hierarchical incentive mechanism based on contract theory, addressing data volume, quality, and costs under information asymmetry, with experiments showing improved utility for model owners compared to benchmarks. However, the above studies do not fully consider the cooperative potential of UAVs or address malicious behaviors like Byzantine attacks. Therefore, our research is necessary and timely.

3. System Model and Problem Formulation

3.1. Basic Setting

As shown in Figure 1, we consider an FL network with some rotary-wing UAVs acting as parameter servers (model owners) and vehicles acting as clients in FL. A UAV n releases some sensing tasks to FL vehicles

P_{n} =

\{1, 2, \dots, p_{n}\}

who are willing to participate. Each UAV strictly corresponds to a certain number of vehicles. A vehicle strictly corresponds to one UAV.

N

is the set of all UAVs, and

S

is a coalition of some UAVs, obviously meeting

S \subseteq N

. Presently, both UAVs and vehicles want to optimize their profits from the formed coalition and reward

γ_{n}

given by the FL UAV n, respectively. As there may not be enough training data, in reality, these UAVs will inevitably cluster around their profit-maximizing goals. Each FL UAV in the coalition constructs a global model using FL without exchanging data.

The process of our FedBeam is as follows: First, each UAV needs to publish a training task to the corresponding vehicles, telling them what kinds of data they need to collect (step ➀). Next, each UAV needs to inform the corresponding vehicle about the total payoff that can be obtained (step ➁). Then, after knowing the training task and the total reward, each vehicle formulates a data contribution strategy that maximizes their revenue, taking into account their situation (e.g., the cost of data collection) (step ➂). Then, each vehicle sends the corresponding local model to the UAV (step ➃). Finally, each UAV assigns each vehicle a payout based on the quantity of data received (step ➄). Once the UAVs have established coalitions based on their situations and strategic considerations, they proceed to train collaboratively within their respective coalitions. The important notations are listed in Table 1.

3.2. Utility Model

The UAV generates only one training task by announcing the payment amount

γ_{n} > 0

. Then, depending on the payment, associated vehicles determine their degree of participation. Without loss of the generality, each vehicle

i \in P_{n}

is willing to contribute a quantity of data

q_{n}^{i}

. If

q_{n}^{i} = 0

, vehicle i does not participate in the task. The rewards given by the UAV are proportional to the quantity of data used for the task.

Definition 1.

(Vehicle’s utility) The utility of vehicle i providing services to the UAV n is defined as

U_{n}^{i} (q_{n}^{i}, q_{n}^{- i}) ≜ \{\begin{matrix} \frac{q_{n}^{i}}{\sum_{i = 1}^{p_{n}} q_{n}^{i}} γ_{n} - q_{n}^{i} c_{n}^{i}, & if q_{n}^{i} > 0 \\ 0, & if q_{n}^{i} = 0 \end{matrix} .

(1)

Here,

q_{n}^{- i} = \{q_{n}^{1}, q_{n}^{2}, \dots, q_{n}^{i - 1}, q_{n}^{i + 1}, \dots, q_{n}^{p_{n}}\}

is the strategy set of all vehicles except vehicle i in UAV n’s task. Due to [24], the cost

c_{n}^{i}

of vehicle i is

c_{n}^{i} = ζ {(f_{n}^{i})}^{2} .

(2)

f_{n}^{i}

represents the CPU frequency of vehicle i that assists UAV n in training its model. For each UAV, since we cannot directly measure its contribution to the coalition according to the loss function, we must make a conversion between contributions and benefits. Here, we regard the quantity of data as a contribution and consider the FL test accuracy as a benefit.

The coalition can then sell the global model as a service for a profit that is contingent on the performance of the model. As we all know, a model that performs better or is more accurate provides greater revenues for the coalition. Inspired by [22,25], we characterize the model performance of the coalition in the following.

Definition 2.

(Coalition’s performance) The performance of a coalition

S

is defined by

V (S) = ln (1 + α Q) .

(3)

Here,

Q = \sum_{n = 1}^{| S |} X_{n}

, and

X_{n} = \sum_{i}^{p_{n}} q_{n}^{i}

.

α

is a fixed constant, and

X_{n}

represents the quantity of data contributed by UAV n.

Communication energy: For the model transmission between different entities in the system model, an orthogonal frequency-division multiple access (OFDMA) strategy is adopted.

The UAV requires energy to transmit data packets to the vehicles. As in [21], the achievable transmission rate by UAV n can be represented by

ν_{n} = b_{n} {log}_{2} (1 + \frac{p_{n} h_{n}}{I_{n} + b_{n} N_{0}}) .

(4)

Here,

b_{n}

is the bandwidth allocated to the UAV,

h_{n}

is the channel gain,

I_{n}

is the interference caused by other UAVs,

P_{n}

is the power of UAV n, and

N_{0}

is the power spectral density of the Gaussian noise.

The cost required by UAV n to receive local model parameters from all the vehicles is

e_{n}^{c} = p_{n} \sum_{w = 1}^{W} \frac{z^{w}}{ν_{n}},

(5)

where

z^{w}

is the data size of the local model parameters of the vehicle.

Hovering cost: The cost required for a UAV to remain stationary near a cell is referred to as hovering energy. The hovering cost of UAV n is denoted by

e_{n}^{h}

.

Circuit cost: Circuit cost refers to the energy consumed by on-board circuits, including computational chips, rotors, and gyroscopes. This energy consumption is crucial and must be considered, as it impacts the overall energy efficiency of the network. The circuit cost of UAV n is denoted by

e_{n}^{t}

.

Therefore, we define the sum of the communication and flight costs for each UAV as:

E_{n} = e_{n}^{c} + e_{n}^{h} + e_{n}^{t} .

(6)

Following [26], we introduce

θ_{n}

to represent the UAV’s assessment of the model. Then, we define the utility of UAV n.

Definition 3.

(UAV’s utility) The utility of a UAV n is defined as

U_{n} = θ_{n} V (S) - E_{n} - γ_{n} - | S | G .

(7)

G

represents the cooperation cost, that is, when there are

| S |

UAVs participating in the cooperation, the cost to be paid by all UAVs in order to collaborate with each other. The communication cost of each owner can be regarded as the same because the size of uploaded gradients and the number of iterations are fixed [27].

Next, we define the utility of a coalition.

Definition 4.

(Coalition’s utility) The utility of a coalition S is defined by

U (S) = \sum_{n \in S} U_{n} + \sum_{n \in S} \sum_{i \in P_{n}} U_{n}^{i} .

(8)

3.3. Weighted-Beta Reputation Model

During FL training, it may be very vulnerable to various unreliable model updates, which makes the model difficult to converge, or the final training accuracy is not high. Here, we mainly consider two types of unreliable model update [24], which we also regard as attacks as follows:

Intentional attack: Under general conditions, there exist some kinds of intentional attacks, e.g., data poisoning attacks or free-ride attacks. On the one hand, data poisoning techniques like label flipping involve changing the training data’s original labels to other random labels and then computing the gradient using these poisonous data. On the other hand, a free-ride attack may not really execute local model training and just upload a random gradient. These two situations seriously damage the accuracy of FL’s final model.
Unintentional attack: Because each UAV has only part of the data, the distribution of these data may be inconsistent with the overall distribution, which goes directly to non-IID. Although it is very common in reality, it has a negative influence on the accuracy of the final model. Here, we quantify differences in local data distributions using the widely used relative entropy (KL divergence) [28]. The difference in the distribution between the data of UAV n and the overall data can be defined as:
Definition 5.
The degree of non-IID is relative entropy (KL divergence):

$KL (P ∥ Q) = \sum P (x) log \frac{P (x)}{Q (x)} .$

(9)

Due to the existence of these unreliable model updates or attacks, only the vehicles with high reputations can participate in the training and obtain better benefits. A weighted-beta reputation mechanism is proposed, which can be used to predict the future performance of each participant through the beta probability distribution function.

The beta reputation system has a strong foundation in statistical theory, in contrast to the majority of other intuitive and unique reputation systems [29]. Although we describe a centralized approach, the beta reputation system can also be used in a distributed environment. There are two shape factors,

α

and

β

, which define the various forms of the probability distribution. This is the distribution’s most intriguing property.

We define a K-dimensional rating vector to represent the evaluation of the vehicles of the aggregation server for each model aggregation. Zero stands for good updates, and one for bad updates. The form of the rating vector is exampled as follows:

\vec{R_{n}} = {0, 1, 1, 0, \dots, 1}_{K} .

(10)

Moreover, this is the definition of the common beta probability density function:

Definition 6.

(Beta probability density function):

f (p ∣ α, β) = \frac{Γ (α + β)}{Γ (α) Γ (β)} p^{α - 1} {(1 - p)}^{β - 1},

(11)

where

α > 0, β > 0, 0 \leq p \leq 1

and the p is in the range

(0, 1)

.

3.4. Problem Formulation

To motivate the active participation and service contribution of UAVs and vehicles, game theory and reputation management are introduced to us to solve this incentive problem. A vehicle receives benefits under our system for delivering a local model, known as a data provisioning plan (DPP). Each UAV needs to decide to give the data reward plan (DRP) according to the situation in the coalition. After each FL training, we can update the reputation value of each vehicle in time, which is called the reputation update method (RUM). Since there are always inadequate training samples, we believe that there are a number of UAVs interested in participating in FL, so we designed a coalition formation algorithm (CFA).

From the perspective of FedBeam, the optimal strategy of UAVs and vehicles needs to satisfy two fundamental properties: individual rationality and Stackelberg equilibrium (SE). Having these properties, the UAVs and vehicles have no motivation to deviate from their current decisions. The formal definitions are given below.

Definition 7.

Individual rationality (IR): UAVs and vehicles are only willing to engage in collaboration if the benefits to them are non-negative:

\{\begin{matrix} U_{n}^{i} (q_{n}^{i}, q_{n}^{- i}) \geq 0 \\ U_{n} \geq 0 \end{matrix} .

(12)

Definition 8.

Stackelberg equilibrium (SE): a set of strategies

q_{n}^{*} = (q_{n}^{1 *}, q_{n}^{2 *}, \dots .)

and

γ^{*} = (γ_{1}^{*}, γ_{2}^{*}, \dots .)

are the SE of the DPP, and the DRP is defined by:

\{\begin{matrix} U_{n}^{i} (q_{n}^{i *}, q_{- n}^{i *}) \geq U_{n}^{i} (q_{n}^{i}, q_{- n}^{i *}) \\ U_{n} (γ_{n}^{*}, γ_{- n}^{*}) \geq U_{n} (γ_{n}, γ_{- n}^{*}) \end{matrix} .

(13)

Here, we consider the SE as a combination of two Nash equilibria (NE). Next, we show that

π_{n}

is the UAV n’s decision to join the coalition. The significance of the NE is very important so that no single participant is willing to deviate from this NE; otherwise, its benefits will be reduced.

The objective of developing the DPP and DRP for UAVs and vehicles is to maximize their utility while simultaneously satisfying IR and SE limitations.

Definition 9.

The optimal design of the vehicle’s utility is formalized as follows:

\{\begin{matrix} \underset{q_{n}^{i}}{m a x} U_{n}^{i} \\ s . t . \{\begin{matrix} (12), (13) \\ 0 \leq q_{n}^{i} \leq d_{n}^{i} \end{matrix} \end{matrix} .

(14)

Here,

d_{n}^{i}

is the maximum quantity of data for vehicle i in UAV n. Because there is a limit to everyone’s ability, we cannot exceed the scope of this ability. Similarly, each UAV desires to maximize its utility at the DRP stage, i.e., to determine its own best payoff, given that a coalition has been established.

Definition 10.

The optimal design of the UAV’s utility is formalized as follows:

\{\begin{matrix} \underset{γ_{n}}{m a x} U_{n} \\ s . t . \{\begin{matrix} (12), (13) \\ 1 + α Q \geq Ω \\ 0 \leq γ_{n} \leq Γ_{n} \end{matrix} \end{matrix} .

(15)

Here,

Ω

denotes a usefulness benchmark number that a feasible model must satisfy. Due to the practical scenarios, if the model performance cannot reach a certain level, it is rejected by consumers. Here,

Γ_{n}

is the maximum reward a UAV could pay.

Finally, we maximize the overall social welfare by designing a coalition formation algorithm (CFA). Coalition formation refers to the process by which individuals or groups come together to form a coalition, typically to achieve common goals that would be difficult to accomplish alone. This concept is widely studied in game theory and social sciences.

Definition 11.

Maximizing social welfare is shown below:

\{\begin{matrix} \underset{π, q, γ}{m a x} \sum U_{n} + \sum U_{n}^{i} \\ s . t . (12), (13), (14), (15) \end{matrix} .

(16)

4. Optimal Design of FedBeam

In this section, we first derive that in the DPP, for any given

γ_{n}

, there exists a unique NE solution

q_{n}^{i}

to the Stackelberg game in the second stage. Then, in the DRP, we derive that there exists a unique NE solution

γ_{n}

with a deterministic coalition structure. Next, we discuss how to construct a Pareto optimal coalition structure CFA. Finally, we study the RUM, designing and proving the reputation mechanism to enable a reliable FL update to be achieved.

4.1. Analysis of the DPP

The DPP and DRP are modeled as a two-stage Stackelberg game. UAV n declares a total payment of

γ_{n}

in the first stage. The second stage is when each vehicle decides on a plan to optimize its utility. During the Stackelberg game, the UAV is the leader, and the vehicles are the followers. Following the backward induction solution, the vehicles’ maximum utility (the second stage) is first induced.

We derive the first derivative of

U_{n}^{i} (q_{n}^{i}, q_{- n}^{i})

with respect to

q_{n}^{i}

, proving the uniqueness of the NE.

\frac{\partial U_{n}^{i} (q_{n}^{i}, q_{- n}^{i})}{\partial q_{n}^{i}} = \frac{- γ_{n} q_{n}^{i}}{{(\sum_{m = 1}^{p_{n}} q_{n}^{m})}^{2}} + \frac{γ_{n}}{\sum_{m = 1}^{p_{n}} q_{n}^{m}} - c_{n}^{i} .

(17)

Then, we proceed to derive the second-order derivative:

\frac{\partial^{2} U_{n}^{i} (q_{n}^{i}, q_{- n}^{i})}{\partial {(q_{n}^{i})}^{2}} = - \frac{2 γ_{n} \sum_{m \neq i} q_{n}^{m}}{{(\sum_{m = 1}^{p_{n}} q_{n}^{m})}^{3}} < 0 .

(18)

The proof is derived to ensure that we have a unique Nash equilibrium solution.

Lemma 1.

The second-stage game has a unique NE for every value of

γ_{n}

when the following two requirements are met [25]:

The strategy sets are convex, bounded, and closed.
The utility functions in the strategy space are quasi-concave and continuous.

Theorem 1.

For any

i \in P_{n}

in the DDP (i.e.,

q_{n}^{i} \in

[0, d_{n}^{i}])

, its optimal response is:

q_{n}^{i *} = \frac{(p_{n} - 1) γ_{n}}{\sum c_{n}^{m}} (1 - \frac{(p_{n} - 1) (c_{n}^{i})}{\sum c_{n}^{m}}) .

(19)

Proof.

We maximize the utility of each vehicle while satisfying the NE. According to the individual rationality, we know that

U_{n}^{i} \geq 0

. As a consequence, from (1)), we can obtain:

q_{n}^{i} \leq \frac{γ_{n}}{c_{n}^{i}} .

(20)

We have shown that the second stage of the game has an NE. Let

\frac{\partial U_{n} (q_{n}^{i}, q_{- n}^{i})}{q_{n}^{i}} = 0

; we have:

\frac{- γ_{n} q_{n}^{i}}{{(\sum_{m = 1}^{p_{n}} q_{n}^{m})}^{2}} + \frac{γ_{n}}{\sum_{m = 1}^{p_{n}} q_{n}^{m}} - c_{n}^{i} = 0 .

(21)

By solving the equation in (21), we obtain the expression of

q_{n}^{i}

:

q_{n}^{i} = \sqrt{\frac{γ_{n} \sum_{m \neq i}^{p_{n}} q_{n}^{m}}{c_{n}^{i}}} - \sum_{m \neq i}^{p_{n}} q_{n}^{m} .

(22)

Because of the concavity of

U_{n}^{i}

, it is the vehicle i’s optimum approach if

q_{n}^{i}

is nonnegative. Otherwise, it indicates that vehicle i is not involved in the task. Moreover, if

q_{n}^{i}

is greater than

d_{n}^{i}

(maximum quantity of data owned by vehicle i), vehicle i participates with its optimal strategy by setting

q_{n}^{i} = d_{n}^{i}

. Therefore, we have:

q_{n}^{i} = \{\begin{matrix} 0, & if γ_{n} \leq c_{n}^{i} \sum_{m \neq i}^{p_{n}} q_{n}^{m} \\ \sqrt{\frac{γ_{n} \sum_{m \neq i}^{p_{n}} q_{n}^{m}}{c_{n}^{i}}} - \sum_{m \neq i}^{p_{n}} q_{n}^{m}, & if q_{n}^{i} \in [0, d_{n}^{i}] \\ d_{n}^{i}, & otherwise \end{matrix} .

(23)

According to Equation (23), for any

i \in P_{n}

, we have:

\sum_{m = 1}^{p_{n}} q_{n}^{i *} = \sqrt{\frac{γ_{n} \sum_{m \neq i}^{p_{n}} q_{n}^{m}}{c_{n}^{i}}} .

(24)

By setting

X_{n} = \sum_{i = 1}^{p_{n}} q_{n}^{i *}

, we can derive that

\{\begin{matrix} q_{n}^{1 *} = q_{n} - \frac{q_{n}^{2} (c_{n}^{1})}{γ} \\ q_{n}^{2 *} = q_{n} - \frac{q_{n}^{2} (c_{n}^{2})}{γ} \\ \dots \\ q_{n}^{p_{n} *} = q_{n} - \frac{q_{n}^{2} (c_{n}^{m})}{γ} . \end{matrix}

(25)

Therefore,

X_{n} = p_{n} X_{n} - \frac{q_{n}^{2} \sum_{m = 1}^{p_{n}} (c_{n}^{m})}{γ_{n}} .

(26)

Based on Equation (26), we have:

X_{n} = \frac{(p_{n} - 1) γ_{n}}{\sum_{m = 1}^{p_{n}} (c_{n}^{m})} .

(27)

By plugging Equation (27) into Equation (25), we can derive

q_{n}^{i *} = \frac{(p_{n} - 1) γ_{n}}{\sum_{m = 1}^{p_{n}} (c_{n}^{m})} (1 - \frac{(p_{n} - 1) (c_{n}^{i})}{\sum c_{n}^{i}}) .

(28)

□

4.2. Analysis of the DRP

Based on the derivation above, we can define q as a function of

γ

, which can be abbreviated as:

\{\begin{matrix} q_{n}^{i *} = σ_{n}^{i} γ_{n}, \\ σ_{n}^{i} = \frac{(p_{n} - 1)}{\sum_{m = 1}^{p_{n}} (c_{n}^{m})} (1 - \frac{(p_{n} - 1) (c_{n}^{i})}{\sum c_{n}^{i}}) . \end{matrix}

(29)

Theorem 2.

For UAV n, the optimal strategy to determine

γ_{n}

is:

\{\begin{matrix} r_{n}^{*} = \frac{θ_{n}}{α W} - \frac{1}{α W^{2}}, \\ W = \sum σ_{n}^{i} . \end{matrix}

(30)

Proof.

To find the optimal value of r, we need to maximize the following objective function:

f (r) = θ_{n} ln (1 + α r W) - r,

(31)

where

θ_{n}, α

, and W are constants.

To find the optimal value of r, we first take the derivative of the objective function with respect to r:

f^{'} (r) = \frac{d}{d r} [θ_{n} ln (1 + α r W) - r] .

(32)

Using the chain rule and the property of the derivative of the natural logarithm, we obtain:

\begin{matrix} f^{'} (r) = θ_{n} \frac{d}{d r} ln (1 + α r W) - \frac{d}{d r} r \\ f^{'} (r) = θ_{n} \cdot \frac{α W}{1 + α r W} - 1 \end{matrix}

(33)

To find the critical points, we set the derivative

f^{'} (r)

to zero:

θ_{n} \cdot \frac{α W}{1 + α r W} - 1 = 0 .

(34)

Simplified solution:

r = \frac{θ_{n}}{α W} - \frac{1}{α W^{2}} .

(35)

□

4.3. Analysis of the CFA

Assuming there are N UAVs

C = {1, 2, \dots, N}

, coalition S represents a group of UAVs (which can also be called players in the coalition game) interested in participating in the same set of federated training [30]. There is a coalition partition

Π

defined as the set

Π = \{S_{1}, S_{2}, \dots, S_{K}\}

, which is comprised of the UAVs. Here, for

k = 1, \dots, K

, every

S_{k} \subseteq C

is a disjoint coalition such that

\cup_{k = 1}^{K} S_{k} = C

and

S_{j} \cap S_{k} = \emptyset

for

j \neq k

.

Theorem 3.

A grand coalition is not always formed through a coalition game. In contrast, separate small disjoint coalitions will emerge finally.

Proof.

Because of the theory in [30,31], the non-superadditivity of the coalitional game is firstly proven. As a consequence, the proposed game’s core is empty. The proof assumes the existence of two disjoint coalitions

S_{1}

and

S_{2}

, where

S_{1}, S_{2} \subseteq N

. The two disjoint coalitions have two choices: one is to exist disjointly, and the other one is to form the grand coalition

S_{1} \cup S_{2}

.

A coalitional game is referred to as superadditive if the condition

v (S_{1} \cup S_{2}) \geq

v (S_{1}) + v (S_{2})

is met. However, the above equations demonstrate that the marginal contribution decreases as

| S |

(the number of UAVs) in the coalition increases. In this way, the marginal benefit of UAVs joining the coalition will be smaller and smaller, or even negative, because

- | S | G - γ_{n} - E_{n}

is less than 0. This further implies that the coalitional game is non-superadditive, since adding a member results in negative marginal gains

v (S_{1} \cup S_{2}) < v (S_{1}) + v (S_{2})

. □

We propose two kinds of coalition formation rules in Figure 2. They are listed below:

Split rule: when we have a coalition S, we can split this coalition into two smaller coalitions $S^{'}$ and $S^{″}$ , i.e., $S \to S^{'} \cup S^{″}$ .
Joining rule: when there is a coalition S with a player n in it, it wants to leave the current coalition S and join a new coalition $S^{'}$ , i.e., $S \cup S^{'} \to (S ∖ {n}) \cup (S^{'} \cup {n})$ .

Figure 2. Split rule and joining rule.

To establish the formation process of the coalition, a preference relationship must be defined. In this situation, each player can select and assess two distinct coalitions based on the preference criteria. We define the partial order relationship for the entire coalition partition ≻. These partial relationships have properties such as being complete, reflexive, and transitive. Specifically, such a change in coalition structure is allowed only when the partial order relationship of overall social utility improvement is satisfied. For example, if the split rule satisfies the condition

v (S^{'}) + v (S^{″}) ≻ v (S)

, or the joining rule satisfies the condition

v (S ∖ {n}) + v (S^{'} \cup {n}) ≻ v (S) + v (S^{'})

, the change is permitted.

The formation of the coalition is comparable to a series of transfer operations. Following the rules of the coalition game, each current state

Π_{c}

can be transferred to the next state

Π_{c + 1}

, and the Pareto improvement in social utility must be satisfied each time for the transfer to take place. From the initial state

Π_{0}

, our algorithm produces the next following transition [32].

Π_{0} \to Π_{1} \to \dots \to Π_{c} \to Π_{c + 1} .

(36)

Here, the implementation of a shift procedure is indicated by the → symbol. Every application of the shift rule generates two possible cases. As we know, the number of coalitions a player can join is limited and cannot exceed the Bell number limit. Therefore, the transformation sequence inevitably terminates and converges to a particular partition

Π_{f}

. The specific process of alliance formation can be referenced in Algorithm 1.

According to [33], we can prove the convergence of the model training.

Theorem 4.

If the learning rate η satisfies the following, we have:

1 - η L - 8 η^{2} L^{2} V_{2} \geq 0, 1 - 16 η^{2} L^{2} V_{3} > 0,

\begin{matrix} \frac{1}{K} \sum_{k = 1}^{K} E [{∥\nabla F (u_{k})∥}_{2}^{2}] & \leq \frac{2 Δ}{η K} + η L \sum_{i \in C} m_{i}^{2} σ^{2} \\ + 2 η^{2} L^{2} V_{1} σ^{2} + 8 η^{2} L^{2} V_{2} κ^{2} \end{matrix}

where

Δ ≜ E [F (u_{1})] - E [F (u^{*})]

and

u^{*} ≜ arg {min}_{w} F (w)

.

Algorithm 1: The coalition formation algorithm (CFA)

4.4. Analysis of RUM

According to function (11), beta distributions have the following expected value:

E (p) = \frac{α}{α + β} .

(37)

As mentioned earlier in formula (10), for the two binomial values, we express the result evaluation of each update of vehicles as good or bad. We do not consider a medium reputation score. From the

\vec{R_{i}}

vector, we count the number of good and bad updates of vehicle i in the current training, which is recorded as g and b. In order to emphasize the negative impact of a bad model update on the whole model, we set the weight parameter

δ > 1

, which means that each bad model update has a greater impact than each good model update. Based on previous experience, we can set the values of

α

and

β

as

α = g + 1, β = δ b + 1, where g, b \geq 0 .

(38)

Based on these

α

and

β

, the beta function (11) could be rewritten as

f_{i} (p ∣ g_{i}, b_{i}) = \frac{Γ (g_{i} + δ b_{i} + 2)}{Γ (g_{i} + 1) Γ (δ b_{i} + 1)} p^{α} {(1 - p)}^{β} .

(39)

Therefore, in the current model training, according to formulas (38) and (39), we can conclude that the reputation is as follows:

r_{i} = E (p) = \frac{g_{i} + 1}{g_{i} + δ b_{i} + 2} .

(40)

To defend the global model from assaults, we discard malicious vehicles that report malicious updates using poison data. In order to determine which model vehicles to exclude and which to include in the training process, we establish a minimum reputation score threshold. We use the following formula to convert the current reputation value of vehicle i into the

[- 1, 1]

interval.

r_{i} = [E (p) - 0.5] * 2 = \frac{g_{i} - δ b_{i}}{g_{i} + δ b_{i} + 2} .

(41)

Considering the historicity of reputation, the current reputation cannot represent the final reputation of vehicles. At the same time, we should also refer to its previous reputation. In order to combine the current reputation with the previous reputation, we design a gradually decreasing function:

Θ_{i} = λ Θ_{i} + (1 - λ) r_{i}, 0 \leq λ \leq 1 .

(42)

where

Θ_{i}

must be greater than or equal to a reputation threshold

Δ

; then, vehicle i will be permitted to participate in the FL. After each training task, the reputation system updates the reputation theta of vehicle i according to the performance

r_{i}

of the current task t. Therefore, for each vehicle, in order to obtain higher benefits, they must seriously participate in training and improve the quality of training and model updating so as to obtain a higher reputation value. Next, we present the reputation update process in Algorithm 2.

Algorithm 2: The reputation update algorithm (RUA)

5. Experimental Results

5.1. Experiment Setting

(1): Datasets:

Simulated datasets: The noise power $(N_{0})$ was −152 dBm/Hz. The bandwidth of each $UAV (B)$ was 180 kHz. The transmitter model size $(ω)$ was 0.1 Mbits. The maximum transmission power $(p_{m t})$ was 10 dBm. The maximum CPU frequency $(f_{n}^{i})$ was 3 GHz. The fixed constant $α$ was five. The computation energy conversion coefficient $Φ$ ranged from 0.01 to 0.05. The cooperation cost G was one. The hovering cost $e_{n}^{h}$ was within [0.5,1.5], and the circuit cost $e_{n}^{t}$ was within [0.1,0.2]. The specific parameters can be found in Table 2.
Real-world datasets: four standard real-world datasets, e.g., MNIST [34], EMNIST [35], SVHN [36], and CIFAR-10 [37] were utilized to evaluate the performance.

(2): Training setup:

Participants setting: There were 2 to 20 UAVs and 5 to 50 vehicles. We configured 10% to 30% of the vehicles to exhibit free-riding and Byzantine behaviors (equally divided). Each UAV had between 2 to 10 vehicles. The reputation threshold was set to 0.5, and the reputation update parameter $λ$ was 0.3. Each UAV estimated the model utility from one to two.
Training parameters: The batch size was set to 32. The number of local epochs e was 10. The number of global training rounds E was 100. The learning rate $η$ was set to 0.01, and the SGD momentum was 0.05.

(3): Baselines: to compare our mechanism with other mechanisms, we compared it with a random algorithm RA (randomly selecting contributions from each vehicle in FedBeam), FLBE (FedBeam without the reputation mechanism), and FLIM [17].

RA: in the random algorithm, within the FedBeam mechanism, each vehicle randomly selected its contribution without choosing the optimal strategy based on the competitive situation.
FLBE: in FLBE, the incentive method from FedBeam was still used, but without the reputation mechanism.
FLIM: in [17], the authors proposed a contract-based incentive mechanism but did not consider cooperation among UAVs.

(4): Evaluation metrics: We used three metrics to evaluate our mechanism: reputation value, social utility, and test accuracy. The results were derived from the average of multiple experiments.

Reputation value: in our mechanism, how the reputation value changed was also a very important metric.
Social utility: Social utility encompasses the utilities of all UAVs and all vehicles. The objective of our mechanism was to maximize social utility.
Test accuracy: test accuracy is a crucial metric for model training in FL, representing the performance that UAVs can achieve with their trained models.

5.2. Performance of the Proposed Wbeta

Here, we assumed that many vehicles participated in the training of multiple tasks. For each task, vehicles whose reputation was above a certain threshold were required to participate. This ensured the quality of training and the speed of convergence. After each task, we reevaluated the reputation of the current vehicles based on their performance in model training and their past reputation value.

We primarily considered two types of attacks. The first was an intentional attack, where some flipped labels existed in the vehicle’s local dataset. For malicious participants conducting poisoning attacks, they could possess 10 types of training data at random. However, the labels of some training samples were deliberately modified to mislead the global model training. The attack strength of a poisoning attack was expressed as the percentage of modified labels in the training sample. The second type was the unintentional attack (non-IID attack), a very common issue in FL, where the data of vehicles are not independent and identically distributed (non-IID). To quantify the data quality of local non-IID training data in unreliable players, we used the KL divergence as an index. KL divergence measures the similarity between two probability distributions.

Figure 3a shows the change in the reputation value of a vehicle throughout 20 training tasks. The vehicle’s reputation value significantly decreased due to its random execution of model assaults, such as free-riding or partial label flipping. For comparison, we used two benchmarks: the unmodified beta reputation mechanism and the subjective logic model (TSL) [38]. Additionally, we established a system without reputation protection. As long as the vehicle uploaded a local model update, the system assumed it had successfully fulfilled its duty and could enhance its reputation. Due to the weight impacts of harmful interaction effects and freshness, Figure 3a demonstrates that the Wbeta reputation algorithm we devised showed a larger and quicker reputation decline compared to the beta reputation algorithm and TSL. After 10 learning tasks, the reputation value in the system without reputation protection had reached 1, while the reputation value of TSL remained above 0.5. Therefore, our Wbeta reputation mechanism could detect these abnormal behaviors faster and reflect them in the reputation value.

Next, in Figure 3b, we evaluated the impact of unreliable vehicles on the model’s accuracy and compared the scenarios with and without the reputation mechanism. Here, we used the MNIST dataset. It was assumed that there were 10 vehicles for joint training, with approximately 30% being attackers. These attackers were trained and evaluated according to different attack intensities (sample label turnover ratio) and different KL divergence values. We found that the training model with the reputation mechanism (green) could significantly improve the model’s training accuracy compared to the model without the reputation mechanism (yellow). With the Wbeta reputation mechanism, the accuracy of our model could be improved by 9.7% over the case without the reputation mechanism, even with 20% of attackers.

To investigate the effect of reputation thresholds on FL accuracy, we evaluated the variation in accuracy under different thresholds. Figure 3c shows the results using two different datasets, MNIST and CIFAR-10. The accuracy of model training improved as the threshold value increased. However, when the reputation threshold was relatively low, e.g., below 0.2, the schemes failed to detect unreliable vehicles fully. The reason was that in some FL tasks, unreliable vehicles could temporarily disguise themselves through good behaviors, thus they could not be detected in a short time.

5.3. Impact of Our Proposed Mechanism on Social Utility

Table 3 illustrates the changes in social welfare under the presence of 10% attackers and the FedBeam mechanism. The table shows that as the number of UAVs or vehicles increased, social welfare gradually improved. However, it also reveals a diminishing marginal utility as the increase in the number of participants intensified competition. Figure 4 illustrates the changes in social welfare in the presence of 10% attackers as the number of UAVs, vehicles, the upper limit of model evaluation values, and costs varied. In Figure 4a, with 30 vehicles, social welfare increased as the number of UAVs rose. Our proposed FedBeam mechanism achieved the highest social welfare, with an average improvement of 17.6% compared to the other three mechanisms. In Figure 4b, with six UAVs, social welfare increased as the number of vehicles rose. Similarly, our proposed FedBeam mechanism attained the best welfare outcomes. In Figure 4c, with 6 UAVs and 30 vehicles, we examined how social welfare changed with the model evaluation value theta, which ranged from 1 to 1.9. Evidently, as theta increased, social welfare significantly improved, and our mechanism showed a clear advantage over others. In Figure 4d, with 6 UAVs and 30 vehicles, we investigated how social welfare changed as costs increased. Here, costs ranged from 0.10 to 0.19. We observed that as costs rose, social welfare significantly declined, yet there was a diminishing marginal utility effect. Our mechanism still achieved the best welfare outcomes due to its effective balance of incentive and reputation management, as well as its efficiency in identifying attackers.

5.4. Impact of Our Proposed Mechanism on Test Accuracy

In Figure 5, under the presence of 10% attackers, with 6 UAVs and 30 vehicles, the changes in test accuracy as global training rounds increase are shown. In Figure 5a, as the training rounds increased, test accuracy improved across all mechanisms. Our proposed FedBeam mechanism achieved the highest test accuracy, with an average improvement of 5.5% compared to the other three mechanisms. This was because our mechanism better incentivized cooperation between vehicles and UAVs and quickly excluded attackers from the training process. The RA mechanism, which randomly selected data contributors and performed local model training each time, was less effective than ours. FLBE, lacking a reputation mechanism, failed to identify attackers, promptly resulting in lower model training effectiveness. FLIM, which did not account for cooperation between attackers and UAVs, also fell short in test accuracy compared to ours. Figure 5b shows similar trends. In Figure 5c, due to the more challenging nature of the SVHN dataset, the test accuracy of all mechanisms decreased, yet our proposed FedBeam still achieved the best results. In Figure 5d, CIFAR-10, being the most challenging among the four datasets, revealed that our FedBeam still achieved the best performance for reasons similar to those mentioned earlier.

6. Conclusions

This paper proposed an incentive mechanism FedBeam, specifically designed to tackle the challenges associated with FL in the UAV-assisted Internet of vehicles. The research modeled the interaction between UAVs, acting as model owners, and vehicles through a two-layer Stackelberg game framework. The study successfully demonstrated the existence and uniqueness of a Nash Equilibrium (NE) within this model. To further enhance participation, a weighted-beta (Wbeta) reputation update mechanism was implemented, encouraging participants to contribute high-quality model updates. Experimental results indicated that this mechanism significantly improved both social welfare and test accuracy, highlighting its effectiveness in fostering collaboration among participants.

In terms of future research, there is potential to investigate adaptive incentive mechanisms that can respond to variations in participant behavior across different scenarios. Additionally, it will be essential to explore how to effectively integrate multimodal data within federated learning frameworks. Future studies should also examine emerging trends such as multi-task learning in federated learning, cross-domain federated learning, and zero-trust architectures in the Internet of vehicles (IoV). Understanding how these trends intersect with the current work will demonstrate that FedBeam is capable of adapting to evolving technologies, ensuring its relevance in a rapidly changing landscape.

Author Contributions

Conceptualization, G.H. and D.Z.; methodology, G.H. and J.S.; software, G.H.; validation, D.Z., J.H. (Jialing Hu), J.H. (Jianmin Han) and T.L.; investigation, T.L.; writing—original draft preparation, G.H., J.H. (Jialing Hu) and D.Z.; writing—review and editing, G.H., D.Z. and T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Ministry of Education of Humanities and Social Science Project, China (grant no. 19YJAZH047).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, C.; Song, M.; Luo, Y. Federated learning based on Stackelberg game in unmanned-aerial-vehicle-enabled mobile edge computing. Expert Syst. Appl. 2024, 235, 121023. [Google Scholar] [CrossRef]
Tong, Z.; Wang, J.; Hou, X.; Chen, J.; Jiao, Z.; Liu, J. Blockchain-Based Trustworthy and Efficient Hierarchical Federated Learning for UAV-Enabled IoT Networks. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
Gad, G.; Farrag, A.; Aboulfotouh, A.; Bedda, K.; Fadlullah, Z.M.; Fouda, M.M. Joint self-organizing maps and knowledge distillation-based communication-efficient federated learning for resource-constrained UAV-IoT systems. IEEE Internet Things J. 2024, 11, 15504–15522. [Google Scholar] [CrossRef]
Fu, M.; Shi, Y.; Zhou, Y. Federated learning via unmanned aerial vehicle. IEEE Trans. Wirel. Commun. 2023, 23, 2884–2900. [Google Scholar] [CrossRef]
Zheng, J.; Xu, J.; Du, H.; Niyato, D.; Kang, J.; Nie, J.; Wang, Z. Trust Management of Tiny Federated Learning in Internet of Unmanned Aerial Vehicles. IEEE Internet Things J. 2024, 11, 21046–21060. [Google Scholar] [CrossRef]
Ke, T.T.; Sudhir, K. Privacy rights and data security: GDPR and personal data markets. Manag. Sci. 2023, 69, 4389–4412. [Google Scholar] [CrossRef]
Liu, Y.; Kang, Y.; Zou, T.; Pu, Y.; He, Y.; Ye, X.; Ouyang, Y.; Zhang, Y.Q.; Yang, Q. Vertical federated learning: Concepts, advances, and challenges. IEEE Trans. Knowl. Data Eng. 2024, 36, 3615–3634. [Google Scholar] [CrossRef]
Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Poor, H.V. Federated learning for internet of things: A comprehensive survey. IEEE Commun. Surv. Tutor. 2021, 23, 1622–1658. [Google Scholar] [CrossRef]
Qu, Y.; Dai, H.; Zhuang, Y.; Chen, J.; Dong, C.; Wu, F.; Guo, S. Decentralized federated learning for UAV networks: Architecture, challenges, and opportunities. IEEE Netw. 2021, 35, 156–162. [Google Scholar] [CrossRef]
Pham, Q.V.; Le, M.; Huynh-The, T.; Han, Z.; Hwang, W.J. Energy-efficient federated learning over UAV-enabled wireless powered communications. IEEE Trans. Veh. Technol. 2022, 71, 4977–4990. [Google Scholar] [CrossRef]
Lu, J.; Liu, H.; Zhang, Z.; Wang, J.; Goudos, S.K.; Wan, S. Toward Fairness-Aware Time-Sensitive Asynchronous Federated Learning for Critical Energy Infrastructure. IEEE Trans. Ind. Inform. 2022, 18, 3462–3472. [Google Scholar] [CrossRef]
Fadlullah, Z.M.; Kato, N. HCP: Heterogeneous computing platform for federated learning based collaborative content caching towards 6G networks. IEEE Trans. Emerg. Top. Comput. 2020, 10, 112–123. [Google Scholar] [CrossRef]
Fan, X.; Chen, Y.; Liu, M.; Sun, S.; Liu, Z.; Xu, K.; Li, Z. UAV-Enabled Federated Learning in Dynamic Environments: Efficiency and Security Trade-off. IEEE Trans. Veh. Technol. 2023, 73, 6993–7006. [Google Scholar] [CrossRef]
Jing, Y.; Qu, Y.; Dong, C.; Ren, W.; Shen, Y.; Wu, Q.; Guo, S. Exploiting UAV for air–ground integrated federated learning: A joint UAV location and resource optimization approach. IEEE Trans. Green Commun. Netw. 2023, 7, 1420–1433. [Google Scholar] [CrossRef]
Qureshi, K.I.; Wang, L.; Xiong, X.; Lodhi, M.A. Asynchronous Federated Learning for Resource Allocation in Software Defined Internet of UAVs. IEEE Internet Things J. 2023, 11, 20899–20911. [Google Scholar] [CrossRef]
Wang, Y.; Su, Z.; Zhang, N.; Benslimane, A. Learning in the air: Secure federated learning for UAV-assisted crowdsensing. IEEE Trans. Netw. Sci. Eng. 2020, 8, 1055–1069. [Google Scholar] [CrossRef]
Lin, S.; Li, Y.; Han, Z.; Zhuang, B.; Ma, J.; Tianfield, H. Joint Incentive Mechanism Design and Energy-Efficient Resource Allocation for Federated Learning in UAV-Assisted Internet of Vehicles. Drones 2024, 8, 82. [Google Scholar] [CrossRef]
Fu, F.; Wang, Y.; Li, S.; Yang, L.T.; Zhao, R.; Dai, Y.; Yang, Z.; Zhang, Z. Incentive Mechanism Against Bounded Rationality for Federated Learning-Enabled Internet of UAVs: A Prospect Theory-Based Approach. IEEE Internet Things J. 2024, 11, 20958–20969. [Google Scholar] [CrossRef]
Le, T.H.T.; Tran, N.H.; Tun, Y.K.; Nguyen, M.N.; Pandey, S.R.; Han, Z.; Hong, C.S. An incentive mechanism for federated learning in wireless cellular networks: An auction approach. IEEE Trans. Wirel. Commun. 2021, 20, 4874–4887. [Google Scholar]
Lim, W.Y.B.; Huang, J.; Xiong, Z.; Kang, J.; Niyato, D.; Hua, X.S.; Leung, C.; Miao, C. Towards federated learning in uav-enabled internet of vehicles: A multi-dimensional contract-matching approach. IEEE Trans. Intell. Transp. Syst. 2021, 22, 5140–5154. [Google Scholar] [CrossRef]
Ng, J.S.; Lim, W.Y.B.; Dai, H.N.; Xiong, Z.; Huang, J.; Niyato, D.; Hua, X.S.; Leung, C.; Miao, C. Joint auction-coalition formation framework for communication-efficient federated learning in UAV-enabled internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2020, 22, 2326–2344. [Google Scholar] [CrossRef]
Wang, Y.; Su, Z.; Luan, T.H.; Li, R.; Zhang, K. Federated learning with fair incentives and robust aggregation for UAV-aided crowdsensing. IEEE Trans. Netw. Sci. Eng. 2021, 9, 3179–3196. [Google Scholar] [CrossRef]
He, G.; Li, C.; Song, M.; Shu, Y.; Lu, C.; Luo, Y. A hierarchical federated learning incentive mechanism in UAV-assisted edge computing environment. Ad Hoc Netw. 2023, 149, 103249. [Google Scholar] [CrossRef]
Kang, J.; Xiong, Z.; Niyato, D.; Xie, S.; Zhang, J. Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory. IEEE Internet Things J. 2019, 6, 10700–10714. [Google Scholar] [CrossRef]
Zhan, Y.; Li, P.; Qu, Z.; Zeng, D.; Guo, S. A Learning-Based Incentive Mechanism for Federated Learning. IEEE Internet Things J. 2020, 7, 6360–6368. [Google Scholar] [CrossRef]
Huang, G.; Chen, X.; Ouyang, T.; Ma, Q.; Chen, L.; Zhang, J. Collaboration in participant-centric federated learning: A game-theoretical perspective. IEEE Trans. Mob. Comput. 2022, 22, 6311–6326. [Google Scholar] [CrossRef]
Lim, W.Y.B.; Xiong, Z.; Miao, C.; Niyato, D.T.; Yang, Q.; Leung, C.; Poor, H.V. Hierarchical Incentive Mechanism Design for Federated Machine Learning in Mobile Networks. IEEE Internet Things J. 2020, 7, 9575–9588. [Google Scholar] [CrossRef]
Su, Z.; Wang, Y.; Luan, T.H.; Zhang, N.; Li, F.; Chen, T.; Cao, H. Secure and Efficient Federated Learning for Smart Grid With Edge-Cloud Collaboration. IEEE Trans. Ind. Inform. 2021, 18, 1333–1344. [Google Scholar] [CrossRef]
Reza, S.M.S.; Hossain, D.A. Malicious vehicle detection based on beta reputation and trust management for secure communication in smart automotive cars network. TELKOMNIKA (Telecommun. Comput. Electron. Control.) 2021, 19, 1688–1696. [Google Scholar]
Hasan, C. Incentive Mechanism Design for Federated Learning: Hedonic Game Approach. ArXiv 2021, arXiv:2101.09673. [Google Scholar]
Saad, W.; Han, Z.; Debbah, M.; Hjørungnes, A.; Başar, T. Coalitional Games for Distributed Collaborative Spectrum Sensing in Cognitive Radio Networks. In Proceedings of the IEEE INFOCOM 2009, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 2114–2122. [Google Scholar]
Guazzone, M.; Anglano, C.; Sereno, M. A Game-Theoretic Approach to Coalition Formation in Green Cloud Federations. In Proceedings of the 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, IL, USA, 26–29 May 2014; pp. 618–625. [Google Scholar]
Sun, Y.; Shao, J.; Mao, Y.; Wang, J.H.; Zhang, J. Semi-decentralized federated edge learning for fast convergence on non-IID data. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 1898–1903. [Google Scholar]
LeCun, Y. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 2 October 2024).
Cohen, G.; Afshar, S.; Tapson, J.C.; van Schaik, A. EMNIST: Extending MNIST to handwritten letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2921–2926. [Google Scholar]
Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 12–17 December 2011; pp. 301–304. [Google Scholar]
Ayi, M.; El-Sharkawy, M. RMNv2: Reduced Mobilenet V2 for CIFAR10. In Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2020; pp. 287–292. [Google Scholar]
Liu, Y.; Li, K.; Jin, Y.; Zhang, Y.; Qu, W. A Novel Reputation Computation Model Based on Subjective Logic for Mobile Ad Hoc Networks. In Proceedings of the 2009 Third International Conference on Network and System Security, Gold Coast, Australia, 19–21 October 2009; pp. 294–301. [Google Scholar]

Figure 1. Federated learning in UAV-enabled wireless networks.

Figure 3. (a) Reputation change; (b) impact of the attacker; (c) impact of the threshold.

Figure 4. The four figures above illustrate the variations in social utility when using simulated data: (a) number of uavs; (b) number of vehicles; (c) upper bound of

θ

; (d) upper bound of

c_{n}^{i}

.

Figure 4. The four figures above illustrate the variations in social utility when using simulated data: (a) number of uavs; (b) number of vehicles; (c) upper bound of

θ

; (d) upper bound of

c_{n}^{i}

.

Figure 5. The four figures above illustrate the variations in test accuracy when using simulated data: (a) MNIST; (b) EMNIST; (c) SVHN; (d) CIFAR10.

Table 1. Notation summary for this article.

Variable	Description
w	Model parameters
$γ_{n}$	Rewards given by UAV n
$P_{n}$	Set of vehicles of UAV n
$p_{n}$	Number of vehicles in UAV n
$U_{n}^{i}$	Utility function of vehicle i in UAV n
$x_{n}^{i}$	Data contribution of vehicle i in UAV n
S	Coalition of UAVs
N	Total number of UAVs
$κ$	Conversion parameter from model performance to profits
$α$	Fixed constant
$Φ$	Computation energy conversion coefficient
$Θ_{i}$	Reputation of vehicle i
G	Cooperation cost
$U_{n}$	Profit function of UAV n
$q_{n}^{i}$	Dataset $q_{n}^{i}$ held by vehicle i in UAV n
$q_{n}^{- i}$	Strategy set of all vehicles except i in UAV n
$c_{n}^{i}$	Cost of vehicle i in UAV n
$V (S)$	Performance of coalition S
Q	Total data contribution $Q = \sum_{n \in S} X_{n}$
$X_{n}$	Data contribution of UAV $n, X_{n} = \sum_{i \in P_{n}} q_{n}^{i}$
$r_{i}$	Reputation value of vehicle i in one task
$δ$	Weight parameter of negative interactions
$Θ_{i}$	Cumulative reputation value of vehicle i
$Δ$	Reputation threshold
$λ$	Decay coefficient for reputation updates
$Γ_{n}$	Maximum reward that UAV n can give
g	Number of good updates
b	Number of bad updates
$η$	Learning rate
L	Lipschitz constant of the loss function
$σ$	Noise level
K	Total number of training rounds
$u^{k}$	Model parameters at round k
$u^{*}$	Optimal model parameters $u^{*} = arg {min}_{w} F (w)$

Table 2. Parameter values.

Parameters	Values
The number of vehicles $p_{n}$	$[2, 10]$
Batch size	32
The number of local epochs e	10
The number of global training rounds E	100
Learning rate $η$	0.01
SGD momentum	0.05
Hovering cost $e_{n}^{h}$	[0.5,1.5]
Circuit cost $e_{n}^{t}$	[0.1,0.2]
Noise power $(N_{0})$	$- 152 dBm / Hz$
Bandwidth of each UAV $(B)$	180 kHz
Transmitted model size $(ω)$	0.1 Mbits
Maximum transmission power $(p_{m t})$	10 dBm
Maximum CPU frequency $(f_{n}^{i})$	3 GHz
Fixed constant $α$	5
Computation energy conversion coefficient $Φ$	[0.01,0.05]
Cooperation cost G	1
Reputation threshold $Δ$	0.5
Decay coefficient for reputation updates $λ$	0.3
Model estimation parameters $θ_{n}$	[1,2]

Table 3. Social utility table with varying EDs and MOs.

Vehicles	UAVs
Vehicles	2	4	6	8	10	12	14	16	18	20
5	9.12	15.43	20.28	23.97	27.24	29.75	31.68	33.43	34.53	36.19
10	17.99	27.10	32.60	36.78	39.48	41.66	43.32	43.89	45.18	46.14
15	25.30	36.67	43.53	47.71	50.03	51.54	52.19	53.48	53.42	54.71
20	31.53	43.56	50.97	55.13	58.12	60.01	61.77	62.50	63.37	63.91
25	36.45	49.76	56.60	60.69	64.14	67.36	69.14	70.03	71.09	71.82
30	41.29	54.94	62.21	66.43	70.13	71.42	74.02	75.01	75.63	77.41
35	45.17	59.27	66.79	69.81	74.36	76.15	77.90	79.30	81.12	82.65
40	48.70	63.09	70.51	72.81	78.01	80.21	83.31	83.04	84.82	87.33
45	52.46	66.11	73.94	76.77	81.52	83.65	85.61	87.73	88.93	89.51
50	54.93	68.43	76.55	79.58	84.87	87.91	88.69	91.10	92.39	94.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, G.; Zhu, D.; Shen, J.; Hu, J.; Han, J.; Li, T. FedBeam: Reliable Incentive Mechanisms for Federated Learning in UAV-Enabled Internet of Vehicles. Drones 2024, 8, 567. https://doi.org/10.3390/drones8100567

AMA Style

Hu G, Zhu D, Shen J, Hu J, Han J, Li T. FedBeam: Reliable Incentive Mechanisms for Federated Learning in UAV-Enabled Internet of Vehicles. Drones. 2024; 8(10):567. https://doi.org/10.3390/drones8100567

Chicago/Turabian Style

Hu, Gangqiang, Donglin Zhu, Jiaying Shen, Jialing Hu, Jianmin Han, and Taiyong Li. 2024. "FedBeam: Reliable Incentive Mechanisms for Federated Learning in UAV-Enabled Internet of Vehicles" Drones 8, no. 10: 567. https://doi.org/10.3390/drones8100567

APA Style

Hu, G., Zhu, D., Shen, J., Hu, J., Han, J., & Li, T. (2024). FedBeam: Reliable Incentive Mechanisms for Federated Learning in UAV-Enabled Internet of Vehicles. Drones, 8(10), 567. https://doi.org/10.3390/drones8100567

Article Menu

FedBeam: Reliable Incentive Mechanisms for Federated Learning in UAV-Enabled Internet of Vehicles

Abstract

1. Introduction

2. Related Works

3. System Model and Problem Formulation

3.1. Basic Setting

3.2. Utility Model

3.3. Weighted-Beta Reputation Model

3.4. Problem Formulation

4. Optimal Design of FedBeam

4.1. Analysis of the DPP

4.2. Analysis of the DRP

4.3. Analysis of the CFA

4.4. Analysis of RUM

5. Experimental Results

5.1. Experiment Setting

5.2. Performance of the Proposed Wbeta

5.3. Impact of Our Proposed Mechanism on Social Utility

5.4. Impact of Our Proposed Mechanism on Test Accuracy

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI