1. Introduction
The Internet of Things (IoT) is undergoing significant changes with the rapid advancement of the big data era [
1]. Given the latest developments in IoT, especially with the introduction of new infrastructure in 2020, four out of seven keywords are closely associated with the IoT industry [
2]. This is an unprecedented opportunity for growth and an important milestone for business transformation. Unfortunately, due to the unique characteristics of IoT, such as resource constraints, self-organization, and short-range communication, IoT has been employing the cloud for outsourced storage and computation, which introduces a new set of challenging security and privacy threats. These challenges are ubiquitous and often lead to the fragmentation of information systems, preventing enterprises from achieving truly intelligent management. In an attempt to solve these problems, a significant amount of resources, including human, financial, and time, must be allocated, but these requirements may be too burdensome for many companies. To address this dilemma, IOTOS IoT centers are envisioned as a technology platform product [
3]. Broadly speaking, these IoT centers can be categorized into six different types: data centers [
4], business centers, algorithm centers, technology centers, R&D centers, and organizational centers. The concept of IoT centers was first introduced by IOTOS in August 2020 [
5]. In this specific context, the term “entity” encompasses not only physical assets, such as equipment and facilities but also a broader range of abstract concepts. Connected components involve individuals, devices, systems, algorithms, and services. An IoT center is more abstract and advanced in nature than a data center, as it typically encompasses all aspects of a collection platform, a communication center platform, and a data center platform. In addition to supporting abstract business services such as data analytics, processing and transactions, it also involves data collection and communication. It is essential for the data collection platform to perform protocol analysis and heterogeneous data processing from system facilities. In addition, the communication center platform should ensure seamless transmissions of data between local, public, and hybrid networks without location-specific constraints so that the collected data can be applied smoothly. In addition, with the advent of the digital transformation (DT) [
6] era, well-known enterprises such as Alibaba and Huawei have been discussing and exploring the digital transformation process [
7], proposing the use of the “data center” mechanism to optimize data utilization.
Data centers play a vital role in enterprises by supplying the necessary data services for various business operations. With their platform and business capabilities, data centers continuously enhance and nurture data, transforming it into an efficient and reliable set of data assets and service capabilities [
8]. Thereby enabling an agile response to new market changes and facilitating the rapid development of new front-end applications. It is crucial to note that the business generates the data, while the data center acts as its support system. In this symbiosis, business represents the yang aspect, symbolizing action and movement, while data represents the yin aspect, representing stability and substance. Collectively, they comprise a harmonious, self-sustaining, closed-loop system. The data center is a set of “let data for use” mechanisms, mainly consisting of three parts: a big data platform, a data assets system, and data service competence. In particular, the data labeling category [
9,
10,
11,
12] system is an essential part of the data resource system. The label data layer is an object-oriented data model that can unify the various identities of uniform objects in different business modules and data fields and consolidate the data of the same object according to the same granularity to form a comprehensive and rich object labeling system. With the arrival of the big data era, data labeling is gradually upgraded to intelligent labeling. By deploying intelligent labeling models in each data middleware, we can label data in a better adaptive way to match the needs of ever-changing business scenarios.
The abovementioned algorithms require nodes in the communication network to communicate with neighboring nodes at each iteration in a distributed system. However, in reality, many communications are unnecessary, and algorithms can converge well by reducing the number of communications under certain conditions. This paper proposes an event-driven communication mechanism in the communication process between nodes in a communication network, where each node only communicates with its neighboring nodes when it meets the driving conditions. The use of event-driven communication mechanisms can greatly reduce the number of communications between nodes and achieve the goal of saving network resources.
In addition, regarding the distributed data center intelligent annotation scenario applied in this paper, securing the necessary data security poses challenges, including concerns about privacy leakage, data leakage, and network security isolation. The existence of data barriers between the training models of different local databases leads to the formation of data “silos” that cannot be shared securely [
13], and we adopt a federated learning framework. Through transferring the data storage and model training phases of machine learning to local users, the privacy of the local secondary data center can be effectively secured only by interacting with the central server for model updates. Certainly, there are challenges and threats to federated learning [
14], including the apparent shortcomings in communication efficiency, the increasing difficulty of privacy and security issues, and the lack of trust and incentives [
15]. In the meantime, security threats to the federated learning process should not be underestimated [
16]. There are mainly three types: firstly, malicious clients change model updates to disrupt global model aggregation; secondly, malicious analyzers analyze model update information to predict the privacy of source data; and thirdly, malicious servers try to obtain source data from clients.
Consequently, to address these threats, in combination with high communication efficiency, optimization of the number of information interactions, and reduction of computational communication overhead in practical scenarios, as well as to achieve consistency in the security environment of distributed smart labels, we utilize the GCN model for local model labeling training in the credit consensus mechanism in the federated learning scenario, and finally in the form of a recommender system. And the contributions of the proposed method are listed as follows:
Communication problem: Aiming at the problem of limited communication bandwidth and high computational cost of local training models in the process of information interaction with the transmission of model parameters, we propose a credit formula mechanism, which determines when the models are fused and the information interaction is done through adaptive parameter settings, to achieve effective interaction between the local model optimization process, and to minimize the communication cost and the computational resources in the transmission process.
Recommendation effect problem: This paper uses the modified GCN model to design the label recommendation model, which finally makes the designed federated label recommendation system to attain the label consistency effect.
Security issues: For the data security and privacy issues in the system, this paper invokes blockchain technology in the cloud (primary data center) to achieve consistency of data annotation in each of the secondary data centers under the conditions of security and privacy.
3. Method
In this section, we present a federated learning approach that aims to achieve consistency among distributed training models for intelligent labeling recommendation systems and to ensure secure data transfer and privacy preservation during the intelligent labeling and recommendation process. To accomplish this goal, we combine the prediction problem with node classification by creating local graph convolutional network (GCN) models. These models use 10 different node features to determine the interconnectivity between nodes. Through the utilization of proprietary data, we construct private models while employing public data as a training benchmark. The node interactions can be facilitated by matrix similarity computation. Eventually, we compare the trained private models with the public data and perform a variance analysis to identify appropriate instances of information exchange. The summary of the symbols is listed in
Table 1.
3.1. Explainable Distributed Semi-Supervised Model Fusion Reputation Consensus Mechanism
Throughout this paper, the accumulated private data for the subsidiary entities show the local correlation with their respective private models, including spatial extent. The proposed distributed training framework for federated learning based on graph convolutional networks is shown in
Figure 2. To facilitate model understanding and enhance data relevance, model similarity is considered in the process of sharing data between models. Utilizing its powerful explainability, a fusion mechanism is developed to perform consistent relationship analysis and importance aggregation through relational attention. A higher model similarity means a higher relevance of the shared data received from the data provider, which ultimately leads to higher quality, accuracy, and reliability of data sharing. The reputation value assigned to each subsidiary, denoted as reputation = {attention, parameter, mean absolute error (MAE)}, is determined by corresponding coefficients,
,
,
, whereby the sum of these coefficients equals 1 (
). In the case of subsidiary 1, the similarity between said subsidiary and its own previous temporal state model is defined as:
1. Attention 1: The degree of attention can be determined by considering the interconnections formed by the 10 feature weights derived from the private data, in particular the topology of the 10 nodes that resembles a connectivity mechanism between them. It is worth noting that a higher degree of attention corresponds to a more favorable reputation.
2. Parameter 1: The reputation of subsidiary 1 depends on the parameters of the private model derived from its proprietary dataset. Significantly, the higher the parameter values, the better the reputation.
3. By substituting the data from the validation set into the previously trained private model, one can ascertain the disparity by calculating the .
4. Therefore, the
is the normalized difference between subsidiary 1 and the previous state, defined as:
Now, the reputation2 consists of the following parts:
1. The MAE of the private data obtained by substituting the private data 1 into the private model 2.
2. The Mass Median Diameter between the public data 1 and the public data 2 ().
3. The difference between parameter 1 and parameter 2 is the .
Similarly, the similarity between subsidiary 1 and subsidiary 2 is:
Therefore, the
is the normalized difference between subsidiary 1 and subsidiary 2, standing for reputation
2, defined as:
By the same token, now consists of the following parts of the reputation :
1. The generated by adding the private data 1 into the private model .
2. The between public data 1 and public data .
3. The difference between parameter 1 and parameter in .
Therefore, the
is the normalized difference between subsidiary 1 and subsidiary n, which is defined as:
Within these entities, it is important to emphasize the importance of each subsidiary’s recommended information when interacting with entity 1. Therefore, subjective views from different subsidiaries (neighboring nodes) are merged into a viewpoint global model, which can be referred to as the global model of subsidiaries, containing the weights assigned to each individual viewpoint.
Or subsidiary 1, the final reputation is:
During the integration and interaction between the subsidiaries, the private data is constantly updated, leading to the evolution of the private model trained on said data. The determination of reputation is influenced not only by the current moment but also by the public data. Hence, when calculating the reputation value, it is important to evaluate the variance of the private model and the difference of the public data. Furthermore, it is essential to integrate one’s own private data into the private models of the other interacting subsidiaries in order to identify the resulting variance values. Eventually, these variance values are merged together to construct a comprehensive global reputation. This methodology takes into account the various factors affecting reputation, resulting in a more persuasive assessment, improved integration between models, and consequently, increased consistency of annotations between subsidiaries.
To visualize the content of
Section 3.1 more intuitively, we represent it in pseudo-code form as follows, Algorithm 1.
Algorithm 1: Explainable distributed semisupervised model fusion reputation consensus mechanism |
2. 3. = difference between parameter1 and parameter n = mass median diameter between the public-data1 and the public data 2 = adding the private data 1 into the private model n. = mass median diameter between the public-data1 and the public data n = Difference between parameter1 and parameter n = normalized difference between subsidiary 1 and subsidiary 2 9. for each data center = 1, 2, … n do
13. Compute
16. //update global model |
The flexible reputation mechanism based on a private model, public data, private data, and updated data is proposed in detail, as shown in
Figure 3. The current six subsidiaries of t are summarized as follows: The first row contains only the private data model. The second row contains all public data (forming the global model established in the first two rows). The third row represents the private data at time t. For time t + n, it consists of the private model 1-n (which determines the appropriate time for its next interaction based only on the information available at time t, i.e., the value of n indicates the amount of update required), the public data 1-n, and its own private data 1.
We recommend expanding the credit value to include a variety of factors related to the intelligent labeling training of subsidiaries. These include consideration of the relationship between the model and its own prior state, as well as the differences between the model and other submodels. In particular, we consider the impact of incorporating private models into the training process of other private models, thus affecting the overall structure of pairwise model training.
3.2. Proof of Reputation-Based Fusion Mechanism
For non-Euclidean data, the connections between vertices and edges are different in each topological graph. For this reason, GCN has been chosen as a means to efficiently extract spatial features for machine learning purposes. Various techniques aimed at explaining deep models using saliency graphs have been demonstrated in recent survey studies. By this means, this paper proposes a reputation-based interpretable fusion graph based on the attention GCN network, as shown in
Figure 4. The method involves utilizing the parameters and feature roots of the model to obtain the saliency graph. As
represents a relational graph model, we opt for
as a straightforward GCN. The two-layer GCN structure designed in this paper consists of one layer with 10 nodes as inputs and another layer with the parameters of the private model. The sparse matrices used in the GCN provide explainability and help in interpreting any black-box models.
We utilize the training data to train 10 feature weights that are the same as the input parts. Together, these feature weights form a feature map, with each weight representing a different feature. By incorporating different proportions of input feature weights during training, the global reputation of the system (private model) can be efficiently established, thus solving the problem of model type saturation. Furthermore, since the convolutional layer retains spatial information, only 10 input feature weights are required to generate the model map. Unique feature maps are generated for each model structure by different weighting ratios. Nevertheless, the different feature weights among the six subsidiaries and the unique connections between the nodes lead to differences in the generated feature graphs, resulting in inaccurate training models. In addition, given the existence of information interaction and fusion dynamics between subsidiaries, the establishment of a fusion mechanism can improve the accuracy and consistency of the training model based on the information on node interactions between different subsidiaries.
In graphical data, nodes can represent objects or concepts, which are called labels. Edges represent relationships between nodes, including labels. Graph Neural Network (GNN) employs a state vector to express the state of a node. The computation of node requires the consideration of four components: the feature vector of node , the feature vectors of neighboring nodes, the state vectors of neighboring nodes, and the feature vectors of edges connected to . The computation of node is performed using a degree-normalized aggregation. These components are updated in each layer using degree-normalized aggregation, resulting in a node representation that captures task-relevant features and structural insights that can benefit a variety of machine learning tasks.
The input phase involves combining information from neighboring nodes and edges to derive a state vector for the current node. Afterwards, the node features and state vectors are merged into the output vector. The graph convolutional network (GCN) utilizes the concept of convolution in graphs where the convolution operation of a particular point is considered as a weighted sum of its neighboring points. Eventually, node classification is performed using node characteristics.
Assuming a pretrained model and 10 input feature weights, we attempt to explain the model by elucidating the relationships between the input features at the node level. These interpretations should ideally capture the features and corresponding labels that are relevant to the model’s predictions. It ensures that when an image interpretation is delivered to a pretrained model, it will generate predictions that are the same or similar to the original image. Interpreting relational data and relational models, however, presents a significant challenge, as the correct relational structure needs to be learned when interpreting the predicted nodes of interest.
In addition, the prediction output of the model is
Pr. We also consider the prediction of the unannotated data and calculate the loss through the
MAE, as a function of the minimized loss:
where can be seen that the proposed explainable strategy includes two parts in the GCN framework: (i) accurate predictions, (ii) regular terms (but fit), (iii) smooth.
The model accuracy, also known as model precision, represents the ability of a predictive model to correctly assign instances to their respective categories. It quantifies how close the model’s predicted values are to the true values. The higher accuracy indicates e higher precision of the model’s predictions, while the lower accuracy indicates lower reliability.
To achieve accurate model annotation, intelligent annotation is implemented in this context. Distributed models require asynchronous communication between subsidiaries through information exchange to ensure consistency of intelligent annotation across models. This contributes to more accurate annotation at the global model level.
In the machine learning and statistics fields, a regularization term is a mechanism for regulating model comprehensiveness. The term is usually attached to the loss function and linked directly to the model parameters. Its main goal is to mitigate overfitting. Commonly used regularization techniques include L1 regularization and L2 regularization, which impose penalties on model parameters using L1 norms and L2 norms, respectively. L1 normalization facilitates feature selection by encouraging certain model parameters to converge to zero, whereas L2 regularization reduces model complexity by bringing model parameters closer to zero. The optimal selection of the regularization terms and fine-tuning should be based on the characteristics of the particular problem and dataset at hand.
Integrating regularization terms helps enhance the model’s generalization ability and promotes its superior performance on unseen data. By delicately controlling the complexity of the model, it effectively alleviates the overfitting challenge and achieves a well-calibrated balance between model fitting and generalization capabilities. To ensure sparsity and promote improved interpretability, we incorporate regularization measures into our objective function to ensure that the learned masks maintain their desired properties.
Originally, the problem of oversmoothing manifested itself through the similarities observed between node embeddings. The ultimate goal of embedding learning is to utilize them in classification tasks to predict labels. Such oversmoothing, however, leads to generating similar embeddings for nodes that do not share the same label, resulting in misclassification. Since convolution inherently involves aggregation, it produces a smoothing effect when the convolution kernel employs specific values. Considering that parameter sharing plays a crucial role in convolutional operations, its emphasis within the graph convolutional network (GCN) framework becomes even more pronounced due to the variation of vertex degrees in the graph.
After an extended research investigation into the root cause of the smoothing problem, modified convolutional kernel regularization methods were found to have the potential to solve the problem of high-powered arithmetic. An alternative approach is to retain results at lower levels and merge them with other features to alleviate the problem of excessive smoothing. This involves using different scales of convolution and then fusing the results of these different scales of convolution to capture different features of the node.
3.3. Information Interaction Conditions
At the beginning of moment t + 1, when the first company updates its private data, it also corresponds. In addition, the private model is updated, resulting in reputation 1. This adjustment stems from calculating the mean absolute error (MAE) between the private data of the different companies and their respective private models, while also taking into account the difference in Minhash values between the private model of company 1 and the private models of the other companies. In addition, the maximum mean discrepancy (MMD) between company 1’s and the other company’s public data was calculated to further inform the real-time update of reputation 1. Despite the lack of direct interaction, the focus shifted to assessing the extent of reputational change and the desire for interaction. Identifying the exact conditions under which such interactions are initiated remains critical.
The reputation value at the next moment (
) can add a pair with the original reputation. The difference degree of the judgment:
where,
denotes the reputation value at the next moment,
denotes the reputation value at this point in time for subsidiary 2, and so on, and the updated reputation value is related to the reputation value of each subsidiary and to the reputation value at the next moment.
3.4. Asynchronous Federated Learning
3.4.1. Node Optimization Selection
The characteristics of heterogeneous computing resources and variable communication conditions in each local data center hinder the efficiency of model fusion and information interaction, as well as the aggregation of global models. Choosing when to interact with information to reduce unnecessary aggregation time and aggregation time is effective. On the other hand, the selection of nodes is also crucial in order to improve the communication efficiency. Therefore, the aim is to select a subset of nodes with the goal of minimizing the number of interactions while maximizing the precision of the aggregation model.
We introduce
in time
as the vector of the selected state of the intelligent annotation model in the local data center,
indicates the node selection interaction state, and
indicates the stopped state. Therefore, the time cost at the node selection can be expressed as:
where
is the training data of the local model
,
is the number of CPU cycles required to train model
on a secondary data center, and
is the local parameter learned in node
. The communication cost of local data center
is expressed as:
where the quality of learning (QoL) describes the accuracy of the local model parameters learned by the local data center in the node t, and the learning accuracy loss
is expressed as follows:
DRL is used to address the node selection problem. The model is learned by interacting with other local training models (secondary data center). Specifically, we use the Markov decision-making process .
3.4.2. Implementation Process of Asynchronous Federated Learning
During federated learning, the time is recorded as
and the system status includes the available computing resources of the local data center, transmission rate of data between individual local data center stations, and the selected state of the node; so the system status is:
The action of time
is the selection strategy of the information interaction node in the data center, which is indicated as follows:
The strategy that goes from the state space to the action space is called strategy P: . In time period , it can be calculated by policy . Local data center station network status is transmitted according to the node selection action.
The effect of an action is evaluated by the system using the reward function
. In epoch
, the agent tasked with node selection takes action
at state
. The behavior was evaluated against the specified reward function as follows:
The reward function
quantifies the action
in node
, where
. The total cumulative bonus is:
Next status: After the system is updated, the status changes to , where . The new status becomes , , and .
Through the selection of nodes and the method of judging when to exchange information, the total cost of federated learning can be minimized. For the DRL model, the goal is to find the maximum cumulative reward
, then
3.5. Data Sharing Process
Within federated learning, it is possible for each device or data source to train the model locally and transmit the updated model parameters to a central server for aggregation, which can update the global model. Autonomy: first, an initialized model is installed on the corresponding terminals of two or more participants, each with the same model. Afterwards, participants can use their local data to train the model. Since the participants have different datasets, the model parameters on the final terminals are different.
Global federated training requires simultaneous uploading of different model parameters to the cloud. The cloud then performs the aggregation and updating of the model parameters, returning the updated parameters to the participants’ endpoints, where each endpoint initiates the next iteration. The process is repeated until the entire training process converges. Consequently, in the scenario of adapting distributed data centers to train a machine learning model with n secondary data centers, a federated learning architecture is introduced, where their business systems have their own user data, and at the same time, the secondary data centers have a labeled data that the model needs to predict. Due to data privacy and security considerations, data cannot be exchanged directly between the secondary platforms. At this point, a federated learning system can be employed to build a distributed intelligent labeling recommender system model, and the system architecture is shown in
Figure 5.
As each local secondary middle-ground uses different categories of data according to each business system and different data of the same category, GCN models are used to train local intelligent labeling models. Simultaneously, each local model is calibrated with encrypted samples.
During the sharing request process, the data requester shares and uploads only the generic model to the cloud, whereas the private model and private data are stored in the secondary data center.
In the process of uploading the local model, under encryption protection, it is constantly interactive with the intermediate calculation results, such as gradient, step size, and so on.
In the modeling process, an intermediate party is involved as a coordinator. The proposed credit consensus mechanism in this paper can optimize the number of information interactions for locally training model parameters and minimize the communication overhead and computational resources.
Constantly update model parameters according to the interaction results and encrypt model training.
Each participant aggregates the global model and calculates the final structure combination as the final model.
4. Experimentation
In this section, the performance of our presented blockchain-authorized asynchronous federated smart labeling recommendation system is assessed. A baseline for the experiments is first defined, and we provide a comprehensive description of the dataset and experimental details. Then, the feasibility of the proposed asynchronous federated learning recommendation algorithm based on the reputation consensus mechanism is verified, and the performance of the federated intelligent labeling recommendation system based on the adaptive GCN algorithm is evaluated.
4.1. Datasets
The proposed recommendation algorithm for asynchronous federated learning of labels on the NGSIM dataset is evaluated. The NGSIM (next generation simulation) dataset consists of comprehensive US highway traffic data collected by the FHWA. The dataset consists of driving conditions for vehicles traversing US 101, I-80, and I-80 roads during a specified time frame. Obtained through camera-based techniques, the data were carefully processed into individual track points. In order to evaluate the effectiveness of the proposed method, three well-known trajectory prediction datasets were selected: NGSIM I-80 [
28], US-101 [
29], and the Apollo landscape trajectory dataset [
30].
The NGSIM (next generation simulation) I-80 dataset contains real-world traffic data collected from the I-80 highway in the United States. NGSIM is a project by the Federal Highway Administration aimed at providing detailed, high-quality data for traffic research [
28]. The data is collected from the I-80 highway in Emeryville, California, including the vehicle trajectories and motion patterns. Trajectory information includes position, speed, acceleration, and lane changes for each vehicle. Data is collected over multiple time intervals, often covering a 15-minute period. High temporal and spatial resolution, with vehicle positions recorded at 0.1-s intervals. The scenario of the data is urban highway traffic, which is particularly useful for analyzing vehicle interactions in dense traffic conditions, such as lane changing, following behavior, and merging.
The NGSIM US-101 dataset is similar to the I-80 dataset, but the data comes from the US-101 highway in Los Angeles, California. This highway has different geometric and traffic characteristics compared to I-80, providing additional diversity in the dataset [
29]. The data are collected from the US-101 highway, Los Angeles, California. The data type contains vehicle trajectories and kinematic data. Trajectory Information includes position, velocity, and lane changing behavior. The scenario is the US-101 highway features higher traffic volume and more complex on/off ramps compared to I-80, introducing different challenges such as more frequent merges and exits.
The Apollo Landscape dataset is part of Baidu’s Apollo autonomous driving platform [
30]. This dataset provides high-quality trajectory data collected in both urban and highway driving scenarios. While our highly respected trajectory dataset consists of carefully captured camera-based imagery, LiDAR-scanned point clouds, and carefully labeled trajectories. In the bustling city of Beijing, China, this comprehensive dataset encompasses a wide range of lighting conditions and varying levels of traffic density. In particular, it embraces complex, interwoven traffic patterns, seamlessly integrating vehicle movements with passengers and pedestrians. The data type includes sensor data (LiDAR, radar, cameras) along with vehicle trajectories, providing richer information compared to NGSIM datasets. The trajectory information is data on vehicle positions, velocities, accelerations, and interactions with surrounding vehicles and infrastructure.
These datasets are invaluable in trajectory prediction research. The NGSIM datasets (I-80 and US-101) are heavily focused on highway traffic scenarios, providing detailed data on vehicle behavior in dense traffic conditions. On the other hand, the Apollo Landscape dataset is broader, covering a variety of driving scenarios, making it highly relevant for autonomous vehicle development and testing across different environments.
Initially, the example dataset (10, n = 10) was divided into separate sets that could be intersected. In order to maintain temporal correlation, the imputation time t for each dataset was precisely recorded, and the differences between the datasets were calculated. As for each insertion and extraction, two key parameters of the program were used: the structural parameters and the significance parameters P1 and P2, which were shared between the five local subplatforms.
4.2. Baselines
Throughout this subsection, we provide a comprehensive comparison of our well-designed scenarios against a series of baselines consistent with the methodology outlined in the cited ref. [
31]. Additionally, we evaluate various existing solutions employing NGSIM datasets, including V-LSTM, C-VGMM+VIM, GAIL-GRU, CS-LSTM, and GRIP++.
Vanilla long short-term memory (V-LSTM) represents a specialized variant of recurrent neural network (RNN) meticulously crafted to handle the intricacies of sequential data within the realm of computer vision. Distinguishing itself from the conventional LSTM (long short-term memory) framework, V-LSTM seamlessly integrates visual information into its network architecture.
Within the V-LSTM framework, the input sequence not only contains temporal data but also integrates visual features extracted from images or video frames. This fusion enables the network to skillfully capture temporal dependencies while understanding and reviewing the sequential data through a visual lens.
Incorporating the power of LSTM with visual capabilities, V-LSTM has become a powerful force in various computer vision domains covering action recognition, video captioning, video generation, and video prediction. This fusion-based approach exhibits higher performance when compared to traditional LSTM models, especially in the field of visual sequential learning tasks.
C-VGMM is a deep learning-based image segmentation algorithm that is mainly used for semantic segmentation of images. VIM is a text editor that is often used to write code and edit text files [
31].
GAIL-GRU stands for generative adversarial imitation learning with gated recurrent units. It is a machine learning algorithm that combines generative adversarial networks (GANs) with gated recurrent units (GRUs) to learn a policy from expert demonstrations in an unsupervised manner. The GAIL-GRU algorithm is commonly used in reinforcement learning and imitation learning tasks [
32].
Contextual semantic long short-term memory (CS-LSTM) is a neural network model used to deal with text sequence modeling tasks, such as sentiment analysis and named entity recognition. It is an improvement on the traditional LSTM (long short-term memory), and the mechanism of introducing contextual semantic information in CS-LSTM can help the model better understand and capture the dependencies between contexts, which improves the performance of text series modeling tasks.
GRIP + + serves as an improved scheme for GRIP, which utilizes fixed and dynamic graphs to capture complex interactions between different types of traffic agents to improve trajectory prediction accuracy [
33].
4.3. Experiment Settings and Evaluation Criteria
The execution of our scheme is on a desktop running Ubuntu 16.04 with a 4.0 GHz Intel Core i7 CPU, 32GB RAM, and an NVIDIA Titan Xp graphics card. Every dataset is randomly partitioned, allocating 70% of the data for training, 20% for validation, and 10% for testing purposes. Each of the methods employs two hidden graph neural network layers (e.g., GCN, graph pages, etc.) with layer sizes precisely matching the number of classes in the dataset. Concretely, the NGSIM dataset is partitioned into 100 segments: 70 segments are used as the training dataset, 20 segments for validation, and 10 segments for testing while the task of sharing edge data involves the propagation of computational results from each data provider’s local data. Ten nodes representing 10 labels are used for model training, and global aggregation is performed after each iteration as a continuous optimization process.
4.3.1. Asynchronous Federated Learning Fusion Optimization Accuracy
From
Figure 6, it can be concluded that as the number of experimental cycles increases, the accuracy obtained by our test set continues to improve, eventually leveling off and remaining above 95%. With this observation, we confirm the feasibility and validity of our experimental approach. It is clear that our proposed reputation-based fusion mechanism does improve the accuracy of intelligent model labeling.
4.3.2. Asynchronous Federated Learning Fusion Optimization Loss Function
Figure 7 shows the loss values for model training. There is a decreasing loss function with the increasing number of iterations, which eventually converges to a steady state. For further illustration of the efficiency of the proposed reputation,
Table 2 and
Table 3 give the loss values for different participants at epochs 1 and 30. As can be seen from
Table 2 and
Table 3, the accuracy of the model is greatly improved for each subject as the number of epochs increases.
5. Result
Afterwards, we continued to validate the correlation between our utilization of reputation values and the importance given to the different graphical snapshots in the GCN. Although reputation values can be compared to attention weights, it is important to determine whether these reputation values are truly consistent with the importance assigned to various graphical snapshots. It is notable that attention weights may not always serve as a strictly interpretable metric. Additionally, we successfully demonstrate the applicability of the proposed GCN by elucidating its ability to explain the behavior of distributed graph models in real-world application scenarios where the importance of graph snapshots is expected to fluctuate over time.
In graph neural networks, the establishment of connections between nodes relies on an attention mechanism that attributes varying degrees of importance to different node connection patterns. As a consequence, for datasets where node labels are uniform across all time intervals, we have the ability to assign higher connection probabilities to nodes belonging to a specific class at a given time step, which fundamentally enhances their relevance in predicting labels.
It has been established by the experiments presented in the GCN-SE paper that there is a strong correlation between attention and importance. Additionally, it has been established that the importance of graph topology and node attributes may fluctuate depending on the model and data used. In order to validate this correlation, a “perturbation” was introduced in the graph snapshots, and the link between the accuracy fluctuations and the attention weights within GCN-SE was investigated. Such an adjustment greatly helps to confirm the relationship between attention and importance.
These are graphs of the consistency results of the epoch communication in the distributed scenario in
Figure 8. As the epoch progresses from 0 to 30, the overall accuracy gradually changes. During this period, the model undergoes five fusion and interaction processes over 30 iterations, but fails to maintain an accuracy level above 95%. These results show that under normal conditions (epoch communication), as the number of epochs increases, the accuracy of model training is not only in an up and down state but also fails to guarantee a consistency of more than 95% or even close to 90%, which is completely unable to satisfy the need for intelligent labeling consistency.
In comparison with
Figure 8, the dotted line in
Figure 9 is relatively stable, and although it still fluctuates up and down, the accuracy of model training can be maintained above 95%. More importantly, the number of model fusions and information interactions is reduced due to the role of the reputation-based model fusion mechanism we designed. The reason for this is that the mechanism determines when to interact based on the difference, which reduces the computational cost of the unnecessary information interaction process and effectively improves the model utilization. As the epoch progresses from zero to thirty, the consensus graph fluctuates during the fusion process but always ensures that it remains above the 95% threshold. Through the utilization of the fusion mechanism, the model only requires four iterations of fusion and interaction to achieve its goal.
Figure 8 shows the fused information interactions, i.e., between subsidiaries, so that the consistency can be maintained above 95%, while the timed communication is updated only four times, thus satisfying the condition of consistency above 95%, enhancing the consistency accuracy, reducing the update time, and reducing the computational overhead.
Figure 10 illustrates the correlation between the number of updates and the number of iterations for the four different datasets. It is clear that all four datasets exhibit remarkable consistency, maintaining over 95% accuracy. This achievement is accompanied by a synchronized reduction in the frequency of information exchange, thus mitigating the computational communication overhead. Remarkably, the fourth dataset achieves the remarkable feat by reducing this overhead to only three instances.
To further validate the effectiveness of the proposed method, three other federated learning frameworks (distributed learning algorithms) were included in the experimental results in the revised manuscript, given as follows:
FedAVG method: FedAvg is a classic method in federated learning that proposes a distributed training approach. The core idea is to distribute the training process of the model to multiple clients (such as user devices); each client uses local data for model training and then sends the locally updated model parameters to the server. The server weighs and averages these parameters to generate a new global model.
FedProx method: FedProx is an improvement on FedAvg specifically designed to address the issue of data and client computing power heterogeneity in federated learning. It limits the offset of local model updates on the client side by introducing a proximal term in the optimization objective, preventing it from deviating too far from the global model. The main steps are similar to FedAvg, with the difference being that the client includes a regularization term during local optimization so that the local model does not differ too much from the global model.
Proof of training quality blockchain-based federated learning (PoTQBFL): It combines model training with the consensus process, which can make better use of the users’ computing resources. For a specific data sharing request, members of the consensus committee are selected by retrieving related users for the request in the blockchain. The committee is responsible for driving the consensus process, as well as for learning the model.
Table 4 presents a comparison of average displacement error (ADE) and final displacement error (FDE) for our method against three baseline methods. Although our model did not achieve state-of-the-art results overall, it demonstrates notable improvements compared to specific methods. While our model leads in all metrics, its practical alignment with real-world scenarios and unique contributions to data-sharing security make it a valuable approach.
In
Figure 11, it is evident that the total number of interactions in
Figure 9 is aligned with the corresponding number of iterations. For example, in
Figure 9a, it corresponds to the initial black mark “+” in
Figure 8, and so on. Also, it is worth noting that the model consistently shows improvement as training progresses. Remarkably, in
Figure 9d, the fourth experiment achieved effective results using only three interaction instances, thus saving resources and communication costs while increasing the value of data utilization.
The comparative results for the baseline (given in Section IV.B) are shown in
Figure 12. By conducting the comparison experiments, it is clear that the results obtained from the various datasets consistently show an upward trajectory. Moreover, the results obtained from the alternative datasets show a commendable level of consistency. The method introduced in this paper clearly outperforms other methods, not only achieving more than 95% label training consistency within a distributed data platform but also exceeding the expected results. These results validate the feasibility and effectiveness of the method proposed in this paper.
6. Conclusions
In this paper, we presented a novel and sophisticated reputation-based interpretable distributed semisupervised fusion mechanism. This mechanism enhances distributed intelligent labeling systems by reducing computational resources and communication overhead while improving labeling consistency and accuracy. Through the integration of multiple perspectives, it skillfully captures structural information and ensures coherence in the task of intelligent annotation. Experimental results clearly demonstrate the superiority of the approach. Remarkably, the annotation information obtained from different subsidiaries is complementary, thus emphasizing the potential for improving system performance through a well-designed fusion mechanism.
However, the proposed method may increase computational overhead because the proof of reputation adds extra steps for verifying the quality of local training on client devices. This verification process can increase computational requirements, especially for resource-constrained devices such as mobile phones or IoT devices. In addition, the scalability issue is another potential limitation. As the number of clients increases, managing and verifying training quality proofs from all participating clients can become computationally and logistically challenging. This impacts the scalability of the system for large-scale deployments. Also, the method proposed in this paper still needs improvement in combating malicious nodes. Although the proposed reputation mechanism aims to ensure the integrity of local training, there may still be vulnerabilities in cases where malicious clients find ways to manipulate the training process or proof generation. In the revised manuscript, we added the limitations of the proposed technique in the conclusion section. According to the limitations of the proposed technique, the first future scope is optimizing computational efficiency. Future research can focus on developing lightweight verification methods that reduce the computational burden on clients while maintaining accurate proof of training quality. And then, a scalable proof mechanism is necessary. New methods to scale reputation mechanisms efficiently, such as batching proofs or creating hierarchical verification systems, will be critical for applying this approach to large-scale federated learning systems with thousands of clients. In addition, the security enhancement and improved incentive structure are needed to be improved to avoid attacks on malicious nodes. Research into more robust mechanisms to ensure secure and tamper-proof proof generation will help strengthen reputation mechanisms against adversarial attacks or malicious behavior by clients. Incorporating better incentive mechanisms to encourage clients to provide high-quality contributions and proofs can enhance the overall model performance and foster more reliable participation from clients.