Next Article in Journal
Data Collection Mechanism for UAV-Assisted Cellular Network Based on PPO
Next Article in Special Issue
Study on Downhole Geomagnetic Suitability Problems Based on Improved Back Propagation Neural Network
Previous Article in Journal
Dynamic Capability Theory Based Study on Performance of Intelligent Manufacturing Enterprise under RFID Influence
Previous Article in Special Issue
Spatio-Temporal Heterogeneous Graph Neural Networks for Estimating Time of Travel
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Design and Application of Security Federation Algorithm

School of Information Science and Engineering, Yanshan University, Qinhuangdao 066000, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(6), 1375; https://doi.org/10.3390/electronics12061375
Submission received: 17 February 2023 / Revised: 8 March 2023 / Accepted: 11 March 2023 / Published: 13 March 2023
(This article belongs to the Special Issue Intelligent Analysis and Security Calculation of Multisource Data)

Abstract

:
(1) Background: To avoid affecting the accuracy and practicability of the security federation model due to the geographical and environmental factors involved in each local model and set the corresponding weights for each local model, the local model parameters and weights participated in the calculation at the same time. (2) Methods: Apply the improved model to the income evaluation of taxi drivers. Multiple linear regression was used to fit the local model parameters, and the loss function value was calculated. Then, according to the improved security federation algorithm, the model parameters and local model weights were encrypted by using the Paillier homomorphic encryption algorithm, and the encrypted model parameter information was uploaded to the aggregation server for aggregation average. (3) Results: The experimental results show that after 1000 iterations, the accuracy curve converges in the interval [0.93, 0.97]; the mean accuracy value was 94.27%, and the mean loss function value was 1.0886. It was the same understanding that the mean value of the loss function calculated by the traditional model was 1.9910. (4) From the model and data, the accuracy of the improved model has been improved. It can better reflect the income of taxi drivers.

1. Introduction

With the continuous iterative evolution of new business forms and new service modes, large Internet companies have accumulated a large amount of data using collection, retention, exchange, and derivation in the process of serving users. In the ubiquitous network environment, the frequent cross-border, cross-system, and cross-ecosystem interaction of data has become the norm driven by information services, which increases the intentional or unintentional retention of private information in different information systems, resulting in information leakage [1]. If data are exposed to real-time disclosure, people will gradually become “transparent people”. To prevent information leakage, many enterprises, organizations, and individuals use data only in the domain. They isolate a large amount of data stored by themselves and gradually form a “data island” [2]. This phenomenon is becoming more and more common, which inhibits the development of the purpose of “intelligent interconnection of all things and ubiquitous sharing of information”. Therefore, the problem of data security needs to be solved urgently [3]. Trusted federated learning first solves the problem of data protection. Trusted federated learning is a secure and trusted multiparty machine learning paradigm, which plays a good role in privacy protection in solving practical engineering problems. However, in the federated learning framework, because each participant or each local model is in a different scenario, such as economic environment, geographical location, population density, etc., which affects the differences between models and makes them have certain “external characteristics”, this poses a certain threat and hidden danger to data information protection. Therefore, how to adjust the federated learning algorithm to better protect and ensure that the model information is not attacked and the loss function is minimal becomes the bottleneck of research.
In recent years, many scholars at home and abroad have become more and more enthusiastic about federal learning, and their advocation for federal learning is also growing. In 2019, Professor Yang Qiang proposed the concept of federal learning, including horizontal federal learning, vertical federal learning, and the federal transfer model [4]. To improve the accuracy of ground vehicle GPS positioning, Devon established a horizontal federated learning model and compared the average method, maximum likelihood estimation method, covariance trajectory optimization method, and covariance intersection method. With the increase in the number of systems, the position error decreases (loss function convergence) and higher robustness is obtained [5]. Baranda uses transverse federation learning to describe and categorize athletes’ spine postures and to identify types of sagittal spine morphologies [6]. Due to 5G network processing edge computing resources and facing power supply constraints, Baghban. H proposed a joint multidimensional fractional backpack algorithm to reduce operating costs, which saves 50% energy consumption compared with nonfederated algorithms [7]. Chen Shaoqi proposed FL-QSAR, a prototype platform for modeling collaborative drug discovery QSAR based on federal learning. Using the newly developed collaborative privacy protection learning framework, he broke down barriers between pharmaceutical institutions and promoted drug cooperation and development [8]. Liu Yijing used the distributed learning model of federated learning to propose an efficient RAN slicing device association scheme to improve network throughput while reducing switching costs. Finally, the scheme achieves significant performance improvements in network throughput and communication efficiency [9]. Topaloglu proposed a model to assess the data contributions of federal learning implementation, looking for convenience and adaptability in terms of computational constraints and costs [10]. Zhu Hangyu proposed a neural structure search method based on reinforcement learning, evolutionary algorithm, and gradient [11]. Rajendran’s specific processes for improving performance vary based on ML models and how federal learning is implemented, and it has been demonstrated that the order of institutions during training does affect overall performance improvement [12]. Subramanya designed an exploratory visual analysis system for the horizontal federal learning process, assessing each customer’s contribution, and verifying the effectiveness of the system [13]. Zhu Xiao proposed a horizontal federal PCA differential privacy data publishing algorithm, which effectively protects the privacy of local data and published data and has higher usability than similar algorithms [14]. Lau proposed a multicriteria risk assessment system based on federal learning, which combines federal learning with the best–worst method (BWM), measures enterprise cold chain risk under a proposed risk hierarchy, and enhances the autonomy in assessing cold chain risk [15]. Liu Yinghui proposed an asynchronous convergent federated learning method considering stale coefficients. At the same time, the block chain network was used instead of the classic central server to aggregate the global model with high accuracy [16]. Ge Ning designed the Federal Support Vector Machine and Random Forestry algorithm models, built a production line fault prediction model based on federal learning, and finally showed that federal learning can replace centralized learning for fault prediction [17]. The above studies all show the differences between edge computing and federal computing and play a role in the application of federal learning.
To study the income of yellow taxi drivers in New York City, the traditional horizontal federated learning algorithm is selected. However, due to certain differences among local models, if each local model directly participates in the global evaluation, the universality of the model will be reduced. How to adjust the safety federal model to minimize the loss function has become a bottleneck problem in the research.
Based on horizontal federation learning, this paper improves the security federation algorithm. Before the local model parameters are uploaded to the aggregation server, the weights are added to each participant, which not only realizes the zero differentiation of model parameters but also the “secondary protection” of model data. The weight will be encrypted together with the local model parameters and uploaded to the aggregation server. The weighted average of global model parameters is based on the reweighting of local model parameters to realize the standardized description of practical problems. Next, the improved safety federation model is applied to the income evaluation of yellow taxi drivers in New York City, and the advantages of the improved model are reflected by comparing the value of the loss function with the traditional model.

2. Horizontal Federal Learning Framework

Horizontal federated learning is when the data sets of all participants in federated learning have the same feature space and different sample space, which is similar to the horizontal division of data in the table view. For example, the customer groups of urban commercial banks in the two regions have a small intersection. The IDs of their data sets are different, but the business types are very similar. Therefore, the feature space of their data sets is the same [4]. The formal description is
x i = x j , y i = y j , I i I j , D i , D j , i j
where D i and D j represent the data sets owned by Party i and Party j , respectively. We assume that the data feature space and label space pairs of the two parties, namely ( x i , y i ) and ( x j , y j ) , are the same. However, the customer ID space of the two parties does not intersect, or the intersection is very small. The specific feature space and sample space of horizontal federated learning are shown in Figure 1:

2.1. Horizontal Federal Learning Training Process

A typical client–server architecture for a horizontal federated learning system is shown in Figure 2.
The training process of a horizontal federated learning system is usually divided into four steps [18]:
Step 1: Each participant calculates the gradient of the model locally, uses encryption technologies such as homomorphic encryption, differential privacy, or secret sharing to mask the gradient information, and sends the masked results to the aggregation server.
Step 2: The server performs security aggregation operations (such as using a weighted average based on homomorphic encryption).
Step 3: The server sends the aggregated results to all participants.
Step 4: Each participant decrypts the gradient after receiving it and updates their model parameters with the decrypted gradient results.

2.2. Global Model

The model performance of the lateral federated learning model was obtained by testing all participants on the test data set. The model performance can be expressed as precision, accuracy, recall, etc. [19].
In the model training process of horizontal federated learning and after the model training, the global model performance is obtained:
Step 1: The k -th participant uses the local test data set to evaluate the existing horizontal federated learning model. For the second classification task, this step will generate the local model test result ( N T P k , N F P k , N T N k , N F N k ) ; Party k = 1 , 2 , , K performs this operation.
Step 2: The k -th participant sends the local model prediction result in ( N T P k , N F P k , N T N k , N F N k ) to the coordinator. Party k = 1 , 2 , , K performs this operation.
Step 3: After collecting the local model prediction results in { ( N T P k , N F P k , N T N k , N F N k ) } k = 1 K of K participants, the coordinator can calculate the global model performance test results. For example, for a two-category task, the global recall rate can be calculated by k = 1 K N T P k k = 1 K ( N T P k + N F N k ) .
Step 4: The coordinator sends the calculated global model performance to all participants.

2.3. Federal Average

The federated averaging algorithm applies to any of the following finite-sum loss functions:
min w R d f ( w ) = 1 n i = 1 n f i ( w )
where n represents the number of training data, and w R d represents the model parameters of d dimension.
For machine learning, f i ( w ) = l ( x i , y i , w ) is generally selected, l ( x i , y i , w ) represents the loss result obtained by predicting sample ( x i , y i ) on a given model parameter w , x i and y i represent the i -th training data point and related labels, respectively.
Suppose there are K participants. In a horizontal system learning, let D k represent the data set owned by the k -th participant and P k represent the index set of data points located at the customer k .
Let n k = | P k | represent the cardinality of P k , so, the k -th player has n k data points. Therefore, when there is a total of K participants, the loss function is
f ( w ) = k = 1 K n k n F k ( w )
F k ( w ) = 1 n k i P k K f i ( w )
For many models, the computational cost is minimal for the communication cost, so we need to use additional calculations. To reduce the communication rounds required by the training model, there are two main methods to increase the calculation [4]:
Increase parallelism: Let them conduct model training independently between communication rounds.
Add calculations in each participant: Each participant can perform more complex calculations between two communication rounds, such as multiple local model update iterations, rather than just simple calculations such as gradient calculations for a single batch.

3. Scheme Design Based on Weighted Federal Average

3.1. Objectives and Requirements

To better protect local data, the improved model described in this section takes into account the user data quality and potential security threats in the aggregation process, introduces weight coefficients into the traditional federated average algorithm, and proposes a new client–server architecture and scheme based on a weighted federated average.
Theoretically, the performance of a model and whether it will be applied will be evaluated from two aspects: accuracy and recall. Considering the particularity of the model and application scenarios, the improved client–server framework of horizontal federated learning based on a weighted average should pay attention to privacy, efficiency, and security.

3.2. Specific Scheme Design

The newly introduced weight is calculated according to the contribution of the model parameters trained by the user, and it is modified according to the training times and learning rate of each iteration. Therefore, the weight parameters of the same data trained under different models are different. The specific improvement scheme is as follows.

3.2.1. Identification and Authentication

Identity authentication is an important prerequisite for the whole training. It will accompany the whole training process. The key information given by the third party to the participants and the aggregation server is convenient for the server to determine the clients participating in the training anbd to avoid malicious attacks and enemy attacks, which will affect the global model training. The participant’s user identity information or data characteristics will show multiple groups of characteristic attributes. The key pair used between the user and the server is generated by the key generation center under the identity-based password constitution [16].

3.2.2. Model Initialization

After the identity authentication and key pair are solved, the aggregation server will broadcast the parameters initialized by the local model to all participants who complete the authentication and determine the training model and goal of each user. The participant training model adopts the gradient descent method to complete the local update of parameters. In the process of data aggregation, due to the problem of data quality, the weight parameters are called “seasoning agents” to give full play to the model parameters.
This scheme involves the data protection of participants. Because the whole process requires data encryption, and then adding or multiplying the encrypted ciphertext, the Paillier algorithm is adopted in this section. The new improved scheme proposed in this section adds weights, and the aggregation method is data parameter weighted summation. The Paillier algorithm is just an additive homomorphic encryption algorithm.
The encryption and decryption key for Paillier homomorphic encryption needs to be generated during initialization:
(a)
Key generation
Select two large prime numbers p , q
Calculate N = p q and λ = lcm ( p 1 , q 1 ) .
Select an integer g Z N 2 * , s.t. gcd ( L ( g λ mod N 2 ) , N ) = 1 , that is, they are prime numbers, where L ( u ) = u 1 N .
Public key is N , g ; Private key is λ .
(b)
Encryption (using public key)
Select a random number; r Z N * is the random source of probabilistic encryption.
Plaintext m corresponds to ciphertext:
c = Enc ( m , r ) = g m r N mod N 2
(c)
Encryption (using public key)
The plaintext c corresponding to the ciphertext is
m = D e c ( c ) = L ( c λ mod N 2 ) L ( g λ mod N 2 ) mod N
The above is the specific process of generating key, encryption, and decryption by using the Paillier addition homomorphic encryption algorithm. lcm ( a , b ) refers to the minimum common multiple of a and b, and gcd ( a , b ) refers to the maximum common divisor of a and b.
The security of the algorithm is based on the difficulty of large integer decomposition, that is, we cannot decompose the large integer N to obtain two prime factors p , q .

3.2.3. Training Local Model

After the model is initialized, the participants calculate the local optimal value of the model by using the local data set according to the model parameters broadcast by the aggregation server. Before local model training, the aggregation server broadcasts to all participants to initialize the model parameter group ( W g 0 , η , T ) and the objective function f ( x ) :
f w , b ( x ) = k = 1 K ( w k x k + b 0 )
where K represents the number of data sets the user has, b 0 is the threshold, W g 0 represents the initialization parameter, η is the learning rate, and T is the number of iterations. Loss function L ( f w , b ( x ) , y ) :
L ( f w , b ( x ) , y ) = 1 2 n i = 1 n ( f w , b ( x ) y i ) 2 = 1 2 n i = 1 n ( k = 1 K ( w k x k + b 0 ) y i ) 2
To make the model optimal, it needs to go through multiple iterations to minimize the value of the loss function. At this time, the difference between the objective function value and the real value is the smallest. At this time, the higher the accuracy of the model, the more optimal the parameters of the local model, and the objective function and solution process is [17]
arg min w , b L ( f w , b ( x ) , y ) = arg min w , b 1 2 n i = 1 n ( f w , b ( x ) y i ) 2
Find the partial derivative and the gradient of the loss function:
g w k = L ( f w , b ( x ) , y ) w k = ( 1 2 n i = 1 n ( k = 1 K ( ( w k x k + b 0 ) y i ) 2 ) ) w k = k = 1 K ( ( w k x k + b 0 ) y i ) k = 1 K ( w k x k + b 0 ) w k = ( f w k , b ( x ) y ) f w k , b ( x ) x

3.2.4. Upload Local Model Parameters

Due to the data quality of participants, it is necessary to provide weight for each user and give play to the “parameter advantage” for the federation aggregation of the global model. Weight calculation method:
ψ i = δ j = 1 d ( w j i w g , j i σ j ) 2
where ψ i represents the parameter weight value of the user in the i iteration, d represents the number of parameters in each group of data, w j i w g , j i represents the difference between the j local parameter and the global parameter in the i iteration, σ j represents the standard deviation of the j parameter, and δ is used to adjust the weight value.
Next, according to the secret key group received in the initialization stage, the user encrypts the weight parameters with Paillier homomorphic encryption algorithm to generate ciphertext [20]:
[ [ w i , j k ] ] = E n c ( ψ i w i , j k , ( g , N ) )
Then, the ciphertext is uploaded to the aggregation server, and the aggregation server calculates the global model parameters. The aggregation server needs to verify the message source and record the time used by each user in this model training to select the aggregation object. After calculating the ciphertext, the participants need to construct a secure message group to complete the interaction of ciphertext [21]. The detailed structure is as follows:
M = S K ( [ [ w i , j k ] ] I D i n o n c e )
Among them, [ [ w k ] ] is the weight parameter ciphertext, I D i is the user label, and the nonce is the fresh random number generated in the authentication and key negotiation stage.

3.2.5. Update Global Model Parameters

Aggregate the ciphertext parameters [ [ w k ] ] uploaded by the user, encrypt the weight parameters with the Paillier homomorphic encryption algorithm to generate ciphertext, and complete the calculation of global parameters:
[ [ w g , j t ] ] = Π i = 1 S E n c ( ψ i w i , j t ) = E n c ( ψ 1 w 1 , j t ) E n c ( ψ 2 w 2 , j t ) E n c ( ψ S w S , j t ) = E n c ( ψ 1 w 1 , j t + ψ 2 w 2 , j t + + ψ S w S , j t ) = E n c ( i = 1 S ψ i w i , j t )
After the calculation, the global model is trained. For the problem of data quality, the weight parameter is added to adjust its proportion in the aggregate average. The aggregation server broadcasts the final global model results to the user. The user uses the key distributed by the third party to decrypt the ciphertext and update the local model. If the training reaches the maximum time, or the accuracy reaches the preset range, and the number of iterations reaches the upper limit, the training will stop; otherwise, the training will be retrained. The specific flow chart is shown in Figure 3.

4. Experimental Analysis

4.1. Experimental Data

This paper selects the official website of Taxi & Limousine Commission in the United States. The data content is the order track data of yellow taxis in New York City in the first 14 days of January 2022; the yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data content is shown in Table 1.

4.2. Data Preprocessing

For the original data, format conversion shall be carried out to eliminate the wrong data, invalid data, and redundant data. By sorting the “Total_amount” column in the data, delete the column where the nonpositive value of this column is located. If the total cost value is lower than 0, it is invalid data. At the same time, delete the column where the “high taxi fare” (value: 401095.62) is to ensure that the total cost is within a reasonable range. According to the time sequence, the order time in this paper is subject to the order start time, excluding the data that the order start time is not between 1 January 2022, and 14 January 2022. After data preprocessing, the original 1,048,576 pieces of data are reduced to 1,036,791 pieces. The overall income of yellow taxis in New York City is shown in Figure 4.

4.3. Building Local Models

The data used in this paper are the yellow taxi data records of 14 days in New York. The daily data of New York is established into a local model, which is equivalent to one of the “client” of federal calculation, and the boundary node is calculated according to the order start time. The data for the whole day on 1 January are intercepted, and the drivers’ income is shown in Figure 5.
Due to the cultural differences between China and the West, passengers in New York City need to pay various fees for taking taxis, such as tips, tolls, taxes, additional surcharges, congestion fees, and improvement fees. Due to the great uncertainty of tips, it is difficult to establish the model. In addition, other fees are small, fixed fees, except for tips, which rarely affect the overall income level. Therefore, when establishing the local model, the order time, the number of passengers, the order driving distance, and the actual taxi cost are fitted by multiple linear fitting. The parameter information of the local model is shown in Table 2.
Among them, the intercept value in Table 2 is b 0 , and other values are the local model parameter weight w k . The gradient descent method is adopted for model training because of the gradient:
g w k = ( f w k , b ( x ) y ) f w k , b ( x ) x
Therefore, when the value of f w k , b ( x ) y is infinitely close to 0, the model reaches the optimal state. For the above local model parameters, the loss function L ( f w , b ( x ) , y ) of each model is calculated. The specific parameters are substituted into the loss function calculation formula, and the loss values of each model are shown in Table 3.

4.4. Local Parameter Upload

To avoid the differences between various local models and better protect the parameter information from being leaked, the server completes the training with stable participants and specially defines the weight for each participant, so that each participant can participate in the federation aggregation operation at the same level. According to the weight calculation formula, the weight value of the first iteration of each client is obtained, as shown in Table 4.
According to the calculation formula of global model parameters:
[ [ w g , j t ] ] = Π i = 1 S E n c ( ψ i w i , j t ) = E n c ( ψ 1 w 1 , j t ) E n c ( ψ 2 w 2 , j t ) E n c ( ψ S w S , j t ) = E n c ( ψ 1 w 1 , j t + ψ 2 w 2 , j t + + ψ S w S , j t ) = E n c ( i = 1 S ψ i w i , j t ) w g , j t ψ 1 w 1 , j t + ψ 2 w 2 , j t + + ψ S w S , j t = i = 1 S ψ i w i , j t
Try to obtain the plaintext weight parameters participating in the global model. After adjusting with δ , the results are shown in Table 5.
For plaintext m 1 and m 2 , add homomorphic encryption:
E ( m 1 ) = g m 1 r 1 n mod N 2
E ( m 2 ) = g m 2 r 2 n mod N 2
E ( m 1 ) E ( m 2 ) = g m 1 + m 2 ( r 1 r 2 ) n mod N 2
E ( m 1 + m 2 ) = g m 1 + m 2 r 1 n mod N 2
D ( E ( m 1 ) E ( m 2 ) ) = m 1 + m 2 = D ( E ( m 1 + m 2 ) )
The above is the nature of additive homomorphic encryption. The weight of the participants and local model parameters in this paper will be uploaded to the aggregation server, so according to the above nature:
[ [ w i k ] ] = E ( ψ i w i , j k , ( g , N ) ) = g ρ i w i t r i n ( mod N 2 )
[ [ w i k ] ] λ = g ρ i w i t λ r i λ N ( mod N 2 )
where λ is the private key. After the definition generated by the Paillier key, obtain the plaintext:
w i k = L ( [ [ w i k ] ] λ ) L ( g λ ) = N α ρ i w i t λ N α λ = ρ i w i t
Because the public and private keys are changed in each iteration, the encrypted ciphertext value will also change each time. In addition, the ciphertext length is considerable. The ciphertext with only two parameter values is listed here, as shown in Table 6.

4.5. Update Model Parameters

After the local model parameters and client weights are uploaded to the aggregation server, the aggregation server performs normal aggregation operations according to the protocol requirements and completes the aggregation calculation of local model parameters in the ciphertext state.
During each iteration, the algorithm will select the participants whose quantity ratio is ρ . When ρ = 1 , it means that the gradient of all data held by all participants decreases in the whole batch. In the t iteration of the global model weight update, the k participant selected will calculate g k = F k ( w t ) , that is, the average gradient obtained by using local data under the model with the current parameter of w t , and then the server aggregates the gradient to calculate the loss function. The aggregation server completes the aggregation averaging operation and broadcasts the aggregated averaged parameters to each participant according to the trusted key pair of the trusted third party for updating each local model:
w t + 1 w t η k = 1 K N k n g k
Among them, η represents the learning rate.

4.6. Update Model Parameters

Firstly, plaintext data are used to participate in the calculation, and plaintext data are used for Paillier homomorphic encryption. After encryption, the data participating in the calculation and decryption correspond 100% to the plaintext, which also proves the availability and accuracy of Paillier, which is in line with the original idea of privacy calculation. The ciphertext (part) after aggregation average is
21595496292932768166433183401932295686888324907433148045378518021694888033094776148514 **** 224136040722433222096253194275188298103856990576077698.
314223982137955375407799678571118500191580455433313129110711964229401836733344558 **** 6051796337913784636444060936490979695777088055406290942001076.
691412498812284842471944703714880581308999024752208726862352964760298736105 *** 48251591476989509945605198810645098425292042433130466743904791.
The plaintext data after decryption of the above ciphertext are 0.00020918338242142856, 0.11335807857142856, 2.330568062142857.
Next, in the whole iterative calculation process, 10 and 1000 iterations are selected for research, and their convergence, accuracy, and loss are discussed. Due to the large computational power of ciphertext calculation data and the high cost of data communication, the number of iterations is taken as 10. The iteration is shown in Figure 6.
The stars in Figure 6 represent the exact values of each iteration of the model. Obviously, it is difficult to achieve the convergence effect in 10 iterations, but it can be seen that the accuracy of the global model is on the rise. By the 10th iteration, the accuracy has reached 72.31%. To intuitively see the convergence of the whole model, change the iteration parameters to 1000 times. See Figure 7 for the iteration at this time.
In Figure 7, the stars represent the exact values of each iteration of the model, the model accuracy value converges and reaches a very high level. According to the previous iteration results, the accuracy fluctuates up and down in the range of [0.93, 0.97], with a maximum value of 96.24%, and the accuracy after the 1000th iteration is 94.27%. At the same time, at the beginning of the iteration, it can also be found that the accuracy curve shows an upward trend, which corresponds to Figure 6. To study the convergence rate of the model, the first 100 iteration images are intercepted in combination with the image shown in Figure 7, as shown in Figure 8.
As can be seen in Figure 8, the stars represent the exact values of each iteration of the model, during the 100 iterations before the experiment, the accuracy of the model generally showed an upward trend, reflecting the closer and closer fit between the experimental value and the real value. It is determined that the model calculation converged after approximately 80 iterations.
Finally, calculate the loss function value, which represents the numerical loss of experimental data after iterative experiments. In the process of 1000 iterations, each iteration produces a loss value, and the distribution of all loss values is shown in Figure 9.
As can be seen from Figure 9, the stars represent the exact values of each iteration of the model, according to statistical theory, the loss value of most functions is distributed between 0.5–1.5, which is lower than that of the local model. However, to more intuitively illustrate this advantage, the average value of the 1000 loss values is taken as the effect index after the model calculation. Similarly, the traditional security federation model is calculated, that is, each local model is not assigned a weight, and the local model parameters are directly encrypted to participate in the aggregation average process. Since the above figure shows the good stability of the model, the average value of the function represents the loss level. The final loss function average is 1.9910, and the improved model loss function average is 1.0886, both lower than the loss value of the local model parameters. See Figure 10 for details.
In Figure 10, the hollow dot represents the average loss function calculated by the traditional security federation model of 1.9910, the hollow box represents the average loss function calculated by the improved security federation model of 1.0886, and the blue solid line represents the actual loss value of each local model. In terms of numerical value, the weighted improved security federation model is lower than the traditional model, and lower than each local model. Compared with the loss value of the local model, the loss function value of the traditional federated model is high and low, and the average value of the two is almost the same.
Through experiments, it is proved that by adding and setting local model weights, the model under double encryption is closer to the optimal level of federal computing than the traditional horizontal federal learning, is more helpful to predict and evaluate the income of yellow taxi drivers in New York City, and more reflects the robustness and adaptability of the improved model.

5. Conclusions

This paper completed the improvement of the security federation model in the horizontal federation learning framework, added weight values to the local model of each participant, encrypted the local model parameters, and uploaded them to the aggregation server to continue the following calculation. Setting the weight value is a “secondary protection” for the data and model parameters of each client and also eliminates the differences between local models. The improved secure federation average algorithm in this paper was mainly introduced in five parts: identification and authentication, model initialization, local model training, local parameter upload, and model parameter update. The algorithm was applied to the driver income of yellow taxis in New York City. A global model was built for the 14-day income of taxi drivers in the city, and a local model was built for the daily income of drivers. The aggregation average was carried out in the aggregation server, and the gradient descent iterative method was adopted. The accuracy of the model reached the highest level, and the loss of the model reached the lowest value. Therefore, the model can better reflect the income of yellow taxi drivers in New York City. However, the research process in this paper is not comprehensive enough. For example, when taking a taxi in the United States, passengers not only pay the fare but also pay tips, taxes, extra fees, and so on. The establishment of the model is uncertain and is not suitable for studying the actual payment of passengers. Therefore, the research takes the driver’s income as the research object. In addition, there are many invalid data in the original data, and these data are relatively discrete, which makes the research results potentially have some deviation, but there will be further improvement in the future.

Author Contributions

Conceptualization, X.Y. and J.X.; methodology, X.Y.; software, X.Y. and J.X.; validation, X.Y.; formal analysis, X.Y.; investigation, X.Y.; resources, Y.L.; data curation, X.Y.; writing—original draft preparation, X.Y.; writing—review and editing, Y.L.; visualization, X.Y. and T.H.; supervision, Y.L.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 61972334).

Data Availability Statement

The data were selected from the public dataset kaggle website.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (No. 61972334).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Luan, Z.X. Research on the security of cloud computing data in the Internet era. Comput. Era 2021, 2021, 35–37+41. [Google Scholar]
  2. Yang, J.G. The Generative Mechanism and Management of Ethical Dilemmas of Privacy Protection in the Era of Big Data. Jiangsu Soc. Sci. 2021, 2021, 142–150+243. [Google Scholar]
  3. Xiong, C.L.; Tong, Y.Q. Boundary and balance: Data governance path and thinking in personal privacy information protection. Sci. Technol. Commun. 2021, 13, 64–68. [Google Scholar]
  4. Yang, Q.; Liu, Y.; Chen, T.J.; Tong, Y.X. Federated Machine Learning. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–19. [Google Scholar] [CrossRef]
  5. Devon, D.; Holzer, T.; Sarkani, S. Innovation-Based Fusion of Multiple Satellite Positioning Systems for Minimizing Uncertainty. IEEE Syst. J. 2019, 13, 928–939. [Google Scholar] [CrossRef]
  6. Sainz De Baranda, P.; Cejudo, A.; Jesus Moreno-Alcaraz, V.; Teresa Martinez-Romero, M.; Aparicio-Sarmiento, A.; Santonja-Medina, F. Sagittal spinal morphotype assessment in 8 to 15 years old Inline Hockey players. PeerJ 2020, 8, E8229. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Baghban, H.; Huang, C.; Hsu, C. Resource provisioning towards OPEX optimization in horizontal edge federation. Comput. Commun. 2020, 158, 39–50. [Google Scholar] [CrossRef]
  8. Chen, S.Q.; Xue, D.Y.; Chuai, G.H.; Yang, Q.; Liu, Q. FL-QSAR: A federated learning-based QSAR prototype for collaborative drug discovery. Bioinformatics 2020, 36, 5492–5498. [Google Scholar] [CrossRef] [PubMed]
  9. Liu, Y.J.; Feng, G.; Sun, Y.; Qin, S.; Liang, Y.C. Device Association for RAN Slicing Based on Hybrid Federated Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2020, 69, 15731–15745. [Google Scholar] [CrossRef]
  10. Topaloglu, M.Y.; Morrell, E.M.; Rajendran, S.; Topaloglu, U. In the Pursuit of Privacy: The Promises and Predicaments of Federated Learning in Healthcare. Front. Artif. Intell. 2021, 4, 746497. [Google Scholar] [CrossRef]
  11. Zhu, H.Y.; Zhang, H.Y.; Jin, Y.C. From federated learning to federated neural architecture search: A survey. Complex Intell. Syst. 2021, 7, 639–657. [Google Scholar] [CrossRef]
  12. Rajendran, S. Cloud-Based Federated Learning Implementation Across Medical Centers. JCO Clin. Cancer Inform. 2021, 5, 1–11. [Google Scholar] [CrossRef] [PubMed]
  13. Subramanya, T.; Riggio, R. Centralized and Federated Learning for Predictive VNF Autoscaling in Multi-Domain 5G Networks and Beyond. IEEE Trans. Netw. Serv. Manag. 2021, 18, 63–78. [Google Scholar] [CrossRef]
  14. Zhu, X.; Yang, G. PCA differential privacy data publishing algorithm in horizontal federated learning. Appl. Res. Comput. 2022, 39, 236–239+248. [Google Scholar]
  15. Lau, H.; Henry, T.; Yung, P.N.; Dilupa, L.; Caeman, K.M. Risk quantification in cold chain management: A federated learning-enabled multi-criteria decision-making methodology. Ind. Manag. Data Syst. 2021, 121, 1684–1703. [Google Scholar] [CrossRef]
  16. Liu, Y.H.; Qu, Y.Y.; Xu, C.H.; Hao, Z.C.; Gu, B. Blockchain-Enabled Asynchronous Federated Learning in Edge Computing. Sensors 2021, 21, 3335. [Google Scholar] [CrossRef] [PubMed]
  17. Pan, Y.H.; Zhou, P.; Agrawal, A.; Wang, Y.H. New insights into the methods for predicting ground surface roughness in the age of digitalisation. Precis. Eng. 2021, 67, 393–418. [Google Scholar] [CrossRef]
  18. Zhu, P.K. Research on Federated Learning Model and Algorithm based on Mobile Edge Computing. Master’s Thesis, Nanjing University of Posts and Telecommunications, Nanjing, China, 2021. [Google Scholar]
  19. Lu, Y.L. Research on data Privacy Protection and Secure Data Sharing Methods. Ph.D. Dissertation, Beijing University of Posts and Telecommunications, Beijing, China, 2020. [Google Scholar]
  20. Alloghani, M.; Alani, M.M.; Al-Jumeily, D. A systematic review on the status and progress of homomorphic encryption technologies. J. Inform. Secur. Appl. 2019, 48, 2019. [Google Scholar] [CrossRef]
  21. Zhou, Q.; Lu, S.; Cui, Y. Quantum Search on Encrypted Data Based on Quantum Homomorphic Encryption. Sci. Rep. 2020, 10, 5135. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Transverse federated learning feature space and sample space.
Figure 1. Transverse federated learning feature space and sample space.
Electronics 12 01375 g001
Figure 2. Horizontal federation learning client–server architecture.
Figure 2. Horizontal federation learning client–server architecture.
Electronics 12 01375 g002
Figure 3. Horizontal federation learning client–server architecture.
Figure 3. Horizontal federation learning client–server architecture.
Electronics 12 01375 g003
Figure 4. Revenue of yellow taxis in New York City in January.
Figure 4. Revenue of yellow taxis in New York City in January.
Electronics 12 01375 g004
Figure 5. Driver income in New York City on 1 January.
Figure 5. Driver income in New York City on 1 January.
Electronics 12 01375 g005
Figure 6. Accuracy variation diagram of 10 iterations.
Figure 6. Accuracy variation diagram of 10 iterations.
Electronics 12 01375 g006
Figure 7. Accuracy variation diagram of 1000 iterations.
Figure 7. Accuracy variation diagram of 1000 iterations.
Electronics 12 01375 g007
Figure 8. Accuracy variation diagram of the first 100 iterations.
Figure 8. Accuracy variation diagram of the first 100 iterations.
Electronics 12 01375 g008
Figure 9. Loss value of 1000 iterations.
Figure 9. Loss value of 1000 iterations.
Electronics 12 01375 g009
Figure 10. Comparison between the average loss of the improved model and the loss of the traditional model.
Figure 10. Comparison between the average loss of the improved model and the loss of the traditional model.
Electronics 12 01375 g010
Table 1. Order Data and Track Data Information.
Table 1. Order Data and Track Data Information.
Order Data LabelDescriptionExample
lpep_pickup_datetimeThe date and time when the meter was engaged.1 January 2022 0:00:08
lpep_dropoff_datetimeThe date and time when the meter was disengaged.1 January 2022 0:14:14
Passenger_countThe number of passengers in the vehicle.1
Trip_distanceThe elapsed trip distance in miles reported by the taximeter.7.94
Store_and_fwd_flagThis flag indicates whether the trip record was held in a vehicle memory before sending it to the vendor.N
PULocationIDTLC Taxi Zone in which the taximeter was engaged.138
DOLocationIDTLC Taxi Zone in which the taximeter was disengaged.255
RateCodeIDThe final rate code in effect at the end of the trip.1
Payment_typeA numeric code signifying how the passenger paid for the trip.1
Fare_amountThe time-and-distance fare calculated by the meter23
ExtraMiscellaneous extras and surcharges.0.5
MTA_taxUSD 0.50 MTA tax that is automatically triggered based on the metered rate in use.0.5
Improvement_surchargeUSD 0.30 improvement surcharge assessed on hailed trips at the flag drop.0.3
Tip_amountThis field is automatically populated for credit card tips. Cash tips are not included.4.86
Tolls_amountTotal amount of all tolls paid in a trip.0
Total_amountThe total amount charged to passengers (does not include cash tips).30.41
congestion_surchargeAutomatically allocated according to the measurement rate in use.2.5
Table 2. Parameter table of local training model.
Table 2. Parameter table of local training model.
InterceptTimePassenger_CountTrip_DistanceStatistics
i ValueStandard ErrorValueStandard ErrorValueStandard ErrorValueStandard ErrorAdjusted R Square
14.49630.049117.90296 × 10-56.58141 × 10-60.103460.023462.484040.005280.78987
24.993780.067056.31535 × 10-59.45205 × 10-60.196670.033662.453480.006330.72942
36.847760.067523.54939 × 10-41.34833 × 10-50.234230.037191.810880.006090.57277
44.599640.043119.01216 × 10-57.48631 × 10-60.092470.023712.511290.004950.78597
56.723150.060825.35508 × 10-41.53895 × 10-50.196010.0341.712460.006240.53545
67.225930.057374.27444 × 10-41.23054 × 10-50.237790.032211.499120.005920.47019
76.060.055292.35914 × 10-41.08658 × 10-50.163820.030211.854060.00640.5601
84.490960.035526.85826 × 10-56.55283 × 10-60.038270.018072.520720.004540.79646
94.420070.050286.21412 × 10-59.73679 × 10-60.032610.025722.524840.005630.76983
105.191990.048772.51099 × 10-41.22026 × 10-50.081730.027482.316870.006090.68499
114.669120.036961.96368 × 10-48.35207 × 10-60.046150.020822.487390.005150.76779
125.042850.040581.85677 × 10-49.20457 × 10-60.030560.022662.409910.005710.70677
135.06850.035171.50399 × 10-47.11874 × 10-60.043860.019372.424890.004910.75656
144.873880.035091.51473 × 10-46.93594 × 10-60.027420.019252.516150.004630.79363
Table 3. Loss values for each local model.
Table 3. Loss values for each local model.
Client12345
Loss1.9831.7693.6731.9772.790
Client678910
Loss2.6113.1681.4282.0132.014
Client11121314
Loss1.3421.8491.4911.386
Table 4. The weight value of the first iteration of each client.
Table 4. The weight value of the first iteration of each client.
Client1234567
Weight1.1071.0970.9870.9901.0180.9991.148
Client891011121314
Weight0.9571.1351.0661.0731.0220.9470.941
Table 5. Global model plaintext weight parameters.
Table 5. Global model plaintext weight parameters.
i ψ i w i , 1 t w i , 2 t w i , 3 t
10.5950.0470.0621.477
20.8620.0540.1692.114
31.3670.4850.3202.475
40.7560.0680.0701.899
51.5130.8100.2962.590
61.2320.5270.2931.847
70.9470.2230.1551.755
80.6850.0470.0261.726
90.8580.0530.0282.166
101.1450.2880.0942.653
110.7790.1530.0361.936
120.9010.1670.0282.171
130.7520.0000.0331.823
140.7370.0000.0201.855
Table 6. Ciphertext of the parameter value.
Table 6. Ciphertext of the parameter value.
Ciphertext
Plaintext: 7.90296 × 10−5, Local model parameters
467955223102456212020705609866270348510893239149668473328424631046163690822437508711878985196197980450966500960003705334974280170339655750810219089129396085992598258672579610836765861787210473675224154209738857222440023315300692968047087231394512980730723977907690271899363662246165478289615153218232574561057949086114385110846695015350995138765746145605015279777950573110994544008683384022603114608651756658952592661266736411408161426646791173899631148278778420886130542466141732139434402701109610305834327986770394754422293491129830004959597471528240922862992766963399797494953223572922119780921343929602123516226546764815183789251169773407702535753226691046130282688248672437264190969732509417383856881643956027630275658489463231296913649226445921957970809319310748381576204501832811777209686852160599382374030694637282942599348941011907550986218863974727800258866828106664495631023416369971144863957435198236651427580289916915911244629558444858595888149638159997776130063235855390386100640660678856046849103703733840941266568709071793346814024807420936630963924283997628396208879073040125624428043948679461395648474879664719912865166134464521421720547985762205981822665782223794666807841734568143717078244584656216644190890183947
Plaintext: 1.107 Cilent Weight
542668593891279026805624626146758625291153100936160670826053091065537229106182546052530540548774787072469124285407047736078487187066518276308290145578375312427125064570317400308530975919384618178036639860621593561533653322880977773419073086270457962531239215185735242020884636600391253929303829227690160632055355635447880958627580397656941016377197145408585526438953020193468957510929216689654165366329926337133784266205374661291090120351504336537932240149806720969730646953841277831552412094932985865479498608414619775181793358986748834334437846079939485846482496495629944011541746380639671509684239095754321959820318102974126215154739058291879812942742334500648651994495566709143482896700506958257481598793473675550399220882623673283793892416770530776917018796048658611303595729306564189333242635729140252800198129123397698494506725979944622028635443835992201975763460443431013747412861376819344647015089996916136618367623998327507099656125843558450496905169872123196108506956250733148418802328952590503257118421329674172634168153904874295407870248711968545420876320203448027987466871897975189509054334654474673284693407101626929792802446523404210423849205563988241679250362814657747157532701181352125841949375211847222624206208994
p: 142562846575319296 ** 7506178382976203844105070937
q: 14439136290904674701 ** 44497097022268651346246713
g: 20584843717203680702 *** 1969859265641225568080082
Where p, q, and g are key information.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, X.; Liu, Y.; Xie, J.; Hao, T. Improved Design and Application of Security Federation Algorithm. Electronics 2023, 12, 1375. https://doi.org/10.3390/electronics12061375

AMA Style

Yang X, Liu Y, Xie J, Hao T. Improved Design and Application of Security Federation Algorithm. Electronics. 2023; 12(6):1375. https://doi.org/10.3390/electronics12061375

Chicago/Turabian Style

Yang, Xiaolei, Yongshan Liu, Jiabin Xie, and Tianbao Hao. 2023. "Improved Design and Application of Security Federation Algorithm" Electronics 12, no. 6: 1375. https://doi.org/10.3390/electronics12061375

APA Style

Yang, X., Liu, Y., Xie, J., & Hao, T. (2023). Improved Design and Application of Security Federation Algorithm. Electronics, 12(6), 1375. https://doi.org/10.3390/electronics12061375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop