Next Article in Journal
Multi-Component Temporal-Correlation Seismic Data Compression Algorithm Based on the PCA and DWT
Previous Article in Journal
Characterizing Perception Deep Learning Algorithms and Applications for Vehicular Edge Computing
Previous Article in Special Issue
Hierarchical Optimization Framework for Layout Design of Star–Tree Gas-Gathering Pipeline Network in Discrete Spaces
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Differentially Private Clustered Federated Load Prediction Based on the Louvain Algorithm

1
CSG Science Research Institute Co., Ltd., Guangzhou 510640, China
2
Power Dispatching Control Center of Guangdong Power Grid Co., Ltd., Guangzhou 510060, China
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(1), 32; https://doi.org/10.3390/a18010032
Submission received: 8 October 2024 / Revised: 25 December 2024 / Accepted: 25 December 2024 / Published: 8 January 2025
(This article belongs to the Special Issue Intelligent Algorithms for High-Penetration New Energy)

Abstract

:
Load forecasting plays a fundamental role in the new type of power system. To address the data heterogeneity and security issues encountered in load forecasting for smart grids, this paper proposes a load-forecasting framework suitable for residential energy users, which allows users to train personalized forecasting models without sharing load data. First, the similarity of user load patterns is calculated under privacy protection. Second, a complex network is constructed, and a federated user clustering method is developed based on the Louvain algorithm, which divides users into multiple clusters based on load pattern similarity. Finally, a personalized and adaptive differentially private federated learning Long Short-Term Memory (LSTM) model for load forecasting is developed. A case study analysis shows that the proposed method can effectively protect user privacy and improve model prediction accuracy when dealing with heterogeneous data. The framework can train load-forecasting models with a fast convergence rate and better prediction performance than current mainstream federated learning algorithms.

1. Introduction

The new power system is a complex intelligent network that primarily integrates new energy, along with components of information, physics, society, and big data [1,2]. The continuous introduction of technologies, such as demand response, electricity retailers, load aggregators, digital twins, and virtual power plants, has led to increasingly complex and variable load characteristics [3]. Load forecasting is foundational in this new power system, essential for the planning, operation, and scheduling of future smart grids.
Limited local data hinder effective load forecasting, making collaborative modeling essential. Recently, many load-forecasting schemes based on machine learning have been proposed. Reference [4] presented a short-term load-forecasting model based on linear regression. Reference [5] combined decision trees with convolutional neural networks to predict user power consumption, and the results showed that the prediction error was significantly reduced. In [6], a series of multiobjective predictive models were created utilizing a range of cutting-edge machine learning (ML) methodologies. For renewable energy, using wind power, a deep learning-driven self-conscious distributed cyber-physical system was proposed in [7]. However, retail providers often refuse to share data due to privacy concerns. Federated learning (FL) [8,9] offers a distributed machine learning framework that keeps data local, presenting a novel approach to load forecasting while ensuring privacy [10,11]. Reference [12] compared FedAvg, single-meter prediction, multi-meter centralized prediction, and federated stochastic gradient descent (FedSGD) methods and verified the superiority of FedAvg in load-prediction performance. Studies [13] have indicated that FedAvg can achieve smaller household load-prediction errors while protecting user privacy. In FL, users train load data locally and only upload the trained model parameters to the server. Despite maintaining local data, curious servers or attackers may still infer private information from a shared model or gradient parameters [14], highlighting the need for stronger privacy mechanisms. Differential privacy (DP) [15] is a widely used privacy protection technique. Ref. [16] proposed a differentially private stochastic gradient descent algorithm to train neural networks within a moderate privacy budget, while Ref. [17] introduced a differentially private federated learning algorithm and analyzed its performance. Implementing DP requires gradient clipping, but the fixed-threshold clipping method has significant limitations [18,19]. If the clipping threshold is too small, clipped gradients may become overly distorted, losing valuable information from local updates. Conversely, a threshold set too large can introduce excessive random noise, negatively affecting the algorithm’s performance. Therefore, both excessively high and excessively low clipping thresholds can reduce model accuracy and practicality. In addition to data privacy issues, other network security issues, such as data poisoning and evasion attacks, are also receiving increasing attention (see [20,21] for detailed discussions).
Moreover, data distribution inconsistency among clients, known as non-IID (Non-Independently and Identically Distributed) data, can introduce significant bias during model training, resulting in client drift and slower convergence [22]. This inconsistency poses challenges for achieving optimal model performance with a single global model. To tackle this issue, Ref. [23] introduced the clustered federated learning (CFL) framework, which uses clustering algorithms to group clients with similar data distributions, allowing for the training of clustered personalized models to improve performance and specialization. Ref. [24] proposed an adaptive clustering federated learning algorithm that accelerates similarity assessment. Clients are assigned to different community clusters based on their data distribution similarities, enabling those within the same cluster to share a federated model. This method effectively reduces the negative impact of non-IID data on federated learning models. However, existing clustered federated learning algorithms have limitations. Classic clustering methods like K-Means [25], the iterative federated clustering algorithm (IFCA) [26], and CFL often require significant computational resources, which can hinder their practicality.
To address the challenges of high computational complexity, performance limitations, and privacy concerns in traditional clustered federated learning, this paper proposes a load-forecasting framework specifically designed for residential energy users. It introduces an innovative federated clustering method and explores a personalized adaptive differentially private clustered federated learning algorithm. The framework selects weather and time factors as key load-related variables, constructs a user dataset, and establishes a load-forecasting model. Case studies demonstrate that the proposed algorithm surpasses existing mainstream federated algorithms in predictive performance. The key contributions of this paper are as follows:
(1)
A federated learning load-forecasting framework tailored for residential energy users with heterogeneous data is proposed.
(2)
A method for calculating the similarity of load patterns while ensuring privacy is introduced. This involves constructing a complex network and developing a novel federated clustering method based on the Louvain algorithm, which offers lower computational costs and does not require pre-specifying the number of clusters (K).
(3)
An adaptive gradient-clipping differentially private clustered federated averaging algorithm, called pADP-FedAvg, is proposed, along with a differential privacy analysis.
(4)
Weather and time factors are identified as key load-related variables, a user dataset is constructed, and a federated learning load-forecasting LSTM model is established. The effectiveness and advantages of the proposed methods are validated through case studies.

2. Related Technology

2.1. Federated Learning Model

The standard workflow for federated learning is as follows: A central server initializes a global model and distributes it to each client. Clients then perform multiple rounds of training on the model using their local data, generating locally updated models that are subsequently transmitted back to the central server. The server aggregates the local models it receives to form an updated global model, which is then broadcast to all clients. By repeating this process, the global model is progressively refined. Classical federated learning utilizes a star topology, as depicted in Figure 1, comprising a central server and multiple clients. In this setup, θ i k denotes the model parameters of the i-th client during the k-th communication round, while θ k + 1 represents the aggregated global model parameters and the client model parameters during the ( k + 1 ) -th communication round.
The local loss function of the local client i is
min θ J i ( θ ) = 1 D i j = 1 | D i | F ( θ , ξ j )
where θ represents the model parameters, ξ j D i is a data sample, D i is the client’s dataset, and F is the loss function. During the model training process, it is necessary to minimize the global loss function J ( θ ) , as defined in Equation (2):
min θ J ( θ ) = i = 1 | s | | D i | i = 1 | s | | D i | J i ( θ )
where S represents the set of clients selected to participate in the training. The client updates during the training process are provided by Equation (3):
Δ θ i k + 1 = SGD ( θ k , D i , η l ) θ k
where η l denotes the learning rate for the local model and SGD refers to the stochastic gradient descent method. Equation (4) illustrates a single update applied by the server to the global model:
θ k + 1 = θ k + i = 1 | S | | D i | i = 1 | S | | D i | Δ θ i k
Without loss of generality, we assume that the number of clients is n. The local dataset of client i [ n ] , denoted as D i = { ξ 1 i , , ξ q i } , consists of q data points.

2.2. Differential Privacy

Differential privacy protects data by adding noise that follows a certain distribution, thus providing privacy guarantees. It rigorously defines the strength of privacy protection. If two datasets D and D differ by only one record, they are referred to as neighboring datasets. Differential privacy ensures that the results of the same query operation on any two neighboring datasets are nearly indistinguishable. Client-level differential privacy FL integrates privacy protection mechanisms into FL, ensuring that the learning model does not reveal whether a specific client participated in the training. This means that the entire dataset of the client is protected. The formal definition of differential privacy is shown below.
Definition 1 
(Differential Privacy [15]). A randomized mechanism M, where D o m ( M ) denotes the domain and R a n g e ( M ) denotes the range. If for any two neighboring datasets D , D D o m ( M ) and any subset O R a n g e ( M ) , the following holds:
Pr [ M ( D ) O ] e ϵ Pr [ M ( D ) O ] + δ ,
then M is said to satisfy ( ϵ , δ ) - D P , where the parameter ϵ represents the privacy budget.
Definition 2 
( l 2 Sensitivity [15]). For a query function f : D R d on a given dataset, where D and D are two neighboring datasets, the  l 2 sensitivity of f is defined as
Δ ( f ) = max D , D f ( D ) f ( D ) 2 .
Zero-Concentrated Differential Privacy (zCDP) is a new relaxed form of differential privacy. Compared to the standard ( ϵ , δ ) - D P , it offers a clearer and more precise analysis of the privacy loss over multiple iterative computations. The definition of zCDP is shown below.
Definition 3 
(zCDP [27]). For any two neighboring datasets D and D , and for any α > 1 , a randomized mechanism M satisfies ρ-zCDP if and only if
D α M D M D = 1 α 1 log E e α 1 L O ρ
where D α M D M D denotes the α-Renyi divergence between M ( D ) and M ( D ) . L O represents the privacy loss incurred by M between neighboring datasets D and D when the outcome is O, that is,
L M D M D ( O ) = log Pr ( M D = O ) Pr ( M D = O )
The following propositions hold for zCDP:
Lemma 1 
([27]). The Gaussian mechanism, which returns f ( D ) + N ( 0 , Δ 2 ( f ) σ 2 I ) , satisfies ( 1 / 2 σ 2 ) - z C D P .
Lemma 2 
([27]). Suppose there are k mechanisms M 1 , , M k , and each M i (for i = 1 , , k ) satisfies ρ i -zCDP. Then, the composition of these mechanisms satisfies i = 1 k ρ i -zCDP.
Lemma 3 
([28]). Let M be composed of adaptive randomized mechanisms M 1 , , M E , and each M i (for i = 1 , , E ) satisfies ρ i -zCDP. When the dataset D is randomly partitioned into D 1 , , D E , then M ( D ) = ( M 1 ( D 1 ) , M 2 ( D 2 ) , , M E ( D E ) ) satisfies max i ρ i -zCDP.
Lemma 4 
([27]). If M satisfies ρ-zCDP, then for any δ > 0 , M satisfies ρ + 2 ρ log ( 1 / δ ) , δ DP.

3. Problem Description

This paper addresses the collaborative challenges and privacy protection issues faced by multiple data collectors during the process of power load forecasting. In the power grid, each user independently holds their own power load data, and due to privacy concerns, it is difficult to directly share their original data. Federated learning is a reasonable choice to solve this problem. In the k-th round of federated learning, each client i performs a local model update:
Δ θ i k + 1 = SGD ( θ k , D i , η l ) θ k , i = 1 , 2 , , n
The global model is updated as
Δ θ k + 1 = i = 1 n D i D Δ θ i k + 1 = κ 1 Δ θ 1 k + 1 + κ 2 Δ θ 2 k + 1 + + κ n Δ θ n k + 1
where D = D 1 D 2 D n , and  κ i = | D i | | D | . When federated training reaches the stable optimal solution θ * , the global model update approaches zero, i.e.,  Δ θ = 0 . At this point, two situations may occur:
(1)
Δ θ 1 = Δ θ 2 = = Δ θ n = 0 , indicating that all clients have consistent data distributions and the optimal solution is achieved simultaneously.
(2)
The local model updates Δ θ i are not all zero, which is caused by inconsistent data distributions during the training process. This scenario more accurately reflects practical applications of federated learning. In this case, the local model updates may not guide the global model toward optimization, resulting in generally lower performance and accuracy for the global model.
Data quality is a critical foundation for load-forecasting modeling in power systems. Existing methods often assume that user load data are independently and identically distributed. In reality, while the power consumption data of different users are independent, the assumption of identical distribution does not hold true. Moreover, although the FL model keeps the training data locally, it requires uploading model parameter updates. Attackers can infer users’ sensitive information from these uploaded model parameters, making it challenging to effectively ensure data privacy.

4. Research Motivation

Due to the non-uniformity of user load data, a single global model is insufficient to meet personalized user requirements. This paper considers using clustering algorithms to aggregate clients with similar data distributions, group electricity users into different clusters, and learn personalized models for each cluster. Providing suitable differentiated models for each user can effectively prevent client drift, thereby enhancing prediction accuracy and model specialization.
Federated clustering can be divided into one-time clustering and iterative clustering based on timing. Ref. [25] adopted a one-time clustering method, which requires specifying the number of clusters K in advance. IFCA uses iterative clustering without the need to predefine K; however, it employs a centralized clustering algorithm, resulting in a significant increase in computational overhead at the central server. CFL also adopts iterative clustering, requiring substantial computational resources to confirm the cluster identities of clients.
To address the issues present in the aforementioned federated clustering methods, this paper constructs a complex network and considers the modularity function. Based on the classical Louvain algorithm for community detection in complex networks, we propose a new federated clustering method that can effectively protect user data privacy, does not require pre-specifying the number of clusters K, and achieves a well-performing differentially private federated clustering model with lower computational cost.

5. System Architecture and Design Scheme

5.1. System Architecture

To address the inconsistency of user load data and privacy protection issues, this paper proposes a differential privacy clustering federated learning method based on the Louvain algorithm for heterogeneous data. The Louvain algorithm is a community detection method for complex networks. The system architecture, illustrated in Figure 2, is divided into three modules.
Module One involves all clients conducting differential privacy federated learning until convergence. During this phase, the server calculates the similarity of the local update vectors from the clients for the current round. Module Two constructs a complex network and employs the Louvain algorithm for community clustering. Module Three focuses on personalized adaptive differential privacy clustering for federated learning load forecasting.

5.2. Similarity Calculation

Since user load data are not exchanged, federated learning cannot directly cluster the data at the nodes; instead, it clusters the model parameters of the clients. Similar clients are identified by analyzing their model parameters. To evaluate the similarity of the clients’ model parameters while ensuring privacy, all clients first perform differential privacy federated learning until convergence. Subsequently, the server calculates the similarity of the local update vectors from the clients for the current round. The specific steps are as follows:
Step 1
Clients use the differential privacy federated averaging (DP-FedAvg) algorithm  [29,30] to train until convergence.
Step 2
Equation (11) is used to calculate the cosine similarity between any two clients’ local updates. The cosine similarity evaluates the similarity of two vectors by calculating the cosine of the angle between them. The closer the value is to 1, the more similar the two vectors are:
e i j = e j i = Δ θ i , Δ θ j Δ θ i Δ θ j

5.3. Complex Network Construction

As an abstract model for understanding complex systems, complex networks are a powerful tool for solving clustering problems. Each client is regarded as a network node, and the network edge set and edge weights between nodes are constructed based on the learned similarity between client parameters, establishing an undirected weighted complex network G = ( V , E , A ) , where V is the set of network nodes, E is the set of edges, and A is the adjacency matrix. When e i j < 0 , the similarity of clients i and j is very small, ( i , j ) E ; when e i j > 0 , ( i , j ) E . The adjacency matrix is specifically defined as follows:
A = a 11 a 12 a 1 n a 21 a 22 a 2 n a n 1 a n 2 a n n
where, when A B , a i j = e i j , e i j 0 0 , e i j < 0 , and when i = j , a i j = 0 , that is, self-loops are not considered. A is a symmetric matrix.
Figure 2. The framework for clustered federated learning.
Figure 2. The framework for clustered federated learning.
Algorithms 18 00032 g002

5.4. Louvain Algorithm

The Louvain algorithm is a community detection method for complex networks based on modularity maximization [31]. It has an approximate linear time complexity and is regarded as one of the best-performing clustering algorithms. The modularity function Q is defined by the following equation:
Q = 1 2 τ i , j a i j k i k j 2 τ δ ( c i , c j ) = 1 2 τ i , j a i j i k i j k j 2 τ δ ( c i , c j )
where k i = j a i j is the sum of the weights of all edges connected to node i. τ = 1 2 i , j a i j , c i is the cluster containing node i. When c i = c j and c i = c j , δ ( c i , c j ) = 1 ; otherwise, δ ( c i , c j ) = 0 . The value of Q reflects the quality of community clustering. The closer it is to 1, the more distinct the clustering structure.
The basic idea of the algorithm is to first treat each node in the network as a cluster. Then, node i is moved into the cluster c of its neighboring node j, and the modularity increment Δ Q , expressed by Equation (14), is calculated to determine the movement method that has the greatest positive impact on modularity:
Δ Q = i n + k i , i n 2 τ t o t + k i 2 τ 2 i n 2 τ t o t 2 τ 2 k i 2 τ 2
where i n is the sum of the weights of the internal edges within cluster c, k i , i n represents the sum of the weights of the edges connecting node i to the nodes within cluster c, and  t o t is the sum of the weights of all edges connecting to the nodes within cluster c. The similarity measure is based on the distribution of load data that groups users, which manifests as community clustering at the level of the complex network.

5.5. Adaptive Differential Privacy Personalized Clustering Federated Learning Algorithm

In the implementation of DP, gradient clipping is essential. The fixed-threshold clipping method has notable drawbacks: setting the clipping threshold too high or too low can lead to decreased model accuracy and practicality. This paper proposes an adaptive gradient-clipping method, where the clipping threshold is defined by Equation (15):
C i t + 1 = g ( y i t , L 1 ) 2 × min ( 1 , C i t / g ( y i t , L 1 ) 2 ) + N ( 0 , ( C i t σ ) 2 )
where C i t is the clipping threshold for client i during the t-th round of training, g ( y i t , L 1 ) is the gradient after the L-th local iteration in the t-th round of training for client i, and  σ is the noise parameter. The idea behind this method is to use the clipped gradient from the t-th round to estimate the clipped gradient for the ( t + 1 ) -th round, with Gaussian noise added to ensure privacy protection.
Building on adaptive gradient clipping, this paper proposes a personalized federated averaging algorithm, pADP-FedAvg. First, the server initializes a global model and distributes it to each client. Next, clients sample mini-batch local data to compute gradients, apply adaptive gradient clipping, and incorporate DP to update the local model. After L local iterations, the local updates are sent back to the central server. The server then aggregates the received local models to generate a new global model, which is broadcast to all clients. By repeating these steps, the global model is gradually optimized. The specific process is summarized in Algorithm 1 below.
Algorithm 1 pADP-FedAvg algorithm.
Input: Number of communication rounds T; Local iteration period L; Initial value of the
 personalized model y 0 ; Local update step size η ; Noise parameter σ ; Initial value of the
 adaptive clipping threshold { C i 0 } i = 1 n ; Number of clients selected in each round r
Output: Personalized model y T
Server executes:
  1:     initialize y 0
  2:     for each round t = 0 , , T 1 do
  3:        Sample a set of r clients at random without replacement, denoted by W t
  4:        Broadcast y t to all clients in W t
  5:          for each client i W t in parallel do
  6:                  y i t ClientUpdate ( i , y t )
  7:          end for
  8:         y t + 1 = ( 1 / r ) i W t y i t
  9:     end for
  10:   Return y T
ClientUpdate ( i , y t ) :
  11:    y i t , 0 y t
  12:   for l = 0 , , L 1 do
  13:       Sample a mini-batch X i D i of size λ and compute gradient
                       g ( y i t , l ) ( 1 / λ ) ξ i X i F ( y i t , l , ξ i )
  14:       Adaptive gradient clipping:
  15:             g ˜ ( y i t , l ) g ( y i t , l ) × min ( 1 , C i t / g ( y i t , l ) 2 )
  16:       Add DP noise to gradient:
  17:             g ¯ ( y i t , l ) g ˜ ( y i t , l ) + N ( 0 , ( C i t σ ) 2 )
  18:             y i t , l + 1 = y i t , l η l g ¯ ( y i t , l )
  19:   end for
  20:   Computes clipping threshold C i t + 1 by Equation (15)
  21:   upload y i t , L to server

5.6. Privacy Analysis

The introduction of differential privacy aims to address differential attacks and prevent honest but curious servers or participants from inferring and stealing sensitive private information of clients based on shared data.
Theorem 1. 
In Algorithm 1, for the local dataset D i of client i participating in training, a small batch sample of size λ is randomly selected without replacement. Assuming that client i participates in training and uploads updates T i times during T rounds of learning, then for client i, Algorithm 1 satisfies the following condition ( ε i , δ ) -DP, where
ε i = 2 T i L q r λ σ 2 + 2 2 T i L q r λ σ 2 log 1 δ
Proof. 
For client i, assuming that two adjacent datasets X i and X i differ only in the j-th data point, i.e., ξ j ξ j , then
g ˜ ( y i t , l ) ( X i ) g ˜ ( y i t , l ) ( X i ) 2 = 1 λ g ˜ ( y i t , l ) ( ξ j ) g ˜ ( y i t , l ) ( ξ j ) 2 C i t λ
Therefore, Δ 2 ( g ˜ ( y i t , l ) ) 2 C i t λ . Noting that, in the absence of added noise,
y i t , L = y i t , L 1 η l g ˜ ( y i t , L 1 ) = y i t , L 2 η l g ˜ ( y i t , L 2 ) η l g ˜ ( y i t , L 1 ) = = y i t , 0 η l g ˜ ( y i t , 0 ) η l g ˜ ( y i t , 1 ) η l g ˜ ( y i t , L 1 )
By the definition of sensitivity and Equations (17) and (18), we have
Δ ( y i t , K ) = η l g ˜ ( y i t , 0 ) ( X i t , 0 ) g ˜ ( y i t , 0 ) ( X i t , 0 ) + g ˜ ( y i 1 ) ( X i t , 1 ) g ˜ ( y i 1 ) ( X i t , 1 ) + + g ˜ ( y i t , L 1 ) ( X i t , L 1 ) g ˜ ( y i t , L 1 ) ( X i t , L 1 ) 2 η l L C i t λ
From Equation (17) and Lemma 1, it is known that in Algorithm 1, each participating client satisfies 2 λ 2 σ 2 -zCDP during a single local iteration. During one round of local iteration, the total number of accesses to the local dataset by the participating client is L λ q . According to Lemma 2 and Lemma 3, the participating client satisfies 2 L q λ σ 2 -zCDP during one round of local iteration. Combining Equation (19) and Lemma 1, Algorithm 1 satisfies 2 L q r λ σ 2 -zCDP in the t-th round of training. If client i participates in training and uploads updates T i times during T rounds of learning, then client i achieves 2 T i L q r λ σ 2 -zCDP throughout the learning process. By Lemma 4, the theorem is proved. □

6. Electricity Load-Forecasting Model

6.1. Load Influencing Factors

Electric load is influenced by various factors, including weather, location, time, electricity prices, user consumption habits, and unexpected events. The load curve can be decomposed into regular components that change periodically, uncertain components that change non-periodically, and noise components from unexplainable physical factors [32]. Key weather factors, such as temperature, humidity, and atmospheric pressure, significantly impact short-term load variations.

6.2. Construction of User Dataset

In this paper, weather and time factors are selected as correlated influences on load. The input data feature types include weather, load, and time. Weather factors, such as temperature, humidity, and atmospheric pressure, are used to capture the impact of weather on load. Time factors, including year, month, day, and hour, reflect the periodicity of the load. The feature selection and transformation process for the dataset is detailed in Table 1.

6.3. Federated Learning Model Based on LSTM

The LSTM (Long Short-Term Memory) network excels at handling time-series problems, and the electricity load of each user is inherently a type of time series. Figure 3 shows the internal unit structure of LSTM, which consists of input gates, forget gates, and output gates.
Table 1. Feature selection and feature transformation of datasets.
Table 1. Feature selection and feature transformation of datasets.
Feature TypeFeature NameData TypeUnit/RangeTransformation MethodTransformed Feature
WeatherTemperatureContinuous°FNormalizationtemperature
HumidityContinuous%Normalizationhumidity
PressureContinuouskPaNormalizationpressure
LoadLoadContinuouskWNormalizationload
TimeYearDiscrete2016–2018One-hot encodingoh-2016, oh-2017, oh-2018
MonthDiscrete1–12sin/cos Cyclic encodingmonth-sin, month-cos
DayDiscrete1–30sin/cos Cyclic encodingday-sin, day-cos
HourDiscrete0–23sin/cos Cyclic encodinghour-sin, hour-cos
In Figure 3, X t is the input, h t 1 and h t are the outputs, and C t 1 and C t are the storage cell states. The calculation formulas for these variables are as follows:
f t = σ W f X t + U f h t 1 + b f
i t = σ W i X t + U i h t 1 + b i
o t = σ W o X t + U o h t 1 + b o
C ˜ t = tanh W c X t + U c h t 1 + b c
C t = f t C t 1 + i t C ˜ t
h t = o t tanh C t
where σ is the sigmoid function and f t , i t , and o t are the outputs of the forget gate, input gate, and output gate, respectively, W c , U c , W o , U o , W i , U i , W f , and U f are the corresponding weights, and b c , b o , b i , and b f are the bias terms. This paper adopts the framework of federated learning, using LSTM as the load-forecasting model, leveraging the three gates mentioned above to utilize long-term user historical data for load forecasting. Each participating user acts as an independent client, and the network topology is illustrated in Figure 1.

6.4. Clustered Federated Learning Load-Forecasting Process Based on Louvain Algorithm

The entire process of personalized adaptive differential privacy clustering federated learning for electricity load forecasting using the Louvain algorithm consists of the following five steps:
Step 1
All clients participate in training with the classical differential privacy federated averaging algorithm until convergence.
Step 2
The central server calculates the cosine similarity based on the local update vectors at the convergence of the model from local clients.
Step 3
Each client serves as a network node, constructing a network edge set and edge weights between nodes based on the similarity of learned client parameters, forming an undirected weighted complex network.
Step 4
Community clustering is performed using the Louvain algorithm.
Step 5
The pADP-FedAvg algorithm is utilized for personalized adaptive differential privacy clustering federated learning for electricity load forecasting.

7. Analysis of Arithmetic Examples

This section evaluates the effectiveness of the proposed scheme through arithmetic examples. The experimental environment is as follows: the CPU is an Intel Core i7-12700K, the GPU is an NVIDIA RTX 3080 Ti with 12 GB of memory, the RAM is 64 GB, the operating system is Ubuntu 21.04, and the software versions include PyTorch 2.0.0, CUDA 11.8, and Python 3.8.16. The experiments are conducted in a single-computer deployment mode. The parameter settings for the LSTM model used for load forecasting are detailed in Table 2.

7.1. Data Sources

The dataset used is HUE [33], which contains hourly energy use data and meteorological data for 22 homes in British Columbia over the past three years, with a time granularity of 24 points per day. The algorithm selects data from 1 June 2016 to 29 January 2018, specifically for houses with IDs 3–14 and 18–20, totaling 15 users. These users are numbered in descending order by their ID numbers, designated as user 1, 2, …, 15. For each responding user, 80% of the dataset is used for training, 10% is used for testing, and 10% is used for validation.

7.2. Data Pre-Processing

After selecting the appropriate data, the data need to be pre-processed. For missing data, this paper utilizes averaging or interpolation methods as follows:
γ i = γ i 1 + γ i + 1 2 , γ i Null , γ i 1 , γ i + 1 Null ; 0 , γ i Null , γ i 1 or γ i + 1 Null
where γ i is the value of electricity load at a certain time period. In addition, in order to accelerate the convergence speed during the model training process and improve training efficiency, the data of continuous values are normalized:
γ nom = γ γ min γ max γ min
where γ is the original data, γ nom is the normalized data, and γ max and γ min are the maximum and minimum values, respectively.

7.3. Evaluation Metrics

To evaluate the accuracy of the algorithm, three metrics are employed: the mean-square error (MSE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE).
The MSE is calculated as follows:
M S E = i = 1 N y ^ i y i 2 N
The RMSE is calculated as follows:
R M S E = i = 1 N y ^ i y i 2 N
The MAPE is calculated as follows:
M A P E = 1 N i = 1 N y ^ i y i y i × 100 %
where N is the number of test samples and y ^ i and y i are the actual and predicted readings in kWh, respectively.

7.4. Analysis of Simulation Results

After calculating the similarity, the Louvain algorithm is applied to classify the associations in the weighted network, and the results are displayed in Figure 4.
During the operation, the Q-values are as follows: −0.0832, 0.1569, −0.0275, 0.3124, 0.2017, 0.4753, 0.3695, 0.5936, 0.4821, 0.6427, 0.5339, 0.7052, 0.6185, 0.7514, 0.6793, 0.7836, 0.7253, 0.7453, 0.7292, and 0.7217. After reaching a peak value of 0.7836, the Q-value begins to decline. At this point, the optimal division results in one association clustering for users 1, 2, 8, 10, and 11, and another for users 3, 4, 5, 6, 7, 9, 12, 13, 14, and 15. Compared to commonly used clustering algorithms, such as K-Means, CFL, and IFCA, our proposed method has lower computational costs and does not require pre-specifying the number of clusters.
The comparison results of the DP-FedAvg and pADP-FedAvg algorithms across the three evaluation metrics—MSE, RMSE, and MAPE—are presented in Table 3. With a privacy budget of ϵ = 0.6 , it is clear that the pADP-FedAvg algorithm significantly enhances the accuracy of the classical DP-FedAvg algorithm and demonstrates superior performance. Due to the introduction of random noise, the DP noise degrades model performance. Compared with a traditional DP strategy, this suggests that our approach significantly mitigates the negative effects of DP on the model.
The model performance comparison, as shown in Table 4, evaluates three models under different privacy budgets: the traditional global model, personalized cluster model 1, and personalized cluster model 2. The results indicate that the traditional global model exhibits the lowest accuracy, while the personalized cluster models demonstrate high performance at the same privacy budget. As the privacy budget ϵ increases, the loss metrics (MSE, RMSE, and MAPE) decrease, leading to higher model accuracy; however, this also results in reduced privacy protection. Conversely, a smaller privacy budget ϵ provides greater privacy protection but results in lower prediction accuracy.
Figure 5 illustrates a comparison of the prediction curves under different privacy-preserving budgets. The experimental results indicate that both models, cluster 1 and cluster 2, exhibit strong prediction performance. When the privacy budget is set higher, their prediction curves align more closely with the true value curves, although this comes at the expense of privacy protection. These findings highlight the trade-off between the privacy-preserving budget and prediction accuracy. In practical applications, it is essential to balance privacy risks and model performance based on specific scenarios to determine the appropriate privacy budget settings. We know that the smaller the privacy budget setting in DP, the better the privacy protection, and the worse the prediction accuracy. Through vertical comparison, under the same privacy budget, the clustered federated load prediction achieves better prediction performance than the global model, which means that our proposed system can achieve a better trade-off between privacy protection and prediction performance.

8. Conclusions

Accurate forecasting of residential electricity loads is crucial for integrating demand-side resources and fulfilling demand-side response requirements. By addressing the data heterogeneity and security issues in load forecasting for smart grids, this paper presents a load-forecasting framework for residential energy users that enables collaborative training of personalized forecasting models without the need to exchange load data. Weather and time factors are identified as key influences on user loads, and the analysis of arithmetic examples demonstrates that the proposed algorithm outperforms current mainstream federated learning algorithms in prediction performance. This approach enhances load-prediction accuracy while ensuring the privacy and security of user data, which is essential for smart grid planning, operation, control, and dispatch. Future research may focus on other cyber security issues, such as data poisoning and evasion attacks, the influence of unexpected events on power load forecasting, further optimizing the federated learning algorithm, exploring additional application scenarios, and advancing the development of smart grids. The trade-off between model complexity and interpretability is attracting increasing attention. This is critical for the broader adoption of predictive models in real-world settings. While complex models like deep learning algorithms can provide high accuracy, their “black-box” nature makes it difficult to understand how they make predictions. This lack of interpretability can be a significant barrier in practical settings, where understanding the rationale behind a decision is crucial. Therefore, in the future, we will focus on studying the interpretability of the models to enhance user trust in smart grid systems.

Author Contributions

Methodology, T.P., J.H., C.L. and X.C.; Project administration, J.H.; Software, X.J., C.L. and X.C.; Writing—original draft, T.P.; Writing—review and editing, J.H., X.J. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the China Southern Power Grid Corporation Technology Project under grant GDKJXM20222125 (036000KK52222009).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study.

References

  1. Voropai, N. Electric Power System Transformations: A Review of Main Prospects and Challenges. Energies 2020, 13, 5639. [Google Scholar] [CrossRef]
  2. Zhang, L.; Ren, J.; Mu, Y.; Wang, B. Privacy-Preserving Multi-Authority Attribute-Based Data Sharing Framework for Smart Grid. IEEE Access 2020, 8, 23294–23307. [Google Scholar] [CrossRef]
  3. Hou, H.; Liu, C.; Wang, Q.; Wu, X.; Tang, J.; Shi, Y.; Xie, C. Review of load forecasting based on artificial intelligence methodologies, models, and challenges. Electr. Power Syst. Res. 2022, 210, 108067. [Google Scholar] [CrossRef]
  4. Dudek, G. Pattern-based local linear regression models for short-term load forecasting. Electr. Power Syst. Res. 2016, 130, 139–147. [Google Scholar] [CrossRef]
  5. Walser, T.; Sauer, A. Typical load profile-supported convolutional neural network for short-term load forecasting in the industrial sector. Energy AI 2021, 5, 100104. [Google Scholar] [CrossRef]
  6. Karaman, O.A. Prediction of Wind Power with Machine Learning Models. Appl. Sci. 2023, 13, 11455. [Google Scholar] [CrossRef]
  7. Cicceri, G.; Tricomi, G.; D’Agati, L.; Longo, F.; Merlino, G.; Puliafito, A. A Deep Learning-Driven Self-Conscious Distributed Cyber-Physical System for Renewable Energy Communities. Sensors 2023, 23, 4549. [Google Scholar] [CrossRef]
  8. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 20–22 April 2017; Singh, A., Zhu, J., Eds.; Volume 54, pp. 1273–1282. [Google Scholar]
  9. Niknam, S.; Dhillon, H.S.; Reed, J.H. Federated Learning for Wireless Communications: Motivation, Opportunities, and Challenges. IEEE Commun. Mag. 2020, 58, 46–51. [Google Scholar] [CrossRef]
  10. Husnoo, M.A.; Anwar, A.; Hosseinzadeh, N.; Islam, S.N.; Mahmood, A.N.; Doss, R. A Secure Federated Learning Framework for Residential Short-Term Load Forecasting. IEEE Trans. Smart Grid 2024, 15, 2044–2055. [Google Scholar] [CrossRef]
  11. Fernández, J.D.; Menci, S.P.; Lee, C.M.; Rieger, A.; Fridgen, G. Privacy-preserving federated learning for residential short-term load forecasting. Appl. Energy 2022, 326, 119915. [Google Scholar] [CrossRef]
  12. Fekri, M.N.; Grolinger, K.; Mir, S. Distributed load forecasting using smart meter data: Federated learning with Recurrent Neural Networks. Int. J. Electr. Power Energy Syst. 2022, 137, 107669. [Google Scholar] [CrossRef]
  13. Savi, M.; Olivadese, F. Short-Term Energy Consumption Forecasting at the Edge: A Federated Learning Approach. IEEE Access 2021, 9, 95949–95969. [Google Scholar] [CrossRef]
  14. Zhu, L.; Liu, Z.; Han, S. Deep leakage from gradients. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA, 2019. [Google Scholar]
  15. Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
  16. Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar] [CrossRef]
  17. Hu, R.; Guo, Y.; Gong, Y. Concentrated Differentially Private Federated Learning with Performance Analysis. IEEE Open J. Comput. Soc. 2021, 2, 276–289. [Google Scholar] [CrossRef]
  18. Sun, X.; Yuan, Z.; Kong, X.; Xue, L.; He, L.; Lin, Y. Communication-Efficient and Privacy-Preserving Aggregation in Federated Learning with Adaptability. IEEE Internet Things J. 2024, 11, 26430–26443. [Google Scholar] [CrossRef]
  19. Xue, R.; Xue, K.; Zhu, B.; Luo, X.; Zhang, T.; Sun, Q.; Lu, J. Differentially Private Federated Learning with an Adaptive Noise Mechanism. IEEE Trans. Inf. Forensics Secur. 2024, 19, 74–87. [Google Scholar] [CrossRef]
  20. Liu, M.; Teng, F.; Zhang, Z.; Ge, P.; Sun, M.; Deng, R.; Cheng, P.; Chen, J. Enhancing Cyber-Resiliency of DER-Based Smart Grid: A Survey. IEEE Trans. Smart Grid 2024, 15, 4998–5030. [Google Scholar] [CrossRef]
  21. Zhang, Z.; Liu, M.; Sun, M.; Deng, R.; Cheng, P.; Niyato, D.; Chow, M.Y.; Chen, J. Vulnerability of Machine Learning Approaches Applied in IoT-Based Smart Grid: A Review. IEEE Internet Things J. 2024, 11, 18951–18975. [Google Scholar] [CrossRef]
  22. Ma, X.; Zhu, J.; Lin, Z.; Chen, S.; Qin, Y. A state-of-the-art survey on solving non-IID data in Federated Learning. Future Gener. Comput. Syst. 2022, 135, 244–258. [Google Scholar] [CrossRef]
  23. Sattler, F.; Müller, K.R.; Samek, W. Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3710–3722. [Google Scholar] [CrossRef]
  24. Gao, Z.; Xiong, Z.; Zhao, C.; Feng, F. Clustered Federated Learning Framework with Acceleration Based on Data Similarity. In Algorithms and Architectures for Parallel Processing; Tari, Z., Li, K., Wu, H., Eds.; Springer: Singapore, 2024; pp. 80–92. [Google Scholar]
  25. Ghosh, A.; Hong, J.; Yin, D.; Ramchandran, K. Robust Federated Learning in a Heterogeneous Environment. arXiv 2019, arXiv:1906.06629. [Google Scholar]
  26. Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. An Efficient Framework for Clustered Federated Learning. IEEE Trans. Inf. Theory 2022, 68, 8076–8091. [Google Scholar] [CrossRef]
  27. Bun, M.; Steinke, T. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. In Theory of Cryptography; Hirt, M., Smith, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; pp. 635–658. [Google Scholar]
  28. Yu, L.; Liu, L.; Pu, C.; Gursoy, M.E.; Truex, S. Differentially Private Model Publishing for Deep Learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 332–349. [Google Scholar] [CrossRef]
  29. Geyer, R.C.; Klein, T.; Nabi, M. Differentially Private Federated Learning: A Client Level Perspective. arXiv 2017, arXiv:1712.07557. [Google Scholar]
  30. McMahan, H.B.; Ramage, D.; Talwar, K.; Zhang, L. Learning Differentially Private Recurrent Language Models. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  31. Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
  32. Shi, H.; Xu, M.; Li, R. Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
  33. Makonin, S. HUE: The hourly usage of energy dataset for buildings in British Columbia. Data Brief 2019, 23, 103744. [Google Scholar] [CrossRef]
Figure 1. Star topology.
Figure 1. Star topology.
Algorithms 18 00032 g001
Figure 3. Internal structure of LSTM.
Figure 3. Internal structure of LSTM.
Algorithms 18 00032 g003
Figure 4. The community division results of the Louvain algorithm.
Figure 4. The community division results of the Louvain algorithm.
Algorithms 18 00032 g004
Figure 5. Comparison of forecasting results. (a) Cluster 1. (b) Cluster 2.
Figure 5. Comparison of forecasting results. (a) Cluster 1. (b) Cluster 2.
Algorithms 18 00032 g005aAlgorithms 18 00032 g005b
Table 2. Model parameters.
Table 2. Model parameters.
ParameterValue
Number of epochs of training on clients8
Total communication rounds60
LSTM hidden layers2
Model structureLSTM layers with 256 hidden states
LSTM layers with 128 hidden states
Fully connected layers with 64 neurons
Fully connected layers with 32 neurons
Input feature dimensions27
Batch size128
Learning rate η 0.0015
Dropout probability0.2
Output feature dimensions15
Privacy budget ϵ 0.4, 0.6, 0.8
LossMSE
Table 3. Comparison of algorithm performance.
Table 3. Comparison of algorithm performance.
AlgorithmMSERMSEMAPE
DP-FedAvg0.2780.52733.5
pADP-FedAvg0.2050.45229.3
Table 4. Comparison of model performance.
Table 4. Comparison of model performance.
ModelEvaluation Metrics ϵ = 0.4 ϵ = 0.6 ϵ = 0.8
GlobalMSE0.2510.2050.192
RMSE0.5010.4520.438
MAPE35.529.319.7
Cluster 1MSE0.0420.0170.009
RMSE0.2050.1300.095
MAPE12.26.22.9
Cluster 2MSE0.0310.0210.015
RMSE0.1760.1450.122
MAPE15.18.53.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, T.; Hou, J.; Jin, X.; Li, C.; Cai, X.; Zhou, X. Differentially Private Clustered Federated Load Prediction Based on the Louvain Algorithm. Algorithms 2025, 18, 32. https://doi.org/10.3390/a18010032

AMA Style

Pan T, Hou J, Jin X, Li C, Cai X, Zhou X. Differentially Private Clustered Federated Load Prediction Based on the Louvain Algorithm. Algorithms. 2025; 18(1):32. https://doi.org/10.3390/a18010032

Chicago/Turabian Style

Pan, Tingzhe, Jue Hou, Xin Jin, Chao Li, Xinlei Cai, and Xiaodong Zhou. 2025. "Differentially Private Clustered Federated Load Prediction Based on the Louvain Algorithm" Algorithms 18, no. 1: 32. https://doi.org/10.3390/a18010032

APA Style

Pan, T., Hou, J., Jin, X., Li, C., Cai, X., & Zhou, X. (2025). Differentially Private Clustered Federated Load Prediction Based on the Louvain Algorithm. Algorithms, 18(1), 32. https://doi.org/10.3390/a18010032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop