1. Introduction
In modern power systems, accurate and efficient state estimation is of great significance for ensuring the safe and stable operation of the grid and dispatching power resources [
1]. However, with the increasing scale of active distribution networks (ADNs) and the increasing presence of distributed generators (DG), the accuracy of state estimation is increasingly affected by measurement deficiencies and random fluctuations in DG outputs [
2]. To meet this challenge one of the most promising approaches to address this challenge is forecasting-aided state estimation (FASE) [
3]. FASE consists of two stages: prediction and filtering. In the filtering stage, FASE combines forecasting and measurements to estimate the state using filtering algorithms based on the Kalman framework, such as the extended Kalman filter (EKF) [
4], unscented Kalman filter (UKF) [
5], ensemble Kalman filter (EnKF) [
6], and covariance particle filter (CPF) [
7]. The CPF is adopted due to its integration of the importance density function, which accounts for the latest measurement design, enhancing the algorithm’s filtering accuracy. The output of DG and active loads are influenced by various factors, such as weather variations, seasonal changes, and special events [
8], resulting in complex nonlinear relationships. Traditional methods struggle to precisely capture these nonlinear characteristics, consequently impacting the accuracy of FASE.
The modal decomposition technique is a signal processing method that decomposes the original state data into multiple intrinsic mode functions (IMFs), representing sub-signals with different frequency components and trend information, thus capturing the diversity and time-varying characteristics of the state signal [
9]. In recent years, modal decomposition techniques have gained considerable attention in the field of load forecasting in power systems [
10]. In [
11], photovoltaic power data were decomposed into a series of more stable and constant subsequences using the empirical mode decomposition (EMD) technique to deal with fluctuations in photovoltaic power data. In [
12], the CEEMDAN-SE-LSTM model was proposed, taking into account meteorological and holiday factors, to predict the short-term electricity load in Changsha, China. However, modal decomposition techniques have rarely been applied in the field of forecasting-assisted state estimation. In this study, we introduce modal decomposition methods into the FASE to decompose the ADNs state and extract components at different scales and frequencies, enabling a more accurate representation of the changing trends and periodic features of the ADNs state. However, the problem of modal aliasing caused by EMD and CEEMDAN methods can degrade the decomposition performance and lead to reduced forecasting accuracy. In [
13], this problem was solved using variational mode decomposition (VMD) to effectively decompose the load of the power system, avoiding modal aliasing. However, selecting appropriate decomposition parameters and determining the optimal number of modes often depends on subjective experience and lacks clear criteria and methods. This can lead to less accurate and stable decomposition results, which can affect the performance of subsequent forecasting models.
After modal decomposition of the ADNs state data, each mode represents different operating patterns or characteristics. Information from each model must be considered, and a combined model with appropriate weighting configuration must be constructed to improve forecasting accuracy and stability [
14]. Currently, common approaches for combined model prediction include averaging methods [
15], weighted averaging methods [
16], and voting methods [
17], which combine the predictions of individual models such as support vector machine regression (SVR) [
18], artificial neural networks (ANN) [
19], or long short-term memory (LSTM) [
20]. However, combined forecasting models based on average weights or voting lack effective utilization of performance differences between forecasting models and may result in reduced forecasting accuracy when there are significant performance differences between the individual models [
21]. To further construct an optimal combined forecasting model, the researchers proposed various ensemble learning methods. In [
22], an extreme weather identification and short-term load forecasting model was proposed based on the bagging–XGBoost algorithm, which can provide advance warning of peak load periods and detailed load values. However, the Bagging method only considers the average prediction results of the base learners, which may not fully utilize the potential correlations among the base learners, limiting the forecasting performance. In [
23], a deep learning-based ensemble stacking (DSE-XGB) method for solar photovoltaic power forecasting was proposed to improve forecasting accuracy. However, the stacking method requires multiple levels of model combinations involving complex model selection and parameter tuning processes, which increase the difficulty and computational complexity of model construction. On the other hand, the extreme random trees (ERT) ensemble method [
24] uses random features and thresholds for node splitting and can integrate the prediction of multiple decision trees, effectively addressing the limitations of the bagging and stacking methods.
In addition, due to the complex and high-dimensional nature of weather features and historical data in ADNs, traditional forecasting models struggle to effectively capture the relationships between features, resulting in lower performance on high-dimensional datasets. To address the challenge of incorporating the influence of multiple features on the changing state of ADNs, researchers have combined feature selection techniques with forecasting models. Currently, the commonly used feature selection method is the maximum information coefficient (MIC) [
25]. Compared to methods such as the Pearson coefficient, MIC can assess the strength of the correlation between two variables without the need for in-depth analysis, and it can also capture complex nonlinear relationships. To improve the computational efficiency of the MIC algorithm for two variables, a RapidMIC algorithm was proposed in the literature [
26]. However, the existing RapidMIC algorithm cannot simultaneously consider the impact of weather features on both DG output buses and load buses to obtain the information coefficient between them. In addition, this method does not fully consider the compatibility between feature selection and prediction models.
To solve the above problems, this paper proposes an optimal ERT ensemble-based FASE method for ADNs, using the VMD state decomposition based on maximum average energy concentration (MAEC). The contributions of this paper are summarized as follows:
By introducing the concept of energy concentration and using the adaptive sessile organisms optimization algorithm (ASSA) [
27], suitable modal numbers and decomposition parameters are determined to improve the accuracy and stability of the decomposition results;
A multivariate RapidMIC method based on Schmidt orthogonal decomposition (OMVRapidMIC) is proposed to analyze the correlation between the calendar rules, weather features, and the ADNs state data while retaining highly correlated features.
An ERT ensemble state prediction model is constructed based on LSTNet [
28], Transformer [
29], XGBoost [
30], and Prophet [
31]. An ensemble strategy optimization model based on root mean square error (RMSE) and mean absolute error (MAE) is established, and the optimal ensemble strategy is obtained using ASSA. After generating the forecast values, the final estimation results are obtained by combining them with the system measurement values using the CPF method.
The paper is structured as follows.
Section 2 presents the mathematical model for FASE.
Section 3 introduces the VMD state decomposition method based on MAEC.
Section 4 describes the OMVRapidMIC.
Section 5 provides the specific steps of the optimal ERT ensemble FASE method based on VMD state decomposition with MAEC for ADNs. In
Section 6, the proposed method is validated, and the results are discussed using the IEEE 118-bus standard distribution system as a test system. Finally,
Section 7 concludes this paper and identifies opportunities for future developments.
4. Feature Dimensionality Reduction
The influence of weather and date features on DG power output and load power variation in ADNs varies significantly. In terms of weather features, changes in wind speed have a significant impact on wind turbine output, while an increase in temperature affects the power consumption of air conditioning loads. Regarding date features, mornings and evenings during holidays are peak periods for residential electricity demand, while in other time periods, load power may be relatively low. As a result, DG units may need to provide additional power during peak periods and appropriately reduce output in other periods to save energy and reduce costs. If only weather and date features are taken into account in the analysis of DG output or load demand, this will lead to a decrease in the accuracy of FASE.
Therefore, this paper proposes a multivariate maximum information coefficient based on Schmidt orthogonalization (OMVRapidMIC) for analyzing the correlation between multiple variables, aiming to reduce the complexity of input features. A three-variable dataset 𝐷 = {(G, L), H} is defined, with H as the regional weather feature variable (or calendar rule variable), G as the state variable of DG nodes in the ADNs, and L as the state variable of load buses in the ADNs. The weather feature variable H is assigned to the X-axis, while DG bus state variables and load bus state variables {G, L} are assigned to the Y-axis. The X and Y axes are divided into grids, in which each grid represents a subset of data points. The ratio of the number of points falling into the corresponding grid to the total number of points is defined as the approximate probability density of that grid, thereby obtaining the mutual information 𝐼(G, L; H) between the weather feature variable H and the DG bus state L1 and load bus state L2. Subsequently, normalization is performed to obtain MVRapidMIC between H and {G, L} are calculated under dataset D. The specific steps for calculating MVRapidMIC are as follows:
- (1)
For a given three-variable dataset 𝐷 and positive integers r and s, where r, s ≥ 2, 𝐷 = {(G, L), H}, when the regional weather feature variable H, DG bus state variables, and load bus state variables {G, L} are divided into r and s blocks, respectively, the mutual information between them can be calculated as follows:
where
Pind(
i) represents the marginal probability density of the regional weather feature variable H divided into r grids,
Pd(
j) represents the marginal probability density of the DG bus state variables and load bus state variables {
G,
L} divided into
s grids, and
P(
ij) represents the joint probability density of the dataset 𝐷 divided into
r ×
s grids.
- (2)
The MVRapidMIC between the regional weather feature variable H and DG bus state variables and the load bus state variables {G, L} is provided by:
where
I(
D,
r,
s, {
G,
L};
H) represents the mutual information between the regional weather feature variable
H and DG bus state variables and the load bus state variables {
G,
L}. log2(min{
r,
s}) is the normalization factor, and
B(
n) is the maximum number of grid partitions, where
r,
s <
B(
n).
- (3)
Let H = {h1, h2, h3,..., hn} be the set of regional weather feature variables, and the target variables be G and L. Let Hγ = {h1, h2, h3,..., hγ} be the selected variable set after choosing γ variables, and hγ + 1 be the (γ + 1)-th variable candidate, where 2 ≤ γ + 1 ≤ n. The importance score of OMVRapidMIC is defined as:
where MVRapidMIC(
GSO(
hγ + 1,
Hγ), {
G,
L}) calculates the multivariate RapidMIC value between the independent feature information of
hγ + 1 with respect to the selected variable set
Hγ and the target variables {
G,
L}.
GSO(
hγ + 1,
Hγ) represents the Gram–Schmidt orthogonalization quantity of the candidate variable
hγ + 1 relative to the selected variable set
Hγ. Using the Gram–Schmidt orthogonalization method, irrelevant and redundant variables are effectively eliminated, ensuring that they do not contribute to the OMVRapidMIC scoring system. The calculation of
GSO(
hγ + 1,
Hγ) is provided by:
The “argmax(.)” in (13) represents the variable that corresponds to the maximum function value, i.e., the variable with the highest OMVRapidMIC value, which will be selected as the first variable. In (15), ‖Vγ + 1‖ represents the norm of the vector.
Then, the set
Hγ after
k variable selections is defined as follows:
After selecting all variables, the variable importance ranking can be obtained by placing the variables in the set in the order in which they were selected, considering the correlation and eliminating irrelevant redundancy.
Based on (10) to (16), the OMVRapidMIC values between state and calendar rules, weather conditions, and ADNs state data are calculated. Features with OMVRapidMIC values greater than 0.51 are retained from the calendar and weather features, while the remaining features are filtered out and removed.
5. Optimal ERT Ensemble FASE Method Based on MAEC VMD State Decomposition
5.1. Optimal ERT Ensemble State Forecasting Based on ASSA
ADNs features are diverse, and data information is complex. However, a single prediction model method has limitations on fitting performance and prediction accuracy and cannot exhibit satisfactory fitting performance in all state sample sets. Ensemble learning is a machine learning method that draws on the strengths of multiple models by training multiple base learners and combining their predictions to obtain a strong learner model. The ERT-based ensemble learning method consists of many decision trees. The branching feature attributes in the ERT algorithm are randomly selected, not obtained within subsets, thus avoiding overfitting. Moreover, each decision tree in the ERT algorithm uses all the data samples in the training set, enabling the efficient construction of a strong learner model through parallel training processes.
Therefore, this paper adopts an ensemble learning method based on ERT for ADNs state prediction. For mode components with significant periodic patterns, the LSTNet algorithm with strong time series modeling capabilities is chosen. For mode components with significant randomness, the Transformer algorithm is chosen because it is suitable for exploring nonlinear relationships in random state sequences. For mode components with significant seasonality, the Prophet algorithm is used because it can better capture seasonal patterns and trends. For mode components with significant trends, the XGBoost algorithm is chosen because it can capture complex trend changes and make more accurate predictions. After determining these four types of machine learning algorithms as base learners, a bootstrapping method is used to obtain differentiated training set samples and train base learners. The final strong learner prediction model is obtained by combining the base learners. This study proposes the idea of optimal ERT ensemble and based on RMSE and MAE, constructs a model for optimizing ensemble weights of base learners. Optimization is performed using the ASSA algorithm to obtain the optimal ensemble weighting strategy and final prediction results among the base learners. This method makes full use of the performance differences and complementary abilities among the base learners and can obtain the optimal ensemble strong learner model through simple optimization for different datasets, outperforming traditional average ensemble methods in terms of ensemble performance. The model for optimizing the ensemble weights of base learners based on RMSE and MAE performance metrics can be expressed as:
where
eRMSE and
eMAE represent the root mean square error and mean absolute percentage error of the ensemble prediction, respectively. Smaller values of
eRMSE and
eMAE indicate better state prediction performance, and both values range from [0, 1].
ϖ1,
ϖ2,
ϖ3, and
ϖ4 are the ensemble weights of the LSTNet, Transformer, XGBoost, and Prophet base learners, respectively.
Ns represents the number of samples.
vLSTNet,i,
vTransformer,i,
vXGBoost,i, and
vProphet,i are the predicted values of the
i-th state sample by LSTNet, Transformer, XGBoost, and Prophet base learners, respectively.
vreal,i represents the true value of the
i-th state sample.
To solve the ensemble weight optimization model in Equation (17) for the state ensemble prediction model in the ADNs, Equations (A5) to (A10) are used to obtain the optimal ensemble strategy for the state ensemble prediction model in the ADNs. This results in the optimal state prediction method of the ERT ensemble for the ADNs.
5.2. Overall FASE Method
Considering the insufficient theoretical basis of FASE state decomposition in ADNs and the underutilization of differentiated performance among ensemble prediction models, an optimal ERT ensemble short-term FASE method considering MAEC state decomposition is proposed. The specific framework is shown in
Figure 1 and can be divided into four modules.
Module 1: Construct the optimization model for VMD decomposition parameters based on MAEC. Solve the optimization problem using the ASSA algorithm in Module 2 to obtain the optimal number of decomposition modes K and the quadratic penalty factor α for the VMD decomposition of the ADNs state. Then, the decomposed dominant periodic state components (IMFs) and the residue are transferred to Module 4.
Module 2: In the nested relationship with Module 1, the ASSA algorithm in Module 2 is used to solve the VMD decomposition parameter optimization problem and obtain the optimal number of decomposition modes K and the quadratic penalty factor α. In the nested relationship with Module 4, the ASSA algorithm in Module 2 is used to solve the optimization problem for ensemble learning weights and obtain the optimal ensemble strategy for the base learners.
Module 3: Calculate OMVRapidMIC among calendar rules, weather features, and ADNs state data. For ADN state variables, retain highly correlated features from calendar rules and weather features, and filter out remaining features, including temperature, relative humidity, visibility, atmospheric pressure, wind speed, cloud coverage, wind direction, and precipitation intensity (eight categories). The calendar rules consider month, day, hour, week type, and holiday (five categories). Input feature sequences are defined in
Table 1. Transfer the processed feature information to Module 4.
Module 4: The processed IMF states, residual components, calendar rules, weather features, and ADNs state data are combined to form the ADNs sample dataset. LSTNet, Transformer, XGBoost, and Prophet models are built as basic learning models for state prediction. Basic learning models are trained using the bootstrap sampling method. A weighted optimization model for integrating basic learning models is constructed based on RMSE and MAE and is solved using the ASSA algorithm from Module 2. This yields the optimal ensemble strategy and the ensemble strong learner model. Finally, the prediction results are obtained, and combined with the measurement values, the state estimation is performed using CPF.
6. Simulated Examples
To validate the superiority of the proposed algorithm in large-scale power systems, simulation tests are conducted based on the IEEE 118-bus system, as shown in
Figure 2. The load data are obtained from a real-world household electricity dataset provided by the Massachusetts Institute of Technology (MIT). Specifically, it covers the period from 1 January 2016, 00:00, to 14 December 2016, 23:00, with a time granularity of 15 min [
33]. The load data are multiplied by the reference load values of each load node and the multi-section power flow true values are obtained based on power flow calculation methods. Gaussian white noise is added to the true power flow values to generate historical multi-section measurement data, forming a historical measurement database. Weather data are derived from the publicly available local meteorological information on the website of the National Renewable Energy Laboratory (NREL) in the United States [
34]. The computer processor used for training is an Intel Core i7-10800 CPU @ 3.2GHz, Nvidia GeForce GTX 1650ti (4 GB) (Santa Clara, CA, USA), and the memory is 16 GB.
To ensure full coverage of monthly feature information in training samples and improve the fitting performance of the prediction model, the monthly data are divided into training, validation, and testing sets in a 7:2:1 ratio, which are combined to form the annual training, validation, and testing sets. The 7:2:1 splitting ratio is adopted to maintain a reasonable sample distribution among the training, validation, and test sets, ensuring scientific rigor and effectiveness:
Training set (70%): The training set constitutes 70% of the total data and is used for model training. A larger training set helps the model learn data features and patterns, improving its fitting performance.
Validation set (20%): The validation set accounts for 20% of the total data and is utilized for model tuning and performance validation. Different parameter tests and selections are performed on the validation set to optimize the model’s hyperparameters, prevent overfitting, and enhance its generalization ability.
Test set (10%): The test set represents 10% of the total data and is employed to assess the model’s final performance. The model’s performance on the test set provides an objective evaluation of its predictive efficacy in real-world scenarios, thereby validating the model’s accuracy and reliability.
The first half of the validation set is used for hyperparameter optimization of the prediction model, while the second half is used for ensemble weight optimization. The proposed prediction methods are validated using a rolling prediction approach with a 15-minute prediction step, resulting in a total of 33,504 data samples available for simulation. Input features of the proposed state forecasting model include the state IMF set, calendar rules, and meteorological features, as detailed in
Appendix C.
6.1. Data Preprocessing
The presence of outliers in the original ADNs state data can significantly affect the accuracy of state estimation. Therefore, the random forest algorithm is used to identify and remove the outlier data, and the missing values are filled out using the average value of neighboring samples.
Random forest consists of multiple decision trees, each based on different data subsets and feature subsets. For each sample data, a prediction is made in the random forest model, resulting in an anomaly score for that sample. The anomaly score reflects the sample’s abnormality relative to other samples. Typically, the anomaly score can be obtained by calculating the average path length or the standardized average path length of the sample across all decision trees. Based on the distribution of anomaly scores, an anomaly threshold can be set to determine anomalies. Samples with scores exceeding this threshold are considered anomalies. For new samples, anomaly detection is performed using the set threshold. If the anomaly score of a new sample exceeds the threshold, it is labeled as an anomaly. Specifically, let
Y be a sample in the dataset and
RF be the random forest model comprising
κ decision trees, the anomaly score (
AS)
Y of the sample can be calculated as follows:
where
RL(
Y,
RFκ) represents the path length of sample
Y in the
κ-th decision tree
RFκ. The higher the anomaly score, the more abnormal the sample is. In this paper, 200 decision trees are set, and the subsample size is 512. For the identified measurement anomalies, a simple voting process is conducted, and they are sorted. The replacement is carried out using pseudo measurements generated by the predictive model.
To avoid the adverse effects of dimensional differences between input features on model training, the Z-Score algorithm is applied to normalize the model input features uniformly.
6.2. Feature Dimensionality Reduction
Massachusetts has a warm summer and cold winter with evenly distributed precipitation, belonging to a temperate continental climate. According to
Table 2, it can be observed that the ADNs state variables are highly correlated with features such as temperature, visibility, wind speed, cloud cover, and precipitation in the meteorological environment. In addition, the ADNs state variables are highly correlated with features such as months, hours, and holidays in calendar rules. These results are consistent with the actual climate conditions and geographic location of Massachusetts, validating the rationality and effectiveness of the proposed OMVRapidMIC feature dimensionality reduction method.
6.3. Comparison of Model Parameter Settings and Evaluation Metrics
To thoroughly validate the performance of the proposed method for FASE in ADNs, we conducted comparative experiments. In terms of state prediction models, we compared LSTNet, Transformer, XGBoost, Prophet, and the average ensemble prediction model. For the state decomposition methods, we compared no decomposition, traditional VMD decomposition, EMD decomposition, and CEEMDAN decomposition. In terms of overall estimation models, we compared traditional FASE models such as Holt’s and deep neural networks (DNN). We verified the accuracy, applicability, and effectiveness of the proposed method from the perspectives of prediction models, decomposition methods, and estimation models. The parameters for each comparison model can be found in
Appendix D.
6.4. Comparison of Model Parameter Settings and Evaluation Metrics
To validate the accuracy of the proposed optimal extreme random tree ensemble prediction model, we selected samples from the test set for comparative analysis with different prediction models. The average ensemble prediction model is denoted as an M-extra tree, while the proposed optimal ensemble prediction model is denoted as an O-extra tree.
Figure 3 shows the comparison of voltage magnitude prediction results for test sets in March, June, September, and December 2016. Voltage phase prediction results for test sets in March, June, September, and December 2016 can be found in
Appendix E. Evaluation metrics for year-round test set prediction and representative seasonal test set prediction results are provided in
Table 3 and
Appendix E, respectively.
Compared with the four individual prediction models, XGBoost, Prophet, Transformer, and LSTNet, the proposed optimal ensemble prediction model optimizes and adjusts the ensemble weights of each individual model based on evaluation metrics, achieving the optimal combination of individual models. This approach effectively avoids the problem of insufficient fitting performance of individual models and further improves prediction accuracy. As shown in
Figure 3, in the voltage magnitude prediction of test sets for four different months, the predictions of the three individual models are similar, while the optimal ensemble prediction model fully exploits the potential of ensemble integration among the individual models, resulting in superior prediction results. According to
Appendix E Table A3, the proposed optimal ensemble forecasting model achieves MAE values of 9.1078 × 10
−5, 1.7872 × 10
−4, 9.3772 × 10
−5, and 1.8911 × 10
−4 in voltage magnitude forecasting for March, June, September, and December, respectively. In voltage phase forecasting, the MAE values are 3.5369 × 10
−3, 3.7371 × 10
−3, 3.6911 × 10
−3, and 3.7017 × 10
−3, lower than any individual forecasting model. Moreover, the RMSE values in voltage magnitude and phase prediction for March, June, September, and December are also the smallest, validating the effectiveness and accuracy of the proposed optimal ensemble prediction model compared to individual models.
Compared with the average ensemble prediction model, the proposed optimal ensemble prediction model has a more flexible weighting scheme, avoiding the adverse effects of poor performance of individual models on the ensemble prediction model. As shown in
Figure 3, in voltage magnitude forecasting, the predictions of the XGBoost and Prophet models deviate significantly from the true values, while the predictions of the Transformer and LSTNet models are closer to the true values, with MAE values of 2.6661 × 10
−3, 1.9451 × 10
−3, 1.7239 × 10
−3, and 1.0920 × 10
−3, respectively. The traditional average ensemble forecasting model assigns equal weights to the four models, resulting in the overall forecasting performance being heavily influenced by the XGBoost and Prophet models, with MAE and RMSE values of 4.7705 × 10
−4 and 5.7173 × 10
−4, respectively. On the other hand, the proposed optimal ensemble forecasting model effectively mitigates the adverse effects caused by the poor performance of the XGBoost and Prophet models, yielding MAE and RMSE values of 9.5620 × 10
−5 and 1.4902 × 10
−4, respectively. The MAE value is reduced by 79.9559%, and the RMSE value is reduced by 73.9352% compared to the average ensemble prediction model, further confirming the effectiveness and accuracy of the proposed optimal ensemble prediction model compared to the average ensemble prediction model.
6.5. Comparison with Different State Decomposition Methods
Optimizing the number of decomposition modes and the quadratic penalty factor of VMD using ASSA with the objective of maximizing the MAEC, the VMD decomposition parameters for voltage magnitude are [K = 8, α = 1500], and for voltage phase are [K = 8, α = 1240]. The corresponding maximum average energy concentrations are 0.8691 and 0.7438, respectively. The detailed decomposition results for both states are provided in
Appendix F.
For comparison, the state decomposition methods considered include no decomposition, traditional VMD decomposition, EMD decomposition, and CEEMDAN decomposition. The corresponding evaluation metrics for the forecast results are shown in
Table 4.
Due to the problem of mode mixing in the EMD decomposition method, the state change features cannot be well extracted. In addition, compared to the non-decomposition approach, the EMD decomposition method introduces more complexity in the input features for the corresponding prediction model, making the model fitting more challenging. As shown in
Table 4, the EMD method has higher MAE and RMSE values in state prediction than the non-decomposition approach. Although the non-decomposition approach avoids the mode mixing problem of the EMD method, the limited input features prevent the prediction model from effectively capturing randomness and fluctuations in the original load signal. Therefore, the performance of the non-decomposition approach is limited and only outperforms the prediction model corresponding to the EMD method.
As an improvement over the EMD method, the CEEMDAN method effectively reduces the mode mixing phenomenon by adding matched positive and negative Gaussian white noise. Thus, compared to the non-decomposition approach, the CEEMDAN method achieves better forecasting performance. In state forecasting, the combined CEEMDAN method with the O-extra tree yields MAE and RMSE values of 9.7525 × 10−5 and 1.6113 × 10−4, respectively, which are very close to the MAE and RMSE values of the proposed state decomposition method.
However, based on the VMD state decomposition method considering the maximum average energy concentration, when combined with different prediction models, it achieves better prediction performance than the other four state decomposition methods, validating the applicability of the proposed state decomposition method.
6.6. Comparisons with Other Forecasting-Assisted State Estimation Methods
To validate the effectiveness of the proposed FASE method, a comparative analysis is conducted with other FASE methods. The results are shown in
Figure 4 and
Table 5.
Based on
Figure 4 and
Table 5, it can be observed that the proposed method achieves the highest estimation accuracy, with evaluation metrics one order of magnitude lower than other algorithms. This is because the proposed method incorporates the consideration of maximum average energy concentration in state decomposition and model ensemble prediction, which allows better handling of data complexity and uncertainty, thus improving the accuracy and stability of state estimation.