4.2. Hierarchical Division of Influencing Indicators
In practical situations, there is sometimes a discrepancy between the official data and observed data, which may be related to data instability and the time lag in the release of government data. For instance, in the parameter of self-generated electricity (), the value given in 2021 was the same as that in 2020, but by 2022, it had increased by 19.5% compared to 2020 and 2021. Specifically, some indicators might remain relatively stable for several consecutive years before experiencing a significant increase or decrease in a particular year. This phenomenon may stem from inconsistent data update frequencies. Government agencies or data providers might release data irregularly, resulting in significant updates in some years. Additionally, changes in data collection methods or definitions can cause data discontinuity and nonlinearity.
To address issues such as policy effect lag, lack of historical data support, and data discontinuity in influencing factors, as well as to determine the quantitative relationship between long-term electricity demand, policy objectives, and meteorological factors, we organize the hierarchy of policy influencing factors by tracing the paths to policy objectives and calculating the relationships among influencing factors, following the classification of meteorological influences by Ahmed et al. [
13].
(1) Primary Indicators
Primary indicators are factors closely related to electricity demand. Although their data may not be frequently updated, their relationship with electricity demand is more pronounced. The electrification rate is chosen as an indicator of the proportion of electricity in end-use energy consumption, reflecting the trend of energy consumption shifting towards electricity, directly influenced by energy conservation and emission reduction policies. Gross Domestic Product (GDP) and energy consumption intensity are selected to assess energy efficiency in economic activities. Energy consumption intensity, focusing on the amount of energy used per unit of GDP, is a key indicator for measuring the effectiveness of policies like energy conservation and emission reduction. The amount of self-generated electricity reflects the acceptance of self-generation systems by consumers and their capacity to interact with the grid, directly related to the effectiveness of distributed energy and demand response policies.
(2) Secondary Indicators
Secondary indicators, as an intermediate layer, link primary and tertiary indicators. They are reasonably feasible in terms of data updates and have a certain relationship with electricity demand. For instance, the electricity substitution rate describes the proportion of electricity substituting other energy sources (such as coal, oil), reflecting the depth of the energy structure adjustment driven by policies. The renewable energy penetration rate shows the share of renewable energy in the total energy supply. The proportion of the secondary industry in GDP reflects the impact of industrial upgrading policies and measures to enhance energy consumption intensity. Proportions such as distributed power installation, fuel cell installation, and power storage installation capacity serve as manifestations of self-generated electricity in policy objectives. According to research by Ahmed, temperature is the most significant factor affecting electricity consumption, impacting evaporation, humidity, and atmospheric pressure. Therefore, correlations exist between temperature and these three weather variables.
Consequently, the electricity substitution rate, renewable energy penetration rate, proportion of the secondary industry in GDP, proportion of distributed power installed capacity, proportion of fuel cell installations, proportion of energy storage installed capacity, humidity, and atmospheric pressure are selected as secondary micro indicators.
(3) Underlying/Tertiary Indicators
Tertiary indicators are more specific and operational factors, with relatively frequent data updates, and they significantly capture changes in electricity demand. The price of new energy storage, a key factor influencing the proliferation of new energy storage devices, profoundly affects the distributed power installation proportion, fuel cell installation proportion, and power storage installation capacity. The proportion of the facility’s energy-saving retrofit investment in GDP, the proportion of the power distribution network construction investment in GDP, and the proportion of low-carbon technology R&D investment in GDP are major measures to promote low-carbon transition and industrial structure optimization and upgrading. Power storage installation subsidies, distributed power installation subsidies, and fuel cell installation subsidies directly relate to enhancing the renewable energy coverage rate and are specific manifestations of government policies to promote energy transformation. According to Ahmed’s research, humidity affects rainfall, and atmospheric pressure affects wind speed.
Therefore, new energy storage prices, energy-saving retrofit investments in facilities as a proportion of GDP, investments in power distribution network construction as a proportion of GDP, R&D investments in low-carbon technology as a proportion of GDP, energy storage installation subsidies, distributed power installation subsidies, fuel cell installation subsidies, rainfall, and sunshine duration are selected as underlying indicators.
This multilevel analytical framework helps us to understand changes in policy indicators and examine data anomalies from a more comprehensive perspective, reducing fluctuations caused by single factors.
4.3. Historical Patterns of Total Electricity Demand and Selected Key Indicators
Historical monthly electricity demand was plotted against selected primary indicators. If we found a linear relationship between demand and any primary indicator, a simple linear regression would be adequate, making multiple regression unnecessary. Using historical data from 2015 to 2020, the Pearson correlation coefficients [
52] between the primary indicators and electricity demand were calculated. To calculate the Pearson correlation coefficients, we ranked these indices based on their impact on electricity demand, as detailed in
Section 4.2. We used these rankings to assess the interconnectivity of the indices, focusing on relationships beyond mere numerical values. This helps in understanding how these factors interact with each other and how they evolve over time without overly focusing on their specific numerical values.
The relationships between electricity demand and various primary indicators are depicted in
Figure 5. Using bivariate correlation analysis, we highlighted the relationships between different variables, noting that none displayed a linear relationship with demand. However, cooling degree days (CDD), heating degree days (HDD), electrification rates, energy consumption intensity, and gross product value seem to have a positive correlation with electricity demand. On the other hand, self-generated electricity appears to have a negative correlation with electricity demand.
Table 4 presents the specific Pearson correlation coefficient matrix. If the correlation coefficient between two factors is close to 1, it indicates a strong positive correlation between them; if the correlation coefficient is close to −1, it indicates a strong negative correlation, meaning one factor increases as the other decreases. If the correlation coefficient is close to 0, it suggests a weak relationship, possibly with no significant linear correlation. Based on the results of the correlation coefficients, those factors with strong linear correlations are removed to avoid the issue of multicollinearity. From
Table 4, it can be seen that multiple variables are needed to establish a relationship with electricity demand. Multiple regression analysis will help in connecting only those variables that are significantly related to demand and in excluding non-significant variables. It will also assist in quantifying the impact of individual variables on the demand pattern.
In addition to the correlation coefficients between the main variables and power demand, the pairwise correlation coefficients among the variables are less than 0.4, indicating that the correlations among the variables are not high. Therefore, multicollinearity is avoided in the model. Furthermore, multivariate regression analysis needs to be conducted on power demand, followed by the construction of a multivariate linear regression model. This study employs logarithmic variables, and the model is defined by the following equation:
Here, represents the power demand of the industrial sector in year t, represents the gross output value, represents the electrification rate, represents the energy consumption intensity, represents heating degree days, represents cooling degree days, represents self-generated electricity, c is the intercept, and is the random error term in the model, reflecting the impact of other factors not considered.
Subsequently, a difference equation that reflects the changes over time points in the dynamic system is constructed as follows [
53]:
where
is the state of the system at time
t, and
is the state of the system at the previous time
. The integral represents the cumulative effect of changes in the state
X over time, where
is the change in
X at time
t. This integral term can be considered as the cumulative effect of the system state in the time series.
According to Equation (
3), introducing the difference equations for each indicator, the calculation formula for power demand can be further expressed as:
where,
For policy and meteorological factors, their modes of influence might differ; policy factors have a cumulative effect, meaning their impact accumulates over time, while meteorological factors have immediacy and transience, with their impact occurring immediately as meteorological conditions change. Therefore, meteorological factors are considered instantaneous variables and are not subjected to cumulative treatment. Here, , , , and represent the adjustment amounts for the primary indicators in year t, while represents the adjustment rate of each factor, which will be derived through the established secondary and underlying regression equations.
4.4. Analysis of Correlation of Influencing Indicators
To ensure the effectiveness of the influencing indicators and the electricity demand forecasting model, considering that past studies often did not incorporate both policy factors and meteorological factors into the power demand forecasting model, we need to conduct a correlation analysis experiment to further explore the relationships between these indicators. The results of the correlation coefficients between the secondary indicators and primary indicators are shown in
Table 5, where all secondary indicators are significantly correlated with at least one primary indicator.
The results of the correlation coefficients between the underlying indicators and the selected secondary indicators are shown in
Table 6, where all underlying indicators are significantly correlated with at least one secondary indicator.
In multivariate regression analysis, in addition to calculating correlations, it is also necessary to compute collinearity. Indicators at the same level should not exhibit collinearity, as it could increase the variance of the predictor variables, affecting the accuracy of the model parameters. To quantitatively assess this collinearity, we used the variance inflation factor (VIF) to measure the degree of multicollinearity for each variable. The VIF of each variable is calculated as follows:
where
is the coefficient of determination for the regression model, where variable
is treated as the dependent variable and all other variables as independent variables.
Through the VIF analysis of the indicators, we found that the VIF values for the fuel cell installation proportion and energy storage installed capacity proportion, as well as for fuel cell installation subsidies and energy storage installation subsidies, are significantly higher than 5. This indicates a strong collinearity between these pairs of variables.
Based on these findings, we decided to exclude the fuel cell installation proportion and fuel cell installation subsidy from the regression model. This decision is based on reducing collinearity in the model to improve the accuracy of the model estimates. The filtered hierarchy of electricity demand indicators is shown in
Figure 6.
4.5. Electricity Demand Regression Model
The hierarchical structure of the influencing factors can be observed from
Figure 6, where upper-level influencing factors are influenced by multiple underlying factors. Based on historical data, multivariate regression analysis can eliminate multicollinearity among influencing factors and quantify the impact weights of different influencing factors, thus obtaining a quantitative predictive equation with explanatory power for reality. The quantitative relationship model is as follows:
where
represents the initial data of the i-th indicator in year
t,
represents the initial data of the i-th indicator of the next-level indicator in year
t,
represents the regression coefficient of the i-th indicator, and
represents the constant term of the multivariate regression equation. Here,
represents the calculation function of the rate of change of
in year
t, with the formula as follows:
Taking the electrification rate regression equation as an example, based on the indicator correlation screening in
Section 4.4, it is found that the primary indicator, electrification rate, is correlated with the secondary indicators electricity substitution rate, proportion of energy storage installed capacity, and proportion of distributed power installed capacity. Therefore, the regression equation for the electrification rate can be expressed as follows:
where
represents the electrification rate of the industrial sector in year
t,
represents the electricity substitution rate of the industrial sector in year
t,
represents the proportion of distributed power installed capacity in the industrial sector in year
t, and
represents the proportion of energy storage installed capacity in the industrial sector in year
t.