1. Introduction
With the intensifying global warming problem, CDE management has become a pivotal issue in the global battle against climate change and the advancement of sustainable development. Accurate CDE predictions are crucial for formulating effective emission reduction policies, optimizing energy structures, adjusting industrial frameworks, and fostering technological innovation. Consequently, CDE prediction is at the forefront of environmental science research and is an indispensable component of global environmental governance and sustainable development planning.
In models based on neural networks, the selection of key influencing factors significantly impacts both the accuracy and reliability of the prediction results. Therefore, a comprehensive analysis of the correlation between CDE and relevant influencing factors, as well as an exploration of their underlying mechanisms, is essential for improving the accuracy of the CDE predictions. In terms of CDE management, elucidating the intricate relationships between CDE and its influencing factors is also of great significance for the formulation and implementation of carbon management policies.
This study selects the Bohai Rim region in China as the research subject, which primarily comprises four provinces and cities: Tianjin, Hebei, Shandong, and Liaoning. The main reasons for choosing this region are as follows:
- (1)
Economic Significance: the Bohai Rim region is one of China’s key economic areas, characterized by developed industries and a significant economic output, which exerts a notable influence on the country’s carbon emissions.
- (2)
Industrial Structural Similarity: These four provinces and cities share certain similarities in their industrial structures, especially in heavy industry and chemical industry. This study aims to better understand the overall carbon emission characteristics of the Bohai Rim region.
- (3)
Data Availability: Given the feasibility of the research and the completeness of the data, we utilized panel data spanning from 1999 to 2021 for this region, relying on reference data sources from Guan, Y. et al. [
1] and websites [
2,
3]. This ensures that sufficient data are available to support the analysis throughout the research process.
To investigate the intricacies of provincial CDE predictions in China’s Bohai Rim region, influenced by a multitude of complex factors, we have adopted a dynamic multi-factor correlation analysis method. This approach aims to improve the prediction accuracy and address regional variations. Additionally, it meets the scientific requirements for informing carbon reduction strategies. With this method, we seek to contribute to the existing literature on CDE prediction and management, ultimately facilitating practical applications in environmental science and policy formulation. To achieve this objective, the following questions will be addressed:
- (1)
What are the mechanisms by which different categories of influencing factors affect CDE?
- (2)
How can the impact of influencing factors on CDE be dynamically described?
- (3)
How can the differences in the effects of the mechanisms affecting CDE across regions be quantified?
- (4)
How can key influencing factors be selected to improve prediction accuracy?
The remainder of this paper is organized as follows:
Section 2 reviews the influence mechanism of different categories of factors on CDE and their representation.
Section 3 describes the dynamic correlation features between CDE and its influencing factors.
Section 4 analyzes the consistency of influencing factors according to different features.
Section 5 compares the prediction accuracy of models using influencing factors with different features.
Section 6 presents the discussion, and
Section 7 concludes the paper.
2. Literature Review
The current prediction of CDE often incorporates various categories of factors, including economic development, urbanization, and technological advancements. As highlighted in [
4,
5,
6], exploring the mechanisms of the influencing factors on CDE and conducting in-depth identification of key influencing factors significantly impact the accuracy of CDE predictions.
The relationship between economic development and CDE is evident in the correlation observed between economic growth and CDE across various regions and stages of development, further manifesting in regional disparities and individual differences in CDE levels. Multiple factors, such as economic growth, industrial structure, energy composition, and policy formulation, influence this correlation. Abid, M. [
7] investigated the positive correlation between economic development and CDE. Liao, H. et al. [
8] validated the nonlinear relationship between economic growth and CDE. The Environmental Kuznets Curve (EKC) hypothesis suggests that as levels of economic development increase, environmental pollution initially rises and then declines, following an inverted U-shaped trajectory. However, Mikayilov, J.I. et al. [
9] analyzed the long-term impact of economic growth in Azerbaijan using various cointegration methods while proposing that the EKC hypothesis may not be applicable in that specific context. Furthermore, the relationship between economic development and CDE demonstrates significant variations across different regions and individuals. An analysis by Li, W. et al. [
10] showed that countries form distinct higher-order clusters in terms of their relationships with economic development and CDE, reflecting differences in stages of economic development, energy structures, policy frameworks, and other factors among nations. Both industrial structure and energy composition have a substantial impact on CDE; a higher proportion of heavy industry and high-energy-consuming sectors often results in increased CDE. Nie, Y. et al. [
11] examined the disparities in industrial structure among the eastern, central, and western regions of China, leading to divergent trends in CDE.
Urbanization significantly influences CDE by driving infrastructure development, population aggregation, and the expansion of economic activities. On the one hand, the increase in energy consumption and changes in the industrial structure resulting from urbanization often lead to an increase in CDE [
12,
13]. On the other hand, urbanization also fosters technological innovation and enhances energy efficiency, which can help mitigate the growth of CDE to some extent [
14,
15]. Zhang, Y. et al. [
16] conducted research using Beijing as a case study to examine the effects of policy interventions during the urbanization process on CDE, highlighting the critical role of strategic urban planning and effective governance practices in reducing CDE. Wang, S. et al. [
17] examined the various effects of urbanization on CDE using panel data analysis. Furthermore, Abdallh, A.A. et al. [
18] highlighted the mediating role of energy consumption in shaping the relationship between urbanization and CDE, emphasizing that improving energy efficiency is essential for curbing the growth of CDE. Meanwhile, Musah, M. et al. [
19] revealed the importance of optimizing the industrial structure and pursuing low-carbon transformations for reducing CDE. Notably, significant differences exist in how urbanization relates to CDE across various regions and countries. Li, J. et al. [
20] confirmed that factors such as economic structure, energy efficiency, and policy environment have different impacts on CDE across regions, while Wang, Y. et al. [
21] identified a nonlinear relationship between levels of urbanization and CDE among different countries.
Technological innovation serves as an effective strategy to address the increase in CDE by reducing energy consumption, enhancing energy efficiency, and effectively driving economic growth activities. However, the impact of technological innovation on CDE exhibits regional disparities. For instance, in China, effects vary across eastern, central, and western regions [
22]; similarly, in Malaysia, although specific regions are not explicitly distinguished, it can be inferred that differing levels of technological innovation and environmental conditions may result in variations in emission reduction outcomes [
23]. Consequently, the relationship between technological innovation and CDE is complex and multidimensional, particularly in developing economies. Research conducted by Cheng, S. et al. [
24] indicated that technological innovations in renewable energy positively affect CDE intensity in low quantile regions while exerting a negative impact in high quantile areas; conversely, the effect of fossil fuel-related innovation is the opposite. Zhang, M. et al. [
25] performed studies utilizing provincial panel data from China to investigate how technological innovation influences CDE. Their findings revealed that such innovations indirectly contribute to reductions in CDE through improvements in energy efficiency and exhibit spatial spillover effects on neighboring provinces. Ali, W. et al. [
23] identified a bidirectional causal relationship between energy consumption and economic growth, as well as between economic growth and technological innovation in the short term. Furthermore, Erdogan, S. [
26] emphasized that CDE is a cumulative variable influenced by historical values, thus requiring consideration of dynamic effects. Eslamipoor, R. et al. [
27] proposed a green supply chain model under a carbon cap and highlighted the critical role of policymakers and the importance of setting allowable emission limits.
In summary, economic development and industrial structure are the primary determinants of CDE, particularly in developing countries [
28,
29]. Meanwhile, the advancement of urbanization, accompanied by increased energy consumption and transport-related activities, has a significant impact on CDE [
30]. Furthermore, technological advancements have become a crucial factor influencing CDE by optimizing industrial and energy structures [
31]. The intricate interactions between these factors, along with other relevant elements, collectively determine the trajectory and magnitude of CDE trends [
32].
Currently, the primary approaches for describing the relationship between CDE and its influencing factors predominantly encompass statistical analysis, decomposition analysis, grey relational analysis, and artificial intelligence analysis. Statistical analysis commonly employs techniques such as correlation, regression, factor, and principal component analysis. Raihan, A. et al. [
33] utilized Dynamic Ordinary Least Squares (DOLS) and Canonical Correlation Regression (CCR) to analyze the dynamic implications of factors such as economic growth and energy consumption on CDE, revealing the causal relationships among these factors. Wang, Z. et al. [
34] identified the principal influencing factors and their interactions using factor analysis (FA) and subsequently employed a Bayesian Neural Network (BNN) to capture the nonlinear relationship between the inputs and CDE. Chang, L. et al. [
35] exploited the capability of Projection Pursuit Regression (PPR) in handling high-dimensional data and extracted the most critical information for predicting CDE from large-scale datasets. Yang, H. et al. [
36] transformed the complex CDE time series into manageable components through decomposition and reconstruction, and subsequently employed a deep learning model to capture the patterns of each component and make predictions. Ding, Y.K. et al. [
37] quantified the influence of technology-upgrading policies on CDE through factor analysis and expressed the flows of CDE between different regions and industries based on Graph Representation Learning (GRL), thereby predicting CDE. Chen, Y.X. et al. [
38] captured the spatiotemporal correlations of CDE by means of a hybrid deep learning model integrating a Gated Recurrent Unit (GRU) and Graph Convolutional Network (GCN), thereby accomplishing CDE prediction.
Based on the above analysis, the potential limitations of the current research are as follows:
- (1)
Regarding the studied regions and datasets: Some research focuses on carbon emission studies in specific areas, such as in Egypt or certain urban agglomerations in China. These studies often rely on limited datasets, potentially failing to adequately capture the variations and similarities across diverse regions.
- (2)
Regarding the depth and breadth of influencing factor analysis: some studies may primarily concentrate on the impact of a few key influencing factors (e.g., economic growth and energy use) on carbon emissions, resulting in a somewhat oversimplified comprehension of the matter.
- (3)
With respect to models and methodologies: although these methods may excel in addressing simple or linear relationships, they may face limitations when dealing with complex, nonlinear carbon emission data.
- (4)
In terms of the practicality and specificity of policy recommendations: the policy suggestions proposed in some research may indeed be relatively high-level or general, lacking specific implementation plans tailored to particular regions or situations.
The primary objective of this study is to comprehensively analyze the dynamic evolution relationship between various influencing factors and CDE, specifically manifested in the following aspects:
- (1)
Dynamism: distinct from static correlation analysis, this study employs a sliding window technique to capture the time-varying relationships between carbon emissions and various influencing factors.
- (2)
Multi-factor Analysis and New Indicators: This study simultaneously considers multiple influencing factors, thereby providing a more comprehensive understanding of the drivers of carbon emissions. Furthermore, we introduce two novel indicators: the Consistency Index of Influencing Factors (CIIF) and the Accurate Predictive Capability Indicator (APCI), which together form a robust framework for evaluating and comparing prediction models.
- (3)
Comprehensive Analysis: by analyzing the consistency and comparing the prediction accuracy of influencing factors across different feature categories, our method offers deeper insights into the complex mechanisms of carbon emissions.
3. Dynamic Characterization of Correlation Features between CDE and Influencing Factors
The most prominent issue encountered in the panel data of the Bohai Rim region is the presence of missing values. To address this, the following data preprocessing measures were adopted in this study:
- (1)
Data Cleaning: Variables with excessive missing values and outliers were removed. To ensure the scientific validity of the research findings, interpolation methods were employed to correct missing data when the amount for a variable was less than 10%. Otherwise, the variable was excluded from the analysis, which inevitably led to certain differences in the datasets of various provinces.
- (2)
Data Standardization: To eliminate the influence of different variable dimensions, all variables were standardized. In this study, the Min–Max standardization method was primarily employed.
Additionally, the partitioning of the dataset into training and testing sets will be introduced in
Section 5.
By employing the Pearson correlation coefficient, as outlined in Equation (1), we define the correlation between various influencing factors and CDE within a sliding window, as illustrated in Equation (2).
In Equation (1),
and
represent the random variables, respectively.
denotes the Pearson correlation coefficient between these two random variables.
represents the covariance,
indicates the standard deviation,
signifies the expected value of a specified random variable, and
denotes the mathematical expectation function.
In Equation (2), represents the Pearson correlation coefficient of the random variable within the i-th sliding window. or denote the values of the random variable within the i-th sliding window, respectively. indicates the number of values taken by the random variable, and indicates the width of the window.
The process of the dynamic multi-factor correlation analysis is summarized as follows:
- (1)
Obtain the length of the time series data for both CDE and its influencing factors. Set the width of the sliding window and define the moving step size as . Calculate the number of sliding windows, denoted as . In this case, set .
- (2)
Let represent the CDE time series data. Identify and denote the number of influencing factors as . Initialize the variable to represent the sequence number of the influencing factors.
- (3)
Represent the time series of the k-th influencing factor with . Initialize the variable , which corresponds to a specific sliding window.
- (4)
Calculate the starting position, denoted as and the ending position, denoted as , for the i-th sliding window. Here, , and .
- (5)
Apply Equation (2) to calculate the correlation coefficient for the variables and within the interval .
- (6)
Increase the value of i by 1. If , go to step (4).
- (7)
At this step, the correlation curve between the k-th influencing factor and the CDE, which is represented by the time series , has been obtained.
- (8)
Increase the value of k by 1. If , proceed to step (3).
- (9)
Finish the process.
Using Tianjin as a case study, the correlation between various influencing factors and CDE was calculated based on Equation (1). The representative results of this calculation are presented in
Table 1.
Using Equation (2), the dynamic correlation curves for each influencing factor presented in
Table 1 were computed individually, as illustrated in
Figure 1. The window width employed in these calculations was set to 10 units, with a moving step of 1.
Through a comparative analysis of
Table 1 and
Figure 1, the following conclusions can be drawn:
- (1)
Variables that exhibit a high overall correlation in
Table 1 may demonstrate relatively low correlation within specific sliding windows illustrated in
Figure 1. These variables may exhibit various fluctuation features, such as initially high then low, initially low then high, low in the middle with high values on both sides, or high in the middle with low values on both sides.
- (2)
Conversely, variables exhibiting low overall correlation in
Table 1 may show relatively high correlation within certain sliding windows presented in
Figure 1, also displaying diverse features similar to those mentioned above.
Thus, by utilizing Equation (2), the sliding window descriptive method offers a more comprehensive understanding of the intricate correlation characteristics between CDE and its various influencing factors. Within this analytical framework, the correlation between independent and dependent variables is categorized as follows: a correlation coefficient exceeding 0.6 is classified as High (“H”), a coefficient ranging from 0 to 0.6 is considered Low (“L”), a coefficient falling between −0.6 and 0 is labeled as Shallow (“S”), and a coefficient below -0.6 is denoted as Deep (“D”).
Based on these classifications, the correlation features illustrated in
Figure 1 can be further represented using the format presented in
Table 2. This representation aligns with the “inverted U-shaped relationship between environmental pollution and per capita income” described in reference [
9]. In
Table 2, the correlation characteristic between “Tertiary Industry Value-added Share in GDP” and CDE in Tianjin is classified as DHD, indicating an inverted U-shaped feature that can be seen in
Figure 1m. When
Figure 1 and
Table 2 are considered together, it becomes evident that the relationship between CDE and its influencing factors also encompasses shapes such as “V” (
Figure 1i), “L” (
Figure 1s), and other more complex forms. These features are generally summarized as “complexity” and “diversity,” and they are manifested through the aforementioned “HLSD” combinations. This provides a reference for the dynamic description and in-depth analysis of correlation mining, modeling, and visualization.
In practical applications of the dynamic multi-factor correlation analysis method, the step size of the sliding window primarily influences the density of data points along the resulting curve. The width of the sliding window is pivotal in determining the accuracy of the data analysis, which in turn affects its robustness and reproducibility. Here, taking Tianjin as an example,
Table 3 presents the correlation features between CDE and different influencing factors across various sliding window widths. The considered sliding window widths include 6, 8, 9, 10, 11, 12, 14, and 16, respectively. Notably, a sliding window width of 10 was utilized for the analysis presented in this article.
As shown in
Table 3, when the sliding window width is narrow, the correlation analysis captures features with more frequent fluctuations. This is exemplified by the impact of “Crude Salt Production” on CDE. Specifically, when the sliding window width is set to 6, the correlation features are characterized as “SHDLSL”. Consequently, an excessively narrow sliding window width may result in an abundance of features, potentially diminishing the practical significance of the results. Conversely, when the sliding window width is broad, some important correlation features may be overlooked. For instance, considering the influencing factor “Urban Population”, when the sliding window width is 10, its impact on CDE is “HDS”. However, with a width of 16, the result changes to “HH”. It is evident that a wider window may lead to an incomplete capture of certain intermediate process features and distort the view of recent trends, particularly given the limited length of the panel data.
Therefore, when setting the parameters of the sliding window, it is recommended to consider two primary factors. The first is the length of the panel data’s time series. When the data length is limited, the sliding window step size should be set to 1 to maximize the use of available data. If the data length is substantial, this value can be appropriately increased to balance computational efficiency and feature capture. The second important factor is the policy-making cycle. In China, major plans are typically formulated every 5 years, accompanied by corresponding policy adjustments. Consequently, in this article, the sliding window width is set at 10, which encompasses two consecutive 5-year plans and the policy adjustments that occur within those periods. When applying this method to different datasets, this factor should also be taken into account to ensure that the sliding window parameters align with the relevant policy cycles.
4. Consistency Analysis of Influencing Factors According to Different Features
The influencing factors of CDE exhibit diverse correlation features across different provinces and cities. In this paper, four provinces and cities in the Bohai Rim region of China, including Tianjin, Hebei, Shandong, and Liaoning, have been selected, and the features of some influencing factors mentioned in previous references are compared, as illustrated in
Table 4.
As illustrated in
Table 4, even within the Bohai Rim region, which shares certain similarities in comprehensive structures, such as economy and energy, the correlation features of identical influencing factors can exhibit significant variations across different areas. This observation indicates that generalizing the impact of factors with identical nomenclature across diverse regions and time periods using a single correlation feature is not feasible. For instance, economic growth, represented by GDP, has an influence on CDE. At the end of the last century in the Bohai Rim region, this influence displayed a consistent H-type correlation feature. However, Shandong consistently maintained a high level of correlation, while Liaoning’s correlation gradually weakened. In contrast, Tianjin and Hebei’s correlations transitioned from positive to negative over time. Regarding industrial growth’s impact on CDE, Tianjin and Hebei exhibited similar correlation characteristics, influenced by regional integration and development initiatives within the Beijing–Tianjin–Hebei region. Both regions demonstrated a gradual transition from positive to negative correlations. In contrast, Shandong, distinguished by its prominent heavy industrial capabilities nationwide, and Liaoning, a pivotal center of heavy industry in Northeast China, consistently exerted a positive influence on CDE through industrial expansion, manifesting HH and HLH correlation features, respectively. The disparate correlation features of the identical influencing factors across various provinces and cities inherently affect the identification of primary CDE drivers, highlighting the importance of this aspect for accurate predictions.
In this study, after accounting for data completeness, the four aforementioned provinces and cities encompass nearly 400 common influencing factors, which are categorized into approximately 50 distinct correlation feature categories. To further quantify the consistency of these influencing factors with varying features among provinces and cities, the Consistency Index of Influencing Factors (CIIF), as illustrated in Equation (3), is defined.
In Equation (3), CIIFi_AB represents the CIIF value for the i-th correlation feature from province B relative to province A, where denotes the total number of correlation feature categories and i is the index of the correlation feature. The variables Ci_A and Ci_B represent the sets of influencing factors corresponding to the i-th correlation feature for provinces A and B, respectively, and card(.) is the counting function for set elements.
The CIIF calculation process involves the following steps:
- (1)
Identify the number (nC) of shared correlation features between two provinces, such as A and B, and initialize the variable i to 1, where i corresponds to a specific feature category;
- (2)
Calculate the number of influencing factors associated with correlation feature i for A and B, respectively, thus obtaining the values of card(Ci_A) and card(Ci_B);
- (3)
Assess the intersection of influencing factors within feature category i for both A and B. The count of these factors is the value of card(Ci_A∩Ci_B), then CIIF is calculated using Equation (3);
- (4)
Increase i by 1. If i ≤ nC, go to step (2); otherwise, finish the process.
Based on Equation (3) and the above steps, the CIIF was calculated separately for the commonly shared correlation features among the four provinces and cities. This analysis specifically targeted representative features such as DH, DHD, DHS, DL, DLS, DS, HD, HDH, HDL, HDS, HH, HL, HLH, HLHS, HS, HSH, HSL, LDH, LDL, LDS, LHD, LHDS, LHS, LSH, LSHS, SDSD, SHD, and SHSL. The results of these calculations are illustrated in
Figure 2.
In each graph of
Figure 2, the curve values represent the magnitude of the CIIF based on each correlation feature within a specific province or city. This indicator does not exhibit a clear numerical distribution. From the CIIF curves of the correlation features in each province and city, it is evident that the LSHS feature demonstrates higher values in Tianjin, Liaoning, and Hebei, indicating that the influencing factors associated with this feature are similar across these regions. Conversely, the CIIF values corresponding to the DHS and DL features are zero in all provinces and cities, suggesting that the influencing factors related to these correlation features are entirely distinct.
In
Figure 2a, the CIIF for features such as HD, HDH, and HDL from Hebei relative to Tianjin is non-zero, which consequently indicates that these features will also exhibit non-zero CIIF values for Tianjin in relation to Hebei in
Figure 2b. However, the corresponding values do not necessarily equal one another.
From this perspective, the CIIF reflects, to some extent, the transfer capability of various influencing factors on CDE, as manifested through correlation features. It is evident that the relationships between different influencing factors and CDE are complex and exhibit significant variations among provinces and cities, despite potential deviations in the size of effective datasets among them.
5. Comparison of CDE Prediction Accuracy Based on Influencing Factors with Different Features
In this study, we assume that the accuracy of neural network prediction depends on the correlation between the selected factors and the predicted value, which is highlighted in Jebli, I. et al. [
39].
Given the diversity of correlated features, predicting CDE inherently presents complex challenges, particularly concerning the identification of key influencing factors. In this section, we evaluate the accuracy of CDE predictions based on influencing factors with distinct features. Taking the aforementioned four provinces and cities as case studies, we conduct a comparative analysis of the prediction accuracy for CDE, employing correlation feature categories of influencing factors as the primary analytical unit. The comparisons include both single and multiple categories of influencing factors.
For the prediction testing, a Long Short-Term Memory (LSTM) neural network was utilized with the following baseline parameters: sequence length = 3, hidden size = 5, learning rate = 0.003, batch size = 4, and number of epochs = 300. Considering the variations in the correlation between influencing factors and CDE across different years, the years 2006, 2011, 2016, and 2021 were selected as test years. This means that the accuracy calculation for each set of influencing factors includes four separate training and testing cycles.
To assess the capacity of influencing factors belonging to distinct correlation feature categories and combinations of multiple categories in predicting CDE accurately, the Accurate Predictive Capability Indicator (APCI) is defined as presented in Equation (4).
In Equation (4), TPi represents the number of samples correctly predicted for the i-th category, while N denotes the total number of samples.
After the CDE prediction, the APCI can be calculated using statistical methods based on a specific prediction accuracy level setting.
In this study, predictions were primarily categorized into two scenarios: first, predictions based on influencing factors exhibiting a single category of correlation features; second, predictions derived from a combination of influencing factors across multiple categories of correlation features. In the former scenario, predictions were made by selecting either one factor or multiple factors (in this case, three) from a single category of correlation features. In the latter scenario, predictions were executed by selecting one factor from each of three entirely distinct categories of correlation features.
5.1. Prediction Based on Influencing Factors with One Correlation Feature Category
Table 5 presents the results of the prediction deviations for CDE using a single influencing factor. For comparison, only categories common to all provinces and cities are included. The maximum prediction deviation and the corresponding year for each category among the aforementioned four provinces and cities are listed. The last column of the table provides the overall deviation for each province and city, representing the maximum value observed within them.
Table 6 presents the APCI values corresponding to the calculation results displayed in
Table 5, including both the individual and comprehensive results for the four provinces and cities. The APCI values offer a quantitative assessment of the predictive capability of each influencing factor and their combinations, facilitating a comparison of their effectiveness in accurately predicting CDE.
The prediction deviations for CDE based on a single influencing factor, as illustrated in
Table 5, are generally substantial. Factors belonging to different categories exhibit varying degrees of accuracy, and even factors within the same category demonstrate significant performance disparities across different provinces and cities. Specifically, the smallest deviations for Tianjin and Liaoning are observed in the HDS correlation feature category, with values of 0.217 and 0.037, respectively. For Hebei and Shandong, the minimum deviations are found in the HD and LDS categories, with values of 0.195 and 0.166, respectively. Considering the overall maximum deviation across the four provinces and cities, the HD correlation feature category exhibits the smallest deviation at 0.243.
Table 6 presents the APCI values for each province and city, corresponding to varying prediction accuracies. Specifically, at a prediction accuracy threshold of 80%, Tianjin demonstrates an APCI of 0. When the threshold is increased to 85%, both Hebei and Shandong also exhibit an APCI of 0. In contrast, at a prediction accuracy level of 90%, Liaoning achieves an APCI of 2.22%.
It is evident that relying on a single influencing factor is insufficient for achieving universally high-accuracy predictions of CDE. This limitation primarily arises from the complex interplay of multiple factors affecting CDE, rendering it impossible for any single factor to fully capture their intricate and dynamic variation characteristics. Furthermore, prediction models that depend solely on individual influencing factors tend to exhibit heightened sensitivity to hyperparameters, necessitating stringent conditions to achieve satisfactory prediction accuracy. Consequently, this paper will not further explore the optimization of prediction models.
Table 7 presents the CDE prediction results based on three influencing factors that belong to the same category. These factors have been selected to investigate the potential of combining multiple influencing factors from a single category to enhance the prediction accuracy. The table outlines the prediction results for various combinations of these factors along with their corresponding deviation values. Additionally,
Table 8 displays the APCI values associated with the calculation results presented in
Table 7.
The analysis of
Table 7 indicates significant prediction deviation across different provinces and cities, exhibiting similar variations to those observed in
Table 5. For Tianjin and Hebei, the minimum deviations are noted in the HSH and HS categories, with values of 0.105 and 0.093, respectively. For Shandong and Liaoning, the lowest deviations are recorded in the DHS and DH categories, with values of 0.162 and 0.075, respectively. When assessing the overall maximum deviation across the four provinces and cities, the HD correlation feature category consistently demonstrates the smallest deviation at 0.244.
The comparison between
Table 7 and
Table 5 indicates that, despite a reduction in the minimum prediction deviations for each province and city, the correlation feature categories associated with these deviations have changed. Nevertheless, the overall maximum deviation and its corresponding correlation feature category remain unchanged.
As presented in
Table 8, when the prediction accuracy threshold is established at 80%, Shandong Province exhibits an APCI of 16.67%, indicating a significant increase. For Tianjin, the APCI rises to 3.03% with a prediction accuracy of 85%. When the prediction accuracy criterion is set at 90%, Hebei and Liaoning provinces demonstrate respective APCI enhancements of 3.7% and 2.94%. Furthermore, the overall APCI across the four provinces and cities also shows improvement.
The comparison of results between
Table 7 and
Table 5, as well as
Table 8 and
Table 6, indicates that utilizing three influencing factors from a single correlation feature category for CDE prediction leads to a reduction in prediction deviations across the four provinces and cities. This suggests that by concentrating on one category of correlation features and incorporating multiple influencing factors, there is potential to enhance the prediction accuracy to a certain degree. However, it also underscores the limitation that a combination of factors within the same correlation feature category cannot fully capture the diversity of CDE variations.
5.2. Combined Prediction Based on Multiple Correlation Feature Categories
Table 9 presents the statistical deviations associated with CDE predictions, derived from a model that incorporates three influencing factors, each originating from distinct categories of correlation features. Given the large number of potential combinations arising from these diverse categories, only the top ten combinations exhibiting the smallest deviations across the provinces and cities are displayed, ensuring consistent representation of these combinations across all provinces and cities. Subsequently,
Table 10 illustrates the APCI values corresponding to the computational results presented in
Table 9.
The data presented in
Table 9 demonstrate that, by incorporating combinations of influencing factors from diverse correlation feature categories, the accuracy of CDE predictions has been significantly enhanced across various provinces and cities, compared to the results shown in
Table 7 and
Table 5. Notably, certain feature combinations achieve high prediction accuracy specifically for Tianjin; however, the overall maximum deviation may still be considerable. This observation is also applicable to combinations that demonstrate effectiveness in other provinces and cities, reinforcing the notion that key influencing factors for CDE predictions differ among various administrative regions. Therefore, a comprehensive analysis tailored to the specific correlation features of influencing factors within each province and city is essential.
Table 10 indicates that at a 90% prediction accuracy threshold, all four provinces and cities exhibit non-zero APCI values, suggesting unique combinations of correlation feature categories that can achieve notably high accuracy in CDE predictions. The relatively modest APCI values reported, both individually and collectively, can be attributed to the extensive range of potential category combinations inherent within each provincial or city context.
In summary, the integration of multiple influencing factors from several distinct correlation feature categories significantly enhances the accuracy of CDE predictions. This suggests that combining various factor categories provides a more comprehensive understanding of the multifaceted drivers behind CDE. Consequently, this approach mitigates the model’s sensitivity to hyperparameters, resulting in improved prediction accuracy. However, it is essential to note that not all combinations achieve optimal prediction accuracy; therefore, the analysis of effective combinations must be specifically tailored to different provinces and cities, along with varying categories of influencing factors.
6. Discussion
This study employed a comprehensive dynamic multi-factor correlation analysis methodology to investigate CDE in China’s Bohai Rim region. We computed and analyzed the dynamic correlation curves between CDE and various influencing factors, thereby elucidating their intricate interrelationships and dynamic characteristics. To quantify these features, we implemented the CIIF and the APCI indices. Among these indicators, the CIIF serves as a valuable tool for policymakers to assess the applicability of successful emission reduction strategies across various regions. Nevertheless, despite its advantages, the proposed method does present certain limitations. A constructive discussion is provided to address these shortcomings and improve future research methodologies.
(1) Policy Implications and Regional Variations: The determinants influencing the variations in relevant characteristics among provinces and cities are intricate and multifaceted, encompassing factors that may have been elaborated upon in the literature review or remain unaddressed in this paper. Despite the geographical proximity of Tianjin, Hebei, Shandong, and Liaoning within the Bohai Rim region, the correlation features identified in this analysis exhibit significant disparities, as illustrated in
Table 4 and
Figure 2. In
Table 4, with the exception of “Government Public Expenditure” and “Foreign Direct Investment”, the dynamic correlations between the other factors and CDE initiate with a high “H” across all provinces. This phenomenon is partially attributable to early regional influences; however, as reforms deepen and economic globalization progresses, diversifications in correlation features subsequently become inevitable. Taking the factor “Economic Growth (GDP)” as an example, the reasons for the differences in its correlation with CDE among different provinces are as follows: the negative correlation observed in Hebei and Tianjin is mainly attributed to the optimization and improvement of industrial structures, policy guidance, and controls on CDE. Additionally, coordinated governance in the Beijing–Tianjin–Hebei area has also played a significant role. In Shandong, the strong correlation is primarily due to its heavy industrial structure, limited variety in its energy consumption mix, and expanding economic scale. The weak correlation in Liaoning Province is mainly due to its heavy industrial structure, reliance on coal-based energy, rapid economic development, and unresolved issues related to pollution emissions. It is therefore recommended that future research efforts undertake a more comprehensive survey encompassing a greater number of provinces and cities nationwide. In doing so, more suitable indicators should be established, referencing the CIIF, to accurately reflect both the synergies and heterogeneities among CDE drivers. Furthermore, the use of clustering methodologies should be considered a viable strategy, and the development of prediction models tailored to specific regions and industrial sectors could significantly enhance their practical applicability.
(2) Multi-Factor Consideration in Policy Formulation: Our findings indicate that prediction models based on multiple influencing factors with diverse correlation features outperform those based on a single factor. This highlights the complexity of CDE dynamics and suggests that policymakers should consider the combined effects of multiple factors when devising emission reduction strategies. The CIIF presented in this study provides a quantitative metric to identify factors that exhibit consistent influence across regions, facilitating the formulation of targeted policies. When devising emission reduction policies, governments and policymakers can leverage the CIIF to prioritize coordinated actions on consistently influential driving factors, thus improving the efficacy of policy execution. Furthermore, insights from the CIIF regarding regional variations in driving factors can optimize resource allocation, enabling a more precise and effective distribution of resources for emission reduction measures. Specifically,
Table 4 presents the influencing factors and their correlation characteristics derived from panel data. Tianjin and Hebei both exhibit a downward trend in these characteristics, indicating a need to strengthen the implementation of existing CDE-related policies. Conversely, Shandong demonstrates an upward trend, suggesting that enhancement of corresponding CDE control measures is necessary. In Liaoning, industrial growth, urbanization rate, and investment in pollution control are strongly correlated with CDE, thereby indicating the need for targeted CDE management strategies that account for the varying stages of industrialization and urbanization.
(3) Implications for Neural Network-Based Prediction Models: This paper establishes the APCI index by evaluating the prediction accuracy of various influencing factors and their combinations through an LSTM neural network. The selection of the LSTM architecture is due to its remarkable memory capabilities, proficiency in managing long-term dependencies within sequential data, and adaptability to dynamic changes in influencing factors. Simultaneously, the limitations of LSTM neural networks have also been recognized: firstly, the model’s complexity, characterized by a relatively intricate structure containing a substantial number of parameters, leading to a time-consuming training process; secondly, the risk of overfitting, which occurs when the training data volume is insufficient; and thirdly, parameter sensitivity, as the model’s performance is notably sensitive to the choice of hyperparameters. To mitigate these limitations, corresponding measures were adopted during the research. For instance, cross-validation was employed to prevent overfitting, and grid search was utilized to optimize the hyperparameters. To further enhance the prediction performance, future research could integrate this model with other neural network architectures, such as Gated Recurrent Units (GRUs), Convolutional Neural Networks (CNNs), Feedforward Neural Networks (FFNNs), and Transformers, thereby improving both the prediction accuracy and stability. Although different neural network models can significantly influence CDE predictions and the APCI, whether these impacts lead to transformative conclusions depends on several factors, such as model characteristics, task complexity, data quality and attributes, as well as the degree of parameter tuning and optimization. In practical applications, it is advisable to empirically compare the performance of diverse models through cross-validation while developing tailored APCIs for each specific model.
(4) Insights into Factor Interplay and Prediction Reliability: This study elucidates the intricate interplay of factors influencing CDE and their associated processes. It provides valuable insights into the accuracy and reliability of CDE predictions across various application scenarios, which have profound implications for environmental protection, policy formulation, and resource management. Given the multidimensional nature of these influencing factors, the correlation features between them and CDE are classified in practical applications. Next, an APCI index is constructed to analyze the predictive ability of different categories of influencing factors on CDE. Moreover, given the complexity and multidimensionality of the CDE impact mechanism, selecting an appropriate combination of influencing factors is crucial. Ultimately, by acknowledging regional variations in key influencing factors, strategies for mutual learning and transfer learning across prediction models in different provinces and cities can be formulated using the CIIF, enhancing both the learning efficiency and prediction accuracy.
(5) Temporal Trend Analysis and Future Enhancements: The current dynamic correlation analysis model established between influencing factors and CDE primarily reflects instances in which variable values traverse multiple threshold regions; nevertheless, it does not incorporate rigorous temporal trend information. Future research can implement several enhancements. Firstly, additional dynamic feature indicators should be integrated alongside time series analysis techniques to effectively capture long-term trends, short-term fluctuations, and random variations within the data sequences. This approach will provide a more comprehensive understanding of the temporal dynamics inherent in the data. Secondly, by refining the data resolution over time, for example, by changing the data collection frequency from annual to quarterly or monthly intervals, this will enable a more detailed observation of how correlations evolve over time, thereby allowing for a more precise analysis of temporal variations.
Collectively, these improvements will further enhance both the precision of the dynamic correlation analysis and the overall predictive capabilities of the model, particularly in capturing long-term trends and short-term fluctuations in the data.
7. Conclusions
Based on panel data from 1999 to 2021, this study conducted an in-depth analysis of the dynamic correlations between provincial CDE and various influencing factors in the Bohai Rim region of China, including Tianjin, Hebei, Shandong, and Liaoning provinces and cities. The main findings and their implications are summarized as follows:
(1) Limitations of Single Influencing Factor: The study reveals that CDE prediction models relying solely on individual influencing factors often fail to achieve high accuracy. This underscores the complexity of CDE dynamics, where multiple factors interact to drive CDE. Therefore, single-factor models cannot fully capture the complex changes in CDE.
(2) Advantages and Limitations of Combining Similar Factors: While combining multiple factors with similar correlation features can slightly improve the prediction accuracy, this approach remains limited in fully capturing the breadth of CDE variations. In contrast, integrating factors with diverse correlation features offers a more comprehensive solution. This highlights the need for a more diverse set of influencing factors to improve the predictive performance.
(3) Benefits of Integrating Multiple Feature Categories: Integrating multiple types of influencing factors with different correlation features significantly enhances the accuracy of CDE predictions. This approach more comprehensively captures the multifaceted driving mechanisms behind CDE, thus improving the robustness and reliability of the predictions.
Implications for Future Research and Policy:
(1) Methodological Innovation: The dynamic multi-factor correlation analysis method introduced in this study provides a novel perspective and tool for advancing future research on CDE prediction. It enhances the understanding of the intricate mechanisms underlying CDE dynamics and offers a pathway for refining predictive models.
(2) Data Integration and Advanced Techniques: As data accessibility and technological advancements continue to evolve, future research should aim to integrate a broader spectrum of dynamic feature indicators and sophisticated time series analysis methodologies. This will further refine the accuracy and reliability of CDE predictions, enabling more informed decision making.
(3) Policy Implications: The research findings emphasize the importance of considering the cumulative impacts of multiple factors when developing emission reduction strategies. Policymakers should utilize the Consistency Index of Influencing Factors (CIIF) as a quantitative tool to pinpoint factors that consistently impact CDE across regions, thereby enabling the development of targeted and coordinated emission reduction strategies.
In summary, this study not only uncovers the complexity of factors influencing CDE but also provides robust support for optimizing CDE prediction models, scientifically formulating emission reduction policies, refining energy structures, adjusting industrial frameworks, and nurturing technological innovations. The insights gained from this research hold significant potential for guiding future environmental policy and sustainable development initiatives.