1. Introduction
With the increasing global emphasis on sustainable development, corporate green innovation has emerged as a crucial driver for economic growth and environmental protection. However, accurately assessing the efficiency of corporate green innovation faces numerous challenges, among which the issue of data asymmetry is particularly prominent. The diversity and difficulty in quantifying the input–output indicators of green innovation, the nonlinearity of the innovation process, as well as the complexity of firm heterogeneity and external environments have all led to difficulties in data collection, processing, and analysis for measuring green innovation efficiency.
The asymmetry of corporate data stems from various factors such as different business areas, market competition, economic cycle fluctuations, differences in management levels, and external environments, resulting in unbalanced performance across various indicators. This complexity reflects the diversity and challenges of the business environment, necessitating a comprehensive consideration to better understand and address the asymmetry of corporate data, thereby supporting effective decision making and management. In dealing with asymmetric data, the DEA (Data Envelopment Analysis) model exhibits significant advantages, making it a powerful tool for evaluating efficiency. DEA model is applicable to situations with multiple inputs and outputs. As a non-parametric method, it does not require assumptions about probability distribution of data or the form of production functions, thus being more suitable for handling various types of data, including asymmetric data. DEA assesses the performance level of units by comparing their relative efficiency rather than relying on a specific mathematical model. This allows DEA to effectively handle asymmetric data and provide judgments of relative efficiency, making it more flexible and widely applicable in practical settings.
DEA is a non-parametric efficiency evaluation method used to assess the efficiency of decision-making units. DEA was first proposed by Charnes, Cooper, and Rhodes in 1978 [
1], leading to the CCR model. Nowadays, this model has been used in numerous applications [
2,
3,
4]. Subsequently, Banker, Charnes, and Cooper (1984) further expanded on the DEA method, introducing the BCC model [
5], which addressed the issue of constant returns to scale present in CCR model. However, Fried et al. pointed out that [
6] enterprises’ inefficiency is not only impacted by internal mismanagement but also by external environments and random errors, thus proposing a three-stage DEA model. Nevertheless, traditional three-stage DEA models also have shortcomings. The efficiency measurement in the first stage of traditional three-stage DEA model operates under the premise of equal contraction proportions for every input [
7]. In reality, however, different inputs exhibit different elasticities and do not decrease proportionally, ignoring the slackness in resource utilization. This can lead to biases in evaluation results and the failure to fully reflect on decision-making units’ efficiency level. A model based on slack variables, SBM model introduced by Tone [
8], can effectively address this deficiency.
According to the above analysis, this article embeds SBM model into three-stage DEA model, adopting a non-parametric and non-oriented SBM model in the first and third stages of the three-stage DEA model. By considering slackness in resource utilization, it evaluates the efficiency level of DMU more comprehensively and provides more accurate evaluation results. In practical applications, many regression problems exhibit nonlinear relationships, and traditional linear regression models often have difficulties effectively capturing the complex patterns in data. As a powerful ensemble learning method, random forests demonstrate significant advantages in handling nonlinear regression problems due to their non-parametric, highly flexible, and robust nature. This article will delve into the application of random forests in studying the factors impacting green innovation efficiency, aiming to provide decision makers with more detailed and comprehensive information that can aid in taking appropriate actions.
This paper’s remaining sections are arranged as follows: A summary of previous research on green innovation efficiency is given in
Section 2. The formulas for three-stage DEA model and random forest model utilized in this article are presented in
Section 3. In
Section 4, green innovation efficiency measured by three-stage DEA model is examined and elements that influence green innovation are discussed. The study findings are outlined in
Section 5, along with the paper’s limitations and future directions.
2. Literature Review
The rapid evolution of the modern economy has brought environmental concerns to the forefront, prompting a heightened societal focus on ecological issues. Green innovation has emerged as a critical research area attracting growing scholarly attention. Green innovation, rooted in the idea of “sustainable development,” was initially used in the 1980 “World Conservation Strategy Report.” [
9]. Subsequently, literature related to sustainable innovation [
10,
11], eco-innovation [
12,
13], green innovation [
14,
15,
16,
17], and environmental innovation [
18] has gradually increased. Scholars hold different views on the understanding of green innovation. Chen et al. [
19] defined “green innovation” as advancements in hardware or software that contribute to eco-friendly products or processes. These innovations encompass areas such as sustainable product design, and environmentally responsible corporate management practices. Wu et al. [
20] believed that green innovation is a product of the combination of innovation theory and ecological views, which aims to maximize economic benefits while obtaining new knowledge and technologies to reduce environmental pollution. Rennings [
21] argued that green innovation has a “double externality,” with spillover effects both in the production stage and in the diffusion stage, resulting in a certain degree of reduction in internal costs and external environmental costs. Bernauer et al. [
22] discussed the concept of green innovation as being within the same category as environmental innovation and eco-innovation. Zhang [
23] and Schiederig [
14], among other scholars, conducted detailed literature reviews and comparative analyses of definitions, revealing that green innovation, eco-innovation, environmental innovation, and sustainable innovation share a high degree of consistency in their core concerns and goals. Disregarding the subtle differences in their definitions, they are often interchangeably used or even equated in many literature sources. Currently, there are three major interpretations of green innovation definition in academia: equating green innovation with innovations that contribute positively to the ecological environment, equating green innovation with innovations that introduce environmental performance, and equating green innovation with environmental innovation or the optimization and innovation of environmental performance [
23].
When evaluating green innovation efficiency and its influencing factors, numerous scholars have adopted diverse strategies. Most input–output indicators are constructed using Stochastic Frontier Analysis (SFA) [
24,
25], DEA [
26,
27,
28,
29], and related methods. Some scholars have also comprehensively assessed green innovation efficiency through spatial econometrics [
30] and evaluated it using the entropy method [
31]. Xiao et al. [
32] used an improved SFA model to conduct a thorough assessment of green innovation efficiency in Yangtze River Economic Belt. However, SFA model application requires the presetting of a production function, which, to a certain extent, increases the subjectivity of the evaluation. In contrast, DEA model operates without requiring assumptions regarding the production function form and can make evaluation results more objective and accurate. Thus, DEA has become the mainstream method for scholars to study green innovation efficiency. The following
Table 1 presents relevant studies that use the DEA model to measure green innovation efficiency.
Regarding the evaluation index system, the existing literature primarily constructs such a system from the following two aspects: input and output. This encompasses the following three dimensions: green innovation efficiency input variable, desirable output, and undesirable output, as illustrated in
Table 2. Tian et al. [
39] divided the input–output indicators into the following two stages: scientific and technological research and development (R&D) and achievement transformation. For R&D stage, the input–output indicators include the number of R&D personnel, whereas for achievement transformation stage, indicators encompass technology introduction and transformation expenditure, sales revenue of new products, etc. Zhang et al. [
40] categorized inputs into human, material, and financial resources, while the selected innovation output indicators are broadly divided into the following two types: scientific and technological outcomes and economic benefits. Ma and Zhu [
41] distinguished innovation inputs into R&D investment and production investment. R&D investment is represented by R&D funding and personnel, while production investment is expressed by employee compensation. For output indicators, they selected the number of patent applications and intangible assets.
When examining factors impacting green innovation efficiency, scholars have primarily focused on two levels: macro-environment and micro-level factors. The macro-environment encompasses the institutional landscape [
50,
51,
52], market industry [
34,
53], and related international trade relations [
54]. At the micro-level, internal factors related to enterprises mainly include the level of awareness of enterprise personnel [
55], enterprise costs [
56,
57], and social responsibility [
58,
59]. It is observable that empirical research on green innovation efficiency differs depending on research questions. Hong et al. [
60] analyzed the influencing factors of innovation efficiency in China’s pharmaceutical manufacturing industry and found that two external macro-factors, namely, market competition intensity and government policy support, as well as the internal micro-factor of the enterprise size, are essential for achieving higher levels of innovation efficiency. Wenbo [
61] studied the impact of production factors, economic benefits, internal management, and the social environment on green innovation. Kang et al. [
62] examined whether and how environmental regulations drive green innovation, aiming to explore the influencing mechanism of green innovation efficiency. Yalabik [
63] found that factors such as market competition, consumption, and environmental protection pressure can significantly affect firms’ green technology innovation efficiency. Gong et al. [
64] provided a detailed analysis of how factors such as the agglomeration effect of outward foreign direct investment influence industrial green innovation efficiency. Kuang et al. [
65] tested the influencing mechanism of green innovation efficiency from the perspective of the shadow economy, exploring potential pathways to enhance green innovation efficiency.
Regarding research methods for influencing factors, the random forest model, as an integrated learning method, exhibits good robustness and generalization capabilities, and is suitable for various types of datasets and problems. In 1995, Ho [
66] first proposed the concept of random decision forests. He suggested creating a classifier based on decision trees that contained an infinite number of decision trees, which were combined in a complementary or weighted manner to construct a new classifier, namely, the random decision forest. Random decision forests address the issue of overfitting that can occur with single decision trees. In 2001, Breiman [
67] integrated bagging algorithms, random subspace algorithms, and classification and regression trees to propose the traditional random forest. Subsequently, the traditional random forest has been widely applied in numerous fields such as ecology [
68,
69,
70], medicine [
71,
72,
73], management [
74,
75], and economics [
76,
77], and has achieved good results in solving routine classification or regression problems. Xu et al. [
78] applied the random forest to observe data from gastric cancer patients to predict their postoperative survival status and assist doctors in assessing treatment decisions. Xie et al. [
75] integrated sampling techniques and cost penalties into the random forest and used bank customer data as an example to predict customer churn. Susana et al. [
79] applied the random forest method to unbalanced samples to enable public institutions to direct public investment subsidies to identified groups of enterprises based on this identification.
The traditional DEA model has limitations in efficiency evaluation, which does not consider the impact of environmental variables and random factors on the green innovation efficiency, resulting in bias in efficiency evaluation results. The measurement of green innovation efficiency mainly stays at the macro-level, such as the province and industry, and there are few studies on the enterprise level. As an important force to promote green low-carbon transformation and achieve sustainable development, research on measuring green innovation efficiency in new-energy companies using DEA model remains limited. The research on the influencing factors of green innovation efficiency is mainly based on linear regression models, which cannot effectively analyze nonlinear relationships. There is a gap in the research on the nonlinear influence relationship, and it is difficult to accurately evaluate the factors affecting the green innovation efficiency.
Against this backdrop, this paper establishes a research framework that combines a three-stage DEA model with an SBM model, excluding environmental and random factors, to provide a accurate measure of green innovation efficiency. Considering the advantages of random forest model in exploring influencing factors, this paper selects the random forest model to analyze the influencing factors of green innovation efficiency. The main contributions of this paper are as follows:
Firstly, by embedding SBM model into the three-stage DEA, this paper comprehensively evaluates the efficiency level of DMUs by considering the slackness of resource utilization, providing more accurate evaluation results. By combining the parametric SFA model with the non-parametric DEA model, this paper fully utilizes their respective advantages to better handle asymmetric data, thereby more comprehensively assessing the efficiency level of units and proposing improvement suggestions.
Secondly, unlike other linear regression methods, the random forest model adopted in this paper can not only provide rankings of influencing factors but can also visually demonstrate the nonlinear characteristics of influencing factors on green innovation efficiency by plotting partial dependence plots. This facilitates a deeper understanding of how various factors influence green innovation efficiency.
5. Conclusions and Discussion
5.1. Conclusions
The three-stage DEA model reveals that, after the second stage of SFA adjustments, TE, PTE, and SE all demonstrate some improvement. However, significant potential for further enhancement remains, highlighting the impact of external environmental constraints on new-energy companies’ green innovation efficiency. Despite adjustments, SE consistently surpasses PTE. The number of enterprises in the state of low-tech high-scale is the largest, accounting for the largest proportion. Therefore, improving green innovation efficiency requires a focus on increasing pure technical efficiency through advancements in technology and management practices.
The factors affecting green innovation efficiency of new-energy companies are studied based on random forest model. Meanwhile, so as to further improve the interpretability of random forest model, important influencing factors are analyzed based on partial dependence plots. The study found that, among these influencing factors, the ownership concentration, R&D personnel structure, and operational capacity hold the top three positions in terms of importance, exerting important influence on new-energy companies’ green innovation efficiency.
In order to improve the green innovation efficiency, it is necessary to work together on the following:
At the enterprise level, strengthen technological innovation capacity building, increase investment in research and development, strengthen key core technologies, and develop more efficient, clean, and low-carbon new-energy technologies and products. Optimize the energy management system, actively promote clean production, and reduce pollutant emissions. Strengthen the construction of the talent team, introduce and cultivate green innovation talents, and enhance the talent support ability of green innovation in enterprises.
At the government level, improve the policy support system and increase the policy support for the green innovation of new-energy enterprises. Strengthen industry supervision, establish a sound green innovation standard system, strengthen the supervision and management of green innovation activities, and guide the green and healthy development of enterprises. Foster a favorable environment for innovation, strengthen intellectual property protection, and create a market environment for fair competition.
5.2. Discussion
In the field of new energy, the green innovation efficiency serves as a pivotal indicator for measuring sustainable development ability and competitiveness of enterprises. With the enhancement of global environmental awareness and the transformation of energy structures, the new-energy industry is facing unprecedented opportunities and challenges, and improving green innovation efficiency is crucial to promoting the high-quality development of the new-energy industry. Green innovation efficiency is the key for new-energy enterprises to achieve win–win economic and environmental benefits. China’s new-energy industry is developing rapidly, but it also faces challenges such as tight resource and environmental constraints and the need for the breakthrough of core technologies. Improving the efficiency of green innovation can promote the development of the new-energy industry into the high-end, intelligent, and green direction, thereby getting rid of the dependence on traditional resources and achieving sustainable development.
The managerial implications of this study for new-energy companies lies in the following. (1) Pointing out the improvement direction and improving the performance of green innovation: The research results can help management to find the shortcomings of enterprises in green innovation, such as a low resource allocation efficiency and poor control of undesirable output, and take targeted improvement measures to improve green innovation performance of enterprises. (2) The research results can help management to deeply understand the factors affecting the green innovation efficiency, identify the advantages and disadvantages of enterprises, and provide a scientific basis for formulating green transformation and upgrading strategies, thereby optimizing resource allocation and enhancing enterprises competitiveness. (3) Promote the change in management concepts and strengthening the awareness of green development: This study emphasizes the importance of green innovation and sustainable development, and encourages enterprises to fully integrate green development principles into management practices.
This research also needs to be further explored from the following aspects: (1) Due to the limited years in which green data, such as the comprehensive energy consumption and greenhouse gas emissions of enterprises, can be obtained, this study only selects 2022 as the research period. It is suggested that the research time scope should be further expanded in future studies to explore the dynamic evolution trend of the green innovation efficiency of enterprises. (2) The selected index system is not complete enough, which affects the depth and breadth of the conclusion. In future studies, qualitative indicators can be added on the basis of quantitative indicators, and the two can be combined for the research so as to further improve the index system.