An Exploration of Prediction Performance Based on Projection Pursuit Regression in Conjunction with Data Envelopment Analysis: A Comparison with Artificial Neural Networks and Support Vector Regression
Abstract
:1. Introduction
- (1)
- To present a new DEA-PPR combined model for performance measurement and prediction, thus bridging the research gap through methodological advancement;
- (2)
- To provide empirical support for the proposed model using two datasets through streamlining sequential processes of DEA measurement and DEA-PPR prediction;
- (3)
- To provide a way of thinking to improve managerial efficiency and enhance administrative flexibility in selecting actionable options from theoretical and practically feasible alternatives and potential progress monitoring via the DEA-PPR combined model;
- (4)
- To discuss the advantages and disadvantages of machine learning, such as BPNN, SVR, PPR, RF, etc.;
- (5)
- To put forward the basic principles and some matters needing attention for building machine learning, such as BPNN, PPR, SVR, etc.
2. Literature Review
2.1. DEA and Its Combined Models
- (1)
- The model’s generalization ability and prediction accuracy are difficult to guarantee without a validation dataset. To establish MLs such as BPNN, except for [8], no other authors divided the samples into training, validation, and test datasets with similar properties. No study discusses how, according to the error changes of the validation dataset, to use the early-stopping or regularization methods to prevent over-training, and use the trial-and-error approach to determine a reasonable number of hidden-layer neurons to ensure the generalization ability and practical value [6,39,40,41,42,43]. Although, most studies clearly state (and some studies do not) to randomly divide the samples into training and test datasets (in some studies, the validation dataset is a test dataset) with similar properties according to the ratios of 8:2, 7:3, or 3:1 (or according to some algorithm). There are also studies stating that over-training should be avoided. Still, since no validation dataset is used to monitor the training process, it is impossible to judge whether over-training has occurred. When the number of training samples is less than three times the number of connection weights due to the extensive network topology, over-training is easy to occur during training. In the case of over-training, even if the error of the training dataset is minimal, and even if the RMSE of the test dataset is casually small, the established model has no generalization ability and practical value. Scholars should pay more attention to this problem.
- (2)
- It is difficult to establish reliable and effective DEA-ML combined models for DEA modeling problems, usually with only small and medium-sized samples or frontier modeling with only a few DEA-valid samples. Most DEA efficiency modeling uses small or medium-sized samples. For modeling DEA-efficient frontier functions, small samples are usually used. To establish the DEA-BPNN combined model, most of the literature does not meet the basic requirement—the number of training samples must be more than the number of connection weights. Ref. [42] put forward a rule of thumb: you should aim to have at least five times as many cases (training samples) as connection weights in the network, and preferably ten times as many for establishing a reliable and effective BPNN model. According to the rule of thumb, you can determine the reasonable number of neurons in the hidden layer through trial-and-error. To establish DEA combined with SVR, RF, etc., you should avoid over-training and overfitting to carefully determine the model’s parameters with small and medium-sized samples.
- (3)
- It is not easy to judge which ML model is better than others. Among the ML models currently used for DEA combined modeling, some studies believe that BPNN performs better [10,12,30], while others believe that SVR and other models are better [13]. Which model has better performance is also a question worthy of study.
2.2. Projection Pursuit Regression (PPR) Model
- (1)
- We are the first to establish the DEA-PPR combined model to effectively and reliably solve the problem of input–output efficiency of small and medium-sized samples that do not obey the normal distribution, to overcome the disadvantages of MLs, such as BPNN, SVR, etc., that can only be applied to large samples.
- (2)
- We proposed the modeling principles and steps for establishing a BPNN model with good generalization ability. Through empirical research, the reliable and effective DEA-BPNN, DEA-SVR, and DEA-PPR combined models were found to verify the effectiveness of each other. The DEA-PPR model has a relatively better generalization ability and prediction accuracy among them.
- (3)
- We established the DEA-PPR combined model to simulate the production function by setting the efficiency score to one, adopting the optimization technology to obtain the frontier function directly, and realizing the unification of the production and frontier surface functions.
- (4)
- We established the DEA-PPR combined model of the frontier function of the DEA by generating virtual samples according to the valid DEA samples. According to the input-oriented DEA-PPR combined model, the optimal input quantity can be obtained; on the contrary, if the output-oriented DEA-PPR combined model is used, the optimal output quantity can be obtained, providing a decision-making basis and technical paths for DMUs to organize production, strengthen management, improve efficiency, and reduce costs.
3. Methodology and Principle
3.1. DEA-CCR Model
3.2. Machine Learning
3.2.1. BPNN Model
3.2.2. PPR Model
4. Empirical Researches
4.1. Empirical Illustration Using Hospital Data
- (1)
- To analyze the relationship between the efficiency and the DEA input–output indicators, judge the importance of the inputs and outputs, and predict the efficiency of new hospital data;
- (2)
- To study how to improve the hospitals’ managerial efficiency, and to provide the lower bounds for the inputs (the numbers of doctors and nurses) for each inefficient DMU to produce or service its current level of outputs (the numbers of inpatients and outpatients).
4.1.1. To Establish the DEA-PPR Combined Model of the DEA Production Function
4.1.2. To Establish the DEA-PPR Combined Model of the DEA-Efficient Frontier Function
4.1.3. To Determine the Optimal Number of Doctors and Nurses
4.1.4. To Compare the Performance of the Different Models
4.1.5. To Analyze the Robustness and Reliability of the DEA-PPR Combined Models
4.1.6. To Establish the DEA-SVR Combined Model of the DEA-Efficient Frontier Function
4.1.7. To Establish the DEA-BPNN Combined Model of the DEA-Efficient Frontier Function
- (1)
- We randomly divide the samples into training, validation, and test subsets with similar properties according to the ratio of 2:1:1. (The number of the validation and test subsets should account for at least 15%, respectively.);
- (2)
- We use the trial-and-error method and make the BPNN topology as compact as possible (usually one hidden layer and the number of neurons in the hidden layer is as small as possible). The ratio of the number of training subsets to the number of connection weights must be greater than one and should be more than five, preferably ten, according to the rule of thumb;
- (3)
- We use the training subset to adjust the connection weights to reduce the sum of squares error (SSE) of the training subset and the validation subset to monitor the training process. Along with the training process, the SSE of the training subset gradually decreases, and the SSE of the validation subset first falls to a specific minimal value, and then begins to rise again, which is a sure sign that over-training is occurring. We stop training (called the early-stopping method). To take the network weights before the SSE begins to rise, we establish the BPNN model;
- (4)
- We use the test subset to measure the prediction ability and performance of the BPNN model. If the SSE of the test subset is reasonably close to or slightly larger (generally less than 1.3 times) than the SSE of the training and validation subsets, the established BPNN model has a good generalization ability, reliability, robustness, and prediction ability, as well as practical value. Otherwise, we should restart the process from (3) until the BPNN model has good generalization and prediction abilities.
4.1.8. To Establish the Combined Models of the DEA-Efficient Frontier Function for Large Samples
4.2. Empirical Illustration Using China’s Provincial Carbon Dioxide Emission Quotas
- (1)
- To build a relationship between the carbon emissions efficiency and the DEA inputs and outputs, judge the importance of the inputs and outputs, and predict the efficiency of new carbon-allocating methods and the quotas in 2030;
- (2)
- To study ways to improve China’s provincial carbon emissions efficiency and help to implement the “dual carbon” target, and provide lower bounds for the inputs (the carbon emissions) for each inefficient DMU to produce its current level of outputs (the GDP and population).
4.2.1. Data Resource
4.2.2. To Build the Combined Model Characterizing the DEA Production Function
- (1)
- To build a DEA-PPR combined model
- (2)
- To build the DEA-SVR and DEA-BPNN combined models
4.2.3. To Build the Combined Models Characterizing the DEA Frontier Function
- (1)
- To build a DEA-PPR combined model
- (2)
- To build the DEA-SVR and DEA-BPNN combined models
- (3)
- To build the DEA-BPNN and DEA-RF combined models
5. Results and Concluding Remarks
5.1. The PPR and DEA Models Have Similarities in Frontier Morphology and Theoretical Consistency
5.2. The Characterization Ability of the DEA-PPR Combined Model to the DEA Production Function
5.3. The Characterization Ability of the DEA-PPR Combined Model to DEA-Efficient Frontier Function
6. Limitations and Future Research
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Charnes, A.; Cooper, W.W.; Rhodes, E. Measuring the efficiency of decision making units. Eur. J. Oper. Res. 1978, 2, 429–444. [Google Scholar] [CrossRef]
- Cheng, G. Data Envelopment Analysis: Methods and MaxDEA Software; Intellectual Property Press: Beijing, China, 2014; Available online: http://www.maxdea.cn/ (accessed on 12 June 2023).
- Cooper, W.; Seiford, L.; Tone, K. Data Envelopment Analysis—A Comprehensive Text with Models, Applications, References and DEA-Solver Software; Klumer Academic Publishers: Boston, MA, USA, 2007. [Google Scholar]
- Panwar, A.; Olfati, M.; Pant, M.; Snasel, V. A review on the 40 years of existence of data envelopment analysis models: Historic development and current trends. Arch. Comput. Methods Eng. 2022, 29, 5397–5426. [Google Scholar] [CrossRef] [PubMed]
- Wu, D.; Yang, Z.; Liang, L. Using dea-neural network approach to evaluate branch efficiency of a large Canadian bank. Expert Syst. Appl. 2006, 31, 108–115. [Google Scholar] [CrossRef]
- Bishop, C. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
- Lewis, H.F.; Sexton, T.R. Network DEA: Efficiency analysis of organizations with complex internal structure. Comput. Oper. Res. 2004, 31, 1365–1410. [Google Scholar] [CrossRef]
- Athanassopoulos, A.D.; Curram, S.P. A comparison of data envelopment analysis and artificial neural networks as tool for assessing the efficiency of decision making units. J. Oper. Res. Soc. 1996, 47, 1000–1016. [Google Scholar] [CrossRef]
- Na, J.; Zhou, Z.; Zhou, H. A decision-analysis approach to determine the total staff employ in government organization—Establishment of local government organizations. Chin. J. Manag. Sci. 1997, 5, 7–17. [Google Scholar]
- Ma, C.; Wu, C.; Zhang, S.; Yang, Y.; Li, H.; Han, Y. Decision making method for variable-rate fertilization based on data envelopment analysis and artificial neural network. Trans. CSAE 2004, 20, 152–155. [Google Scholar]
- Zhu, N.; Zhu, C.; Emrouznejad, A. A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of Chinese manufacturing listed companies. J. Manag. Sci. Eng. 2021, 6, 435–448. [Google Scholar] [CrossRef]
- Olanrewaju, O. Integrated index decomposition analysis-artificial neural network-data envelopment analysis (IDA-ANN-DEA)—Implementation guide. Energy Effic. 2021, 14, 71–78. [Google Scholar] [CrossRef]
- Zhong, K.; Wang, Y.; Pei, J. Super efficiency SBM-DEA and neural network for performance evaluation. Inf. Process. Manag. 2021, 58, 102728. [Google Scholar] [CrossRef]
- Anouze, A.L.M.; Bou-Hamad, I. Data envelopment analysis and data mining to efficiency estimation and evaluation. Int. J. Islam. Middle East. Financ. Manag. 2019, 12, 169–190. [Google Scholar] [CrossRef]
- Bose, A.; Patel, G.N. “NeuralDEA”—A framework using neural network to re-evaluate DEA benchmarks. OPSearch 2015, 52, 18–41. [Google Scholar] [CrossRef]
- Kwon, H. Exploring the predictive potential of artificial neural networks in conjunction with DEA in railroad performance modeling. Int. J. Prod. Econ. 2017, 183, 159–170. [Google Scholar] [CrossRef]
- Hong, H.K.; Ha, S.H.; Shin, C.K.; Park, S.C.; Kim, S.H. Evaluating the efficiency of system integration projects using data envelopment analysis (DEA) and machine learning. Expert Syst. Appl. 1999, 16, 283–296. [Google Scholar] [CrossRef]
- Tsaples, G.; Papathanasiou, J.; Georgiou, A.C. An exploratory DEA and machine learning framework for the evaluation and analysis of sustainability composite indicators in the EU. Mathematics 2022, 10, 2277. [Google Scholar] [CrossRef]
- Mirmozaffari, M.; Shadkam, E.; Khalili, S.M.; Kabirifar, K.; Yazdani, R.; Gashteroodkhani, T.A. A novel artificial intelligent approach: Comparison of machine learning tools and algorithms based on optimization DEA Malmquist productivity index for eco-efficiency evaluation. Int. J. Energy Sect. Manag. 2021, 25, 523–550. [Google Scholar] [CrossRef]
- Yang, X.; Dimitrov, S. Data envelopment analysis may obfuscate corporate financial data: Using support vector machine and data envelopment analysis to predict corporate failure for nonmanufacturing firms. INFOR Inf. Syst. Oper. Res. 2017, 55, 295–311. [Google Scholar] [CrossRef]
- Emrouznejad, A.; Shale, E. A combined neural network and DEA for measuring efficiency of large scale datasets. Comput. Ind. Eng. 2009, 56, 249–254. [Google Scholar] [CrossRef]
- Saeidi, S.; Jouybanpour, P.; Mirvakilli, A.; Iranshahi, D.; Klemeš, J.J. A comparative study between modified data envelopment analysis and response surface methodology for optimisation of heterogeneous biodiesel production from waste cooking palm oil. J. Clean. Prod. 2016, 136, 23–30. [Google Scholar] [CrossRef]
- Kwon, H.; Lee, J.; Roh, J. Best performance modeling using complementary DEA-ANN approach—Application to Japanese electronics manufacturing firms. Benchmarking Int. J. 2016, 23, 704–721. [Google Scholar] [CrossRef]
- Farahmand, M.; Desa, M.; Nilashi, M. A combined data envelopment analysis and support vector regression for efficiency evaluation of large decision making units. Int. J. Eng. Technol. 2014, 6, 2310–2321. [Google Scholar]
- Friedman, J.; Stuetzle, W. Projection pursuit regression. J. Am. Stat. Assoc. 1981, 76, 817–823. [Google Scholar] [CrossRef]
- Lou, W. The Projection Pursuit Theory Based on Swarm Intelligence Optimization Algorithms—New Developments, Applications, and Software; Fudan University Press: Shanghai, China, 2021. [Google Scholar]
- Hwang, T.; Lay, S.; Maechler, M. Regression modeling in back-propagation and projection pursuit learning. IEEE Trans. Neural Netw. 1994, 5, 342–353. [Google Scholar] [CrossRef] [PubMed]
- Zhan, H.R.; Zhang, M.K.; Xia, Y.C. Ensemble projection pursuit for general nonparametric. arXiv 2022, arXiv:2210.14467. [Google Scholar]
- Ren, H.; Ma, X.R.; Li, H.B. Improvement of input evaluation for giant projects based on GA-BP neural network. Syst. Eng.—Theory Pract. 2015, 35, 1474–1481. [Google Scholar]
- Tsolas, I.E.; Charles, V.; Gherman, T. Supporting better practice benchmarking: A DEA-ANN approach to bank branch performance assessment. Expert Syst. Appl. 2020, 160, 113599. [Google Scholar] [CrossRef]
- Zhang, Z.; Xiao, Y.; Niu, H. DEA and Machine Learning for Performance Prediction. Mathematics 2022, 10, 1776. [Google Scholar] [CrossRef]
- Fallahpour, A.; Kazemi, N.; Molani, M.; Nayyeri, S.; Ehsani, M. An Intelligence-Based Model for Supplier Selection Integrating Data Envelopment Analysis and Support Vector Machine. Iran. J. Manag. Stud. 2018, 11, 209–241. [Google Scholar]
- Yazdanparast, R.; Tavakkoli-Moghaddam, R.; Heidari, R.; Aliabadi, L. A hybrid Z-number data envelopment analysis and neural network for assessment of supply chain resilience: A case study. Cent. Eur. J. Oper. Res. 2021, 29, 611–631. [Google Scholar] [CrossRef]
- Sreekumar, S.; Mahapatra, S. Performance modeling of Indian business schools: A DEA-neural network approach. Benchmarking Int. J. 2011, 18, 221–239. [Google Scholar] [CrossRef]
- Kao, H.; Huang, C.; Chen, J. Classification using DEA and SVM approaches: The empirical study of higher education. Information 2013, 16, 7801–7810. [Google Scholar]
- Barros, C.; Wanke, P. Insurance companies in Mozambique: A two-stage DEA and neural networks on efficiency and capacity slacks. Appl. Econ. 2014, 46, 3591–3600. [Google Scholar]
- Sanei, R.; Hosseinzadeh lotfi, F.; Fallah, M.; Sobhani, F. An estimation of an acceptable efficiency frontier having an optimum resource management approach, with a combination of the DEA-ANN-GA technique (A case study of branches of an insurance company). Mathematics 2022, 10, 4503. [Google Scholar] [CrossRef]
- Liu, Q.; Shang, J.; Wang, J.; Niu, W.; Qiao, W. Evaluation and prediction of the safety management efficiency of coal enterprises based on a DEA-BP neural network. Resour. Policy 2023, 83, 103611. [Google Scholar] [CrossRef]
- Lou, W.G. Evaluation and prediction of soil quality based on artificial neural network in the Sanjiang Plain. Chin. J. Manag. Sci. 2002, 10, 79–83. [Google Scholar]
- Haykin, S. Neural Networks and Learning Machines; China Machine Press: Beijing, China, 2009. [Google Scholar]
- Zhang, G.; Patuwo, E.; Hu, M. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 1998, 14, 35–62. [Google Scholar] [CrossRef]
- StatSoft, Inc. STATISTICA Neural Networks; StatSoft, Inc.: Tulsa, OK, USA, 2011. [Google Scholar]
- Zhong, C.; Lou, W.G.; Wang, C. Neural Network-Based Modeling for Risk Evaluation and Early Warning for Large-Scale Sports Events. Mathematics 2022, 10, 3228. [Google Scholar] [CrossRef]
- Singhee, A. SiLVR: Projection Pursuit for Response Surface Modeling. In Machine Learning in VLSI Computer-Aided Design; Elfadel, I., Boning, D., Li, X., Eds.; Springer Nature: Cham, Switzerland, 2019. [Google Scholar]
- Hall, P. On projection pursuit regression. Ann. Stat. 1989, 17, 573–588. [Google Scholar] [CrossRef]
- Yu, X.H.; Xu, H.Y.; Lou, W.G.; Xu, X.; Shi, V. Examining energy eco-efficiency in China’s logistics industry. Int. J. Prod. Econ. 2023, 258, 108797. [Google Scholar] [CrossRef]
- Banker, R.D.; Charnes, A.; Cooper, W.W. Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manag. Sci. 1984, 30, 1078–1092. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Chen, C.; Tuo, R. Projection Pursuit Gaussian Process Regression. IISE Trans. 2023, 55, 901–911. [Google Scholar] [CrossRef]
- Tao, H.; Tao, J.; Li, Q.; Aihemaiti, M.; Jiang, Y.; Yang, W.; Wei, J. Average relative flow of single-wing labyrinth drip irrigation tape based on projection pursuit regression. Sci. Rep. 2022, 12, 8543. [Google Scholar] [CrossRef] [PubMed]
- Mohamed, A.A.A.; Hassan, S.A.; Hemeida, A.M.; Alkhalaf, S.; Mahmoud, M.; Eldin, A.M.B. Parasitism–Predation algorithm (PPA): A novel approach for feature selection. Ain Shams Eng. J. 2020, 11, 293–308. [Google Scholar] [CrossRef]
- Athey, S.; Imbens, G. Machine learning methods that economists should know about. Annu. Rev. Econ. 2019, 11, 685–725. [Google Scholar] [CrossRef]
- Marsland, S. Machine Learning: An Algorithmic Perspective; CRC Press: Boca Raton, FL, USA; Taylor & Francis Group: Abingdon, UK, 2015. [Google Scholar]
- Valero-Carreras, D.; Aparicio, J.; Guerrero, N.G. Support vector frontiers: A new approach for estimating production functions through support vector machines. Omega 2021, 104, 102490. [Google Scholar] [CrossRef]
- Gomes, E.G.; Lins, M.P.E. Modelling undesirable outputs with zero sum gains data envelopment analysis models. J. Oper. Res. Soc. 2008, 59, 616–623. [Google Scholar] [CrossRef]
- Zhou, X.; Niu, A.Y.; Lin, C.X. Optimizing carbon emission forecast for modelling China’s 2030 provincial carbon emission quota allocation. J. Environ. Manag. 2023, 325, 116523. [Google Scholar] [CrossRef]
- Tang, Q.; Zhang, C. Data processing system (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research. Insect Sci. 2013, 20, 254–260. [Google Scholar] [CrossRef]
References | DEA Model | Topology Employed in a Combined Model | Combined Model | Number of Samples (Validation, Test) | To Obey the Rule of Thumb | Model for Efficiency Score (ES) and Efficient Frontier (EF) |
---|---|---|---|---|---|---|
Wu et al. [5] | CCR (2, 3) | 5-10-1, 5-4-1 | two BPNNs | 142 (84) | no | ES |
Athanassopoulos et al. [8] | CCR (2, 1) *, DEA (4, 3) | 2-3-1 **, 4-10-3 | BPNN | 250 (50) | yes | ES, DEA > BPNN; efficiency rank, BPNN > DEA |
Na et al. [9] | CCR (5, 2) | 5-3-2 | BPNN | 13N, N | no | ES |
Ma et al. [10] | CCR (6, 1) | 4-4-3 | BPNN | 38 (5) | no | ES |
Zhu et al. [11] | CCR (2, 2) | 4-?-1 | BPNN, GANN, SVR, ISVR | 948 (48) | yes | ES, GANN > BPNN > ISVM > SVM |
Olanrewaju [12] | CCR (1, 1) | 1-5-1 | BPNN | 8 | no | ES |
Zhong et al. [13] | SESBM (3, 1) | 3-?-1 | 15 MLs such as CART, CIT, Bagging, RF, BPNN, etc. | 710 (3:1) | no | EF, BPNN > ET > RF > GBR |
Anouze et al. [14] | CCR (5, 4), and 15 environmental variables | 9-5-1 | 15 MLs such as CART, CIT, Bagging, RF, BPNN, etc. | 151 (2:1) | no | The bagging and RF are better than BPNN, CIT, etc. |
Bose et al. [15] | CCR (2, 2) | 2-3-2-2, 5-4-3-2 | Two BPNNs | 12, 99 | no | EF |
Kwon [16] | two DEAs, CCRs (4, 1) | 5-7-1, 5-8-1, 5-8-1, 5-7-1; 4-3-1, 4-2-1, 4-4-1, 4-5-1 | eight BPNNs | 56 (17) | no | ES, EF |
Hong et al. [17] | CCR (4,4) | / | SOMc | 50 | / | / |
Yang et al. [20] | SBM (5, 5) | trial-and-error | SVMc | creating at most 500 instances, 10-fold CV | / | ES |
Saeidi et al. [22] | CCR (4, 2) | 6-?-1 | BPNN | 26 | no | ES and EF |
Kwon et al. [23] | CCR (3, 3), CCR (3, 1) | 15-9-1, 6-30-1, 15-3-1, 15-3-1 | four BPNNs | 181 (37, 36) # | no | ES, EF |
Ren et al. [29] | CCR (5, 5) | 10-21-1 | BPNN | 5N, N | no | ES |
Tsolas et al. [30] | eight DEAs, CCR (3, 2) | 3-3-1, 6-?-1 | two BPNNs | 160 (4:1) | no | ES, EF |
Zhang et al. [31] | SESBM (3, 2) | 5-10-1; 5-10-20-1; 5-10-10-10-1; 5-10-10-20-1; 5-10-20-30-1 | 11 MLs such as BPNN, SVR, etc. | 420 (30), 5-fold CV | no | ES, BPNN is the optimal |
Fallahpour et al. [32] | CCR (3, 3) | trial-and-error | SVM | 48 (12) | / | ES |
Yazdanparast et al. [33] | Z-DEA (1, 17) | 17-?-1 | BPNN | 150 (45) | no | EF |
Sreekumar et al. [34] | CCR (3, 8), BCC (3, 8) | 11-?-1 | GRNN | 49 | / | ES |
Kao et al. [35] | CCR (10, 2) | 12-?-1 | two SVMsc | 91 | / | ES |
Barros et al. [36] | PCA-CCR (2, 2) | 4-20-1 | BPNN | 50 | no | ES |
Sanei et al. [37] | SBM (3, 3) | 5-6-1 | three BPNNs | 155 (46) | no | EF |
Liu et al. [38] | BBC (4, 4) | 7-8-4 | BPNN | 120 (20) | no | ES |
DMUs | Original Data | ES | Efficient/ Inefficient | PPR Model | ||||||
---|---|---|---|---|---|---|---|---|---|---|
I1 * | I2 | O1 | O2 | PPR-1 | PPR-2 | PPR | ESN | |||
A | 20 | 151 | 100 | 90 | 1 | Efficient | 0.996 | 0.018 | 1.014 | 1 |
B | 19 | 131 | 150 | 50 | 1 | Efficient | 0.974 | 0.017 | 0.992 | 1 |
C | 25 | 160 | 160 | 55 | 0.883 | Inefficient | 0.902 | −0.012 | 0.891 | 0.874 |
D | 27 | 168 | 180 | 72 | 1 | Efficient | 0.995 | −0.008 | 0.987 | 0.941 |
E | 22 | 158 | 94 | 66 | 0.764 | Inefficient | 0.787 | −0.011 | 0.777 | 0.748 |
F | 55 | 255 | 230 | 90 | 0.835 | Inefficient | 0.807 | 0.023 | 0.831 | 0.791 |
G | 33 | 235 | 220 | 88 | 0.902 | Inefficient | 0.930 | −0.002 | 0.928 | 0.902 |
H | 31 | 206 | 152 | 80 | 0.796 | Inefficient | 0.806 | −0.024 | 0.782 | 0.752 |
I | 30 | 244 | 190 | 100 | 0.960 | Inefficient | 0.897 | 0.022 | 0.919 | 0.960 |
J | 50 | 268 | 250 | 100 | 0.871 | Inefficient | 0.905 | −0.028 | 0.877 | 0.819 |
K | 53 | 306 | 260 | 147 | 0.955 | Inefficient | 1.002 | −0.035 | 0.968 | 0.873 |
L | 38 | 284 | 250 | 120 | 0.958 | Inefficient | 0.979 | 0.008 | 0.984 | 0.958 |
B-C ** | 20.9 | 141.2 | 160 | 55 | 1 | Efficient | 0.980 | 0.012 | 0.993 | 1(M) # |
B-E | 16.8 | 120.6 | 94 | 66 | 1 | Efficient | 0.982 | 0.015 | 0.997 | 0.963(N) |
B-F | 33.8 | 212.9 | 230 | 90 | 1 | Efficient | 1.002 | −0.012 | 0.990 | |
B-G | 29.8 | 208.6 | 220 | 88 | 1 | Efficient | 1.001 | 0.013 | 1.013 | |
B-H | 24.7 | 164 | 152 | 80 | 1 | Efficient | 0.996 | 0.001 | 0.997 | |
B-I | 28.8 | 207.1 | 190 | 100 | 1 | Efficient | 1.002 | 0.014 | 1.016 | |
B-J | 37.5 | 233.3 | 250 | 100 | 1 | Efficient | 1.002 | −0.018 | 0.985 | |
B-K | 43.3 | 292.3 | 260 | 147 | 1 | Efficient | 0.984 | −0.012 | 0.972 | |
B-L | 36.4 | 259.5 | 250 | 120 | 1 | Efficient | 1.000 | 0.011 | 1.011 |
DMUs | I1 | I2 | O1 | O2 | DPPR | NPPR | DB | NB | DPPR-27 | NPPR-27 | DPPR-8 | NPPR-8 | DNN-729 | NNN-729 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 20 | 151 | 100 | 90 | 20.04 | 150.95 | 20.00 | 150.19 | 20.02 | 150.98 | 20.00 | 151.00 | 20.26 | 149.17 |
B | 19 | 131 | 150 | 50 | 20.08 | 129.91 | 19.01 | 131.31 | 20.07 | 129.92 | 20.04 | 129.95 | 20.19 | 129.94 |
D | 27 | 168 | 180 | 72 | 25.57 | 169.45 | 26.93 | 167.17 | 25.55 | 169.47 | 25.52 | 169.50 | 25.29 | 168.20 |
C | 25 | 160 | 160 | 55 | 21.63 | 140.44 | 21.06 | 140.16 | 21.62 | 140.45 | 21.59 | 140.48 | 21.47 | 139.30 |
E | 22 | 158 | 94 | 66 | 16.61 | 120.82 | 14.68 | 122.47 | 16.60 | 120.83 | 16.58 | 120.86 | 17.97 | 122.48 |
F | 55 | 255 | 230 | 90 | 32.51 | 214.18 | 33.84 | 194.51 | 32.48 | 214.20 | 32.45 | 214.23 | 32.56 | 214.94 |
G | 33 | 235 | 220 | 88 | 31.31 | 207.04 | 32.97 | 191.43 | 31.29 | 207.07 | 31.26 | 207.10 | 31.33 | 207.61 |
H | 31 | 206 | 152 | 80 | 23.84 | 164.90 | 24.86 | 163.26 | 23.82 | 164.92 | 23.79 | 164.95 | 23.58 | 163.41 |
I | 30 | 244 | 190 | 100 | 29.86 | 206.06 | 30.86 | 186.89 | 29.84 | 206.09 | 29.81 | 206.12 | 29.85 | 206.51 |
J | 50 | 268 | 250 | 100 | 35.61 | 235.25 | 35.42 | 200.64 | 35.59 | 235.27 | 35.55 | 235.31 | 35.69 | 236.11 |
K | 53 | 306 | 260 | 147 | 42.15 | 293.42 | 33.42 | 196.15 | 42.12 | 293.46 | 42.09 | 293.48 | 42.05 | 292.72 |
L | 38 | 284 | 250 | 120 | 37.99 | 257.93 | 34.96 | 200.38 | 37.96 | 257.96 | 37.93 | 257.99 | 38.03 | 258.24 |
M | 25 | 150 | 170 | 79 | 25.45 | 272.52 | / | / | 22.43 | 172.54 | 25.40 | 172.57 | 25.18 | 171.38 |
N | 30 | 300 | 90 | 130 | 23.85 | 191.45 | / | / | 23.81 | 191.49 | 23.79 | 191.51 | 23.73 | 191.02 |
MAE | MAPE (%) | RMSE | R | (%) | ||
---|---|---|---|---|---|---|
2.346 | 8.0 | 3.695 | 0.898 | 9.880 | 29.6 | |
1.068 | 5.3 | 1.190 | 0.989 | 1.589 | 7.9 | |
1.083 | 5.4 | 1.193 | 0.957 | 1.827 | 9.1 | |
1.163 | 4.2 | 1.281 | 0.983 | 2.262 | 6.8 | |
1.068 | 3.7 | 1.188 | 0.989 | 1.882 | 5.6 | |
1.065 | 3.7 | 1.182 | 0.989 | 1.820 | 5.3 | |
1.070 | 3.66 | 1.192 | 0.989 | 1.894 | 5.8 | |
1.209 | 4.39 | 1.283 | 0.988 | 1.807 | 7.0 | |
1.621 | 6.47 | 1.736 | 0.983 | 3.163 | 17.0 | |
27.48 | 14.1 | 40.62 | 0.895 | 96.15 | 49.0 | |
1.081 | 0.7 | 1.202 | 1 | 1.946 | 1.3 | |
1.081 | 0.7 | 1.212 | 0.999 | 1.999 | 1.3 | |
1.111 | 0.6 | 1.253 | 0.999 | 2.379 | 1.0 | |
1.078 | 0.6 | 1.199 | 1 | 1.939 | 0.9 | |
1.075 | 0.6 | 1.193 | 1 | 1.672 | 0.9 | |
1.083 | 0.56 | 1.204 | 1 | 1.952 | 0.9 | |
1.298 | 0.75 | 1.504 | 1 | 2.807 | 1.6 | |
2.648 | 1.65 | 3.230 | 0.999 | 6.629 | 4.4 |
Model | Training Subset | Validation Subset | Test Subset | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
MAE | MAPE | EA-max | ER-max | MAE | MAPE | EA-max | ER-max | MAE | MAPE | EA-max | ER-max | |
0.496 | 1.52 | 1.597 | 11.68 | 0.513 | 1.49 | 1.515 | 4.35 | 0.527 | 1.33 | 1.822 | 6.75 | |
2.170 | 2.06 | 3.245 | 8.60 | 2.417 | 2.17 | 6.13 | 9.47 | 2.649 | 2.66 | 10.06 | 10.29 | |
0.429 | 1.13 | 2.250 | 14.00 | 0.429 | 1.31 | 3.28 | 24.9 | 0.430 | 1.21 | 2.02 | 13.26 | |
0.833 | 0.37 | 15.75 | 15.40 | 0.787 | 0.30 | 9.32 | 2.13 | 0.880 | 0.39 | 8.55 | 6.37 |
Provinces | CO2 (Mt) | Population (104 Persons) | GDP (CNY 100 Million) | ES | Quotas | ES-q | Quotas-y | ES-y | Quotas-2030 | ES-2030 | SVR |
---|---|---|---|---|---|---|---|---|---|---|---|
Beijing | 89.2 | 2154 | 35,371 | 1 | 89.2 | 0.6906 | 426.5 | 0.6516 | 84.3 | 0.9578 | 0.9877 |
Tianjin | 158.5 | 1562 | 14,104 | 0.417 | 66.1 | 0.7636 | 179.2 | 0.6503 | 157.7 | 0.6094 | 0.4351 |
Hebei | 914.2 | 7592 | 35,105 | 0.3133 | 286.4 | 0.3407 | 453.0 | 0.8803 | 910.0 | 0.4506 | 0.3410 |
Shanxi | 564.9 | 3729 | 17,027 | 0.2562 | 144.7 | 0.5744 | 207.5 | 0.8229 | 727.0 | 0.3222 | 0.2874 |
Inner Mongolia | 794.3 | 2540 | 17,213 | 0.1280 | 101.7 | 0.6701 | 547.6 | 0.2707 | 712.2 | 0.3484 | 0.1726 |
Liaoning | 533.4 | 4352 | 24,909 | 0.3146 | 167.8 | 0.5214 | 361.3 | 0.6240 | 504.0 | 0.2491 | 0.3401 |
Jilin | 203.7 | 2691 | 11,727 | 0.5245 | 106.8 | 0.6702 | 218.6 | 0.5558 | 213.5 | 0.5236 | 0.5353 |
Heilongjiang | 278.2 | 3751 | 13,613 | 0.5231 | 145.5 | 0.5779 | 290.6 | 0.5776 | 262.9 | 0.4580 | 0.5302 |
Shanghai | 192.9 | 2428 | 38,155 | 0.5581 | 107.7 | 0.6301 | 374.8 | 0.7978 | 173.0 | 0.7542 | 0.5641 |
Jiangsu | 804.6 | 8070 | 99,632 | 0.6410 | 515.8 | 1.2084 | 787.1 | 0.9723 | 760.2 | 0.8652 | 0.6389 |
Zhejiang | 381.4 | 5850 | 62,352 | 0.7201 | 274.7 | 0.4306 | 546.3 | 0.8834 | 379.7 | 0.8558 | 0.7108 |
Anhui | 408.1 | 6366 | 37,114 | 0.5923 | 241.7 | 0.4090 | 366.2 | 0.9428 | 427.8 | 0.4555 | 0.5950 |
Fujian | 278.1 | 3973 | 42,395 | 0.5922 | 164.7 | 0.5287 | 352.6 | 0.9392 | 285.6 | 0.6615 | 0.5945 |
Jiangxi | 242.3 | 4666 | 24,758 | 0.7393 | 179.1 | 0.4996 | 276.8 | 0.8492 | 267.5 | 0.6407 | 0.7402 |
Shandong | 937.1 | 10,070 | 71,068 | 0.4823 | 452.0 | 0.4691 | 824.4 | 0.7480 | 885.4 | 0.5838 | 0.4947 |
Henan | 460.6 | 9640 | 54,259 | 0.9059 | 417.3 | 0.3018 | 528.1 | 1 | 458.5 | 0.8236 | 0.8966 |
Hubei | 354.8 | 5927 | 45,828 | 0.6636 | 235.4 | 0.4297 | 417.2 | 0.8935 | 371.9 | 0.5774 | 0.6600 |
Hunan | 310.6 | 6918 | 39,752 | 0.8430 | 261.9 | 0.3874 | 385.3 | 0.9731 | 325.7 | 0.7527 | 0.8359 |
Guangdong | 569.1 | 11,521 | 107,671 | 1 | 569.1 | 1.2367 | 826.4 | 1 | 510.3 | 1.4683 | 0.9877 |
Guangxi | 246.7 | 4960 | 21,237 | 0.7689 | 189.7 | 0.4799 | 232.6 | 1 | 258.7 | 0.7177 | 0.7617 |
Hainan | 43.1 | 945 | 5309 | 1 | 43.1 | 0.8542 | 74.5 | 0.6900 | 67.8 | 0.8221 | 0.9860 |
Chongqing | 156.3 | 3124 | 23,606 | 0.7891 | 123.3 | 0.6135 | 238.1 | 0.8199 | 155.5 | 0.8515 | 0.7736 |
Sichuan | 315.2 | 8375 | 46,616 | 1 | 315.2 | 0.3491 | 572.2 | 0.7915 | 313.7 | 0.9902 | 0.9877 |
Guizhou | 261.1 | 3623 | 16,769 | 0.5394 | 140.9 | 0.5831 | 165.6 | 1 | 288.3 | 0.5035 | 0.5414 |
Yunnan | 186.0 | 4858 | 23,224 | 1 | 186.0 | 0.4869 | 367.0 | 0.6421 | 216.1 | 0.7992 | 0.9881 |
Shaanxi | 296.3 | 3876 | 25,793 | 0.5085 | 150.7 | 0.5553 | 308.7 | 0.7174 | 294.9 | 0.5161 | 0.5184 |
Gansu | 164.5 | 2647 | 8718 | 0.6397 | 105.2 | 0.6820 | 108.8 | 1 | 172.4 | 0.6670 | 0.6323 |
Qinghai | 51.8 | 608 | 2966 | 0.8322 | 43.1 | 0.8170 | 118.2 | 0.3213 | 48.9 | 0.8353 | 0.8346 |
Ningxia | 212.4 | 695 | 3748 | 0.2028 | 43.1 | 0.8257 | 38.0 | 1 | 222.7 | 0.3959 | 0.2404 |
Xinjiang | 455.3 | 2523 | 13,597 | 0.2214 | 100.8 | 0.6800 | 271.3 | 0.4695 | 408.2 | 0.2431 | 0.2574 |
Tianjin b | 1562 | 14,104 | 1 | 60.7 | 0.9783 | 0.6401 | |||||
Hebei b | 7592 | 42,258 | 1 | 285.7 | 0.9669 | 0.7387 | |||||
Shanxi b | 3729 | 20,756 | 1 | 140.3 | 0.9747 | 0.6924 | |||||
Inner Mongolia b | 2540 | 17,213 | 1 | 96.7 | 0.9757 | 0.6219 | |||||
Liaoning b | 4352 | 24,909 | 1 | 164.0 | 0.9673 | 0.8048 | |||||
Jilin b | 2691 | 14,978 | 1 | 101.3 | 0.9869 | 0.6233 | |||||
Heilongjiang b | 3751 | 20,878 | 1 | 141.2 | 0.9745 | 0.6931 | |||||
Shanghai b | 2515 | 38,155 | 1 | 107.7 | 0.9107 | 0.8277 | |||||
Jiangsu b | 10,479 | 99,632 | 1 | 515.8 | 0.9770 | 0.622 | |||||
Zhejiang b | 5850 | 62,352 | 1 | 274.7 | 0.8479 | 0.6197 | |||||
Average | / | / | / | 0.6239 | / | 0.6089 | / | 0.7828 | / | 0.6567 | / |
Training Subset | Validation Subset | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Model | MAE | MAPE (%) | RMSE | (%) | MAE | MAPE (%) | RMSE | (%) | ||
12.30 | 3.29 | 15.53 | 43.20 | 36.41 | 8.37 | 5.05 | 14.66 | 35.00 | 27.37 | |
10.92 | 5.03 | 15.00 | 45.78 | 106.3 | 42.54 | 27.43 | 53.41 | 132.2 | 49.06 | |
12.08 | 1.89 | 14.69 | 45.33 | 41.81 | 9.52 | 5.73 | 14.86 | 37.51 | 22.82 | |
11.40 | 2.10 | 14.67 | 51.29 | 119.1 | 36.45 | 24.33 | 44.44 | 106.2 | 44.77 | |
11.74 | 1.60 | 14.51 | 52.78 | 9.27 | 22.77 | 14.03 | 30.83 | 69.65 | 52.94 | |
7.39 | 1.88 | 10.54 | 71.50 | 166.0 | 34.35 | 28.80 | 37.13 | 59.44 | 97.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, X.; Lou, W. An Exploration of Prediction Performance Based on Projection Pursuit Regression in Conjunction with Data Envelopment Analysis: A Comparison with Artificial Neural Networks and Support Vector Regression. Mathematics 2023, 11, 4775. https://doi.org/10.3390/math11234775
Yu X, Lou W. An Exploration of Prediction Performance Based on Projection Pursuit Regression in Conjunction with Data Envelopment Analysis: A Comparison with Artificial Neural Networks and Support Vector Regression. Mathematics. 2023; 11(23):4775. https://doi.org/10.3390/math11234775
Chicago/Turabian StyleYu, Xiaohong, and Wengao Lou. 2023. "An Exploration of Prediction Performance Based on Projection Pursuit Regression in Conjunction with Data Envelopment Analysis: A Comparison with Artificial Neural Networks and Support Vector Regression" Mathematics 11, no. 23: 4775. https://doi.org/10.3390/math11234775
APA StyleYu, X., & Lou, W. (2023). An Exploration of Prediction Performance Based on Projection Pursuit Regression in Conjunction with Data Envelopment Analysis: A Comparison with Artificial Neural Networks and Support Vector Regression. Mathematics, 11(23), 4775. https://doi.org/10.3390/math11234775