Dynamic Nearest Neighbor: An Improved Machine Learning Classifier and Its Application in Finances
Abstract
:1. Introduction
2. Related Works on Machine Learning for Financial Business
3. Proposed Method
- In a specific pattern classification problem, there are , and which are fixed positive integers greater than 1.
- There is a training or learning set of cardinality , , which is made up of patterns.
- Each of the patterns in the set is made up of attributes .
- The -th component of the -th training or learning pattern is denoted by .
- The patterns do not contain categorical, mixed, or missing values. There are only numeric values.
- Each of the patterns in the training or learning set belongs to a single class in the set of classes .
- A parameter is set, (typically ).
- For each index such that , and for each index such that do:
- 2.1
- Create a new learning set with the patterns of restricted to attributes such that
- 2.2
- Apply the 1-NN classifier to the pattern to be classified , with the set
- 2.3
- Store the class that delivers the 1-NN in step 2.2, as
- Assign to the pattern o the most frequent class in the set of all values
- (i).
- For the index takes the values 3, 4, 5, and 6. That is, the following four subsets of attributes are formed:
- (ii).
- For the index takes the values 4, 5, and 6. That is, the following three subsets of attributes are formed:
- (iii).
- For the index takes the values 5, and 6. That is, the following two subsets of attributes are formed:
- (iv).
- Finally, for the index takes only the value 6. That is, the following subset of attributes are formed:
4. Results
4.1. Datasets
4.2. State of the Art Classifiers for Comparison
4.3. Performance and Comparative Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bodie, Z.; Robert, C. Merton and the Science of Finance. Annu. Rev. Financ. Econ. 2020, 12, 19–38. [Google Scholar] [CrossRef]
- Alessi, L.; Savona, R. Machine Learning for Financial Stability. In Data Science for Economics and Finance; Springer: Cham, Switzerland, 2021; pp. 65–87. [Google Scholar]
- Levantesi, S.; Zacchia, G. Machine learning and financial literacy: An exploration of factors influencing financial knowledge in Italy. J. Risk Financ. Manag. 2021, 14, 120. [Google Scholar] [CrossRef]
- Moro, S.; Cortez, P.; Rita, P. Using customer lifetime value and neural networks to improve the prediction of bank deposit subscription in telemarketing campaigns. Neural Comput. Appl. 2015, 26, 131–139. [Google Scholar] [CrossRef]
- Ampountolas, A.; Nyarko Nde, T.; Date, P.; Constantinescu, C. A Machine Learning Approach for Micro-Credit Scoring. Risks 2021, 9, 50. [Google Scholar] [CrossRef]
- Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Hart, P.E.; Stork, D.G.; Duda, R.O. Pattern Classification, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2001. [Google Scholar]
- Wei, W.; Zhang, Q. Evaluation of rural financial ecological environment based on machine learning and improved neural network. Neural Comput. Appl. 2021, 1–18. [Google Scholar] [CrossRef]
- Chen, T.-H.; Chang, R.-C. Using machine learning to evaluate the influence of FinTech patents: The case of Taiwan’s financial industry. J. Comput. Appl. Math. 2021, 390, 113215. [Google Scholar] [CrossRef]
- Canhoto, A.I. Leveraging machine learning in the global fight against money laundering and terrorism financing: An affordances perspective. J. Bus. Res. 2021, 131, 441–452. [Google Scholar] [CrossRef]
- Wu, Z. Using Machine Learning Approach to Evaluate the Excessive Financialization Risks of Trading Enterprises. Comput. Econ. 2021, 1–19. [Google Scholar] [CrossRef]
- Błaszczyński, J.; de Almeida Filho, A.T.; Matuszyk, A.; Szeląg, M.; Słowiński, R. Auto loan fraud detection using dominance-based rough set approach versus machine learning methods. Expert Syst. Appl. 2021, 163, 113740. [Google Scholar] [CrossRef]
- Wolpert, D.H. The supervised learning no-free-lunch theorems. In Soft Computing and Industry; Springer: London, UK, 2002; pp. 25–42. [Google Scholar]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Altman, E.I. A fifty-year retrospective on credit risk models, the Altman Z-score family of models and their applications to financial markets and managerial strategies. J. Credit. Risk 2018, 14, 4. [Google Scholar] [CrossRef] [Green Version]
- Boughaci, D.; Alkhawaldeh, A.A. Appropriate machine learning techniques for credit scoring and bankruptcy prediction in banking and finance: A comparative study. Risk Decis. Anal. 2020, 8, 15–24. [Google Scholar] [CrossRef]
- Chen, H.-L.; Yang, B.; Wang, G.; Liu, J.; Xu, X.; Wang, S.-J.; Liu, D.-Y. A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. Knowl.-Based Syst. 2011, 24, 1348–1359. [Google Scholar] [CrossRef]
- Clement, C. Machine Learning in Bankruptcy Prediction—A Review. J. Public Adm. Financ. Law 2020, 178–196. [Google Scholar]
- Smiti, S.; Soui, M. Bankruptcy prediction using deep learning approach based on borderline SMOTE. Inf. Syst. Front. 2020, 22, 1067–1083. [Google Scholar] [CrossRef]
- Ansari, A.; Ahmad, I.S.; Bakar, A.A.; Yaakub, M.R. A hybrid metaheuristic method in training artificial neural network for bankruptcy prediction. IEEE Access 2020, 8, 176640–176650. [Google Scholar] [CrossRef]
- Chen, Z.; Chen, W.; Shi, Y. Ensemble learning with label proportions for bankruptcy prediction. Expert Syst. Appl. 2020, 146, 113155. [Google Scholar] [CrossRef]
- Dastile, X.; Celik, T.; Potsane, M. Statistical and machine learning models in credit scoring: A systematic literature survey. Appl. Soft Comput. 2020, 91, 106263. [Google Scholar] [CrossRef]
- Pławiak, P.; Abdar, M.; Pławiak, J.; Makarenkov, V.; Acharya, U.R. DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring. Inf. Sci. 2020, 516, 401–418. [Google Scholar] [CrossRef]
- Shen, F.; Zhao, X.; Kou, G. Three-stage reject inference learning framework for credit scoring using unsupervised transfer learning and three-way decision theory. Decis. Support Syst. 2020, 137, 113366. [Google Scholar] [CrossRef]
- Teles, G.; Rodrigues, J.J.; Saleem, K.; Kozlov, S.; Rabêlo, R.A. Machine learning and decision support system on credit scoring. Neural Comput. Appl. 2020, 32, 9809–9826. [Google Scholar] [CrossRef]
- Ghatasheh, N.; Faris, H.; AlTaharwa, I.; Harb, Y.; Harb, A. Business analytics in telemarketing: Cost-sensitive analysis of bank campaigns using artificial neural networks. Appl. Sci. 2020, 10, 2581. [Google Scholar] [CrossRef] [Green Version]
- Dua, D.; Taniskidou, E.K. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 25 August 2021).
- Sadatrasoul, S.; Gholamian, M.; Shahanaghi, K. Combination of Feature Selection and Optimized Fuzzy Apriori Rules: The Case of Credit Scoring. Int. Arab. J. Inf. Technol. (IAJIT) 2015, 12, 138–145. [Google Scholar]
- López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
- Kim, M.-J.; Han, I. The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms. Expert Syst. Appl. 2003, 25, 637–646. [Google Scholar] [CrossRef]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar]
- Ballabio, D.; Grisoni, F.; Todeschini, R. Multivariate comparison of classification performance measures. Chemom. Intell. Lab. Syst. 2018, 174, 33–44. [Google Scholar] [CrossRef]
- Available online: http://archive.ics.uci.edu/ml/datasets/statlog+(australian+credit+approval) (accessed on 20 August 2021).
- Moro, S.; Cortez, P.; Rita, P. A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 2014, 62, 22–31. [Google Scholar] [CrossRef] [Green Version]
- John, G.H.; Langley, P. Estimating continuous distributions in Bayesian classifiers. arXiv Prepr. 2013, arXiv:1302.4964. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Fukunaga, K.; Hummels, D.M. Leave-one-out procedures for nonparametric error estimates. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 421–423. [Google Scholar] [CrossRef]
- Platt, J. Sequential minimal optimization: A fast algorithm for training support vector machines. In Advances in Kernel Methods—Support Vector Learning; Schoelkopf, B., Burges, C., Smola, A., Eds.; MIcrosoft Research: New York, NY, USA, 1998. [Google Scholar]
- Quinlan, J.R. Bagging, boosting, and C4. 5. In Proceedings of the Aaai/iaai, Portland, OR, USA, 4–8 August 1996; Volume 1, pp. 725–730. [Google Scholar]
Dataset | Number of Patterns | Number of Features | IR | Number of Classes |
---|---|---|---|---|
Australian credit approval | 690 | 14 | 1.24 | 2 |
Bank | 4521 | 16 | 7.67 | 2 |
Bank additional | 4119 | 20 | 8.13 | 2 |
Banknote authentication | 1372 | 4 | 1.24 | 2 |
Credit Approval | 690 | 15 | 1.24 | 2 |
German credit data | 1000 | 24 | 2.33 | 2 |
Iranian credit | 1583 | 28 | 1.50 | 2 |
Qualitative bankruptcy | 250 | 6 | 1.33 | 2 |
Algorithm | Conceptual Basis |
---|---|
Naïve Bayes | Bayesian theory |
kNN | Instance-based |
Logistic | Statistic-based |
MLP | Artificial Neural Networks |
SVM | Finding a kernel-based hyperplane |
AdaBoost | Ensemble of classifiers |
Datasets | Naïve Bayes | Logistic | kNN | MLP | SVM | AdaBoost | D1-NN |
---|---|---|---|---|---|---|---|
Australian credit approval | 77.10 | 86.52 | 80.72 | 84.34 | 55.50 | 86.08 | 82.89 |
Bank | 76.70 | 80.08 | 85.58 | 85.33 | 60.34 | 83.98 | 86.56 |
Bank additional | 76.80 | 85.76 | 86.25 | 88.01 | 76.63 | 86.58 | 91.38 |
Banknote authentication | 75.79 | 99.12 | 80.57 | 84.49 | 55.36 | 84.63 | 83.18 |
Credit approval | 83.89 | 86.52 | 99.85 | 100.00 | 100.00 | 94.09 | 99.92 |
German credit data | 71.78 | 77.00 | 73.58 | 71.61 | 63.37 | 75.55 | 71.69 |
Iranian credit | 50.72 | 75.93 | 91.34 | 88.37 | 61.59 | 85.59 | 95.13 |
Qualitative bankruptcy | 98.00 | 99.20 | 99.60 | 99.20 | 99.60 | 99.60 | 99.60 |
Average Accuracy | 76.35 | 86.26 | 87.19 | 87.67 | 71.55 | 87.01 | 88.79 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Camacho-Urriolagoitia, O.; López-Yáñez, I.; Villuendas-Rey, Y.; Camacho-Nieto, O.; Yáñez-Márquez, C. Dynamic Nearest Neighbor: An Improved Machine Learning Classifier and Its Application in Finances. Appl. Sci. 2021, 11, 8884. https://doi.org/10.3390/app11198884
Camacho-Urriolagoitia O, López-Yáñez I, Villuendas-Rey Y, Camacho-Nieto O, Yáñez-Márquez C. Dynamic Nearest Neighbor: An Improved Machine Learning Classifier and Its Application in Finances. Applied Sciences. 2021; 11(19):8884. https://doi.org/10.3390/app11198884
Chicago/Turabian StyleCamacho-Urriolagoitia, Oscar, Itzamá López-Yáñez, Yenny Villuendas-Rey, Oscar Camacho-Nieto, and Cornelio Yáñez-Márquez. 2021. "Dynamic Nearest Neighbor: An Improved Machine Learning Classifier and Its Application in Finances" Applied Sciences 11, no. 19: 8884. https://doi.org/10.3390/app11198884
APA StyleCamacho-Urriolagoitia, O., López-Yáñez, I., Villuendas-Rey, Y., Camacho-Nieto, O., & Yáñez-Márquez, C. (2021). Dynamic Nearest Neighbor: An Improved Machine Learning Classifier and Its Application in Finances. Applied Sciences, 11(19), 8884. https://doi.org/10.3390/app11198884