Modeling Credit Risk: A Category Theory Perspective
Abstract
:1. Introduction
2. Modeling Framework
2.1. Categorial Equivalence
2.2. Model Combination
2.3. Shannon’s Information Entropy
2.4. Enriched Categories
2.5. The Stacking Process
2.6. Base Models
2.7. Method of Comparison
3. Data
4. Empirical Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Key Definitions in Category Theory
- A collection of objects denoted as Ob();
- For every two objects c and d, there is a set that consists of morhphims from or ;
- For every object , there is a morphism , called the identity morphism on . For convenience, is used instead of ;
- For every three objectsand morphismsand, there is a morphism, called the composite ofand .
- These elements are required to satisfy the following conditions:
- For any morphism, withand , which is called the unitality condition;
- For any three morphisms,and, the following are equal: . This is called the associativity condition.
- Ob(Set) is the collection of all sets;
- If S and T are sets, then Set(X,Y) =, where is a function;
- For each set S, the identity functionis given byfor each ;
- Givenand, their composite function is .
- Since these elements satisfy the unitality and associativity conditions, Set is indeed a category.
- For every object, there is an object ;
- For every morphismin, there is a morphismin .
- These elements are required to satisfy the following conditions:
- For every object , ;
- For any three objects,andand two morphisms,, and, the equationholds in .
- For each object, there is a morphismin, called the c-component of , that satisfies the following naturality condition;
- For every morphismin , the following equation holds.
- Reflexivity: ; and
- Transitivity: Ifand, then .
- The preorder can be denoted as .
- An element, called the monoidal unit;
- A function , called the monoidal product.
- These elements must satisfy the following four properties:
- Monotocity: for all ;
- Unitality: for all, the equations hold;
- Associativity: for all holds;
- Aymmetry: for all holds.
- This structure is called a symmetric monoidal preorder and is denoted as .
- An element , called the monoidal unit;
- A function , called the monoidal product.
- These elements must satisfy the following properties:
- Monotocity: for all ;
- Unitality: for all, the equations hold;
- Associativity: for all holds;
- Symmetry: for all holds.
- This structure is called a symmetric monoidal preorder denoted as. Let, the structurecam be developed with representing the AND operation defined in the following matrix.
It is trivial to show that this structure forms a symmetric monoidal structure.false true false false false true false True
- A set , elements of which are called objects;
- For every two objects, there is an element , called the hom-object.
- These elements must satisfy the following two properties:
- For every object, ;
- For every three objects, all .
- Hence, it can be said thatis enriched in .
Prediction | |||
Default | Non-Default | ||
Actual | Default | TP | FN |
Non-Default | FP | TN |
References
- Abdou, Hussein A., and John Pointon. 2011. Credit scoring, statistical techniques and evaluation criteria: A review of the literature. Intelligent Systems in Accounting, Finance and Management 18: 59–88. [Google Scholar] [CrossRef] [Green Version]
- Abellán, Joaquin, and Carlos J. Mantas. 2014. Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications 41: 3825–30. [Google Scholar] [CrossRef]
- Ala’raj, Maher, and Maysam F. Abbod. 2016a. Classifier’s consensus system approach for credit scoring. Knowledge-Based Systems 104: 89–105. [Google Scholar] [CrossRef]
- Ala’raj, Maher, and Maysam F. Abbod. 2016b. A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Systems with Applications 64: 36–55. [Google Scholar] [CrossRef]
- Altman, Edward I. 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance 23: 589–609. [Google Scholar] [CrossRef]
- Altman, Edward I., and Gabriele Sabato. 2007. Modeling credit risk for SMEs: Evidence from the U.S. Market. Abacus 43: 332–57. [Google Scholar] [CrossRef]
- Asquith, Paul, David W. Mullins, and Eric D. Wolff. 1989. Original issue high yield bonds: Aging analyses of defaults, exchanges, and calls. The Journal of Finance 44: 923–52. [Google Scholar] [CrossRef]
- Aster, Richard C., Brian Borchers, and Clifford H. Thurber. 2018. Parameter Estimation and Inverse Problems, 3rd ed. Amsterdam: Elsevier Publishing Company. [Google Scholar]
- Barboza, Flavio, Herbert Kimura, and Edward Altman. 2017. Machine learning models and bankruptcy prediction. Expert Systems with Applications 83: 405–17. [Google Scholar] [CrossRef]
- Breiman, Leo. 2001. Random forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef] [Green Version]
- Brown, Peter F., Vincent J. Della Pietra, Peter V. Desouza, Jenifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics 18: 467–80. [Google Scholar]
- Chang, Shunpo, Simon D-O Kim, and Genki Kondo. 2015. Predicting default risk of lending club loans. CS229: Machine Learning, 1–5. Available online: http://cs229.stanford.edu/proj2018/report/69.pdf (accessed on 24 January 2021).
- Chen, Tianqi, and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. Paper presented at the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17; New York, NY, USA: Association for Computing Machinery, pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
- Cortes, Corinna, and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20: 273–97. [Google Scholar] [CrossRef]
- Dastile, Xolani, Turgay Celik, and Moshe Potsane. 2020. Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing 91: 106263. [Google Scholar] [CrossRef]
- Doumpos, Michael, and Constantin Zopounidis. 2007. Model combination for credit risk assessment: A stacked generalization approach. Annals of Operations Research 151: 289–306. [Google Scholar] [CrossRef]
- Eilenberg, Samuel, and Saunders MacLane. 1945. General theory of natural equivalences. Transactions of the American Mathematical Society 58: 231–94. [Google Scholar] [CrossRef] [Green Version]
- Finlay, Steven. 2011. Multiple classifier architectures and their application to credit risk assessment. European Journal of Operational Research 210: 368–78. [Google Scholar] [CrossRef] [Green Version]
- Gradojevic, Nikola, and Marko Caric. 2016. Predicting systemic risk with entropic indicators. Journal of Forecasting 36: 16–25. [Google Scholar] [CrossRef] [Green Version]
- Henley, William, and David J. Hand. 1996. A k-nearest-neighbour classifier for assessing consumer credit risk. Journal of the Royal Statistical Society: Series D (The Statistician) 45: 77–95. [Google Scholar] [CrossRef]
- Hsieh, Nan-Chen, and Lun-Ping Hung. 2010. A data driven ensemble classifier for credit scoring analysis. Expert Systems with Applications 37: 534–45. [Google Scholar] [CrossRef]
- Jonkhart, Marius J. L. 1979. On the term structure of interest rates and the risk of default: An analytical approach. Journal of Banking & Finance 3: 253–62. [Google Scholar] [CrossRef]
- Joy, Maurice O., and John O. Tollefson. 1978. Some clarifying comments on discriminant analysis. Journal of Financial and Quantitative Analysis 13: 197–200. [Google Scholar] [CrossRef]
- Kelly, Max G. 2005. Basic Concepts of Enriched Category Theory. London Mathematical Society Lecture Note Series 64; Cambridge: Cambridge University Press. Reprinted as Reprints in Theory and Applications of Categories 10. First published 1982. [Google Scholar]
- Lawrence, Edward C., Douglas L. Smith, and Malcolm Rhoades. 1992. An analysis of default risk in mobile home credit. Journal of Banking & Finance 16: 299–312. [Google Scholar] [CrossRef]
- Lending Club. 2020. Peer-to-Peer Loans Data. Available online: https://www.kaggle.com/wordsforthewise/lending-club (accessed on 24 November 2020).
- Lessmann, Stefan, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247: 124–36. [Google Scholar] [CrossRef] [Green Version]
- Li, Wei, Shuai Ding, Yi Chen, and Shanlin Yang. 2018. Heterogeneous ensemble for default prediction of peer-to-peer lending in China. IEEE Access 6: 54396–406. [Google Scholar] [CrossRef]
- Lupu, Radu, Adrian C. Călin, Cristina G. Zeldea, and Iulia Lupu. 2020. A bayesian entropy approach to sectoral systemic risk modeling. Entropy 22: 1371. [Google Scholar] [CrossRef]
- Malekipirbazari, Milad, and Vural Aksakalli. 2015. Risk assessment in social lending via random forests. Expert Systems with Applications 42: 4621–31. [Google Scholar] [CrossRef]
- Martin, Daniel. 1977. Early warning of bank failure: A logit regression approach. Journal of Banking & Finance 1: 249–76. [Google Scholar] [CrossRef]
- Matthews, Ben W. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405: 442–51. [Google Scholar] [CrossRef]
- Mcleay, Stuart, and Azmi Omar. 2000. The sensitivity of prediction models to the non-normality of bounded and unbounded financial ratios. British Accounting Review 32: 213–30. [Google Scholar] [CrossRef]
- Ohlson, James A. 1980. Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research 18: 109–31. [Google Scholar] [CrossRef] [Green Version]
- Pichler, Alois, and Ruben Schlotter. 2020. Entropy based risk measures. European Journal of Operational Research 285: 223–36. [Google Scholar] [CrossRef] [Green Version]
- Rish, Irina. 2001. An empirical study of the naive Bayes classifier. Paper presented at the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, August 4–6; vol. 3, pp. 41–46. [Google Scholar]
- Safavian, Stephen R., and David Landgrebe. 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21: 660–74. [Google Scholar] [CrossRef] [Green Version]
- Santomero, Anthony. M., and Joseph D. Vinso. 1977. Estimating the probability of failure for commercial banks and the banking system. Journal of Banking & Finance 1: 185–205. [Google Scholar] [CrossRef]
- Shannon, Claude E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27: 379–423. [Google Scholar] [CrossRef] [Green Version]
- Tam, Kar Yan. 1991. Neural network models and the prediction of bank bankruptcy. Omega 19: 429–45. [Google Scholar] [CrossRef]
- Teply, Petr, and Michal Polena. 2020. Best classification algorithms in peer-to-peer lending. North American Journal of Economics and Finance 51: 100904. [Google Scholar] [CrossRef]
- Tsai, Ming-Chun, Shu-Ping Lin, Ching-Chan Cheng, and Yen-Ping Lin. 2009. The consumer loan default predicting model. An application of DEA–DA and neural network. Expert Systems with Applications 36: 11682–90. [Google Scholar] [CrossRef]
- Vassalou, Maria, and Yuhang Xing. 2004. Default Risk in Equity Returns. The Journal of Finance 59: 831–68. [Google Scholar] [CrossRef]
- Wang, Maoguang, Jiayu Yu, and Zijian Ji. 2018. Personal credit risk assessment based on stacking ensemble model. Paper presented at the 10th International Conference on Intelligent Information Processing (IIP), Nanning, China, October 19–22. [Google Scholar]
- Wolpert, David H. 1992. Stacked generalization. Neural Networks 5: 241–59. [Google Scholar] [CrossRef]
- Xia, Yufei, Chuanzhe Liu, Bowen Da, and Fangming Xie. 2018. A novel heterogeneous ensemble credit scoring model based on stacking approach. Expert Systems with Applications 93: 182–99. [Google Scholar] [CrossRef]
- Yeh, I-Cheng. 2006. Default of Credit Card Clients Data Set. Department of Information Management, Chung Hua University, Taiwan. Available online: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients (accessed on 24 November 2020).
- Yeh, I-Cheng, and Che-hui Lien. 2009. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36: 2473–80. [Google Scholar] [CrossRef]
Descriptive | Lending Club Peer-to-Peer Loans Dataset | Taiwanese Credit Card Clients Dataset |
---|---|---|
Period | 2007–2013 | 2005 |
Original Sample Size | 226,151 | 30,000 |
Filtered Samples | 13,871 | 0 |
Filtering Criteria | Current loans, loans in grace period and loans with training missing features or abnormal values. | None |
Final Sample Size | 212,280 | 30,000 |
Non-Default Sample | 178,500 | 23,364 |
Default Sample | 33,780 (16%) | 6636 (20%) |
Classes | Default and Non-Default | Default and Non-Default |
Number of Original Features | 115 | 25 |
Number of Final Features | 22 | 24 |
Number of Base Models | Performance | Nearest Neighbours | Markov Model | Gradient Boosted Trees | Naive Bayes | Support Vector Machine | Decision Tree | Neural Network | Random Forest | Logistic Regression | Stacking Model |
---|---|---|---|---|---|---|---|---|---|---|---|
2 | MCC Mean | 0.05 */* | 0.00 */* | 0.15 | |||||||
MCC Std | 62% | 499% | 26% | ||||||||
Accuracy Mean | 0.83 | 0.84 | 0.79 | ||||||||
Extreme Bias | 0 | 96 | 0 | ||||||||
3 | MCC Mean | 0.00 */* | 0.14 */* | 0.07 */* | 0.19 | ||||||
MCC Std | 35% | 509% | 67% | 15% | |||||||
Accuracy Mean | 0.84 | 0.80 | 0.84 | 0.74 | |||||||
Extreme Bias | 96 | 0 | 4 | 0 | |||||||
4 | MCC Mean | 0.06 */* | 0.06 */* | 0.14 */* | 0.08 */* | 0.19 | |||||
MCC Std | 46% | 48% | 82% | 32% | 15% | ||||||
Accuracy Mean | 0.83 | 0.84 | 0.80 | 0.84 | 0.76 | ||||||
Extreme Bias | 0 | 17 | 0 | 0 | 0 | ||||||
5 | MCC Mean | 0.00 */* | 0.06 */* | 0.06 */* | 0.14 */* | 0.07 */* | 0.19 | ||||
MCC Std | 59% | 32% | 689% | 59% | 73% | 14% | |||||
Accuracy Mean | 0.84 | 0.83 | 0.84 | 0.80 | 0.84 | 0.77 | |||||
Extreme Bias | 95 | 0 | 15 | 0 | 1 | 0 | |||||
6 | MCC Mean | 0.07 */* | 0.14 */* | 0.00 */* | 0.08 */* | 0.06 */* | 0.07 */* | 0.17 | |||
MCC Std | 69% | 75% | 35% | 56% | 37% | 499% | 20% | ||||
Accuracy Mean | 0.84 | 0.81 | 0.84 | 0.76 | 0.84 | 0.84 | 0.80 | ||||
Extreme Bias | 12 | 0 | 81 | 0 | 4 | 1 | 0 | ||||
7 | MCC Mean | 0.00 */* | 0.05 */* | 0.06 */* | 0.00 */* | 0.07 */* | 0.06 */* | 0.06 */* | 0.15 | ||
MCC Std | 67% | 45% | 78% | 1316% | 436% | 85% | 72% | 23% | |||
Accuracy Mean | 0.84 | 0.83 | 0.84 | 0.84 | 0.77 | 0.84 | 0.84 | 0.80 | |||
Extreme Bias | 97 | 0 | 20 | 74 | 0 | 4 | 2 | 0 | |||
8 | MCC Mean | 0.00 */* | 0.06 */* | 0.06 */* | 0.13 */* | 0.07 */* | 0.09 */* | 0.06 */* | 0.07 */* | 0.18 | |
MCC Std | 55% | 842% | 57% | 74% | 46% | 48% | 32% | 86% | 17% | ||
Accuracy Mean | 0.84 | 0.83 | 0.84 | 0.81 | 0.76 | 0.83 | 0.84 | 0.84 | 0.79 | ||
Extreme Bias | 95 | 0 | 11 | 0 | 0 | 0 | 7 | 4 | 0 | ||
9 | MCC Mean | 0.00 */* | 0.06 */* | 0.07 */* | 0.14 */* | 0.01 */* | 0.07 */* | 0.09 */* | 0.06 */* | 0.07 */* | 0.19 |
MCC Std | 64% | 53% | 80% | 34% | 322% | 44% | 43% | 62% | 59% | 17% | |
Accuracy Mean | 0.84 | 0.83 | 0.84 | 0.80 | 0.84 | 0.77 | 0.84 | 0.84 | 0.84 | 0.79 | |
Extreme Bias | 94 | 0 | 13 | 0 | 75 | 0 | 0 | 1 | 0 | 0 |
Number of Base Models | Performance | Nearest Neighbours | Markov Model | Gradient Boosted Trees | Naive Bayes | Support Vector Machine | Decision Tree | Neural Network | Random Forest | Logistic Regression | Stacking Model |
---|---|---|---|---|---|---|---|---|---|---|---|
2 | MCC Mean | 0.11 */* | 0.31 | 0.32 | |||||||
MCC Std | 109% | 40% | 38% | ||||||||
Accuracy Mean | 0.78 | 0.80 | 0.77 | ||||||||
Extreme Bias | 45 | 6 | 5 | ||||||||
3 | MCC Mean | 0.08 */* | 0.33 | 0.30 */* | 0.34 | ||||||
MCC Std | 22% | 26% | 143% | 20% | |||||||
Accuracy Mean | 0.79 | 0.80 | 0.80 | 0.79 | |||||||
Extreme Bias | 54 | 0 | 0 | 0 | |||||||
4 | MCC Mean | 0.31 */* | 0.23 */* | 0.29 */* | 0.31 | 0.33 | |||||
MCC Std | 28% | 35% | 27% | 41% | 21% | ||||||
Accuracy Mean | 0.80 | 0.74 | 0.79 | 0.80 | 0.78 | ||||||
Extreme Bias | 0 | 0 | 0 | 8 | 0 | ||||||
5 | MCC Mean | 0.08 */* | 0.27 */* | 0.18 */* | 0.27 */* | 0.32 | 0.34 | ||||
MCC Std | 34% | 25% | 77% | 35% | 146% | 21% | |||||
Accuracy Mean | 0.79 | 0.76 | 0.79 | 0.78 | 0.80 | 0.78 | |||||
Extreme Bias | 50 | 0 | 25 | 0 | 5 | 0 | |||||
6 | MCC Mean | 0.09 */* | 0.31 */* | 0.21 */* | 0.22 */* | 0.28 */* | 0.31 */* | 0.35 | |||
MCC Std | 40% | 28% | 126% | 30% | 70% | 42% | 21% | ||||
Accuracy Mean | 0.78 | 0.80 | 0.79 | 0.74 | 0.80 | 0.80 | |||||
Extreme Bias | 48 | 0 | 19 | 0 | 0 | 8 | 0 | ||||
7 | MCC Mean | 0.12 */* | 0.03 */* | 0.28 */* | 0.24 */* | 0.24 */* | 0.30 */* | 0.31 | 0.32 | ||
MCC Std | 41% | 28% | 26% | 109% | 273% | 27% | 59% | 24% | |||
Accuracy Mean | 0.79 | 0.66 | 0.75 | 0.80 | 0.74 | 0.78 | 0.80 | 0.77 | |||
Extreme Bias | 44 | 1 | 0 | 16 | 0 | 0 | 0 | 0 | |||
8 | MCC Mean | 0.08 */* | 0.03 */* | 0.32 */* | 0.28 */* | 0.21 */* | 0.22 */* | 0.28 */* | 0.33 | 0.36 | |
MCC Std | 65% | 26% | 37% | 29% | 232% | 32% | 127% | 26% | 20% | ||
Accuracy Mean | 0.78 | 0.66 | 0.80 | 0.76 | 0.80 | 0.74 | 0.78 | 0.80 | |||
Extreme Bias | 51 | 0 | 0 | 0 | 20 | 0 | 0 | 4 | 0 | ||
9 | MCC Mean | 0.10 */* | 0.04 */* | 0.30 */* | 0.28 */* | 0.23 */* | 0.25 */* | 0.28 */* | 0.33 | 0.30 */* | 0.35 |
MCC Std | 124% | 171% | 38% | 23% | 57% | 36% | 30% | 33% | 27% | 19% | |
Accuracy Mean | 0.79 | 0.66 | 0.80 | 0.76 | 0.80 | 0.75 | 0.78 | 0.80 | 0.80 | 0.79 | |
Extreme Bias | 47 | 0 | 6 | 0 | 12 | 0 | 0 | 5 | 0 | 0 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tran, C.S.; Nicolau, D.; Nayak, R.; Verhoeven, P. Modeling Credit Risk: A Category Theory Perspective. J. Risk Financial Manag. 2021, 14, 298. https://doi.org/10.3390/jrfm14070298
Tran CS, Nicolau D, Nayak R, Verhoeven P. Modeling Credit Risk: A Category Theory Perspective. Journal of Risk and Financial Management. 2021; 14(7):298. https://doi.org/10.3390/jrfm14070298
Chicago/Turabian StyleTran, Cao Son, Dan Nicolau, Richi Nayak, and Peter Verhoeven. 2021. "Modeling Credit Risk: A Category Theory Perspective" Journal of Risk and Financial Management 14, no. 7: 298. https://doi.org/10.3390/jrfm14070298
APA StyleTran, C. S., Nicolau, D., Nayak, R., & Verhoeven, P. (2021). Modeling Credit Risk: A Category Theory Perspective. Journal of Risk and Financial Management, 14(7), 298. https://doi.org/10.3390/jrfm14070298