Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques
Abstract
:1. Introduction
1.1. Contribution
- Evaluate the performance of text mining techniques, such as latent dirichlet allocation (LDA), non-negative matrix factorization (NMF), and latent semantic analysis (LSA) in the fraud-related data set. The goal is to select the technique that provides an integral representation of the analyzed documents through clusters, i.e., topic, as separated.
- Once we select the appropriate topic analysis technique, we use the documents’ probabilities on the assigned topic to determine if a text can be identified as being fraud related, using supervised machine learning models. For this purpose, we conduct experiments on seven classification methods, including logistic regression (LR), random forest (RF), gradient boosting (GB), Gaussian naive Bayes (GNB), decision tree (DT), k-nearest neighbor (kN), and support vector machines (SVM), using the synthetically generated data set.
- Furthermore, we perform the same experiment using deep learning techniques, such as convolutional neural network (CNN), dense neural network (DNN), and long short-term memory (LSTM), to determine the performance’s differences using receiver operating characteristic (ROC) curves based on the area under the curve (AUC) with the traditional ML classification methods. The goal is to show which technique is more compatible to work with topic modeling to detect suspicious behavior of fraud.
1.2. Related Work
2. Materials and Methods
2.1. Fraud Triangle Theory (FTT)
2.2. Topic Modeling (TM)
2.2.1. Latent Semantic Analysis (LSA)
2.2.2. Non-Negative Matrix Factorization (NMF)
2.2.3. Latent Dirichlet Allocation (LDA)
2.3. Classification Methods
2.3.1. Logistic Regression (LR)
2.3.2. k-Nearest Neighbor (kN)
2.3.3. Decision Tree (DT)
2.3.4. Random Forest (RF)
2.3.5. Gaussian Naïve Bayes (GNB)
2.3.6. Gradient Boosting Decision Tree (GBDT)
2.3.7. Support Vector Machines (SVM)
2.4. Neural Networks
2.4.1. Deep Learning (DL)
Dense Neural Networks
Convolutional Neural Networks
Long Short-Term Memory
3. Methodology for Predicting Fraud based on the Fraud Triangle Components
3.1. Data Set Generation
3.2. Data Preprocessing
3.2.1. Tokenization
3.2.2. Homogenization
- Change all tokens to lower case. This is implemented by the Python lower function.
- Remove non-alphanumeric items. To identify non-alphanumeric characters, we use the Python isalnum function.
- Obtain the word lexeme (lemmatization). Lemmatization turns a words into their lemma/lexeme form (for example, “runs”, “running” and “ran” are all forms of the word run, and therefore “run” is the lemma of all these words). When obtaining lexemes, word sets are uniquely represented. In this way, the semantic meaning of the words is associated with the same lexeme. For this, we use the lemmatize [69] function of WordNetLemmatizer from NLTK.
3.2.3. Cleaning
3.2.4. Vectorization
3.3. Quantitative Evaluation of Topic Modeling Algorithms
3.4. Selection of the Topic Modeling Algorithm
3.5. Methodology of Evaluation
4. Results and Discussion
4.1. Probability Distribution Generation
4.1.1. Optimal Number of Topics
4.1.2. Application of LDA Model
Algorithm 1 Algorithm to find the value of k that maximizes the coherence of an LDA model by testing different values of hyperparameters. |
Require: Function that compute coherence values Input: min_topics = 4, max_topics = 10, step_size = 1 Output: format file containing results
|
4.2. Detection of Phrases Related to Fraud
4.2.1. Classification Algorithms
4.2.2. Comparison of Classification Models
- We preprocessed the information by dominant topic, importing the LDA data, and labeling the documents, to later be transformed into CSV format.
- Training was carried out after selecting a portion of data for testing (20%) and another for training (80%). The data set was divided into four subsets, where the first was used to train the algorithm with the corresponding attributes; the second was used to test the attributes. The third is made up of the labels related to the training set, and the fourth contains the labels corresponding to the test set.
- Finally, we evaluated and compared different classifiers (linear and no-linear algorithms vs. neural networks).
4.2.3. Classifier Performance
4.2.4. Deep Learning
4.2.5. Comparative Analysis
5. Conclusions
Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sanchez, M.; Torres, J.; Zambrano, P.; Flores, P. FraudFind: Financial fraud detection by analyzing human behavior. In Proceedings of the 2018 IEEE 8th Annual Computing And Communication Workshop And Conference (CCWC), Las Vegas, NV, USA, 8–10 January 2018. [Google Scholar] [CrossRef]
- PwC. (This Link Contains Information about FRAUD). Available online: https://www.pwc.com (accessed on 8 September 2021).
- Abdullahi, R.; Mansor, N. Fraud Triangle Theory and Fraud Diamond Theory. Understanding the Convergent and Divergent for Future Research. Int. J. Acad. Res. Account. Financ. Manag. Sci. 2015, 5, 10. [Google Scholar] [CrossRef] [Green Version]
- Ravisankar, P.; Ravi, V.; Rao, G.; Bose, I. Detection of financial statement fraud and feature selection using data mining techniques. Decis. Support Syst. 2011, 50, 491–500. [Google Scholar] [CrossRef]
- Guan, J.; Li, R.; Yu, S.; Zhang, X. A Method for Generating Synthetic Electronic Medical Record Text. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 173–182. [Google Scholar] [CrossRef] [PubMed]
- Talib, R.; Kashif, M.; Ayesha, S.; Fatima, F. Text Mining: Techniques, Applications and Issues. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 414–418. Available online: https://thesai.org (accessed on 8 September 2021). [CrossRef]
- Kozbagarov, O.; Mussabayev, R.; Mladenovic, N. A New Sentence-Based Interpretative Topic Modeling and Automatic Topic Labeling. Symmetry 2021, 13, 837. [Google Scholar]
- Hoyer, S.; Zakhariya, H.; Sandner, T.; Breitner, M. Fraud Prediction and the Human Factor: An Approach to Include Human Behavior in an Automated Fraud Audit. In Proceedings of the 2012 45th Hawaii International Conference On System Sciences, Maui, HI, USA, 4–7 January 2012. [Google Scholar] [CrossRef]
- Holton, C. Identifying disgruntled employee systems fraud risk through text mining: A simple solution for a multi-billion dollar problem. Decis. Support Syst. 2009, 46, 853–864. [Google Scholar] [CrossRef]
- Jans, M.; Lybaert, N.; Vanhoof, K. Internal fraud risk reduction: Results of a data mining case study. Int. J. Account. Inf. Syst. 2010, 11, 17–41. [Google Scholar] [CrossRef] [Green Version]
- Jans, M.; Lybaert, N.; Vanhoof, K. A framework for internal fraud risk reduction at it integrating business processes. Int. J. Digit. Account. Res. 2009, 9, 1–29. [Google Scholar] [CrossRef]
- Kumar, V.; Sriganga, B. A review on data mining techniques to detect insider fraud in banks. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2014, 4, 370–380. [Google Scholar]
- Panigrahi, P. A Framework for Discovering Internal Financial Fraud Using Analytics. In Proceedings of the 2011 International Conference On Communication Systems And Network Technologies, Katra, India, 3–5 June 2011. [Google Scholar] [CrossRef]
- Jayabrabu, R.; Saravanan, V.; Tamilselvi, J. A framework for fraud detection system in automated data mining using intelligent agent for better decision making process. In Proceedings of the 2014 International Conference On Green Computing Communication And Electrical Engineering (ICGCCEE), Coimbatore, India, 6–8 March 2014. [Google Scholar] [CrossRef]
- Yue, D.; Wu, X.; Wang, Y.; Li, Y.; Chu, C. A Review of Data Mining-Based Financial Fraud Detection Research. In Proceedings of the 2007 International Conference On Wireless Communications, Networking And Mobile Computing, Shanghai, China, 21–25 September 2007. [Google Scholar] [CrossRef]
- Phua, C.; Lee, V.; Smith, K.; Gayler, R. A comprehensive survey of data mining-based fraud detection research. arXiv 2010, arXiv:1009.6119. [Google Scholar] [CrossRef] [Green Version]
- Wang, S. A Comprehensive Survey of Data Mining-Based Accounting-Fraud Detection Research. In Proceedings of the 2010 International Conference on Intelligent Computation Technology and Automation, Changsha, China, 11–12 May 2010. [Google Scholar] [CrossRef]
- Al-Jumeily, D.; Hussain, A.; MacDermott, A.; Tawfik, H.; Seeckts, G.; Lunn, J. The Development of Fraud Detection Systems for Detection of Potentially Fraudulent Applications. In Proceedings of the International Conference on Developments of E-Systems Engineering (DeSE), Dubai, United Arab Emirates, 13–14 December 2015. [Google Scholar] [CrossRef]
- Lopez-Rojas, E.; Axelsson, S. Social Simulation of Commercial and Financial Behaviour for Fraud Detection Research. In Proceedings of the 10th Social Simulation Conference, Barcelona, Spain, 1–5 September 2014. [Google Scholar] [CrossRef]
- Lopez-Rojas, E.; Gorton, D.; Axelsson, S. Using the RetSim Simulator for Fraud Detection Research. Int. J. Simul. Process Model. 2015, 10, 144–155. [Google Scholar] [CrossRef]
- Lopez-Rojas, E.; Axelsson, S. A review of computer simulation for fraud detection research in financial datasets. In Proceedings of the 2016 Future Technologies Conference (FTC), San Francisco, CA, USA, 6–7 December 2016. [Google Scholar] [CrossRef]
- Cappelli, D.; Moore, A.; Trzeciak, R.; Shimeall, T. Common Sense Guide to Prevention and Detection of Insider Threats; CERT, Software Engineering Institute, Carnegie Mellon University: Pittsburgh, PA, USA, 2009; Available online: https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=50275 (accessed on 23 January 2022).
- ACFE. (ACFE—Association of Certified Fraud Examiners). Available online: https://www.acfe.com/rttn-introduction.aspx (accessed on 8 September 2021).
- Mui, G.; Mailley, J. A tale of two triangles: Comparing the Fraud Triangle with criminology’s Crime Triangle. Account. Res. J. 2015, 28, 45–58. [Google Scholar] [CrossRef]
- Vu, H.; Li, G.; Law, R. Discovering implicit activity preferences in travel itineraries by topic modeling. Tour. Manag. 2019, 75, 435–446. [Google Scholar] [CrossRef]
- Daume, S.; Albert, M.; Gadow, K. Assessing Citizen Science Opportunities in Forest Monitoring Using Probabilistic Topic Modelling. For. Ecosyst. 2014, 1, 11. Available online: https://forestecosyst.springeropen.com (accessed on 8 September 2021). [CrossRef]
- Tunazzina Islam Yoga-Veganism: Correlation Mining of Twitter Health Data. arXiv 2019, arXiv:1906.07668.
- Tresnasari, N.; Adji, T.; Permanasari, A. Social-Child-Case Document Clustering based on Topic Modeling using Latent Dirichlet Allocation. IJCCS Indonesian J. Comput. Cybern. Syst. 2020, 14, 179. [Google Scholar] [CrossRef]
- Schneider, P. App Ecosystem Out of Balance: An Empirical Analysis of Update Interdependence between Operating System and Application Software. Master’s Thesis, Technical University of Munich, Garching, Germany, 2020. [Google Scholar]
- Wu, Y.; Ding, Y.; Wang, X.; Xu, J. A comparative study of topic models for topic clustering of Chinese web news. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, 9–11 July 2010. [Google Scholar] [CrossRef]
- Alghamdi, R.; Alfalqi, K. A Survey of Topic Modeling in Text Mining. Int. J. Adv. Comput. Sci. Appl. 2015, 6. Available online: https://thesai.org (accessed on 8 September 2021). [CrossRef]
- O’Callaghan, D.; Greene, D.; Carthy, J.; Cunningham, P. An analysis of the coherence of descriptors in topic modeling. Expert Syst. Appl. 2015, 42, 5645–5657. [Google Scholar] [CrossRef] [Green Version]
- Kuang, D.; Brantingham, P.; Bertozzi, A. Crime Topic Modeling. Crime Sci. 2017, 6, 12. Available online: https://crimesciencejournal.biomedcentral.com (accessed on 8 September 2021). [CrossRef]
- Hidayatullah, A.F.; Aditya, S.K.; Gardini, S.T. Topic modeling of weather and climate condition on twitter using latent dirichlet allocation (LDA). IOP Conf. Ser. Mater. Sci. Eng. 2019, 482, 012033. [Google Scholar] [CrossRef]
- Jacobi, C.; Atteveldt, W.; Welbers, K. Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit. J. 2015, 4, 89–106. [Google Scholar] [CrossRef]
- Blei, D. Probabilistic topic models. Commun. ACM 2012, 55, 77. [Google Scholar] [CrossRef] [Green Version]
- Cosovic, M.; Amelio, A.; Junuz, E. Classification Methods in Cultural Heritage. In Proceedings of the Visual Pattern Extraction and Recognition for Cultural Heritage Understanding (VIPERC2019), Pisa, Italy, 30 January 2019; Available online: http://ceur-ws.org (accessed on 8 September 2021).
- EntezariMaleki, R.; Rezaei, A.; MinaeiBidgoli, B. Comparison of Classification Methods Based on the Type of Attributes and Sample Size. J. Converg. Inf. Technol. 2009, 4, 94–102. [Google Scholar] [CrossRef] [Green Version]
- Fawcett, T. Introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Novakovic, J.; Veljovic, A.; Ilic, S.; Papic, M. Experimental study of using the k-nearest neighbour classifier with filter methods. In Proceedings of the Computer Science and Technology, Varna, Bulgaria, 30 April 2016. [Google Scholar]
- Zhang, Z. Introduction to machine learning: K-nearest neighbors. Ann. Transl. Med. 2016, 4, 218. [Google Scholar] [CrossRef] [Green Version]
- Basha, S.; Rajput, D. Chapter 9—Survey on Evaluating the Performance of Machine Learning Algorithms: Past Contributions and Future Roadmap. Deep. Learn. Parallel Comput. Environ. Bioeng. Syst. 2019, 153–164. [Google Scholar] [CrossRef]
- Mashat, A.; Fouad, M.; Yu, P.; Gharib, T. A Decision Tree Classification Model for University Admission System. J. Adv. Comput. Sci. Appl. 2012, 3. Available online: https://thesai.org (accessed on 8 September 2021). [CrossRef] [Green Version]
- Oshiro, T.; Perez, P.; Baranauskas, J. How Many Trees in a Random Forest. In International Workshop on Machine Learning and Data Mining in Pattern Recognition; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7376, Available online: https://www.researchgate.net (accessed on 8 September 2021). [CrossRef]
- Ali, J.; Khan, R.; Ahmad, N.; Maqsood, I. Random Forests and Decision Trees. Int. J. Comput. Sci. Issues 2012, 9, 272. Available online: http://ijcsi.org/papers/IJCSI-9-5-3-272-278.pdf (accessed on 23 January 2022).
- Kamel, H.; Abdulah, D.; Al-Tuwaijari, J. Cancer Classification Using Gaussian Naive Bayes Algorithm. Int. J. Intell. Eng. Syst. 2019, 14, 134–146. [Google Scholar] [CrossRef]
- Yang, T.; Chen, W.; Cao, G. Automated classification of neonatal amplitude-integrated EEG based on gradient boosting method. Biomed. Signal Process. Control. 2016, 28, 50–57. [Google Scholar] [CrossRef]
- Ding, C.; Cao, X.; Næss, P. Applying gradient boosting decision trees to examine non-linear effects of the built environment on driving distance in Oslo. Transp. Res. Part Policy Pract. 2018, 110, 107–117. [Google Scholar] [CrossRef]
- Cervantes, J.; García-Lamont, F.; Rodríguez, L.; Lopez-Chau, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
- Amatriain, X.; Pujol, J. Data Mining Methods for Recommender Systems. In Recommender Systems Handbook; Springer: Boston, MA, USA, 2015; pp. 227–262. [Google Scholar] [CrossRef]
- Liang, J.; Xue, L.; Lin, X.; Shen, X. Verifiable and Secure SVM Classification for Cloud-Based Health Monitoring Services. IEEE Internet Things J. 2021, 8, 17029–17042. [Google Scholar] [CrossRef]
- Zhang, Z. A gentle introduction to artificial neural networks. Ann. Transl. Med. 2016, 4, 370. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nhu, V.; Hoang, N.; Nguyen, H.; Thao, N.; Bui, T.; Hoa, P.; Samui, P.; Bui, D. Effectiveness assessment of Keras based deep learning with different robust optimization algorithms for shallow landslide susceptibility mapping at tropical area. Catena 2020, 188. [Google Scholar] [CrossRef]
- Benuwa, B.; Zhan, Y.; Ghansah, B.; Wornyo, D.; Banaseka, F. A Review of Deep Machine Learning. Int. J. Eng. Res. Afr. 2016, 24, 124–136. [Google Scholar] [CrossRef]
- Volz, B.; Behrendt, K.; Mielenz, H.; Gilitschenski, I.; Siegwart, R.; Nieto, J. A data-driven approach for pedestrian intention estimation. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016. [Google Scholar] [CrossRef]
- Nazari, F.; Yan, W. Convolutional versus Dense Neural Networks: Comparing the Two Neural Networks Performance in Predicting Building Operational Energy Use Based on the Building Shape. arXiv 2018, arXiv:2108.12929. [Google Scholar]
- Yamashita, R.; Nishio, M.; Do, R.; Togashi, K. Convolutional Neural Networks: An Overview and Application in Radiology. Insights Into Imaging 2018, 9, 611–629. Available online: https://insightsimaging.springeropen.com (accessed on 8 September 2021). [CrossRef] [Green Version]
- Islam, M.; Islam, M.; Asraf, A. A Combined Deep CNN-LSTM Network for the Detection of Novel Coronavirus (COVID-19) Using X-ray Images. Informatics Med. Unlocked 2020, 20, 100412. [Google Scholar] [CrossRef]
- Li, W.; Tao, W.; Qiu, J.; Liu, X.; Zhou, X.; Pan, Z. Densely Connected Convolutional Networks With Attention LSTM for Crowd Flows Prediction. IEEE Access 2019, 7, 140488–140498. [Google Scholar] [CrossRef]
- Ozyirmidokuz, E. Mining Unstructured Turkish Economy News Articles. Procedia Econ. Financ. 2014, 16, 320–328. [Google Scholar] [CrossRef] [Green Version]
- dos Santos Brito, Y.P.; dos Santos, C.G.R.; de Paula Mendonça, S.; Aráujo, T.D.; de Freitas, A.A.; Meiguins, B.S. A Prototype Application to Generate Synthetic Datasets for Information Visualization Evaluations. In Proceedings of the 2018 22nd International Conference Information Visualisation (IV), Fisciano, Italy, 10–13 July 2018. [Google Scholar] [CrossRef]
- Redpath, R.; Srinivasan, B. Criteria for a Comparative Study of Visualization Techniques in Data Mining. In Intelligent Systems Design and Applications; Springer: Berlin/Heidelberg, Germany, 2003; Volume 23, pp. 609–620. [Google Scholar] [CrossRef]
- Audinet. (Using Key Word Analysis of an Organization’s Big Data For Error and Fraud Detection). Available online: https://www.auditnet.org/key-word-analytics (accessed on 8 September 2021).
- Randomwordgenerator. (Random Word Generator). Available online: https://www.randomwordgenerator.org (accessed on 8 September 2021).
- Reverso. (Reverso Context). Available online: https:/https://context.reverso.net/traduccion/ingles-espanol (accessed on 8 September 2021).
- Sentencedict. (Sentence Dict). Available online: https://sentencedict.com/ (accessed on 8 September 2021).
- Kastrati, Z.; Kurti, A.; Imran, A. WET: Word Embedding-Topic Distribution Vectors for MOOC Video Lectures Dataset. Data Brief. 2020, 28, 105090. Available online: http://www.sciencedirect.com (accessed on 23 January 2022). [CrossRef] [PubMed]
- Maldonado, M.; Alulema, D.; Morocho, D.; Proano, M. System for monitoring natural disasters using natural language processing in the social network Twitter. In Proceedings of the 2016 IEEE International Carnahan Conference on Security Technology (ICCST), Orlando, FL, USA, 24–27 October 2016. [Google Scholar] [CrossRef]
- Maier, D.; Waldherr, A.; Miltner, P.; Wiedemann, G.; Niekler, A.; Keinert, A.; Pfetsch, B.; Heyer, G.; Reber, U.; Häussler, T.; et al. Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Commun. Methods Meas. 2018, 12, 93–118. [Google Scholar] [CrossRef]
- Schofield, A.; Magnusson, M.; Mimno, D. Pulling Out the Stops: Rethinking Stopword Removal for Topic Models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers; Association for Computational Linguistics: Valencia, Spain, 2017. [Google Scholar] [CrossRef] [Green Version]
- Rehurek, R.; Sojka, P. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, 22 May 2010. [Google Scholar] [CrossRef]
- Kherwa, P.; Bansal, P. Topic Modeling: A Comprehensive Review. EAI Endorsed Trans. Scalable Inf. Syst. 2020, 7, e2. Available online: https://eudl.eu (accessed on 8 September 2021). [CrossRef] [Green Version]
- Albalawi, R.; Yeap, T.; Benyoucef, M. Using topic modeling methods for short-text data: A comparative analysis. Front. Artif. Intell. 2020, 3, 42. [Google Scholar] [CrossRef] [PubMed]
- George, S. Comparison of LDA and NMF Topic Modeling Techniques for Restaurant Reviews. Indian J. Nat. Sci. 2020, 10. Available online: https://www.researchgate.net (accessed on 8 September 2021).
- Mifrah, S.; Benlahmar, E. Topic modeling coherence: A comparative study between LDA and NMF models using COVID-19 corpus. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 5756–5761. [Google Scholar] [CrossRef]
- Merino, S.; Atzmueller, M. Multimodal Behavioral Mobility Pattern Mining and Analysis Using Topic Modeling on GPS Data. Behav. Anal. Soc. Ubiquitous Environ. 2019, 11406, 68–88. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, J.; Wu, M. Finding Users’ Voice on Social Media: An Investigation of Online Support Groups for Autism-Affected Users on Facebook. Int. J. Environ. Res. Public Health 2019, 16, 4804. [Google Scholar] [CrossRef] [Green Version]
- Jain, N. Data mining techniques: A survey paper. Int. J. Res. Eng. Technol. 2013, 2, 116–119. [Google Scholar] [CrossRef]
- AUC. Available online: https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-au (accessed on 15 July 2021).
- Straube, S.; Krell, M. How to Evaluate an Agent’s Behavior to Infrequent Events?—Reliable Performance Estimation Insensitive to Class Distribution. Front. Comput. Neurosci. 2014, 8, 43. Available online: https://www.frontiersin.org/article/10.3389/fncom.2014.00043 (accessed on 23 January 2022). [CrossRef] [PubMed] [Green Version]
Models | |||
---|---|---|---|
LSA | NMF | LDA | |
Coherence Values | 0.4735 | 0.9143 | 0.6164 |
LSA | |||
---|---|---|---|
T1 | T2 | T3 | T4 |
problem | debt | be | job |
economic | public | scare | lose |
debt | problem | job | be |
social | economic | lose | scare |
political | country | go | ill |
face | private | know | would |
solve | service | get | scared |
country | include | care | want |
serious | reduction | think | work |
issue | stock | people | earning |
people | total | deserve | get |
NMF | |||||||
---|---|---|---|---|---|---|---|
T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
debt | economic | tom | system | scared | review | job | easily |
public | problem | mary | failure | people | period | lose | accessible |
external | problem | big | error | know | currently | get | hotel |
countries | social | think | file | got | keep | want | public |
sustainability | political | want | data | really | kept | temporary | transport |
private | issue | know | case | something | matter | steal | information |
restructuring | serious | going | power | think | committee | work | car |
total | countries | told | due | away | earnings | deserve | bus |
reduction | people | help | event | look | countries | going | foot |
management | country | thought | computer | get | board | need | city |
LDA | ||||||||
---|---|---|---|---|---|---|---|---|
T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 |
steal | review | poor | want | people | big | make | economic | problem |
later | think | child | deadline | know | use | care | weakness | debt |
support | time | need | failure | evacuation | exploitation | job | ill | fair |
say | fix | inadequate | year | deserve | right | work | life | abuse |
just | help | insufficient | temporary | unethical | labor | compensation | leave | easily |
tell | come | country | day | issue | family | lose | feel | accessible |
woman | look | supervision | man | cause | friend | good | face | case |
live | scare | really | old | situation | different | earning | thing | car |
currently | like | money | ask | away | girl | way | great | information |
period | world | school | change | abuse | hope | new | social | food |
Topics | |||
---|---|---|---|
T1 | T2 | T3 | T4 |
review | debt | problem | want |
care | think | economic | know |
poor | later | make | job |
steal | fix | big | work |
temporary | just | people | lose |
say | tell | abuse | support |
new | inadequate | fair | deadline |
man | look | compensation | help |
really | failure | child | come |
insufficient | weakness | good | time |
state | ill | earning | exploitation |
money | unethical | easily | deserve |
issue | life | accessible | scare |
evacuation | world | country | right |
leave | try | need | like |
woman | let | way | day |
year | talk | pay | use |
long | old | school | scared |
change | feel | home | ask |
period | place | thing | car |
Classification Method’s | Predictive Accuracy | Mean | |||
---|---|---|---|---|---|
T1 | T2 | T3 | T4 | ||
Logistic Regression: AUC | 0.83 | 0.64 | 0.68 | 0.65 | 0.70 |
Random Forest: AUC | 0.88 | 0.77 | 0.80 | 0.79 | 0.81 |
GNB: AUC | 0.86 | 0.70 | 0.74 | 0.73 | 0.76 |
Gradient Boosting: AUC | 0.89 | 0.77 | 0.79 | 0.79 | 0.81 |
k-NN: AUC | 0.86 | 0.72 | 0.76 | 0.74 | 0.77 |
Decision Tree: AUC | 0.80 | 0.71 | 0.73 | 0.75 | 0.74 |
SVM: AUC | 0.86 | 0.70 | 0.75 | 0.74 | 0.76 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sánchez-Aguayo, M.; Urquiza-Aguiar, L.; Estrada-Jiménez, J. Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques. Appl. Sci. 2022, 12, 3382. https://doi.org/10.3390/app12073382
Sánchez-Aguayo M, Urquiza-Aguiar L, Estrada-Jiménez J. Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques. Applied Sciences. 2022; 12(7):3382. https://doi.org/10.3390/app12073382
Chicago/Turabian StyleSánchez-Aguayo, Marco, Luis Urquiza-Aguiar, and José Estrada-Jiménez. 2022. "Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques" Applied Sciences 12, no. 7: 3382. https://doi.org/10.3390/app12073382
APA StyleSánchez-Aguayo, M., Urquiza-Aguiar, L., & Estrada-Jiménez, J. (2022). Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques. Applied Sciences, 12(7), 3382. https://doi.org/10.3390/app12073382