Risk Management in E-Commerce—A Fraud Study Case Using Acoustic Analysis through Its Complexity
Abstract
:1. Introduction
2. Material and Methods
2.1. Dataset
2.2. Acoustic Indexes
2.3. Classification Models
Performance Statistics
- MCC—Matthews Correlation Coefficient. Correlation measure of binary classifications, where 1 corresponds to a perfect classifier, 0 indicates a random prediction and −1 is a completely wrong classifier. True and false positives and negatives are considered in the calculation [15].
- ROC—Receiver Operating Characteristic curve. Graphical method to evaluate, organize and select diagnostics and predictions. Based on the probability of detection (rate of true positives, axes x) against false alarms (axes y) [16].
- AUC—Area Under the ROC curve. The proportion of fraction of the area of a square of one. Indicating the positive example is ordered first as a negative example [16].
- ACCURACY—The fraction of the right predictions from the adjusted model. Rate of true and false positive over the sum of true and false positive and negative.
- SENSIBILITY—The fraction of actual positives correctly identified according to the adjusted model. Rate of true positive over the sum of true positive plus false negative.
- SPECIFICITY—The fraction of actual negatives correctly identified according to the adjusted model. Rate of true negative over the sum of true negative plus false negative.
- PRECISION—Rate of true positive over the sum of true positive and false negative.
- NVP—Negative Predictive Values. Rate of true negative over the sum of true and false negative.
- KS—Kolmogorov-Smirnov measures performance. Measure of similarity between the cumulative empirical distributions between predicted and observed data.
3. Related Works
4. Results
5. Discussion
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Bhatla, T.P.; Prabhu, V.; Dua, A. Understanding Credit Card Frauds. In Cards Business Review #2003–01; Tata Group: Mumbai, Maharashtra, India, 2003. [Google Scholar]
- Jain1, N.; Khan, V. Credit Card Fraud Detection using Recurrent Attributes. Int. Adv. Res. J. Sci. Eng. Technol. 2018, 5. [Google Scholar] [CrossRef]
- Sueur, J. Sound Analysis and Synthesis with R; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Kolmogorov, A.N. Sur l’interpolation et extrapolation des suites stationnaires. CR Acad. Sci. 1939, 208, 2043–2045. [Google Scholar]
- Whittle, P. The analysis of multiple stationary time series. J. R. Stat. Soc. Ser. B Methodol. 1953, 15, 125–139. [Google Scholar] [CrossRef]
- Llanos, F.; Alexander, J.M.; Stilp, C.E.; Kluender, K.R. Power spectral entropy as an information-theoretic correlate of manner of articulation in American English. J. Acoust. Soc. Am. 2017, 141, EL127–EL133. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shannon, C.E. Communication theory of secrecy systems. Bell Syst. Tech. J. 1949, 28, 656–715. [Google Scholar] [CrossRef]
- Pieretti, N.; Farina, A.; Morri, D. A new methodology to infer the singing activity of an avian community: The Acoustic Complexity Index (ACI). Ecol. Indic. 2011, 11, 868–873. [Google Scholar] [CrossRef]
- Villanueva-Rivera, L.J.; Pijanowski, B.C.; Doucette, J.; Pekin, B. A primer of acoustic analysis for landscape ecologists. Landsc. Ecol. 2011, 26, 1233. [Google Scholar] [CrossRef]
- Depraetere, M.; Pavoine, S.; Jiguet, F.; Gasc, A.; Duvail, S.; Sueur, J. Monitoring animal diversity using acoustic indices: Implementation in a temperate woodland. Ecol. Indic. 2012, 13, 46–54. [Google Scholar] [CrossRef]
- Sueur, J.; Pavoine, S.; Hamerlynck, O.; Duvail, S. Rapid acoustic survey for biodiversity appraisal. PLoS ONE 2008, 3, e4065. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Series in Statistics; Springer: New York, NY, USA, 2001. [Google Scholar]
- Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Methodol. 1958, 20, 215–242. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Et Biophys. Acta (BBA)-Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
- Spackman, K.A. Signal detection theory: Valuable tools for evaluating inductive learning. In Proceedings of the Sixth International Workshop on Machine Learning; Elsevier: Amsterdam, The Netherlands, 1989; pp. 160–163. [Google Scholar]
- Hand, D.J.; Henley, W.E. Statistical classification methods in consumer credit scoring: A review. J. R. Stat. Soc. Ser. A Stat. Soc. 1997, 160, 523–541. [Google Scholar] [CrossRef]
- Chan, P.K.; Fan, W.; Prodromidis, A.L.; Stolfo, S.J. Distributed data mining in credit card fraud detection. IEEE Intell. Syst. Their Appl. 1999, 14, 67–74. [Google Scholar] [CrossRef]
- Louzada, F.; Ara, A.; Fernandes, G.B. Classification methods applied to credit scoring: Systematic review and overall comparison. Surv. Oper. Res. Manag. Sci. 2016, 21, 117–134. [Google Scholar] [CrossRef] [Green Version]
- Bolton, R.J.; Hand, D.J. Statistical fraud detection: A review. Stat. Sci. 2002, 17, 235–249. [Google Scholar]
- Jiang, H.; Hirose, K.; Huo, Q. Robust speech recognition based on a Bayesian prediction approach. IEEE Trans. Speech Audio Process. 1999, 7, 426–440. [Google Scholar] [CrossRef]
- Xu, M.; Xu, C.; Duan, L.; Jin, J.S.; Luo, S. Audio keywords generation for sports video analysis. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2008, 4, 11. [Google Scholar] [CrossRef]
- Ellis, D.P.; Zeng, X.; McDermott, J.H. Classifying soundtracks with audio texture features. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 5880–5883. [Google Scholar]
- Burgoon, J.K.; Mayew, W.J.; Giboney, J.S.; Elkins, A.; Moffitt, K.; Spitzley, L.; Byrd, M.; Dorn, B.; Williams, J. Applying Linguistic and Vocalic Analysis to Company Conference Calls to Detect Fraud-Related Statements. In Report of the HICSS-47 Rapid Screening Technologies, Deception Detection and Credibility; HICSS: Kauai, HI, USA, 2014; pp. 36–49. [Google Scholar]
- Subramaniam, L.V.; Faruquie, T.A.; Ikbal, S.; Godbole, S.; Mohania, M.K. Business intelligence from voice of customer. In Proceedings of the IEEE International Conference on Data Engineering, Shanghai, China, 29 March–2 April 2009; pp. 1391–1402. [Google Scholar]
- Strobl, C.; Boulesteix, A.L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef] [PubMed]
- Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Methodol. 1974, 36, 111–133. [Google Scholar] [CrossRef]
- Sen, P.K.; Singer, J.M. Large Sample Methods in Statistics (1994): An Introduction with Applications; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
Feature | Description | Type |
---|---|---|
orderid | id of the order | integer |
ordered on | date when the order was placed | date |
is fraud | the target variable | boolean |
is recurrent | customer has ordered before | boolean |
gender | gender of the customer | char |
email domain | e-mail domain that customer registered | char |
state | state of the shipping address | char |
city | city of the shipping address | char |
neighborhood | neighborhood of the shipping address | char |
category | product category | char |
installments | number of installments of the order | integer |
price | price of the order | real |
Feature | Description | Type |
---|---|---|
call id | id of the call | integer |
called on | date of the call | date |
call duration second | duration of the call in seconds | real |
is fraud | the target variable | boolean |
order id | id of the order | integer |
audio id | id of the audio file related to the call | integer |
index acoustic complexity | acoustic complexity index | real |
index acoustic diversity right | acoustic diversity index—right channel | real |
index acoustic r.ar | acoustic richness index—right channel | real |
index acoustic r.ht | acoustic richness index (entropy)—right channel | real |
index acoustic l.ht | acoustic richness index (entropy)—left channel | real |
index acoustic r.m | acoustic richness index (amplitude)—right channel | real |
index acoustic l.m | acoustic richness index (amplitude)—left channel | real |
index temporal entropy | temporal entropy index | real |
index acoustic entropy | acoustic entropy index | real |
Index | Affix | Summary | Reference |
---|---|---|---|
Acoustic Complexity Index | ACI | Average absolute amplitude difference between adjacent cells of the STDFT matrix in each frequency bin | Pieretti et al. [8] |
Acoustic diversity index | ADI | Shannon entropy on the spectral content | Villanueva-Rivera et al. [9] |
Acoustic richness index | AR | Ranks of the indices M and Ht obtained for a set of n files | Depraetere et al. [10] |
Temporal entropy | HT | Shannon evenness of the amplitude envelope | Sueur et al. [11] |
Amplitude index | M | Amplitude index that computes the median of the amplitude envelope | Depraetere et al. [10] |
Acoustic Entropy | H | Multiplication of the Shannon evenness (Time vs. Frequency domain) | Sueur et al. [11] |
Acoustic evenness index | G | ADI variation instead uses the Gini coefficient as measure of distribution inequality | Villanueva-Rivera et al. [9] |
Metrics | Logistic Regression | Red. Logistic Regre | Random Forest | Red. Random Forest |
---|---|---|---|---|
# Variable | 41 | 37 | 41 | 36 |
MCC | 0.2607 | 0.2531 | 0.4808 | 0.4703 |
AUC | 0.7945 | 0.7978 | 0.8046 | 0.7983 |
Accuracy | 0.7556 | 0.7575 | 0.7579 | 0.7525 |
Sensibility | 0.6800 | 0.6835 | 0.6951 | 0.7123 |
Specificity | 0.8281 | 0.8288 | 0.8200 | 0.7889 |
Precision | 0.8058 | 0.8069 | 0.8042 | 0.7805 |
NVP | 0.7229 | 0.7251 | 0.7277 | 0.7379 |
KS | 0.4882 | 0.4906 | 0.5000 | 0.4858 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nascimento, D.C.; Barbosa, B.; Perez, A.M.; Caires, D.O.; Hirama, E.; Ramos, P.L.; Louzada, F. Risk Management in E-Commerce—A Fraud Study Case Using Acoustic Analysis through Its Complexity. Entropy 2019, 21, 1087. https://doi.org/10.3390/e21111087
Nascimento DC, Barbosa B, Perez AM, Caires DO, Hirama E, Ramos PL, Louzada F. Risk Management in E-Commerce—A Fraud Study Case Using Acoustic Analysis through Its Complexity. Entropy. 2019; 21(11):1087. https://doi.org/10.3390/e21111087
Chicago/Turabian StyleNascimento, Diego C., Bruno Barbosa, André M. Perez, Daniel O. Caires, Edgar Hirama, Pedro L. Ramos, and Francisco Louzada. 2019. "Risk Management in E-Commerce—A Fraud Study Case Using Acoustic Analysis through Its Complexity" Entropy 21, no. 11: 1087. https://doi.org/10.3390/e21111087
APA StyleNascimento, D. C., Barbosa, B., Perez, A. M., Caires, D. O., Hirama, E., Ramos, P. L., & Louzada, F. (2019). Risk Management in E-Commerce—A Fraud Study Case Using Acoustic Analysis through Its Complexity. Entropy, 21(11), 1087. https://doi.org/10.3390/e21111087