Understanding Online Purchases with Explainable Machine Learning
Abstract
:1. Introduction
2. Data
- the total number of entrances from each source/medium pair;
- the total number of pageviews for sessions originating from each source/medium pair;
- the average time on pages for sessions originating from each source/medium pair;
- the total number of distinct source/medium pairs used by the customer.
- the total number of pageviews in the analyzed period;
- the total number of times the customer started a browsing session on the page;
- the average time spent on the page.
3. Selection of Customer Features
4. Models
4.1. Logistic Regression
4.2. Extreme Gradient Boosting Machine
IF Avgpageviews_day > 15 AND Distinct_days > 5 THEN customer converts.
4.3. Evaluating Model Performance
4.4. Gain and Lift Analysis
5. The Determinants of Conversions
5.1. Permutation Feature Importance
5.2. Shapley Additive Explanations (SHAP)
5.3. Accumulated Local Effect Plots
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- McDowell, W.C.; Wilson, R.C.; Kile, C.O., Jr. An examination of retail website design and conversion rate. J. Bus. Res. 2016, 69, 4837–4842. [Google Scholar] [CrossRef]
- Moe, W.W. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. J. Consum. Psychol. 2003, 13, 29–39. [Google Scholar] [CrossRef]
- Moe, W.W.; Fader, P.S. Dynamic conversion behavior at e-commerce sites. Manag. Sci. 2004, 50, 326–335. [Google Scholar] [CrossRef]
- Sismeiro, C.; Bucklin, R.E. Modeling purchase behavior at an e-commerce web site: A task completion approach. J. Mark. 2004, 41, 306–323. [Google Scholar] [CrossRef]
- Van den Poel, D.; Buckinx, W. Prediction online-purchasing behavior. Eur. J. Oper. Res. 2005, 166, 557–575. [Google Scholar] [CrossRef]
- Olbrich, R.; Holsing, C. Modeling consumer purchasing behavior in social shopping communities with clickstream data. Int. J. Electron. Commer. 2011, 16, 15–40. [Google Scholar] [CrossRef]
- Lo, C.; Frankowski, D.; Leskovec, J. Understanding behaviors that lead to purchasing: A case study of Pinterest. In Proceedings of the KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 531–540. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
- Statistica Inc. Conversion Rate of Online Shoppers in the United States from 2nd Quarter 2021 to 2nd Quarter 2022. 2022. Available online: https://www.statista.com/statistics/439558/us-online-shopper-conversion-rate (accessed on 22 September 2024).
- Kim, E.; Kim, W.; Lee, Y. Combination of multiple classifiers for the customer’s purchase behavior prediction. Decis. Support Syst. 2003, 34, 167–175. [Google Scholar] [CrossRef]
- Mokryn, O.; Bogina, V.; Kuflik, T. Will this session end with a purchase? Inferring current purchase intent of anonymous visitors. Electron. Commer. Res. Appl. 2019, 34, 100836. [Google Scholar] [CrossRef]
- Esmeli, R.; Bader-El-Den, M.; Abdullahi, H. Towards early purchase intention prediction in online session based retailing systems. Electron. Mark. 2020, 31, 697–715. [Google Scholar] [CrossRef]
- Martínez, A.; Schmuck, C.; Pereverzyev, S., Jr.; Pirker, C.; Haltmeier, M. A machine learning framework for customer purchase prediction in the non-contractual setting. Eur. J. Oper. Res. 2020, 281, 588–596. [Google Scholar] [CrossRef]
- Chaudhuria, N.; Gupta, G.; Vamsi, V.; Bose, I. On the platform but will they buy? Predicting customers’ purchase behavior using deep learning. Decis. Support Syst. 2021, 149, 113622. [Google Scholar] [CrossRef]
- Esmeli, R.; Bader-El-Den, M.; Abdullahi, H. An analysis of the effect of using contextual and loyalty features on early purchase prediction of shoppers in e-commerce domain. J. Bus. Res. 2022, 147, 420–434. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, E. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
- Apley, D.W.; Zhu, J. Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Soc. Ser. B 2020, 82, 1059–1086. [Google Scholar] [CrossRef]
- Cutroni, J. Google Analytics: Understanding Visitor Behavior; O’Reilly Media: Sebastopol, MA, USA, 2010. [Google Scholar]
- Masís, S. Interpretable Machine Learning with Python; Packt Press: Birmingham, UK, 2021. [Google Scholar]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Wadsworth: Belment, CA, USA, 1983. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Jeni, L.A.; Cohn, J.F.; De La Torre, F. Facing imbalanced data recommendations for the use of performance metrics. In Proceedings of the Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2–5 September 2013; pp. 245–251. [Google Scholar]
Variable Name | Variable Description | IV |
---|---|---|
Avgpageviews_day | Average number of viewed pages per day | 1.56 |
Equip_pageviews_total | Total number of viewed pages related to equipment content | 1.22 |
Pageviews_total | Total number of viewed pages | 0.97 |
Average_time_page_total | Sum of average time spent on page | 0.86 |
Purchase_method_1 | 1 if the user searched for the type 1 purchase method; otherwise, 0 | 0.73 |
Equip_type1_pageviews | Total number of viewed pages of related to equipment of type 1 | 0.62 |
Distinct_days | Number of distinct days on which the user accessed the website | 0.59 |
Distinct_sources | Number of distinct sources/mediums from which the user accessed the website | 0.54 |
Equip_type1_avgtime_sum | Sum of average time spent on equipment type 1-related pages | 0.41 |
Source1_pageviews | Number of viewed pages through the source/medium of type 1 | 0.41 |
Purchase_method_2 | 1 if the user searched for the type 2 purchase method; otherwise, 0 | 0.35 |
Brand2_pageviews | Number of viewed pages related to brand 2 | 0.30 |
Equip_type2_pageviews | Number of viewed pages related to equipment type 2 | 0.27 |
Brand1_pageviews | Number of viewed pages related to brand 1 | 0.21 |
Equip_pageviews_other | Number of viewed pages not related to the equipment categories | 0.21 |
Equip_type1_entrances | Total number of entrances in equipment type 1-related pages | 0.21 |
Purchase_method_3 | 1 if the user searched for the type 3 purchase method; otherwise, 0 | 0.20 |
Coefficient | Std. Error | z-Stat | p-Value | |
---|---|---|---|---|
Constant | −4.860 | 0.034 | −143.002 | 0.000 |
Brand2_pageviews | −0.002 | 0.006 | −0.376 | 0.707 |
Purchase_method_2 | 0.395 | 0.037 | 10.685 | 0.000 |
Equip_type1_avgtime_sum | −0.014 | 0.019 | −0.751 | 0.453 |
Source1_pageviews | 0.010 | 0.001 | 11.673 | 0.000 |
Distinct_sources | 0.218 | 0.014 | 15.776 | 0.000 |
Distinct_days | 0.084 | 0.007 | 12.958 | 0.000 |
Equip_type1_pageviews | −0.006 | 0.005 | −1.178 | 0.239 |
Purchase_method_1 | 1.111 | 0.037 | 29.746 | 0.000 |
Average_time_page_total | 0.128 | 0.008 | 16.067 | 0.000 |
Pageviews_total | −0.013 | 0.001 | −11.671 | 0.000 |
Equip_pageviews_total | 0.016 | 0.003 | 4.792 | 0.000 |
Avgpageviews_day | 0.038 | 0.001 | 30.259 | 0.000 |
Purchase_method_3 | 0.534 | 0.055 | 9.727 | 0.000 |
Equip_type1_entrances | −0.040 | 0.012 | −3.225 | 0.001 |
Brand1_pageviews | 0.007 | 0.005 | 1.383 | 0.167 |
Equip_pageviews_other | −0.036 | 0.008 | −4.483 | 0.000 |
Equip_type2_pageviews | 0.073 | 0.011 | 6.946 | 0.000 |
Customer Feature | Std. Dev. | Importance |
---|---|---|
Avgpageviews_day | 9.014 | 0.343 |
Purchase_method_1 | 0.288 | 0.320 |
Average_time_page_total | 2.034 | 0.260 |
Pageviews_total | 17.434 | 0.227 |
Distinct_days | 2.411 | 0.202 |
Distinct_sources | 0.918 | 0.200 |
Purchase_method_2 | 0.326 | 0.129 |
Equip_pageviews_total | 7.605 | 0.120 |
Source1_pageviews | 10.102 | 0.103 |
Purchase_method_3 | 0.160 | 0.085 |
Equip_type2_pageviews | 0.880 | 0.065 |
Equip_pageviews_other | 1.233 | 0.045 |
Equip_type1_entrances | 0.874 | 0.035 |
Classifier | Precision | Recall | F1-Score | AUC |
---|---|---|---|---|
Logistic Regression | 0.46 | 0.08 | 0.14 | 0.86 |
Gradient boosting machine | 0.28 | 0.43 | 0.34 | 0.88 |
Decile | # Cases | # Responses | Cumulative | % Events | Gain | Lift |
---|---|---|---|---|---|---|
1 | 3696 | 859 | 859 | 59.16 | 59.16 | 5.92 |
2 | 3696 | 277 | 1136 | 19.08 | 78.24 | 3.91 |
3 | 3695 | 136 | 1272 | 9.37 | 87.61 | 2.92 |
4 | 3696 | 69 | 1341 | 4.75 | 92.36 | 2.31 |
5 | 3695 | 54 | 1395 | 3.72 | 96.08 | 1.92 |
6 | 3695 | 32 | 1427 | 2.2 | 98.28 | 1.64 |
7 | 3697 | 21 | 1448 | 1.45 | 99.73 | 1.42 |
8 | 3233 | 4 | 1452 | 0.28 | 100.01 | 1.25 |
9 | 4089 | 0 | 1452 | 0 | 100.01 | 1.11 |
10 | 3765 | 0 | 1452 | 0 | 100.01 | 1.00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bastos, J.A.; Bernardes, M.I. Understanding Online Purchases with Explainable Machine Learning. Information 2024, 15, 587. https://doi.org/10.3390/info15100587
Bastos JA, Bernardes MI. Understanding Online Purchases with Explainable Machine Learning. Information. 2024; 15(10):587. https://doi.org/10.3390/info15100587
Chicago/Turabian StyleBastos, João A., and Maria Inês Bernardes. 2024. "Understanding Online Purchases with Explainable Machine Learning" Information 15, no. 10: 587. https://doi.org/10.3390/info15100587
APA StyleBastos, J. A., & Bernardes, M. I. (2024). Understanding Online Purchases with Explainable Machine Learning. Information, 15(10), 587. https://doi.org/10.3390/info15100587