Gms-Afkmc2: A New Customer Segmentation Framework Based on the Gaussian Mixture Model and ASSUMPTION-FREE K-MC2
Abstract
:1. Introduction
2. Related Work
2.1. RFM Model
2.2. K-Means
3. The Proposed Gms-Afkmc2
3.1. Gaussian Mixture Model Dataset Sampling Algorithm (Gms)
Algorithm 1 Gaussian mixture model dataset sampling |
Input: (Original dataset), (Weight of each comonent), k(Num of cluster), (Num of subset) Output: (Subsample dataset) # Train Gaussian mixture model with # The fit function is a python library function belonging to sklearn, a machine learning library # Obtain the samples # The sample function is a python library function belonging to sklearn, a machine learning library # Find the nodes of the dataset closest to the samples as for sample in samples for dis = if return |
3.2. Assumption-Free K-MC2 (Afkmc2)
Algorithm 2 Gms-Afkmc2 |
Input: (Original dataset), (Weight of each component), k(Num of cluster), (Num of subset) Output: (The final clusters) # Produce a subsample dataset by the function Gms in Algorithm 1 # Generate initial centers of subdataset Point uniformly sampled for x in do Compute data distribution for i in do Point uniformly sampled # Make initial clusters of dataset # The AFkmc2 function is a python library function belonging to kmc2-0.1, a library proposed by Oliver Bachem # Obtain final segmentations return |
4. Experiments
4.1. Dataset and Evaluation Metrics
4.2. Experiment Analysis
4.3. Customer Segmentation and Application
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Guerola-Navarro, V.; Gil-Gomez, H.; Oltra-Badenes, R.; Soto-Acosta, P. Customer relationship management and its impact on entrepreneurial marketing: A literature review. Int. Entrep. Manag. J. 2022, 20, 507–547. [Google Scholar] [CrossRef]
- Wan, S.; Chen, J.; Qi, Z.; Gan, W.; Tang, L. Fast RFM Model for Customer Segmentation. In Proceedings of the Companion of The Web Conference 2022, Virtual Event/Lyon, France, 25–29 April 2022; Laforest, F., Troncy, R., Simperl, E., Agarwal, D., Gionis, A., Herman, I., Médini, L., Eds.; ACM: New York, NY, USA, 2022; pp. 965–972. [Google Scholar]
- Chen, Y.; Liu, L.; Zheng, D.; Li, B. Estimating travellers’ value when purchasing auxiliary services in the airline industry based on the RFM model. J. Retail. Consum. Serv. 2023, 74, 103433. [Google Scholar] [CrossRef]
- Rungruang, C.; Riyapan, P.; Intarasit, A.; Chuarkham, K.; Muangprathub, J. RFM model customer segmentation based on hierarchical approach using FCA. Expert Syst. Appl. 2024, 237, 121449. [Google Scholar] [CrossRef]
- Tang, C.; Li, Z.; Wang, J.; Liu, X.; Zhang, W.; Zhu, E. Unified one-step multi-view spectral clustering. IEEE Trans. Knowl. Data Eng. 2022, 35, 6449–6460. [Google Scholar] [CrossRef]
- Chen, Y.; Zhou, L.; Bouguila, N.; Wang, C.; Chen, Y.; Du, J. BLOCK-DBSCAN: Fast clustering for large scale data. Pattern Recognit. 2021, 109, 107624. [Google Scholar] [CrossRef]
- Yan, J.; Cheng, Y.; Wang, Q.; Liu, L.; Zhang, W.; Jin, B. Transformer and Graph Convolution-Based Unsupervised Detection of Machine Anomalous Sound Under Domain Shifts. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2827–2842. [Google Scholar] [CrossRef]
- Lloyd, S.P. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–136. [Google Scholar] [CrossRef]
- Inaba, M.; Katoh, N.; Imai, H. Applications of Weighted Voronoi Diagrams and Randomization to Variance-Based k-Clustering (Extended Abstract). In Proceedings of the Tenth Annual Symposium on Computational Geometry, New York, NY, USA, 6–8 June 1994; Mehlhorn, K., Ed.; ACM: New York, NY, USA, 1994; pp. 332–339. [Google Scholar]
- Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, LA, USA, 7–9 January 2007; Bansal, N., Pruhs, K., Stein, C., Eds.; SIAM: Philadelphia, PA, USA, 2007; pp. 1027–1035. [Google Scholar]
- Makarychev, K.; Reddy, A.; Shan, L. Improved Guarantees for k-means++ and k-means++ Parallel. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Bachem, O.; Lucic, M.; Hassani, H.; Krause, A. Fast and Provably Good Seedings for k-Means. In Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Maryani, I.; Riana, D.; Astuti, R.D.; Ishaq, A.; Pratama, E.A. Customer segmentation based on RFM model and clustering techniques with K-means algorithm. In Proceedings of the 2018 Third International Conference on Informatics and Computing (ICIC), Palembang, Indonesia, 17–18 October 2018; pp. 1–6. [Google Scholar]
- Dzulhaq, M.I.; Sari, K.W.; Ramdhan, S.; Tullah, R. Customer segmentation based on RFM value using K-means algorithm. In Proceedings of the 2019 Fourth International Conference on Informatics and Computing (ICIC), Semarang, Indonesia, 16–17 October 2019; pp. 1–7. [Google Scholar]
- Anitha, P.; Patil, M.M. RFM model for customer purchase behavior using K-Means algorithm. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 1785–1792. [Google Scholar] [CrossRef]
- Khalili-Damghani, K.; Abdi, F.; Abolmakarem, S. Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-centric industries. Appl. Soft Comput. 2018, 73, 816–828. [Google Scholar] [CrossRef]
- Zhang, Y.; Bradlow, E.T.; Small, D.S. Predicting customer value using clumpiness: From RFM to RFMC. Mark. Sci. 2015, 34, 195–208. [Google Scholar] [CrossRef]
- Heldt, R.; Silveira, C.S.; Luce, F.B. Predicting customer value per product: From RFM to RFM/P. J. Bus. Res. 2021, 127, 444–453. [Google Scholar] [CrossRef]
- Elkan, C. Using the Triangle Inequality to Accelerate k-Means. In Proceedings of the Machine Learning, Proceedings of the Twentieth International Conference ICML 2003, Washington, DC, USA, 21–24 August 2003; Fawcett, T., Mishra, N., Eds.; AAAI Press: Washington, DC, USA, 2003; pp. 147–153. [Google Scholar]
- Vemuri, R.T.; Azam, M.; Bouguila, N.; Patterson, Z. A Bayesian sampling framework for asymmetric generalized Gaussian mixture models learning. Neural Comput. Appl. 2022, 34, 14123–14134. [Google Scholar] [CrossRef]
- Wang, F.; Liao, F.; Li, Y.; Wang, H. A new prediction strategy for dynamic multi-objective optimization using Gaussian Mixture Model. Inf. Sci. 2021, 580, 331–351. [Google Scholar] [CrossRef]
- Jin, P.; Huang, J.; Liu, F.; Wu, X.; Ge, S.; Song, G.; Clifton, D.; Chen, J. Expectation-maximization contrastive learning for compact video-and-language representations. Adv. Neural Inf. Process. Syst. 2022, 35, 30291–30306. [Google Scholar]
- Cong, Y.; Li, S. Big Learning Expectation Maximization. arXiv 2023, arXiv:2312.11926. [Google Scholar] [CrossRef]
- Bagirov, A.M.; Aliguliyev, R.M.; Sultanova, N. Finding compact and well-separated clusters: Clustering using silhouette coefficients. Pattern Recognit. 2023, 135, 109144. [Google Scholar] [CrossRef]
- Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
- Narayanan, S.; Samuel, P.; Chacko, M. Product Pre-Launch Prediction From Resilient Distributed e-WOM Data. IEEE Access 2020, 8, 167887–167899. [Google Scholar] [CrossRef]
- Anastasiei, B.; Dospinescu, N.; Dospinescu, O. Individual and Product-Related Antecedents of Electronic Word-of-Mouth. arXiv 2024, arXiv:2403.14717. [Google Scholar]
- Kim, W.; Nam, K.; Son, Y. Categorizing affective response of customer with novel explainable clustering algorithm: The case study of Amazon reviews. Electron. Commer. Res. Appl. 2023, 58, 101250. [Google Scholar] [CrossRef]
Rate | Iterations | Runtime | SH ↑ | DBI ↓ |
---|---|---|---|---|
1 | 6 | 0.024 | 0.916 | 0.415 |
0.8 | 6 | 0.024 | 0.916 | 0.415 |
0.6 | 8 | 0.032 | 0.916 | 0.415 |
0.5 | 8 | 0.032 | 0.916 | 0.415 |
0.4 | 9 | 0.036 | 0.913 | 0.420 |
0.2 | 11 | 0.044 | 0.910 | 0.423 |
0.1 | 12 | 0.048 | 0.910 | 0.423 |
0.05 | 12 | 0.048 | 0.910 | 0.423 |
Rate | Iterations | Runtime | SH ↑ | DBI ↓ |
---|---|---|---|---|
1 | 9 | 0.027 | 0.355 | 1.065 |
0.8 | 10 | 0.030 | 0.355 | 1.065 |
0.6 | 12 | 0.036 | 0.355 | 1.065 |
0.5 | 13 | 0.039 | 0.355 | 1.065 |
0.4 | 13 | 0.039 | 0.354 | 1.066 |
0.2 | 13 | 0.039 | 0.352 | 1.067 |
0.1 | 15 | 0.045 | 0.352 | 1.067 |
0.05 | 15 | 0.045 | 0.350 | 1.068 |
Method | SSE ↓ | SH ↑ | DBI ↑ | Runtime |
---|---|---|---|---|
Gms-Afkmc2 | 28.19 | 0.92 | 0.42 | 0.018 |
K-means++ | 29.57 | 0.90 | 0.44 | 0.008 |
K-means | 31.25 | 0.89 | 0.52 | 0.051 |
Method | SSE ↓ | SH ↑ | DBI ↑ | Runtime |
---|---|---|---|---|
Gms-Afkmc2 | 241.423 | 0.355 | 1.065 | 0.026 |
K-means++ | 245.745 | 0.352 | 1.067 | 0.013 |
K-means | 248.131 | 0.347 | 1.071 | 0.0750 |
Rate | Recency ↓ | Frequency ↑ | Monetary ↑ | |
---|---|---|---|---|
Cluster 1 | 13.46% | 597.76 | 30.24 | 657.41 |
Cluster 2 | 25.43% | 348.1 | 56.62 | 1099.67 |
Cluster 3 | 60.85% | 54.09 | 169.44 | 3451.62 |
Cluster 4 | 0.26% | 3.86 | 4629.43 | 204,593.38 |
Rate | Recency ↓ | Frequency ↑ | Monetary ↑ | |
---|---|---|---|---|
Cluster 1 | 13.46% | 593 | 18 | 284.85 |
Cluster 2 | 25.43% | 375 | 31.5 | 515.82 |
Cluster 3 | 60.85% | 36 | 87 | 1428 |
Cluster 4 | 0.26% | 2.5 | 3844 | 143,587.09 |
Rate | Recency ↓ | Frequency ↑ | Monetary ↑ | |
---|---|---|---|---|
Cluster 1 | 28.99% | 101.69 | 93.57 | 202,034.79 |
Cluster 2 | 34.20% | 314.14 | 76 | 169,933.59 |
Cluster 3 | 28.11% | 210.99 | 73.24 | 156,509.03 |
Cluster 4 | 8.7% | 203.36 | 334.14 | 756,760.6 |
Rate | Recency ↓ | Frequency ↑ | Monetary ↑ | |
---|---|---|---|---|
Cluster 1 | 28.99% | 104 | 56 | 142,673.87 |
Cluster 2 | 34.20% | 312 | 43.5 | 111,913.43 |
Cluster 3 | 28.11% | 213 | 48 | 122,743.21 |
Cluster 4 | 8.7% | 203 | 278 | 747,008.31 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, L.; Zhang, J. Gms-Afkmc2: A New Customer Segmentation Framework Based on the Gaussian Mixture Model and ASSUMPTION-FREE K-MC2. Electronics 2024, 13, 3523. https://doi.org/10.3390/electronics13173523
Xiao L, Zhang J. Gms-Afkmc2: A New Customer Segmentation Framework Based on the Gaussian Mixture Model and ASSUMPTION-FREE K-MC2. Electronics. 2024; 13(17):3523. https://doi.org/10.3390/electronics13173523
Chicago/Turabian StyleXiao, Liqun, and Jiashu Zhang. 2024. "Gms-Afkmc2: A New Customer Segmentation Framework Based on the Gaussian Mixture Model and ASSUMPTION-FREE K-MC2" Electronics 13, no. 17: 3523. https://doi.org/10.3390/electronics13173523
APA StyleXiao, L., & Zhang, J. (2024). Gms-Afkmc2: A New Customer Segmentation Framework Based on the Gaussian Mixture Model and ASSUMPTION-FREE K-MC2. Electronics, 13(17), 3523. https://doi.org/10.3390/electronics13173523