An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms
Abstract
:1. Introduction
- Clustering: the proposed model uses k-means, k-medoids, and random in the clustering stage. This stage gives us the top clustering technique. The k-medoids technique performs better and overcomes several hitches of the k-mean and random algorithm.
- Classification: the proposed system evaluates different classifiers as single and hybrid models using two datasets. Through the classification stage, we select the most appropriate individual and hybrid classification model.
- Ensemble-based Churn prediction: the churn prediction stage, best hybrid clustering, and classification-based hybrid models are used with an ensemble classifier to select the top CCP ensemble approach.
2. Literature Work
3. Materials and Methods
3.1. Datasets Collection
3.2. Pre-Processing of the Dataset
3.3. Clustering Algorithms
3.3.1. K-Means Clustering
3.3.2. K-Medoids Clustering
3.3.3. Random Clustering
3.4. Classification Algorithms
3.4.1. K-Nearest Neighbor
3.4.2. Decision Tree
3.4.3. Gradient Boosted Tree
3.4.4. Random Forest
3.4.5. Deep Learning
3.4.6. Naive Bayes
3.5. Ensemble Classifiers
3.5.1. Voting
3.5.2. Bagging
3.5.3. Stacking
3.6. Proposed Framework
- Initially, clustering algorithms are employed, and select top clustering algorithm. We used 2 clusters in the clustering algorithms. In our experiments, the k-medoids technique outperforms k-means and random clustering.
- Next, a single classifier including GBT, DT, RF, NB, and DL classification algorithms is implemented.
- Afterward, k-medoids and single classifier-based model are implemented. This hybrid technique attained superior performance in comparison to single clustering or classification algorithms.
- Then, k-medoids clustering and hybrid classifiers-based model are designed and results are evaluated in terms of accuracy, recall, precision, and F-measure.
- Finally, ensemble models such as bagging and stacking are incorporated with the best previous hybrid model which achieves top results in a contrast to all the above experiments.
- It is clear from the evaluation that the proposed combination of clustering and ensemble model has achieved the highest prediction accuracy as compared to other methods.
Map Clustering on the Label
4. Results
4.1. Evaluation Measures
4.2. Performance Analysis Based on Clustering Algorithms
4.3. Performance Analysis Based on Classification Algorithms
4.4. Combining the k-Medoids Clustering Algorithm with Each Single Classifier
4.5. Combining k-Medoids Clustering Algorithm Hybrid Classifiers
4.6. Ensemble Classifiers Combined with k-Medoids and Hybrid Classifiers
5. Performance Comparison with Other Existing Approaches
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mattison, R. Telecom Churn Management: The Golden Opportunity; APDG Pub: Fuquay-Varina, NC, USA, 2001. [Google Scholar]
- Payne, A.; Frow, P. A strategic framework for customer relationship management. J. Mark. 2005, 69, 167–176. [Google Scholar] [CrossRef]
- Reinartz, W.; Krafft, M.; Hoyer, W.D. The customer relationship management process: Its measurement and impact on performance. J. Mark. Res. 2004, 41, 293–305. [Google Scholar] [CrossRef]
- Neslin, S.A.; Gupta, S.; Kamakura, W.; Lu, J.; Mason, C.H. Defection detection: Measuring and understanding the predictive accuracy of customer churn models. J. Mark. Res. 2006, 43, 204–211. [Google Scholar] [CrossRef]
- Liu, C.J.; Huang, T.S.; Ho, P.T.; Huang, J.C.; Hsieh, C.T. Machine learning-based e-commerce platform repurchase customer prediction model. PLoS ONE 2020, 15, e0243105. [Google Scholar] [CrossRef]
- Gulc, A. Multi-stakeholder perspective of courier service quality in B2C e-commerce. PLoS ONE 2021, 16, e0251728. [Google Scholar] [CrossRef]
- Abbasimehr, H.; Bahrini, A. An analytical framework based on the recency, frequency, and monetary model and time series clustering techniques for dynamic segmentation. Expert Syst. Appl. 2022, 192, 116373. [Google Scholar] [CrossRef]
- Carbo-Valverde, S.; Cuadros-Solas, P.; Rodríguez-Fernández, F. A machine learning approach to the digitalization of bank customers: Evidence from random and causal forests. PLoS ONE 2020, 15, e0240362. [Google Scholar] [CrossRef]
- Zhou, J.; Zhai, L.; Pantelous, A.A. Market segmentation using high-dimensional sparse consumers data. Expert Syst. Appl. 2020, 145, 113136. [Google Scholar] [CrossRef]
- Van den Poel, D.; Lariviere, B. Customer attrition analysis for financial services using proportional hazard models. Eur. J. Oper. Res. 2004, 157, 196–217. [Google Scholar] [CrossRef]
- Reinartz, W.J.; Kumar, V. The impact of customer relationship characteristics on profitable lifetime duration. J. Mark. 2003, 67, 77–99. [Google Scholar] [CrossRef] [Green Version]
- Lin, S.C.; Tung, C.H.; Jan, N.Y.; Chiang, D.A. Evaluating churn model in CRM: A case study in Telecom. J. Converg. Inf. Technol. 2011, 6. [Google Scholar] [CrossRef]
- Hwang, H.; Jung, T.; Suh, E. An LTV model and customer segmentation based on customer value: A case study on the wireless telecommunication industry. Expert Syst. Appl. 2004, 26, 181–188. [Google Scholar] [CrossRef]
- Larivière, B.; Van den Poel, D. Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Syst. Appl. 2005, 29, 472–484. [Google Scholar] [CrossRef]
- Wei, C.P.; Chiu, I.T. Turning telecommunications call details to churn prediction: A data mining approach. Expert Syst. Appl. 2002, 23, 103–112. [Google Scholar] [CrossRef]
- Xia, G.E.; Jin, W.D. Model of customer churn prediction on support vector machine. Syst.-Eng.-Theory Pract. 2008, 28, 71–77. [Google Scholar] [CrossRef]
- Dietterich, T.G. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21–23 June 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
- Van den Berg, M.; Slot, R.; van Steenbergen, M.; Faasse, P.; van Vliet, H. How enterprise architecture improves the quality of IT investment decisions. J. Syst. Softw. 2019, 152, 134–150. [Google Scholar] [CrossRef]
- Kornyshova, E.; Barrios, J. Industry 4.0 impact propagation on enterprise architecture models. Procedia Comput. Sci. 2020, 176, 2497–2506. [Google Scholar] [CrossRef]
- Kotusev, S.; Kurnia, S.; Dilnutt, R. The practical roles of enterprise architecture artifacts: A classification and relationship. Inf. Softw. Technol. 2022, 147, 106897. [Google Scholar] [CrossRef]
- Górski, T. Towards Enterprise Architecture for Capital Group in Energy Sector. In Proceedings of the 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES), Las Palmas de Gran Canaria, Spain, 21–23 June 2018; pp. 000239–000244. [Google Scholar]
- Hung, S.Y.; Yen, D.C.; Wang, H.Y. Applying data mining to telecom churn management. Expert Syst. Appl. 2006, 31, 515–524. [Google Scholar] [CrossRef]
- Huang, Y.; Kechadi, T. An effective hybrid learning system for telecommunication churn prediction. Expert Syst. Appl. 2013, 40, 5635–5647. [Google Scholar] [CrossRef]
- Pendharkar, P.C. Genetic algorithm based neural network approaches for predicting churn in cellular wireless network services. Expert Syst. Appl. 2009, 36, 6714–6720. [Google Scholar] [CrossRef]
- Burez, J.; Van den Poel, D. Handling class imbalance in customer churn prediction. Expert Syst. Appl. 2009, 36, 4626–4636. [Google Scholar] [CrossRef]
- Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part (Appl. Rev.) 2011, 42, 463–484. [Google Scholar] [CrossRef]
- Verbeke, W.; Dejaeger, K.; Martens, D.; Hur, J.; Baesens, B. New insights into churn prediction in the telecommunication sector: A profit driven data mining approach. Eur. J. Oper. Res. 2012, 218, 211–229. [Google Scholar] [CrossRef]
- Huang, B.; Buckley, B.; Kechadi, T.M. Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications. Expert Syst. Appl. 2010, 37, 3638–3646. [Google Scholar] [CrossRef]
- Kisioglu, P.; Topcu, Y.I. Applying Bayesian Belief Network approach to customer churn analysis: A case study on the telecom industry of Turkey. Expert Syst. Appl. 2011, 38, 7151–7157. [Google Scholar] [CrossRef]
- Xu, H.; Zhang, Z.; Zhang, Y. Churn prediction in telecom using a hybrid two-phase feature selection method. In Proceedings of the 2009 Third International Symposium on Intelligent Information Technology Application, Nanchang, China, 21–22 November 2009; Volume 3, pp. 576–579. [Google Scholar]
- De Bock, K.W.; Van den Poel, D. An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction. Expert Syst. Appl. 2011, 38, 12293–12301. [Google Scholar] [CrossRef]
- Dalli, A. Impact of Hyperparameters on Deep Learning Model for Customer Churn Prediction in Telecommunication Sector. Math. Probl. Eng. 2022, 2022, 4720539. [Google Scholar] [CrossRef]
- Lalwani, P.; Mishra, M.K.; Chadha, J.S.; Sethi, P. Customer churn prediction system: A machine learning approach. Computing 2022, 104, 271–294. [Google Scholar] [CrossRef]
- Hu, X.; Yang, Y.; Chen, L.; Zhu, S. Research on a customer churn combination prediction model based on decision tree and neural network. In Proceedings of the 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 10–13 April 2020; pp. 129–132. [Google Scholar]
- Jain, H.; Khunteta, A.; Shrivastav, S.P. Telecom Churn Prediction Using Seven Machine Learning Experiments integrating Features engineering and Normalization. 2021, 1–25. Available online: https://www.researchsquare.com/article/rs-239201/v1 (accessed on 14 July 2022). [CrossRef]
- Miller, H.; Clarke, S.; Lane, S.; Lonie, A.; Lazaridis, D.; Petrovski, S.; Jones, O. Predicting customer behaviour: The University of Melbourne’s KDD Cup report. In Proceedings of the KDD-Cup 2009 Competition, PMLR, Paris, France, 28 June–1 July 2009; pp. 45–55. [Google Scholar]
- Sorokina, D. Application of additive groves ensemble with multiple counts feature evaluation to KDD cup’09 small data set. In Proceedings of the KDD-Cup 2009 Competition, PMLR, Paris, France, 28 June–1 July 2009; pp. 101–109. [Google Scholar]
- Gajowniczek, K.; Orłowski, A.; Ząbkowski, T. Insolvency modeling with generalized entropy cost function in neural networks. Phys. Stat. Mech. Its Appl. 2019, 526, 120730. [Google Scholar] [CrossRef]
- Sjarif, N.; Rusydi, M.; Yusof, M.; Hooi, D.; Wong, T.; Yaakob, S.; Ibrahim, R.; Osman, M. A customer Churn prediction using Pearson correlation function and K nearest neighbor algorithm for telecommunication industry. Int. J. Adv. Soft Compu. Appl. 2019, 11, 46–59. [Google Scholar]
- Salzberg, S.L. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 1994, 16, 235–240. [Google Scholar] [CrossRef]
- Stearns, B.; Rangel, F.M.; Rangel, F.; de Faria, F.F.; Oliveira, J.; Ramos, A.A.d.S. Scholar Performance Prediction using Boosted Regression Trees Techniques. In Proceedings of the The European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 26–28 April 2017. [Google Scholar]
- Idris, A.; Khan, A. Customer churn prediction for telecommunication: Employing various various features selection techniques and tree based ensemble classifiers. In Proceedings of the 2012 15th International Multitopic Conference (INMIC), Islamabad, Pakistan, 13–15 December 2012; pp. 23–27. [Google Scholar]
- Yulianti, Y.; Saifudin, A. Sequential feature selection in customer churn prediction based on Naive Bayes. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Ulaanbaatar, Mongolia, 10–13 September 2020; IOP Publishing: Bristol, UK, 2020; Volume 879, p. 012090. [Google Scholar]
- Gupta, M.K.; Chandra, P. A comprehensive survey of data mining. Int. J. Inf. Technol. 2020, 12, 1243–1257. [Google Scholar] [CrossRef]
- Dudoit, S.; Fridlyand, J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 2003, 19, 1090–1099. [Google Scholar] [CrossRef]
- Idris, A.; Khan, A. Churn prediction system for telecom using filter–wrapper and ensemble classification. Comput. J. 2017, 60, 410–430. [Google Scholar] [CrossRef]
- Ahmed, A.A.; Maheswari, D. Churn prediction on huge telecom data using hybrid firefly based classification. Egypt. Inform. J. 2017, 18, 215–220. [Google Scholar] [CrossRef]
- Vijaya, J.; Sivasankar, E. An efficient system for customer churn prediction through particle swarm optimization based feature selection model with simulated annealing. Clust. Comput. 2019, 22, 10757–10768. [Google Scholar] [CrossRef]
- Pustokhina, I.V.; Pustokhin, D.A.; Nguyen, P.T.; Elhoseny, M.; Shankar, K. Multi-objective rain optimization algorithm with WELM model for customer churn prediction in telecommunication sector. Complex Intell. Syst. 2021, 1–13. [Google Scholar] [CrossRef]
- Usman, M.; Ahmad, W.; Fong, A. Design and Implementation of a System for Comparative Analysis of Learning Architectures for Churn Prediction. IEEE Commun. Mag. 2021, 59, 86–90. [Google Scholar] [CrossRef]
- Wael Fujo, S.; Subramanian, S.; Ahmad Khder, M. Customer Churn Prediction in Telecommunication Industry Using Deep Learning. Inf. Sci. Lett. 2022, 11, 24. [Google Scholar]
- Praseeda, C.; Shivakumar, B. Fuzzy particle swarm optimization (FPSO) based feature selection and hybrid kernel distance based possibilistic fuzzy local information C-means (HKD-PFLICM) clustering for churn prediction in telecom industry. SN Appl. Sci. 2021, 3, 1–18. [Google Scholar] [CrossRef]
Attributes | Cell2Cell | Orange |
---|---|---|
Complete examples | 40,000 | 50,000 |
Complete features | 76 | 260 |
Numerical features | 68 | 190 |
Nominal features | 8 | 70 |
Data sharing | Balanced | Imbalanced |
Missing values | No | Yes |
Ref | Method | Dataset | Accuracy (%) | Recall (%) | F-Measure (%) | Year |
---|---|---|---|---|---|---|
[47] | Hybrid firefly algorithm | Orange | 86.38 | 80 | 85 | 2017 |
[46] | FW-ECP | Orange | 79.4 | 74.1 | 72.7 | 2017 |
[48] | PSO-FSSA | Orange | 94.08 | 84.01 | 80.28 | 2019 |
[46] | FW-ECP | Cell2Cell | 84.9 | 80.2 | 81.02 | 2017 |
[50] | LSTM | Cell2Cell | 72.7 | 78 | 80.65 | 2021 |
[51] | Deep-BP-ANN | Cell2Cell | 92 | 81.74 | 77.47 | 2022 |
[52] | HKD-PFLICM | Cell2Cell | 76.51 | 79 | 78 | 2021 |
Proposed | Stacking-based ensemble model | Orange | 96 | 91.61 | 90.23 | 2022 |
Proposed | Stacking-based ensemble model | Cell2Cell | 93.6 | 85.45 | 83.72 | 2022 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, R.; Ali, S.; Bilal, S.F.; Sakhawat, Z.; Imran, A.; Almuhaimeed, A.; Alzahrani, A.; Sun, G. An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms. Appl. Sci. 2022, 12, 9355. https://doi.org/10.3390/app12189355
Liu R, Ali S, Bilal SF, Sakhawat Z, Imran A, Almuhaimeed A, Alzahrani A, Sun G. An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms. Applied Sciences. 2022; 12(18):9355. https://doi.org/10.3390/app12189355
Chicago/Turabian StyleLiu, Rencheng, Saqib Ali, Syed Fakhar Bilal, Zareen Sakhawat, Azhar Imran, Abdullah Almuhaimeed, Abdulkareem Alzahrani, and Guangmin Sun. 2022. "An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms" Applied Sciences 12, no. 18: 9355. https://doi.org/10.3390/app12189355
APA StyleLiu, R., Ali, S., Bilal, S. F., Sakhawat, Z., Imran, A., Almuhaimeed, A., Alzahrani, A., & Sun, G. (2022). An Intelligent Hybrid Scheme for Customer Churn Prediction Integrating Clustering and Classification Algorithms. Applied Sciences, 12(18), 9355. https://doi.org/10.3390/app12189355