Research and Application of Improved Clustering Algorithm in Retail Customer Classification
Abstract
:1. Introduction
2. Literature Review
- Stage 1: Network Initial Design
- Stage 2: Network Execution
- (1)
- The algorithm requires the user to specify the value of parameter K, and different K values affect the efficiency and result of cluster analysis;
- (2)
- The selection of the initial center point of the algorithm is closely related to the operation efficiency of the algorithm. It may lead to a large number of iterations or limited to a local optimal state;
- (3)
- It is only good at dealing with spherical data;
- (4)
- It is sensitive to abnormal deviation data;
- (5)
- The algorithm is only suitable for the data of numerical attributes, and the data processing effect of classification attributes is poor;
- (6)
- The efficiency of the search strategy is low, and the overhead of the algorithm is large when dealing with large databases. In this paper, the defects of point (1), (2), and (6) for the typical k-means algorithm are analyzed because the clustering results of the traditional k-means algorithm are affected by the selection of the initial clustering center. This algorithm is improved on the basis of the traditional k-means algorithm. The improved algorithm A effectively solves the problem of the algorithm. Depending on the initial value of K, the number of classes K can be automatically generated; at the same time, the algorithm compares the selection of the initial center point. Strictly, the distance between each central point is far, which prevents the initial clustering center to select a class and have a certain process. It overcomes the limitation of the algorithm to the local optimal state. Algorithm B is improved by point (6), combined with sampling technology and a hierarchical aggregation algorithm that improve the original algorithm; thus, the new algorithm B is more effective. In the application of the algorithm, clustering technology is used in customer segmentation, and customer segmentation is established by the analytic hierarchy process value system and by quantifying customer value. Based on this, cluster technology is applied to divide customers into different categories. Therefore, it has a certain practical significance in that it effectively carries out customer management. At present, there have been some customer value reviews of the price system, but the measurement model is not mature enough [40]. The first measurement index is the direct profit contribution of customers to the enterprise. It is also difficult to quantify. This paper will use the method of data mining to start from the actual situation of the enterprise. Through a series of operable customer value evaluation indexes, a customer value evaluation model suitable for enterprise development is established. Based on this, we can measure customer value, segment customers, and establish a decision support system for customer value management. As for the part on theoretical research, this paper focuses on the ideas of discovering problems, raising problems, analyzing problems, and solving problems based on clues, adopts the research method of combining empirical analysis and theoretical analysis, and combines theoretical research with applied research [41]. Organically combined, through the enlightenment of the improved k-means algorithm, the aim is to resolve the existing problems of the algorithm. The deficiency is studied and a new algorithm is obtained. Based on the previous research results, this paper follows the basic requirements of clustering and gradually completes the research work.as can be seen in Table 2.
3. Materials and Methods
3.1. Aim and Hypotheses
3.2. Variables and Instruments
3.3. Sample
3.4. Statistical Analysis of Data and Procedure
4. Results
5. Conclusions
5.1. Main Work of the Paper
5.2. Further Research Directions
6. Discussions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gurley, J.W. The one Internet metric that really matters. Fortune 2000, 2, 141–392. [Google Scholar]
- Bickerton, P. 7 technologies that are transforming the hospitality industry. Hosp. Mark. 2015, 1, 14–28. [Google Scholar]
- Pui, L.T.; Chechen, L.; Tzu, H.L. Shopping motivations on Internet: A study based on utilitarian and hedonic value. Technovation 2007, 27, 774–787. [Google Scholar]
- Lee, E.Y.; Soo, B.L.; Yu, J.J. Factors influencing the behavioral intention to use food delivery. Apps. Soc. Behav. Pers. 2017, 45, 1461–1474. [Google Scholar] [CrossRef]
- Armstrong, G.; Kotler, P. Marketing; Prentice-Hall: Englewood Cliffs, NJ, USA, 2000; pp. 1–98. [Google Scholar]
- Ozkara, B.Y.; Ozmen, M.; Kim, J.W. Examining the effect of flow experience on online purchase: A novel approach to the flow theory based on hedonic and utilitarian value. J. Retail. Consum. Serv. 2017, 37, 119–131. [Google Scholar] [CrossRef]
- Park, J.; Ha, S. Co-creation of service recovery: Utilitarian and hedonic value and post-recovery responses. J. Retail. Consum. Serv. 2016, 28, 310–316. [Google Scholar] [CrossRef]
- Anderson, K.C.; Knight, D.K.; Pookulangara, S.; Josiam, B. Influence of hedonic and utilitarian motivations on retailer loyalty and purchase intention: A facebook perspective. J. Retail. Consum. Serv. 2014, 21, 773–779. [Google Scholar] [CrossRef]
- Chiu, C.M.; Wang, E.T.; Fang, Y.H.; Huang, H.Y. Understanding customers’ repeat purchase intentions in B2C E-Commerce: The roles of utilitarian value, hedonic value and perceived risk. Inf. Syst. J. 2014, 24, 85–114. [Google Scholar] [CrossRef]
- Lin, K.Y.; Lu, H.P. Predicting mobile social network acceptance based on mobile value and social influence. Internet Res. 2015, 25, 107–130. [Google Scholar] [CrossRef]
- Huang, J.H.; Yang, Y.C. Gender difference in adolescents’ online shopping motivation. Afr. J. Bus. Manag. 2010, 4, 849–857. [Google Scholar]
- Grandom, E.; Mykytyn, P. Theory-based instrumentation to measure the intention to use electronic commerce in small and medium sized businesses. J. Comput. Inf. Syst. 2004, 44, 44–57. [Google Scholar]
- Chan, T.K.H.; Cheung, C.M.K.; Shi, N.; Lee, M.K.O. Gender differences in satisfaction with Facebook users. Ind. Manag. Data Syst. 2015, 115, 182–206. [Google Scholar] [CrossRef]
- Meyers-Levy, J.; Sternthal, B. Gender differences in the use of message cues and judgments. J. Mark. Res. 1991, 28, 84–96. [Google Scholar] [CrossRef]
- Jo¨reskog, K.G.; So¨rbom, D. LISREL 8: User’s Reference Guide; Scientific Software International: Chicago, IL, USA, 1996; pp. 1–98. [Google Scholar]
- Arnold, M.J.; Reynolds, K.E. Hedonic shopping motivations. J. Retail. 2003, 79, 77–95. [Google Scholar] [CrossRef]
- Schwab, D.P. Research Methods for Organizational Studies; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2005; pp. 1–30. [Google Scholar]
- Byrne, B.M. Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2001; pp. 1–28. [Google Scholar]
- Swaminathan, V.; Lepowska, W.E.; Rao, B.P. Browsers or buyers in cyberspace: An investigation of factors influencing electronic exchange. J. Comput.-Mediat. Commun. 1999, 5, 224–234. [Google Scholar] [CrossRef]
- Stützle, T.; Hoos, H. Improvements on the Ant System: Introducing MAX—MIN ant System. In Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, East Lansing, MI, USA, 19–23 July 1997; Springer: Vienna, Austria, 1997; pp. 245–249. [Google Scholar]
- Yang, Z.; Li, H.; Haodong, Z. An improved k-means dynamic clustering algorithm. J. Chongqing Norm. Univ. Nat. Sci. Dep. Acad. Ed. 2016, 33, 97–101. [Google Scholar]
- He, Q.; Wang, Q.; Zhuang, F.; Tan, Q.; Shi, Z. Parallel CLARANS Clustering Based on MapReduce. Energy Procedia 2011, 13, 3269–3279. [Google Scholar]
- Karypis, G.; Han, E.H.; Kumar, V. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Trans. Comput. 1999, 32, 68–75. [Google Scholar] [CrossRef] [Green Version]
- Guha, S.; Rastogi, R.; Shim, K. CURE: An efficient clustering algorithm forlarge databases. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, WA, USA, 2–4 June 1998; ACM Press: New York, NY, USA, 1998; pp. 73–84. [Google Scholar]
- Yang, X. Application of Improved Birch Algorithm in Telecom Customer Segmentation; Hefei University of technology: Hefei, China, 2015; p. 77. [Google Scholar]
- Zhang, J.; Wang, L.; Yao, Y. A traffic classification algorithm based on options clustering. Zhengzhou Inst. Light Ind. J. Nat. Sci. 2013, 28, 83–86. [Google Scholar]
- Hinneburg, A.; Keim, D. An efficient approach to clustering large multimedia database with noise. In Proceedings of the 4th ACM SIGKDD on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; AAAI Press: New York, NY, USA, 1998; pp. 58–65. [Google Scholar]
- Wang, D.; Wang, I.; Fan, W. An improved clique high dimensional subspace clustering algorithm. Semicond. Light Dian 2016, 37, 275–278. [Google Scholar]
- Sheikholeslami, G.; Chatterjee, S.; Zhang, A. WaveCluster: A Multi—Resolution Clustering Approach for Very Large Spatial Database. In Proceedings of the 24th Conference on VLDB, New York, NY, USA, 24–27 August 1998; pp. 428–439. [Google Scholar]
- Liang, J. Research on Ant Colony Algorithm and Its Application in Clustering; South China University of Technology: Guangzhou, China, 2011. [Google Scholar]
- Tan, S.C.; Ting, K.M.; Teng, S.W. Simplifying and improving clustering ant-based. In Proceedings of the International Conference on Computational Science 2011, Dalian, China, 24–26 June 2011; pp. 46–55. [Google Scholar]
- Haiying, M.; Yu, G. Customer Segmentation Study of College Students Basedon the RFM. In Proceedings of the 2010 International Conference on E-Business and EGovernment, Guangzhou, China, 7–9 May 2010; pp. 3860–3863. [Google Scholar] [CrossRef]
- Sheshasaayee, A.; Logeshwari, L. An efficiency analysis on the TPA clustering methods for intelligent customer segmentation. In Proceedings of the 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 21–23 February 2017; pp. 784–788. [Google Scholar]
- Srivastava, R. Identification of customer clusters using RFM model: A case of diverse purchaser classification. Int. J. Bus. Anal. Intell. 2016, 4, 45–50. [Google Scholar]
- Kamel, M.H. Topic discovery from text using aggregation of dlfferent clustering methods. In Advances in Artificial Intelligence. Canadian AI 2002; Lecture Notes in Computer Science; Cohen, R., Spencer, B., Eds.; Springer: Berlin, Heidelberg, 2002; Volume 2338, pp. 161–175. [Google Scholar]
- Zahrotun, L. Implementation of data mining technique for customer relationship management (CRM) on online shop tokodiapers.com with fuzzy c-means clustering. In Proceedings of the 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 1–2 November 2017; pp. 299–303. [Google Scholar]
- Tong, L.; Wang, Y.; Wen, F.; Li, X. The research of customer loyalty improvement in telecom industry based on NPS data mining. China Commun. 2017, 14, 260–268. [Google Scholar] [CrossRef]
- Shah, S.; Singh, M. Comparison of a Time Efficient Modified K-mean Algorithm with K-Mean and K-Medoid Algorithm. In Proceedings of the 2012 International Conference on Communication Systems and Network Technologies, Rajkot, India, 11–13 May 2012; pp. 435–437. [Google Scholar]
- Liu, C.C.; Chu, S.W.; Chan, Y.K.; Yu, S.S. A Modified K-Means Algorithm—TwoLayer K-Means Algorithm. In Proceedings of the 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kitakyushu, Japan, 27–29 August 2014; pp. 447–450. [Google Scholar] [CrossRef]
- Cho, Y.; Moon, S.C. Weighted mining frequent pattern-based customer’s RFM score for personalized u-commerce recommendation system. J. Converg. 2013, 4, 36–40. [Google Scholar]
- Jiang, T.; Tuzhilin, A. Improving personalization solutions through optimal segmentation of customer bases. IEEE Trans. Knowl. Data Eng. 2009, 21, 305–320. [Google Scholar] [CrossRef] [Green Version]
- Lu, H.; Lin, J.; Lu, J.; Zhang, G. A customer churn prediction model in telecom industry using boosting. IEEE Trans. Ind. Inf. 2014, 10, 1659–1665. [Google Scholar] [CrossRef]
- He, X.; Li, C. The research and application of customer segmentation one-commerce websites. In Proceedings of the 2016 6th International Conference on Digital Home(ICDH), Guangzhou, China, 2–4 December 2016; pp. 203–208. [Google Scholar] [CrossRef]
Factor of Influence | Contents | Literature Review |
---|---|---|
Data acquisition | Bayesian classification is statistical | (Haiying 2010) [32] |
Network classification | The task of data mining | (Sheshasaayee 2017) [33] |
Clustering ways | Apply clustering to customer segmentation | (Srivastava 2016) [34] |
Purchase frequency | Artificial neural network is an analysis | (Zhang 2002) [35] |
Modified k-means | HK clustering network | (Zahrotun 2017) [36] |
K-means algorithm | K-means clustering algorithm is a method based on centroid | (Tong 2017) [37] |
Hierachical Clustering Methods | K-Means Clustering | Fuzzy C-Means |
---|---|---|
(Cho 2013) [40] | (Jiang 2009) [41] | (Lu 2014) [42] |
Hierarchical clustering is a method of classifying classes from variable to less. The steps of classification are as follows: Each sample falls into one category, and there are different categories at this time. Calculate the distance between each sample and classify the nearest two samples into one category; Calculate the distance between the new class and other classes, and then merge the two nearest classes. It is still greater than 1, continue to repeat the above steps until all samples are classified into one category, then stop. The distance between samples has different definition methods, and there are also different definitions of distance between class methods, which produces different hierarchical clustering methods. | K-means algorithm adopts iterative updating method: K clustering centers are used in each iteration. Form the surrounding points into K clusters, and recalculate the centroid of each cluster (i.e., the plane of all points in the cluster). The average (i.e., geometric center) will be used as the reference point for the next iteration. Iteratively generates the selected reference point. The closer it gets to the real clustering centroid, the objective function becomes increasingly smaller, and the clustering effect progressively better. | Fuzzy c-means is a clustering method that allows specific data to appear in multiple clusters. It does not determine the member history of the data point in a given cluster. Instead, a specific data point calculates data belonging to the cluster. The advantage of fuzzy c-means over k-means is that the results obtained for large, similar data sets are better than those of the k-means algorithm, where data points must completely exist in a cluster. |
Hypothesis Number | Alternative Hypothesis | Previous Research |
---|---|---|
H1 | Association rule analysis | (Haiying 2010) [32] |
H2 | Cluster analysis | (Sheshasaayee 2017) [33] |
H3 | Classification square analysis | (Srivastava 2016) [34] |
H4 | Customer value evaluation method | (Zhang 2002) [35] |
H5 | Customer classification model | (Zahrotun 2017) [36] |
H6 | Implementation of cluster analysis algorithm | (Tong 2017) [37] |
H7 | Model application example analysis | (Shah 2012) [38] |
H | The Hypothesis |
---|---|
Ha1 | The phenomenon of improved clustering algorithm in retail customer classification depends significantly on age. |
Ha2 | The phenomenon of improved clustering algorithm in retail customer classification depends significantly on gender. |
Ha3 | The phenomenon of improved clustering algorithm in retail customer classification depends significantly on marital status. |
Segment | (Haiying 2010) [32] | 178 | 2.00 | 5.00 | 3.9101 | 0.67726 |
Education | (Sheshasaayee 2017) [33] | 178 | 2.00 | 5.00 | 4.2191 | 0.53786 |
Account age (months) | (Srivastava 2016) [34] | 178 | 1.00 | 5.00 | 2.4326 | 1.27490 |
Occupation | (Zhang 2002) [35] | 178 | 2.00 | 5.00 | 4.4213 | 0.59136 |
Income | (Zahrotun 2017) [36] | 178 | 1.00 | 5.00 | 4.1503 | 0.65167 |
Purchase frequency | (Tong 2017) [37] | 178 | 1.00 | 5.00 | 4.1742 | 0.79405 |
Purchase amount | (Shah 2012) [38] | 178 | 2.00 | 5.00 | 4.6404 | 0.62431 |
Shopping satisfaction | (Liu 2014) [39] | 178 | 3.00 | 5.00 | 4.3090 | 0.61067 |
Indicators | Beta | T | Sig. |
---|---|---|---|
(Constant) | 0.772 * | 2.097 | 0.037 |
Occupation | 0.334 ** | 5.323 | 0.000 |
Income | 0.215 * | 2.522 | 0.013 |
Purchase frequency | 0.032 | 1.071 | 0.286 |
Purchase amount | 0.201 ** | 2.854 | 0.005 |
Shopping satisfaction | −0.070 | −1.060 | 0.291 |
Indicators | Tolerance | VIF |
---|---|---|
Education | 0.676 | 1.480 |
Occupation | 0.581 | 1.721 |
Income | 0.843 | 1.186 |
Purchase frequency | 0.705 | 1.418 |
Purchase amount | 0.656 | 1.523 |
Shopping satisfaction | 0.744 | 1.344 |
Account age (months) | 0.607 | 1.647 |
(a) HK Random Seeds (Minimum Variation Solution) | |||||||
---|---|---|---|---|---|---|---|
Segment | Number of Credit Cards Used | Account Age (Months) | Days since First Purchase | Days since Last Purchase | Total Number of Orders | Total Dollars Spent | Segment Size |
One | 1.6 | 39.8 | 797.2 | 319.2 | 4.0 | 425.67 | 603 |
Two | 1.2 | 54.8 | 824.0 | 659.4 | 1.6 | 137.02 | 648 |
Three | 0.9 | 31.7 | 541.0 | 452.5 | 1.4 | 113.53 | 1173 |
Four | 1.5 | 48.0 | 454.4 | 335.2 | 1.6 | 144.16 | 645 |
Five | 1.0 | 33.8 | 858.3 | 795.8 | 1.4 | 139.85 | 1248 |
(b) K-Means Rational Seeds | |||||||
Segment | Number of Credit Cards Used | Account Age (months) | Days since First Purchase | Days since Last Purchase | Total Number of Orders | Total Dollars Spent | Segment Size |
One | 1.1 | 56.8 | 712.1 | 591.9 | 1.5 | 133.00 | 684 |
Two | 1.0 | 35.0 | 480.3 | 417.8 | 1.4 | 115.83 | 1296 |
Three | 0.9 | 34.7 | 839.2 | 794.7 | 1.4 | 138.48 | 1383 |
Four | 2.0 | 38.2 | 787.2 | 333.8 | 2.4 | 207.74 | 605 |
Five | 1.4 | 41.5 | 765.6 | 322.3 | 4.7 | 536.43 | 351 |
(c) Normal Mixtures Rational Seeds | |||||||
Segment | Number of Credit Cards Used | Account Age (months) | Days since First Purchase | Days since Last Purchase | Total Number of Orders | Total Dollars Spent | Segment Size |
One | 1.5 | 40.5 | 754.8 | 273.2 | 4.3 | 429.05 | 404 |
Two | 3.1 | 45.9 | 725.3 | 557.4 | 1.6 | 163.77 | 141 |
Three | 4.0 | 51.5 | 871.0 | 188.5 | 6.0 | 787.52 | 12 |
Four | 1.0 | 38.9 | 689.4 | 577.6 | 1.5 | 136.20 | 3729 |
Five | 1.2 | 41.6 | 809.7 | 483.0 | 4.9 | 1044.47 | 31 |
The rational seeds for K-means and normal mixtures were based on the centroid locations obtained from hierarchical clustering using the average method. | |||||||
Total within-segment variation for a real-world data set using standardized variables (N = 4317) | |||||||
Seed-type | Clustering algorithm | ||||||
HK | K-means | Normal mixtures | |||||
Rational | 19.29 | 19.66 | 29.21 | ||||
(based on hierachical clustering using average method) | |||||||
Random | 19.23 | 20.47 | 27.31 | ||||
(mean across 250 analyses) |
(a) Average HARI by Clustering Algorithm and Data Complexity for N = 72 Artificial Data Sets | ||||||
---|---|---|---|---|---|---|
Algorithm | Cluster density Equal 0.60 | Outliers None 0.20 | Variables and noisy Variables | |||
8 + 1 | 8 + 2 | |||||
Noisy | Noisy | |||||
HK | 0.92 | 0.74 | 0.93 | 0.72 | 0.84 | 0.82 |
K-means | 0.80 | 0.87 | 0.92 | 0.75 | 0.82 | 0.85 |
Normal mixtures | 0.66 | 0.73 | 0.92 | 0.48 | 0.68 | 0.72 |
(b) Average Total within-Segment Variation by Clustering Algorithm and Data Complexity for N = 72 Artificial Data Sets | ||||||
Algorithm | Cluster density Equal 0.60 | Outliers None 0.20 | Variables and noisy Variables | |||
8 + 1 | 8 + 2 | |||||
Noisy | Noisy | |||||
HK | 21.82 | 24.76 | 20.29 | 26.31 | 21.39 | 25.20 |
K-means | 23.87 | 29.13 | 20.34 | 32.66 | 26.24 | 26.77 |
Normal mixtures | 43.95 | 43.64 | 20.40 | 67.20 | 42.17 | 45.42 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fang, C.; Liu, H. Research and Application of Improved Clustering Algorithm in Retail Customer Classification. Symmetry 2021, 13, 1789. https://doi.org/10.3390/sym13101789
Fang C, Liu H. Research and Application of Improved Clustering Algorithm in Retail Customer Classification. Symmetry. 2021; 13(10):1789. https://doi.org/10.3390/sym13101789
Chicago/Turabian StyleFang, Chu, and Haiming Liu. 2021. "Research and Application of Improved Clustering Algorithm in Retail Customer Classification" Symmetry 13, no. 10: 1789. https://doi.org/10.3390/sym13101789
APA StyleFang, C., & Liu, H. (2021). Research and Application of Improved Clustering Algorithm in Retail Customer Classification. Symmetry, 13(10), 1789. https://doi.org/10.3390/sym13101789