Shopping Mall Customers Clustering Based on LOF and K-means++
DOI:
https://doi.org/10.54097/hset.v57i.9894Keywords:
K-means , LOF, Customer segmentation.Abstract
Effective customer analysis is vital for business success, and AI-based customer analysis can significantly improve its accuracy, particularly regarding customer grouping and labeling, enabling merchants to generate personalized marketing strategies. Theories on machine learning models for grouping data have been developed since the 1950s, including logistic regression, Support Vector Machine (SVM), decision tree, and random forest; however, computational limitations hindered their practical application. Recent advances in computer technology have led to the development of more accessible machine learning algorithms that generate high-value results. The K-means clustering algorithm is one such model that best fits the customer labeling requirements. As an unsupervised training model, the K-means algorithm clusters customer data into a predetermined number of clusters. In this paper, we apply the K-means algorithm to separately cluster data on male and female clients, while using the K-means++ model to keep initial cluster centers as far apart as possible. We also apply the LOF algorithm to remove any outliers and modify the dataset accordingly.
Downloads
References
Tanveer M, Khan N, Ahmad A R. AI Support Marketing: Understanding the Customer Journey towards the Business Development. 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA). IEEE, 2021: 144-150.
Moor J. The Dartmouth College artificial intelligence conference: The next fifty years. Ai Magazine, 2006, 27(4): 87-87.
Chao X, Kou G, Li T, et al. Jie Ke versus AlphaGo: A ranking approach using decision making method for large-scale data with incomplete information. European Journal of Operational Research, 2018, 265(1): 239-247.
Liaw A, iener M. Classification and Regression by random Forest. R News, 2002, 23(23).
Hartigan J A, Wong M A. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), 1979, 28(1): 100-108.
Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying Density-Based Local Outliers. Acm Sigmod International Conference on Management of Data. ACM, 2000.
Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. Proc. Of the Eighteenth Annual ACM – SIAM Symposium on Discrete Algorithms (SODA)Society for Industrial and Applied Mathematics, Philadelphia, 2007:1027-1035.
Yu Q, Chen P, Lin Z, et al. Clustering Analysis for Silent Telecom Customers Based on K-means++. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 2020, 1: 1023-1027.
Jenssen R, Eltoft T. A new information theoretic analysis of sum-of-squared-error kernel clustering. Neurocomputing, 2008, 72(1-3): 23-31.
Rousseeuw P J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 1987, 20: 53-65.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







