Shopping Mall Customers Clustering Based on LOF and K-means++

Yuhang Deng; Peiwei Wu

doi:10.54097/hset.v57i.9894

Authors

Yuhang Deng
Peiwei Wu

DOI:

https://doi.org/10.54097/hset.v57i.9894

Keywords:

K-means , LOF, Customer segmentation.

Abstract

Effective customer analysis is vital for business success, and AI-based customer analysis can significantly improve its accuracy, particularly regarding customer grouping and labeling, enabling merchants to generate personalized marketing strategies. Theories on machine learning models for grouping data have been developed since the 1950s, including logistic regression, Support Vector Machine (SVM), decision tree, and random forest; however, computational limitations hindered their practical application. Recent advances in computer technology have led to the development of more accessible machine learning algorithms that generate high-value results. The K-means clustering algorithm is one such model that best fits the customer labeling requirements. As an unsupervised training model, the K-means algorithm clusters customer data into a predetermined number of clusters. In this paper, we apply the K-means algorithm to separately cluster data on male and female clients, while using the K-means++ model to keep initial cluster centers as far apart as possible. We also apply the LOF algorithm to remove any outliers and modify the dataset accordingly.

Downloads

Download data is not yet available.

References

Tanveer M, Khan N, Ahmad A R. AI Support Marketing: Understanding the Customer Journey towards the Business Development. 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA). IEEE, 2021: 144-150.

Moor J. The Dartmouth College artificial intelligence conference: The next fifty years. Ai Magazine, 2006, 27(4): 87-87.

Chao X, Kou G, Li T, et al. Jie Ke versus AlphaGo: A ranking approach using decision making method for large-scale data with incomplete information. European Journal of Operational Research, 2018, 265(1): 239-247.

Liaw A, iener M. Classification and Regression by random Forest. R News, 2002, 23(23).

Hartigan J A, Wong M A. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), 1979, 28(1): 100-108.

Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying Density-Based Local Outliers. Acm Sigmod International Conference on Management of Data. ACM, 2000.

Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. Proc. Of the Eighteenth Annual ACM – SIAM Symposium on Discrete Algorithms (SODA)Society for Industrial and Applied Mathematics, Philadelphia, 2007:1027-1035.

Yu Q, Chen P, Lin Z, et al. Clustering Analysis for Silent Telecom Customers Based on K-means++. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 2020, 1: 1023-1027.

Jenssen R, Eltoft T. A new information theoretic analysis of sum-of-squared-error kernel clustering. Neurocomputing, 2008, 72(1-3): 23-31.

Rousseeuw P J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 1987, 20: 53-65.

Shopping Mall Customers Clustering Based on LOF and K-means++

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Indexing

Latest publications