Research on Customer Churn Detection Based on K-means Clustering and Logistic Regression
DOI:
https://doi.org/10.54097/nj6k9065Keywords:
Customer churn detection; K-means clustering; Logistic regression; Data preprocessing; Customer Relationship ManagementAbstract
In the digital age, customer churn has become a key factor restricting the sustainable development of enterprises. Traditional analytical methods that rely on experience-based judgment and simple statistics lack systematicity and accuracy, making it difficult to adapt to complex market environments and customer behavior patterns. This study aims to construct a high-precision and practical customer churn detection model to overcome the limitations of traditional methods. First, customer data is optimized through preprocessing steps such as data cleaning, standardization, and feature engineering. Then, based on the RFM model, the K-means clustering algorithm is used to accurately segment customers, identifying different characteristic groups such as core customers, maintenance customers, and risk customers. Finally, a churn prediction model is constructed by combining the logistic regression algorithm and validated using a real telecommunications customer dataset. The model performance is evaluated using metrics such as accuracy, recall, and F1 score. Experimental results show that the K-means + logistic regression combined model achieves a prediction accuracy of 87.5%, a recall of 82.3%, and an F1 score of 84.8%, significantly improving performance compared to traditional models such as logistic regression and random forest alone. Furthermore, age and cash deposit balance are key characteristics influencing customer churn. This research provides scientific decision support for enterprise customer relationship management, helps formulate differentiated customer retention strategies, and has significant practical application value.
References
[1]Hu Tianjie. Feature extraction method that preserves nearest neighbor subspace[J]. Computer Applications and Software, 2024.
[2]Zheng Tong, Shen Ya, Zhang Lijie. Evaluation of digital maturity of Chinese garment enterprises[J]. Wool Textile Technology, 2024.
[3]Chou Fan. Comparative analysis of three velocity field models in CORS reference station time series[J]. Surveying and Mapping Technology and Equipment, 2024.
[4]Yang Guoan, Wang Zhigang. Clustering method using multi-label information[J]. Journal of Tianjin University of Technology, 2024.
[5]Yang Ze. Application of K-means clustering algorithm with PSO integration in financial analysis of colleges and universities[J]. Information and Computer (Theoretical Edition), 2024.
[6]Cao Guilin, Yang Xuliang, Wang Ruofan. Bank customer churn analysis based on machine learning[J]. Journal of Shandong Business Vocational College, 2024.
[7]Huang Junping. Network Intrusion Detection Based on Improved K-means Data Clustering Algorithm [J]. Journal of Chengdu Institute of Technology, 2024.
[8]Jiang Xiaowei, Zhang Wenjin, Liu Ling. Research Progress of Machine Learning Methods in Health Monitoring of Composite Material Structures [J]. Aviation Manufacturing Technology, 2024.
[9] Li Hengbo, Liu Jingchao, Wu Ketong. Image Segmentation Based on Improved K-means Algorithm [J]. Modern Computer, 2024.
[10]Gao Haibin. A K-means Clustering Algorithm Integrating Crow Search Algorithm [J]. Journal of Xinxiang University, 2024.
[11]Sun Lin, Liu Menghan. K-means Clustering Based on Adaptive Cuckoo Optimization Feature Selection [J]. Computer Applications, 2024.
[12]Zhang Fan, Gao Shan. Abnormal Packet Detection Method of Modbus TCP Protocol in Industrial Control Network [J]. Microcomputer Applications, 2024.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Jincheng Wu, Qi Li, Xijin Zhou

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







