An Integrated Learning-Based Prediction Model for Purchasing Propensity of Jingdong Visitors

Authors

  • Yizheng Liu
  • Hengkui Zhang
  • Hangjun Ren

DOI:

https://doi.org/10.54097/hset.v70i.12146

Keywords:

Random Forest, GBDT, XGBoost, LightGBM, CatBoost.

Abstract

The detection of purchasing behaviour of consumer customers belongs to the classical machine learning binary classification problem, which requires the use of a new dataset after downsampling to build a machine learning model and the selection of parameters for the visual display of the model.In this paper, we first performed data cleaning and pre-processing of the data, explored the complexity, non-linearity and other properties of the original data, performed logarithmic transformation and normalisation of the data pre-processing, and then selected the data features, established multiple machine learning models for modelling, adjusted the parameters using cross-validation and grid tuning, further optimised the models, calculated the evaluation metrics of the models, and compared the models, and finally used cross-validation and grid tuning to adjust parameters, further optimise the models, and calculate evaluation metrics of the models. The models were compared and finally the selected models were integrated using the Stacking method. Then the probability calibration was performed and the models were interpreted using SHAP values and PDP plots. This paper also adopts a rich visualisation approach for data and model presentation during data processing and modelling.

Downloads

Download data is not yet available.

References

Su Yuteng, Lv Shiyun, Xie Wenhan, Li Yuan, Ouyang Yixin, Xue Yongxi, Hu Meiling, Li Shuting, Zhou Hang, Liu Xiangtong. Analysis of risk factors for the development of type 2 diabetes mellitus based on LASSO regression and random forest algorithm [J]. Journal of Environmental Health,2023,13(07):485-495.

KONG Yaqi, LIU Yu. Design and implementation of fitness counting system based on BlazePose and KNN [J]. Software Engineering,2023,26(07):58-62.

JIA Ying,ZHAO Feng,LI Bo,GE Shiyu. Bayesian optimisation of XGBoost credit risk assessment model [J]. Computer Engineering and Applications:1-15.

ZHOU Wentao, WEI Guangtao,WANG Zeli, ZHANG Xiaochen,REN Lizhi. A probabilistic short-term load forecasting method based on LightGBM at night economy user level [J]. Frontiers of Data and Computing Development,2023,5(03):160-168.

Li Hongsheng,Li Kuniu,Wang Yang,Gao Fei,Zhang Yu,Xie Hongfu. Robust optimal scheduling model for grid-connected electric vehicle clusters based on SVC [J]. High Voltage Technology:1-9.

Zheng Lijia,Song Bing. Pre-pruning and optimisation of decision tree classification algorithms [J]. Automation Instrumentation,2023,44(05):56-62.

HOU Tianbao,WANG Aiyin. Personal credit assessment based on Stacking feature-enhanced multi-granularity cascade logistic [J]. Journal of Henan Normal University (Natural Science Edition), 2023,51(03):111-122.

JIN Cui, LIU Yang,LI Qi,ZHAO Molin,MO Xianyao,WANG Ying. CatBoost-based arcing fault identification method for commonly used electrical loads [J]. Electrical Measurement and Instrumentation,2023,60(07):193-200.

ZHAO Miao, WANG Xiaolei, ZHU Liwen, SHEN Jie, ZHANG Jian. Research on user repurchase behaviour prediction and marketing strategy scheme based on CatBoost-RBF fusion algorithm [J]. Inner Mongolia Science and Economy,2023, (07):74-77+90.

Wang Shaoqing. Research on hotel pricing under online tourism business model [D]. Beijing university of industry and commerce, 2019.

Downloads

Published

15-11-2023

How to Cite

Liu, Y., Zhang, H., & Ren, H. (2023). An Integrated Learning-Based Prediction Model for Purchasing Propensity of Jingdong Visitors. Highlights in Science, Engineering and Technology, 70, 60-66. https://doi.org/10.54097/hset.v70i.12146