Enhancing Credit Card Fraud Detection on Imbalanced Datasets

Authors

  • Chao Liu

DOI:

https://doi.org/10.54097/hbem.v21i.14759

Keywords:

Imbalanced Data; Credit Card Fraud; Tree-Based Method; XGBoost; undersampling.

Abstract

Abstract. The use of credit cards is becoming more and more popular in today's society, especially now with the prevalence of electronic payments. Electronic payments are available through bank apps or payment processors like PayPal, Alipay, etc. Without using cash, people are taking advantage of the convenience of them both online and offline. In addition to bringing people the benefits of efficiency and convenience, credit card fraud has also emerged and caused a great deal of economic losses for cardholders, as well as causing great trouble for banks. The primary goal of this work is to identify fraudulent transactions in an unbalanced dataset. The dataset comprises credit card transactions from just two days in 2013 in Europe. In this study, the original data, the original data with Stratified-KFolds, and the undersampled data will be compared. It is found that undersampling, although it reduces the accuracy by a small amount, can greatly improve the detection of fraudulent transactions. Meanwhile, this study uses different models, one is Logistic Regression, and the others are all Tree-Based method. The study analyzes their confusion matrices, ROC curves, and Precession-Recall curves. The results show that in the undersampling rate dataset, the recall, precision, F1 score and accuracy of Xgboost are optimized to 93.2%, 97%, 95%, and 95% respectively, and the AUC of both the ROC curve and the Presicion-Recall curve are optimized to 99%, so this study concludes that XGboost is the best performer. With excellent algorithms, we can better avoid the leakage of information and loss of money in real life.

Downloads

Download data is not yet available.

References

Ghai, V., & Kang, S. S. Role of Machine Learning in Credit Card Fraud Detection. In 2021 3rd International Conference on Advances in Computing, Communication Control and Networking, 2021: 939-943.

Bodepudi, H. Credit card fraud detection using unsupervised machine learning algorithms. Int J Comput Trends Technol, 2021, 69: 1-13.

Dablain, D., Krawczyk, B., & Chawla, N. V. DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 2022.

Wang, H. Y. Research on Credit Card Fraud Detection Scheme Based on Machine Learning. Thesis for master’s degree. Beijing University of Posts and Telecommunications, 2019.

Alarfaj, F. K., Malik, I., Khan, H. U., Almusallam, N., Ramzan, M., & Ahmed, M. Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access, 2022, 10: 39700-39715.

Kumar, M. S., Soundarya, V., Kavitha, S., Keerthika, E. S., & Aswini, E. Credit card fraud detection using random forest algorithm. In 2019 3rd International Conference on Computing and Communications Technologies, 2019: 149-153.

Liu, J., Zhang, Z., Li, X., et al. Machine learning assisted phase and size-controlled synthesis of iron oxide particles. Chemical Engineering Journal, 2023, 473: 145216.

Luo, Z. Q. Improvement and Application of Random Forest Algorithm in Credit Card Fraud Detection. Thesis for master’s degree, Beijing Forestry University, 2020.

Itoo, F., Meenakshi, & Singh, S. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology, 2021, 13: 1503-1511.

LaValley, M. P. Logistic regression. Circulation, 2008, 117(18): 2395-2399.

James, G. G. M. An introduction to statistical learning: with applications in Python.2023.

Downloads

Published

12-12-2023

How to Cite

Liu, C. (2023). Enhancing Credit Card Fraud Detection on Imbalanced Datasets. Highlights in Business, Economics and Management, 21, 765-773. https://doi.org/10.54097/hbem.v21i.14759