Machine Learning Methods for Credit Card Fraud Detection

Authors

  • Yihong He

DOI:

https://doi.org/10.54097/hset.v23i.3204

Keywords:

Machine learning, K-nearest neighbor, random forest, Support vector machine.

Abstract

Machine learning is an innovative and efficient tool to prevent credit card fraud, however, given the variety of machine learning models, which model is the most suitable for fraudulent transaction predictions becomes a tough question to answer. In this research, a comprehensive evaluation method is borrowed to compare performances between different machine learning models. More precisely, this research uses the Area under the ROC Curve (AUC) metric to evaluate and compare performances between four different machine learning models with the same transaction information dataset. The four models are K Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine. In this research, a dataset that contains over one million credit card transaction data is processed and divided into training data and testing data. After preprocessing, the same training data are fitted into four different models and being test against the same testing data. After a series of hyperparameter tuning, the AUC score of each model is obtained and compared. The comparison result indicates that Random Forest makes the most accurate and consistent predictions on fraudulent transactions in this dataset, and thus can be recommended as the primary machine learning algorithm to prevent credit card fraudulent transactions.

Downloads

Download data is not yet available.

References

Shift, “Credit Card Statistics.” Shift Credit Card Processing, 2021, https://shiftprocessing.com/credit-card/#:~:text=70%25%20of%20the%20United%20States,dispatched%20among%20multiple%20different%20outlets.

The Motley Fool, “How to Avoid Credit Card Fraud and Scams.” 2022, https://www.fool.com/the-ascent/credit-cards/scams-fraud-how-avoid/#:~:text=How%20do%20credit%20card%20companies,to%20look%20for%20unusual%20transactions.

Wandre, S., et al. “Cerdit Card Fraud Detection Using KNN & Navie Bayes Algorithm.” JETIR, JETIR(Www.jetir.org), https://www.jetir.org/view?paper=JETIR2204420.

Meier, T. M., “Early Detecting Credit Card Frauds.” Medium, Towards Data Science, 5 Jan. 2022, https://towardsdatascience.com/early-detecting-credit-card-frauds-38db7c190e44.

Zhang, S., et al. "Learning k for knn classification." ACM Transactions on Intelligent Systems and Technology (TIST) 8.3 (2017): 1-19.

Yu, Q., et al. "Clustering Analysis for Silent Telecom Customers Based on K-means++." 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Vol. 1. IEEE, 2020.

LaValley, M. P. "Logistic regression." Circulation 117.18 (2008): 2395-2399.

Kleinbaum, D. G., et al. "Logistic regression." New York: Springer-Verlag, 2002.

Probst, P., “Hyperparameters of the Support Vector Machine.” Hyperparameters of the Support Vector Machine – Philipp Probst – Statistician, Data Scientist, Football Player, Alpinist, https://philipppro.github.io/Hyperparameters_svm_/.

Kumar, A., “Cosine Similarity & Cosine Distance.” Medium, DataDrivenInvestor, 5 July 2020.

Rohith, G., “Support Vector Machine — Introduction to Machine Learning Algorithms,” https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47.

Downloads

Published

03-12-2022

How to Cite

He, Y. (2022). Machine Learning Methods for Credit Card Fraud Detection. Highlights in Science, Engineering and Technology, 23, 106-110. https://doi.org/10.54097/hset.v23i.3204