Machine Learning Methods for Credit Card Fraud Detection
DOI:
https://doi.org/10.54097/hset.v23i.3204Keywords:
Machine learning, K-nearest neighbor, random forest, Support vector machine.Abstract
Machine learning is an innovative and efficient tool to prevent credit card fraud, however, given the variety of machine learning models, which model is the most suitable for fraudulent transaction predictions becomes a tough question to answer. In this research, a comprehensive evaluation method is borrowed to compare performances between different machine learning models. More precisely, this research uses the Area under the ROC Curve (AUC) metric to evaluate and compare performances between four different machine learning models with the same transaction information dataset. The four models are K Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine. In this research, a dataset that contains over one million credit card transaction data is processed and divided into training data and testing data. After preprocessing, the same training data are fitted into four different models and being test against the same testing data. After a series of hyperparameter tuning, the AUC score of each model is obtained and compared. The comparison result indicates that Random Forest makes the most accurate and consistent predictions on fraudulent transactions in this dataset, and thus can be recommended as the primary machine learning algorithm to prevent credit card fraudulent transactions.
Downloads
References
Shift, “Credit Card Statistics.” Shift Credit Card Processing, 2021, https://shiftprocessing.com/credit-card/#:~:text=70%25%20of%20the%20United%20States,dispatched%20among%20multiple%20different%20outlets.
The Motley Fool, “How to Avoid Credit Card Fraud and Scams.” 2022, https://www.fool.com/the-ascent/credit-cards/scams-fraud-how-avoid/#:~:text=How%20do%20credit%20card%20companies,to%20look%20for%20unusual%20transactions.
Wandre, S., et al. “Cerdit Card Fraud Detection Using KNN & Navie Bayes Algorithm.” JETIR, JETIR(Www.jetir.org), https://www.jetir.org/view?paper=JETIR2204420.
Meier, T. M., “Early Detecting Credit Card Frauds.” Medium, Towards Data Science, 5 Jan. 2022, https://towardsdatascience.com/early-detecting-credit-card-frauds-38db7c190e44.
Zhang, S., et al. "Learning k for knn classification." ACM Transactions on Intelligent Systems and Technology (TIST) 8.3 (2017): 1-19.
Yu, Q., et al. "Clustering Analysis for Silent Telecom Customers Based on K-means++." 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Vol. 1. IEEE, 2020.
LaValley, M. P. "Logistic regression." Circulation 117.18 (2008): 2395-2399.
Kleinbaum, D. G., et al. "Logistic regression." New York: Springer-Verlag, 2002.
Probst, P., “Hyperparameters of the Support Vector Machine.” Hyperparameters of the Support Vector Machine – Philipp Probst – Statistician, Data Scientist, Football Player, Alpinist, https://philipppro.github.io/Hyperparameters_svm_/.
Kumar, A., “Cosine Similarity & Cosine Distance.” Medium, DataDrivenInvestor, 5 July 2020.
Rohith, G., “Support Vector Machine — Introduction to Machine Learning Algorithms,” https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







