SVM Model Against Telecom Card Fraud Using GA Optimised Ten-Fold Cross-Testing
DOI:
https://doi.org/10.54097/hset.v70i.12169Keywords:
Machine learning, correlation analysis, data normalisation, genetic algorithms.Abstract
The increased number of payment methods also makes it easier for personal information to be stolen by criminals, and for criminals to take over financial payment accounts and steal money. With trillions of bank card transactions occurring every day, Credit Card Fraud Detection (CCFD) is a serious challenge, so this paper predicts "whether or not fraud occurs" by using six types of machine learning models. For problem 1, firstly, "mean, maximum, minimum, median, variance, standard deviation, quartile" are calculated for each indicator; secondly, data cleaning is carried out, and the data set is found to be free of missing values and outliers. Then the data preprocessing work was carried out, min_max normalisation and z-score standardisation were performed on the data. After that, correlation analysis was carried out, and the first four indicators were classified as negative indicators and the last three as positive indicators according to the characteristics of the indicators themselves. It can be found by calculating the Pearson correlation coefficient value after two data processing. Using the coefficient of variation method to calculate the weight of the seven "influence whether fraud" indicators. Finally, BP neural network model, decision tree model, random forest classification model, ELM model, SVM model, logistic regression model are established. For Problem 2, the four models constructed in Problem 1 are solved; to solve the BP neural network model: the data set is divided into training set and testing set according to the ratio of 6:4, and the sigmod function is used as the activation function. For BP neural network, "output >0.5" is recorded as 1, i.e. fraudulent behaviour; "output <0.5" is recorded as 0, i.e. non-fraudulent behaviour. Adjusting the learning rate and the number of iterations, the optimal average mean square error after optimal gradient descent is smaller. To solve the SVM model, the data set is divided into ten groups using the improved ten-fold cross-test, with one group as the training set and nine groups as the validation set, so as to obtain the model with the highest accuracy and the corresponding training data, and then the genetic algorithm is used to search for the optimisation of the kernel parameters in the SVM model on this basis. To solve the decision tree model, the training set and prediction set are divided into 7:3 and solved, and the number of leaf nodes is optimised. Solve the random forest classification model, divided into training set and prediction set according to 7:3 and solved, for similar accuracy choose the random forest classifier when the decision tree is less.
Downloads
References
Wang C, Han D. Credit card fraud forecasting model based on clustering analysis and integrated support vector machine [J]. Cluster Computing, 2019, 22: 13861-13866.
Zhao Qi. Global digital payment is growing [N]. Chinese Journal of Social Science,2022-07-08(003).
Wang Y, Adams S, Beling P, et al. Privacy preserving distributed deep learning and its application in credit card fraud detection [C]//2018 17th IEEE International Conference on Trust, Security and Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, 2018: 1070-1078.
Wiese B, Omlin C. Credit card transactions, fraud detection, and machine learning: Modelling time with LSTM recurrent neural networks [M]//Innovations in neural information paradigms and applications. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009: 231-268.
Evseev S P, Tomashevskyy B P. Two-factor authentication methods threats analysis [J]. Радіоелектроніка, інформатика, управління, 2015 (1 (32)): 52-59.
Roy A, Sun J, Mahoney R, et al. Deep learning detecting fraud in credit card transactions [C]//2018 systems and information engineering design symposium (SIEDS). IEEE, 2018: 129-134.
Hassan A K I, Abraham A. Modeling insurance fraud detection using imbalanced data classification[C]//Advances in Nature and Biologically Inspired Computing: Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing (NaBIC2015) in Pietermaritzburg, South Africa, held December 01-03, 2015. Springer International Publishing, 2016: 117-127.
Zhang Ge. Research on data preprocessing in Course Recommendation Prediction Model [J]. China New Communications,2019,21(19):185.
Yang Kang, XUE Xicheng, LI Shibo. Geological hazard susceptibility evaluation based on GA-optimized SVM model with information integration [J]. Safety and Environmental Engineering,2022,29(03):109-118.
Xu Ge, Zhang Ke. Real Estate Price Evaluation based on Random Forest Model. Statistics and Decision,2014(17):22-25.
Lin Tingting. Research on Grade Prediction Model based on BP Neural Network Algorithm [J]. Computational Technology and Automation,202,41(01):79-81+147.
Tyagi C S, Parwekar P, Singh P, et al. Analysis of Credit Card Fraud Detection Techniques [J]. Solid State Technology, 2020, 63(6): 18057-18069.
Alden M E, Bryan D M, Lessley B J, et al. Detection of financial statement fraud using evolutionary algorithms[J]. Journal of Emerging Technologies in Accounting, 2012, 9(1): 71-94.
Saheed Y K, Hambali M A, Arowolo M O, et al. Application of GA feature selection on Naive Bayes, random forest and SVM for credit card fraud detection[C]//2020 international conference on decision aid sciences and application (DASA). IEEE, 2020: 1091-1097.
Prusti D, Rath S K. Web service-based credit card fraud detection by applying machine learning techniques[C]//TENCON 2019-2019 IEEE Region 10 Conference (TENCON). IEEE, 2019: 492-497.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







