Spam Classification Based on Machine Learning Algorithm

Authors

  • Yichun Huang

DOI:

https://doi.org/10.54097/hset.v34i.5371

Keywords:

Machine Learning, Convolution Neural Network, Recurrent Neural Network, Long Short-Term Memory, Naïve Bayes.

Abstract

As technology continues to advance, email is used in almost every field. However, the dramatic increase in the number of spam emails has led to a growing need for accurate and powerful spam classifiers. Unfortunately, spam in its many forms is constantly being updated as the Internet evolves, and the challenge of fighting spam is enormous. With the emerging of deep learning methods, deep based algorithms are widely applied in most of the real-world scene and tasks. In this paper, we used CNN, RNN, LSTM and naïve Bayes models to implement spam filtering, and compared them based on information such as accuracy, model strengths and weaknesses. We test all the models on the public dataset and measure them by four metrics, including accuracy, precision, recall and F1-score. Finally, we conclude that the naïve bayes model achieves the best performance among all those methods, which can deal with the threat of spam efficiently.

Downloads

Download data is not yet available.

References

Christina V, Karpagavalli S, Suganya G. Email spam filtering using supervised machine learning techniques [J]. International Journal on Computer Science and Engineering (IJCSE), 2010, 2(09): 3126-3129.

Awad W A, ELseuofi S M. Machine learning methods for spam e-mail classification [J]. International Journal of Computer Science & Information Technology (IJCSIT), 2011, 3(1): 173-184.

Rusland N F, Wahid N, Kasim S, et al. Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets [C]//IOP conference series: materials science and engineering. IOP Publishing, 2017, 226(1): 012091.

Pantel P, Lin D. Spamcop: A spam classification & organization program [C]//Proceedings of AAAI-98 Workshop on Learning for Text Categorization. 1998: 95-98.

Ning B, Junwei W, Feng H. Spam message classification based on the Naïve Bayes classification algorithm [J]. IAENG International Journal of Computer Science, 2019, 46(1): 46-53.

Zhang W. Spam filter through deep learning and information retrieval [D]. Dissertation, Johns Hopkins University, 2018.

Metlapalli A C, Muthusamy T, Battula B P. Classification of Social Media Text Spam Using VAE-CNN and LSTM Model [J]. Ingénierie des Systèmes d Inf., 2020, 25(6): 747-753.

Joshua Kim. Understanding how Convolutional Neural Network (CNN) perform text classification with word embeddings. Tech Report, 2019.

Gomaa W H. The impact of Deep Learning Techniques on SMS spam filtering [J]. International Journal of Advanced Computer Science and Applications, 2020, 11(1).

Raj H, Weihong Y, Banbhrani S K, et al. Lstm based short message service (sms) modeling for spam classification [C]//Proceedings of the 2018 International Conference on Machine Learning Technologies. 2018: 76-80.

Downloads

Published

28-02-2023

How to Cite

Huang, Y. (2023). Spam Classification Based on Machine Learning Algorithm. Highlights in Science, Engineering and Technology, 34, 32-38. https://doi.org/10.54097/hset.v34i.5371