A New Ensemble Model Based on Machine Learning Algorithms for the Spam-Filtering

Authors

  • Zixuan Lin

DOI:

https://doi.org/10.54097/hset.v57i.9896

Keywords:

Machine Learning, Scikit-Learn, Spam-filter.

Abstract

Due to the rapid increasement about the information in email, it is necessary to improve the technology of spam-filter. Based on the data on website, this study used python and Scikit-Learn library to process the data. Training the data by using Logistic Regression, Support Vector Machine, Naïve Bayes, Random Forest, and Decision Tree these five models, and calculate the evaluation measures for the rest test. From the results of the evaluation to infer the possible logical formula between these five models and calculate their evaluation measures. Comparing the final evaluation measures from the single model’s and the logical formula’s to find one possible logical formula that is better than the single model. The experimental results demonstrated that, each models have different efficiencies about the spam-filter, and the logical formula will improve some scores in evaluation measures, which means the logical formula using multiple models will improve the technology of spam-filter.

Downloads

Download data is not yet available.

References

Statista. Number of sent and received e-mails per day worldwide from 2017 to 2025 available as: https://www.statista.com/statistics/456500/daily-number-of-e-mails-worldwide/

Kaur G, et al. A Survey on Various Classification Techniques in Email Spamming. International Journal of Technology and Computing (IJTC) 5.3, 2016, 589-593.

Jordan M I., and Tom M. Mitchell. Machine learning: Trends, perspectives, and prospects. Science 349.6245, 2015, 255-260.

Rusland N F et al. Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets. IOP conference series: materials science and engineering. Vol. 226. No. 1. IOP Publishing, 2017.

DeBarr D, and Harry W. Spam detection using clustering, random forests, and active learning. Sixth Conference on Email and Anti-Spam. Mountain View, California, 2009.

Amayri O, and Nizar B. "A study of spam filtering using support vector machines. Artificial Intelligence Review 34, 2010, 73-108.

Olatunji S O. Improved email spam detection model based on support vector machines. Neural Computing and Applications 31, 2019, 691-699.

Alghoul A, et al. Email classification using artificial neural network, 2018.

Dedeturk B K, and Bahriye A. Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Applied Soft Computing 91, 2020, 106229.

Daisy S. Jancy S, and A. R B. Smart material to build mail spam filtering technique using Naive Bayes and MRF methodologies. Materials Today: Proceedings 47, 2021, 446-452.

UCI, machine-learning-databases (unknow) available at: https://archive.ics.uci.edu/ml/machine-learning-databases/00228/

Qader W A., Musa M A, and Bilal I A. An overview of bag of words; importance, implementation, applications, and challenges. 2019 international engineering conference (IEC). IEEE, 2019.

Great Learning Team An Introduction to Bag of Words (BoW) | What is Bag of Words? available at: https://www.mygreatlearning.com/blog/bag-of-words/

Ajitesh Kumar Accuracy, Precision, Recall & F1-Score – Python Examples available at: https://vitalflux.com/accuracy-precision-recall-f1-score-python-example/#:~:text=Recall%20score%20is%20used%20to,the%20classes%20are%20very%20imbalanced, 2013

Downloads

Published

11-07-2023

How to Cite

Lin, Z. (2023). A New Ensemble Model Based on Machine Learning Algorithms for the Spam-Filtering. Highlights in Science, Engineering and Technology, 57, 52-56. https://doi.org/10.54097/hset.v57i.9896