Intrusion Detection System with Supervised Learning Models
DOI:
https://doi.org/10.54097/hset.v23i.3215Keywords:
Intrusion Detection Systems (IDSs); network anomaly detection; machine learning algorithms; accuracy score; confusion matrix.Abstract
Intrusion Detection Systems (IDSs) can analyze and detect abnormal network activity, which addresses potential attacks based on studying and analyzing past attacks. This paper uses four supervised machine learning methods, which are logistic regression, decision tree, support vector machine, and random forest, to detect these abnormal attacks. The dataset used in this paper is from KDDCUP’99, a publicly available dataset for network-based anomaly detection systems. Certain features and outcomes are first extracted from the dataset. The values in nominal features are converted into dummy variables, and the values in outcome are changed to either normal or attack. Then the training processes are performed with the four algorithms, and the models are tested to get the accuracy scores. According to the results, the logistic regression model has the highest accuracy score of 0.9415, and the other three models all have accuracy scores above 0.90. The accuracy scores of the decision tree, support vector machine, and random forest are 0.9317, 0.9374, and 0.9202, respectively. Our models turn out to be efficient in identifying the network anomaly with provided data.
Downloads
References
Erdbrink, T. (2012). Iranian Oil Sites Go Offline Amid Cyberattack. The New York Times.
What is an Intrusion Detection System (IDS)?. Check Point. https://www.checkpoint.com/cyber-hub/network-security/what-is-an-intrusion-detection-system-ids/
Hoffman, J. Different Types of Intrusion Detection System (IDS). WisdomPlexus. https://wisdomplexus.com/blogs/different-types-of-intrusion-detection-systems-ids/
Anush. Network Anomaly Detection. Kaggle. https://www.kaggle.com/datasets/anushonkar/network-anamoly-detection?resource=download
Allison, P. D. (1999) Logistic regression using the sas system: theory and application. SAS Publishing.
Tolles, Juliana; Meurer, William J (2016). "Logistic Regression Relating Patient Characteristics to Outcomes". JAMA. 316 (5): 533–4. doi:10.1001/jama.2016.7653. ISSN 0098-7484. OCLC 6823603312. PMID 27483067.
Y. LeCun, L. D. Jackel, L. Bottou, A. Brunot, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Sackinger, P. Simard, V. Vapnik. (1995) Comparison of learning algorithms for handwritten digit recognition. In: International Conference on Artificial Neural Networks. Paris.
Bujokas, E. (2022) Feature Importance in Decision Trees. Towards Data Science. https://towardsdatascience.com/feature-importance-in-decision-trees-e9450120b445
Gandhi, R (2018). Support Vector Machine — Introduction to Machine Learning Algorithms. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
IBM Cloud Education. Random Forest. IBM. https://www.ibm.com/cloud/learn/random-forest#:~:text=Random%20forest%20is%20a%20commonly,both%20classification%20and%20regression%20problems
Scikit-learn. Accuracy Score. Sklearn. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
Scikit-learn. Confusion Matrix. Sklearn. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
Scikit-learn. Decision Tree Classifier. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







