Identifying Relevant Features of CSE-CIC-IDS2018 Dataset for the Development of an Intrusion Detection System

Jul-21-2023–arXiv.org Artificial Intelligence

Intrusion detection systems (IDSs) are essential elements of IT systems. Their key component is a classification module that continuously evaluates some features of the network traffic and identifies possible threats. Its efficiency is greatly affected by the right selection of the features to be monitored. Therefore, the identification of a minimal set of features that are necessary to safely distinguish malicious traffic from benign traffic is indispensable in the course of the development of an IDS. This paper presents the preprocessing and feature selection workflow as well as its results in the case of the CSE-CIC-IDS2018 on AWS dataset, focusing on five attack types. To identify the relevant features, six feature selection methods were applied, and the final ranking of the features was elaborated based on their average score. Next, several subsets of the features were formed based on different ranking threshold values, and each subset was tried with five classification algorithms to determine the optimal feature set for each attack type. During the evaluation, four widely used metrics were taken into consideration.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Jul-21-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Texas > Travis County
    - Austin (0.04)
- Europe > Hungary
  - Bács-Kiskun County > Kecskemét (0.04)
- Asia > Japan
  - Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:
- Research Report (1.00)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Communications > Networks (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Uncertainty
      - Bayesian Inference (0.93)
      - Fuzzy Logic (0.67)
    - Machine Learning
      - Statistical Learning (1.00)
      - Performance Analysis > Accuracy (1.00)
      - Neural Networks (0.93)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found