Stream clustering guided supervised learning for classifying NIDS alerts

Risto Vaarandi; Alejandro Guerra-Manzanares

doi:10.1016/j.future.2024.01.032

Stream clustering guided supervised learning for classifying NIDS alerts

Risto Vaarandi, Alejandro Guerra-Manzanares

Research output: Journal Publication › Article › peer-review

8 Citations (Scopus)

Abstract

A Network Intrusion Detection System (NIDS) is a network monitoring technology for identifying cyber attacks, botnet command and control traffic, and other unwanted network activity. Unfortunately, organizational NIDS solutions can often generate tens or hundreds of thousands of alerts on a daily basis, with a significant part of them having low importance or being false positives. Therefore, high priority alerts become hard to spot, which overloads security analysts and complicates their work. The current paper addresses this problem and introduces a machine learning framework for classifying NIDS alerts with the help of stream clustering and supervised learning. We propose a stream-clustering-guided method for creating labeled NIDS alert data sets. The small data sets created using this method can be used for training high-performance supervised NIDS alert classifiers. This significantly reduces the human labeling effort and eases the application of supervised machine learning for NIDS alert classification. The proposed machine learning framework was evaluated on NIDS alerts collected over 2 months from the network of a large academic organization. The experimental results indicate that combining stream clustering and supervised learning into a NIDS alert classification framework significantly decreases the number of false positives, and thus reduces the workload of human security analysts. The framework also features low CPU time and memory consumption and can thus be run on commodity hardware. In conclusion, the proposed framework provides a cost-effective means of integrating machine learning into Security Operation Centers (SOCs). This enables the identification of critical NIDS alerts using high-performance classifiers, thereby assisting in the automation of alert handling tasks for SOC personnel. To address the lack of public data sets in the problem domain and foster further research, we publicly share the large labeled NIDS alert data set used in our experimental setup.

Original language	English
Pages (from-to)	231-244
Number of pages	14
Journal	Future Generation Computer Systems
Volume	155
DOIs	https://doi.org/10.1016/j.future.2024.01.032
Publication status	Published - Jun 2024
Externally published	Yes

Keywords

Data labeling
Data set generation
High-priority NIDS alert
IDS
Intrusion detection
Network Intrusion Detection System
Network security
NIDS
NIDS alert
NIDS alert classification
Security Operations Center
Small training data set
SOC
Stream clustering
Supervised learning
Workload reduction

ASJC Scopus subject areas

Software
Hardware and Architecture
Computer Networks and Communications

Access to Document

10.1016/j.future.2024.01.032

Cite this

@article{3473c19837754d61a0fc72ddff36f74a,

title = "Stream clustering guided supervised learning for classifying NIDS alerts",

abstract = "A Network Intrusion Detection System (NIDS) is a network monitoring technology for identifying cyber attacks, botnet command and control traffic, and other unwanted network activity. Unfortunately, organizational NIDS solutions can often generate tens or hundreds of thousands of alerts on a daily basis, with a significant part of them having low importance or being false positives. Therefore, high priority alerts become hard to spot, which overloads security analysts and complicates their work. The current paper addresses this problem and introduces a machine learning framework for classifying NIDS alerts with the help of stream clustering and supervised learning. We propose a stream-clustering-guided method for creating labeled NIDS alert data sets. The small data sets created using this method can be used for training high-performance supervised NIDS alert classifiers. This significantly reduces the human labeling effort and eases the application of supervised machine learning for NIDS alert classification. The proposed machine learning framework was evaluated on NIDS alerts collected over 2 months from the network of a large academic organization. The experimental results indicate that combining stream clustering and supervised learning into a NIDS alert classification framework significantly decreases the number of false positives, and thus reduces the workload of human security analysts. The framework also features low CPU time and memory consumption and can thus be run on commodity hardware. In conclusion, the proposed framework provides a cost-effective means of integrating machine learning into Security Operation Centers (SOCs). This enables the identification of critical NIDS alerts using high-performance classifiers, thereby assisting in the automation of alert handling tasks for SOC personnel. To address the lack of public data sets in the problem domain and foster further research, we publicly share the large labeled NIDS alert data set used in our experimental setup.",

keywords = "Data labeling, Data set generation, High-priority NIDS alert, IDS, Intrusion detection, Network Intrusion Detection System, Network security, NIDS, NIDS alert, NIDS alert classification, Security Operations Center, Small training data set, SOC, Stream clustering, Supervised learning, Workload reduction",

author = "Risto Vaarandi and Alejandro Guerra-Manzanares",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = jun,

doi = "10.1016/j.future.2024.01.032",

language = "English",

volume = "155",

pages = "231--244",

journal = "Future Generation Computer Systems",

issn = "0167-739X",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Stream clustering guided supervised learning for classifying NIDS alerts

AU - Vaarandi, Risto

AU - Guerra-Manzanares, Alejandro

PY - 2024/6

Y1 - 2024/6

N2 - A Network Intrusion Detection System (NIDS) is a network monitoring technology for identifying cyber attacks, botnet command and control traffic, and other unwanted network activity. Unfortunately, organizational NIDS solutions can often generate tens or hundreds of thousands of alerts on a daily basis, with a significant part of them having low importance or being false positives. Therefore, high priority alerts become hard to spot, which overloads security analysts and complicates their work. The current paper addresses this problem and introduces a machine learning framework for classifying NIDS alerts with the help of stream clustering and supervised learning. We propose a stream-clustering-guided method for creating labeled NIDS alert data sets. The small data sets created using this method can be used for training high-performance supervised NIDS alert classifiers. This significantly reduces the human labeling effort and eases the application of supervised machine learning for NIDS alert classification. The proposed machine learning framework was evaluated on NIDS alerts collected over 2 months from the network of a large academic organization. The experimental results indicate that combining stream clustering and supervised learning into a NIDS alert classification framework significantly decreases the number of false positives, and thus reduces the workload of human security analysts. The framework also features low CPU time and memory consumption and can thus be run on commodity hardware. In conclusion, the proposed framework provides a cost-effective means of integrating machine learning into Security Operation Centers (SOCs). This enables the identification of critical NIDS alerts using high-performance classifiers, thereby assisting in the automation of alert handling tasks for SOC personnel. To address the lack of public data sets in the problem domain and foster further research, we publicly share the large labeled NIDS alert data set used in our experimental setup.

AB - A Network Intrusion Detection System (NIDS) is a network monitoring technology for identifying cyber attacks, botnet command and control traffic, and other unwanted network activity. Unfortunately, organizational NIDS solutions can often generate tens or hundreds of thousands of alerts on a daily basis, with a significant part of them having low importance or being false positives. Therefore, high priority alerts become hard to spot, which overloads security analysts and complicates their work. The current paper addresses this problem and introduces a machine learning framework for classifying NIDS alerts with the help of stream clustering and supervised learning. We propose a stream-clustering-guided method for creating labeled NIDS alert data sets. The small data sets created using this method can be used for training high-performance supervised NIDS alert classifiers. This significantly reduces the human labeling effort and eases the application of supervised machine learning for NIDS alert classification. The proposed machine learning framework was evaluated on NIDS alerts collected over 2 months from the network of a large academic organization. The experimental results indicate that combining stream clustering and supervised learning into a NIDS alert classification framework significantly decreases the number of false positives, and thus reduces the workload of human security analysts. The framework also features low CPU time and memory consumption and can thus be run on commodity hardware. In conclusion, the proposed framework provides a cost-effective means of integrating machine learning into Security Operation Centers (SOCs). This enables the identification of critical NIDS alerts using high-performance classifiers, thereby assisting in the automation of alert handling tasks for SOC personnel. To address the lack of public data sets in the problem domain and foster further research, we publicly share the large labeled NIDS alert data set used in our experimental setup.

KW - Data labeling

KW - Data set generation

KW - High-priority NIDS alert

KW - IDS

KW - Intrusion detection

KW - Network Intrusion Detection System

KW - Network security

KW - NIDS

KW - NIDS alert

KW - NIDS alert classification

KW - Security Operations Center

KW - Small training data set

KW - SOC

KW - Stream clustering

KW - Supervised learning

KW - Workload reduction

UR - http://www.scopus.com/inward/record.url?scp=85185711530&partnerID=8YFLogxK

U2 - 10.1016/j.future.2024.01.032

DO - 10.1016/j.future.2024.01.032

M3 - Article

AN - SCOPUS:85185711530

SN - 0167-739X

VL - 155

SP - 231

EP - 244

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

ER -

Stream clustering guided supervised learning for classifying NIDS alerts

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this