Abstract
The active learning approach for machine learning can greatly benefit those environments where a wealth of unlabeled data is available, and the labeling cost of the data can be restrictive. In this regard, security operations centers (SOCs) can take advantage of the human expertise available to improve machine learning-based detection models using the active learning approach. In the context of SOC operations and IoT botnet detection, our study provides a thorough benchmarking of the application of different active learning approaches within the framework of pool-based sampling. The selection of the optimal query instance for learning is evaluated using uncertainty sampling, ranked batch-mode sampling, and query by committee strategies. Our results show that the active learning approach can help to generate better detection models using all the active learning query strategies tested in our benchmarking setup. Leveraging the human–machine interaction can produce high-performance models in the context of IoT botnet detection using significantly less data than the passive approaches traditionally used for the generation of machine learning-based detection systems. Additionally, the impact of wrong-labeled data in the active learning implementation is explored.
Original language | English |
---|---|
Pages (from-to) | 40-53 |
Number of pages | 14 |
Journal | Future Generation Computer Systems |
Volume | 141 |
DOIs | |
Publication status | Published - Apr 2023 |
Externally published | Yes |
Keywords
- Active learning
- Botnet detection
- Internet of things
- Intrusion detection
- IoT
- IoT botnet
- Machine learning
- Query learning
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Networks and Communications