Abstract
Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a mutual information based algorithm that analytically selects the optimal feature for classification. This mutual information based feature selection algorithm can handle linearly and nonlinearly dependent data features. Its effectiveness is evaluated in the cases of network intrusion detection. An Intrusion Detection System (IDS), named Least Square Support Vector Machine based IDS (LSSVM-IDS), is built using the features selected by our proposed feature selection algorithm. The performance of LSSVM-IDS is evaluated using three intrusion detection evaluation datasets, namely KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset. The evaluation results show that our feature selection algorithm contributes more critical features for LSSVM-IDS to achieve better accuracy and lower computational cost compared with the state-of-the-art methods.
Original language | English |
---|---|
Article number | 7387736 |
Pages (from-to) | 2986-2998 |
Number of pages | 13 |
Journal | IEEE Transactions on Computers |
Volume | 65 |
Issue number | 10 |
DOIs | |
Publication status | Published - 1 Oct 2016 |
Externally published | Yes |
Keywords
- Intrusion detection
- feature selection
- least square support vector machine
- linear correlation coefficient
- mutual information
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics