Experts still needed: boosting long-term android malware detection with active learning

Research output: Journal PublicationArticlepeer-review

Abstract

The continuous evolution of cyber threats imposes a critical challenge to malware detection systems, so operational detection solutions in real-world settings must keep up-to-date malware knowledge databases. Machine learning-based solutions are not exempt from this requirement as handling concept drift constitutes the primary building block for keeping high detection performance in the long term. However, maintaining non-stationary malware detection models is highly demanding due to the high cost of labeling. This study applies several active learning-based approaches for maintaining a non-stationary model for Android malware detection in a 7-year-long time frame and conducts a comprehensive analysis to understand the impact of feature space selection, different data balancing techniques, and timestamping methods, utilized for locating the instances along the historical timeline, on the model’s detection performance over time. The detection accuracy and labeling costs are compared with various baselines. Additionally, the study investigates the resilience of such models against noisy labeling, a common problem in production environments due to unintentional expert errors and adversarial attacks. This research fills a significant gap in the literature by conducting a comprehensive analysis of active learning approaches to address concept drift in non-stationary settings established for mobile malware detection.

Original languageEnglish
Pages (from-to)901-918
Number of pages18
JournalJournal of Computer Virology and Hacking Techniques
Volume20
Issue number4
DOIs
Publication statusPublished - Nov 2024
Externally publishedYes

Keywords

  • Active learning
  • Android
  • Concept drift
  • Data labeling
  • Machine learning
  • Malware detection
  • Mobile malware

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Software
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Experts still needed: boosting long-term android malware detection with active learning'. Together they form a unique fingerprint.

Cite this