Experts still needed: boosting long-term android malware detection with active learning

Alejandro Guerra-Manzanares; Hayretdin Bahsi

doi:10.1007/s11416-024-00536-y

Experts still needed: boosting long-term android malware detection with active learning

Alejandro Guerra-Manzanares, Hayretdin Bahsi

Research output: Journal Publication › Article › peer-review

3 Citations (Scopus)

Abstract

The continuous evolution of cyber threats imposes a critical challenge to malware detection systems, so operational detection solutions in real-world settings must keep up-to-date malware knowledge databases. Machine learning-based solutions are not exempt from this requirement as handling concept drift constitutes the primary building block for keeping high detection performance in the long term. However, maintaining non-stationary malware detection models is highly demanding due to the high cost of labeling. This study applies several active learning-based approaches for maintaining a non-stationary model for Android malware detection in a 7-year-long time frame and conducts a comprehensive analysis to understand the impact of feature space selection, different data balancing techniques, and timestamping methods, utilized for locating the instances along the historical timeline, on the model’s detection performance over time. The detection accuracy and labeling costs are compared with various baselines. Additionally, the study investigates the resilience of such models against noisy labeling, a common problem in production environments due to unintentional expert errors and adversarial attacks. This research fills a significant gap in the literature by conducting a comprehensive analysis of active learning approaches to address concept drift in non-stationary settings established for mobile malware detection.

Original language	English
Pages (from-to)	901-918
Number of pages	18
Journal	Journal of Computer Virology and Hacking Techniques
Volume	20
Issue number	4
DOIs	https://doi.org/10.1007/s11416-024-00536-y
Publication status	Published - Nov 2024
Externally published	Yes

Keywords

Active learning
Android
Concept drift
Data labeling
Machine learning
Malware detection
Mobile malware

ASJC Scopus subject areas

Computer Science (miscellaneous)
Software
Hardware and Architecture
Computational Theory and Mathematics

Access to Document

10.1007/s11416-024-00536-y

Cite this

@article{b88081009f7e4746973c601f33096447,

title = "Experts still needed: boosting long-term android malware detection with active learning",

abstract = "The continuous evolution of cyber threats imposes a critical challenge to malware detection systems, so operational detection solutions in real-world settings must keep up-to-date malware knowledge databases. Machine learning-based solutions are not exempt from this requirement as handling concept drift constitutes the primary building block for keeping high detection performance in the long term. However, maintaining non-stationary malware detection models is highly demanding due to the high cost of labeling. This study applies several active learning-based approaches for maintaining a non-stationary model for Android malware detection in a 7-year-long time frame and conducts a comprehensive analysis to understand the impact of feature space selection, different data balancing techniques, and timestamping methods, utilized for locating the instances along the historical timeline, on the model{\textquoteright}s detection performance over time. The detection accuracy and labeling costs are compared with various baselines. Additionally, the study investigates the resilience of such models against noisy labeling, a common problem in production environments due to unintentional expert errors and adversarial attacks. This research fills a significant gap in the literature by conducting a comprehensive analysis of active learning approaches to address concept drift in non-stationary settings established for mobile malware detection.",

keywords = "Active learning, Android, Concept drift, Data labeling, Machine learning, Malware detection, Mobile malware",

author = "Alejandro Guerra-Manzanares and Hayretdin Bahsi",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = nov,

doi = "10.1007/s11416-024-00536-y",

language = "English",

volume = "20",

pages = "901--918",

journal = "Journal of Computer Virology and Hacking Techniques",

issn = "2263-8733",

publisher = "Springer Science + Business Media",

number = "4",

}

TY - JOUR

T1 - Experts still needed

T2 - boosting long-term android malware detection with active learning

AU - Guerra-Manzanares, Alejandro

AU - Bahsi, Hayretdin

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/11

Y1 - 2024/11

N2 - The continuous evolution of cyber threats imposes a critical challenge to malware detection systems, so operational detection solutions in real-world settings must keep up-to-date malware knowledge databases. Machine learning-based solutions are not exempt from this requirement as handling concept drift constitutes the primary building block for keeping high detection performance in the long term. However, maintaining non-stationary malware detection models is highly demanding due to the high cost of labeling. This study applies several active learning-based approaches for maintaining a non-stationary model for Android malware detection in a 7-year-long time frame and conducts a comprehensive analysis to understand the impact of feature space selection, different data balancing techniques, and timestamping methods, utilized for locating the instances along the historical timeline, on the model’s detection performance over time. The detection accuracy and labeling costs are compared with various baselines. Additionally, the study investigates the resilience of such models against noisy labeling, a common problem in production environments due to unintentional expert errors and adversarial attacks. This research fills a significant gap in the literature by conducting a comprehensive analysis of active learning approaches to address concept drift in non-stationary settings established for mobile malware detection.

AB - The continuous evolution of cyber threats imposes a critical challenge to malware detection systems, so operational detection solutions in real-world settings must keep up-to-date malware knowledge databases. Machine learning-based solutions are not exempt from this requirement as handling concept drift constitutes the primary building block for keeping high detection performance in the long term. However, maintaining non-stationary malware detection models is highly demanding due to the high cost of labeling. This study applies several active learning-based approaches for maintaining a non-stationary model for Android malware detection in a 7-year-long time frame and conducts a comprehensive analysis to understand the impact of feature space selection, different data balancing techniques, and timestamping methods, utilized for locating the instances along the historical timeline, on the model’s detection performance over time. The detection accuracy and labeling costs are compared with various baselines. Additionally, the study investigates the resilience of such models against noisy labeling, a common problem in production environments due to unintentional expert errors and adversarial attacks. This research fills a significant gap in the literature by conducting a comprehensive analysis of active learning approaches to address concept drift in non-stationary settings established for mobile malware detection.

KW - Active learning

KW - Android

KW - Concept drift

KW - Data labeling

KW - Machine learning

KW - Malware detection

KW - Mobile malware

UR - http://www.scopus.com/inward/record.url?scp=85205572445&partnerID=8YFLogxK

U2 - 10.1007/s11416-024-00536-y

DO - 10.1007/s11416-024-00536-y

M3 - Article

AN - SCOPUS:85205572445

SN - 2263-8733

VL - 20

SP - 901

EP - 918

JO - Journal of Computer Virology and Hacking Techniques

JF - Journal of Computer Virology and Hacking Techniques

IS - 4

ER -

Experts still needed: boosting long-term android malware detection with active learning

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this