Concept drift and cross-device behavior: Challenges and implications for effective android malware detection

Alejandro Guerra-Manzanares; Marcin Luckner; Hayretdin Bahsi

doi:10.1016/j.cose.2022.102757

Concept drift and cross-device behavior: Challenges and implications for effective android malware detection

Alejandro Guerra-Manzanares, Marcin Luckner, Hayretdin Bahsi

Research output: Journal Publication › Article › peer-review

21 Citations (Scopus)

Abstract

The large body of Android malware research has demonstrated that machine learning methods can provide high performance for detecting Android malware. However, the vast majority of studies underestimate the evolving nature of the threat landscape, which requires the creation of a model life-cycle to ensure effective continuous detection in real-world settings over time. In this study, we modeled the concept drift issue of Android malware detection, encompassing the years between 2011 and 2018, using dynamic feature sets (i.e., system calls) derived from Android apps. The relevant studies in the literature have not focused on the timestamp selection approach and its critical impact on effective drift modeling. We evaluated and compared distinct timestamp alternatives. Our experimental results show that a widely used timestamp in the literature yields poor results over time and that enhanced concept drift handling is achieved when an app internal timestamp was used. Additionally, this study sheds light on the usage of distinct data sources and their impact on concept drift modeling. We identified that dynamic features obtained for individual apps from different data sources (i.e., emulator and real device) show significant differences that can distort the modeling results. Therefore, the data sources should be considered and their fusion preferably avoided while creating the training and testing data sets. Our analysis is supported using a global interpretation method to comprehend and characterize the evolution of Android apps throughout the years from a data source-related perspective.

Original language	English
Article number	102757
Journal	Computers & Security
Volume	120
DOIs	https://doi.org/10.1016/j.cose.2022.102757
Publication status	Published - Sept 2022
Externally published	Yes

Keywords

Android
Android emulator
Concept drift
Malware detection
Mobile security
Real device
Smartphone

ASJC Scopus subject areas

General Computer Science
Law

Access to Document

10.1016/j.cose.2022.102757

Cite this

@article{736d6498221f49ed9229a93bc9990f13,

title = "Concept drift and cross-device behavior: Challenges and implications for effective android malware detection",

abstract = "The large body of Android malware research has demonstrated that machine learning methods can provide high performance for detecting Android malware. However, the vast majority of studies underestimate the evolving nature of the threat landscape, which requires the creation of a model life-cycle to ensure effective continuous detection in real-world settings over time. In this study, we modeled the concept drift issue of Android malware detection, encompassing the years between 2011 and 2018, using dynamic feature sets (i.e., system calls) derived from Android apps. The relevant studies in the literature have not focused on the timestamp selection approach and its critical impact on effective drift modeling. We evaluated and compared distinct timestamp alternatives. Our experimental results show that a widely used timestamp in the literature yields poor results over time and that enhanced concept drift handling is achieved when an app internal timestamp was used. Additionally, this study sheds light on the usage of distinct data sources and their impact on concept drift modeling. We identified that dynamic features obtained for individual apps from different data sources (i.e., emulator and real device) show significant differences that can distort the modeling results. Therefore, the data sources should be considered and their fusion preferably avoided while creating the training and testing data sets. Our analysis is supported using a global interpretation method to comprehend and characterize the evolution of Android apps throughout the years from a data source-related perspective.",

keywords = "Android, Android emulator, Concept drift, Malware detection, Mobile security, Real device, Smartphone",

author = "Alejandro Guerra-Manzanares and Marcin Luckner and Hayretdin Bahsi",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2022",

month = sep,

doi = "10.1016/j.cose.2022.102757",

language = "English",

volume = "120",

journal = "Computers \& Security",

issn = "0167-4048",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Concept drift and cross-device behavior

T2 - Challenges and implications for effective android malware detection

AU - Guerra-Manzanares, Alejandro

AU - Luckner, Marcin

AU - Bahsi, Hayretdin

PY - 2022/9

Y1 - 2022/9

N2 - The large body of Android malware research has demonstrated that machine learning methods can provide high performance for detecting Android malware. However, the vast majority of studies underestimate the evolving nature of the threat landscape, which requires the creation of a model life-cycle to ensure effective continuous detection in real-world settings over time. In this study, we modeled the concept drift issue of Android malware detection, encompassing the years between 2011 and 2018, using dynamic feature sets (i.e., system calls) derived from Android apps. The relevant studies in the literature have not focused on the timestamp selection approach and its critical impact on effective drift modeling. We evaluated and compared distinct timestamp alternatives. Our experimental results show that a widely used timestamp in the literature yields poor results over time and that enhanced concept drift handling is achieved when an app internal timestamp was used. Additionally, this study sheds light on the usage of distinct data sources and their impact on concept drift modeling. We identified that dynamic features obtained for individual apps from different data sources (i.e., emulator and real device) show significant differences that can distort the modeling results. Therefore, the data sources should be considered and their fusion preferably avoided while creating the training and testing data sets. Our analysis is supported using a global interpretation method to comprehend and characterize the evolution of Android apps throughout the years from a data source-related perspective.

AB - The large body of Android malware research has demonstrated that machine learning methods can provide high performance for detecting Android malware. However, the vast majority of studies underestimate the evolving nature of the threat landscape, which requires the creation of a model life-cycle to ensure effective continuous detection in real-world settings over time. In this study, we modeled the concept drift issue of Android malware detection, encompassing the years between 2011 and 2018, using dynamic feature sets (i.e., system calls) derived from Android apps. The relevant studies in the literature have not focused on the timestamp selection approach and its critical impact on effective drift modeling. We evaluated and compared distinct timestamp alternatives. Our experimental results show that a widely used timestamp in the literature yields poor results over time and that enhanced concept drift handling is achieved when an app internal timestamp was used. Additionally, this study sheds light on the usage of distinct data sources and their impact on concept drift modeling. We identified that dynamic features obtained for individual apps from different data sources (i.e., emulator and real device) show significant differences that can distort the modeling results. Therefore, the data sources should be considered and their fusion preferably avoided while creating the training and testing data sets. Our analysis is supported using a global interpretation method to comprehend and characterize the evolution of Android apps throughout the years from a data source-related perspective.

KW - Android

KW - Android emulator

KW - Concept drift

KW - Malware detection

KW - Mobile security

KW - Real device

KW - Smartphone

UR - http://www.scopus.com/inward/record.url?scp=85132230850&partnerID=8YFLogxK

U2 - 10.1016/j.cose.2022.102757

DO - 10.1016/j.cose.2022.102757

M3 - Article

AN - SCOPUS:85132230850

SN - 0167-4048

VL - 120

JO - Computers & Security

JF - Computers & Security

M1 - 102757

ER -

Concept drift and cross-device behavior: Challenges and implications for effective android malware detection

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this