KronoDroid: Time-based hybrid-featured dataset for effective android malware detection and characterization

Alejandro Guerra-Manzanares, Hayretdin Bahsi, Sven Nõmm

Research output: Journal PublicationArticlepeer-review

54 Citations (Scopus)

Abstract

Android malware evolution has been neglected by the available data sets, thus providing a static snapshot of a non-stationary phenomenon. The impact of the time variable has not had the deserved attention by the Android malware research, omitting its degenerative impact on the performance of machine learning- based classifiers (i.e., concept drift). Besides, the sources of dynamic data and their particularities have been overlooked (i.e., real devices and emulators). Critical factors to take into account when aiming to build more effective, robust, and long-lasting Android malware detection systems. In this research, different sources of benign and malware data are merged, generating a data set encompassing a larger time frame and 489 static and dynamic features are collected. The particularities of the source of the dynamic features (i.e., system calls) are attended using an emulator and a real device, thus generating two equally featured sub-datasets. The main outcome of this research is a novel, labeled, and hybrid-featured Android dataset that provides timestamps for each data sample, covering all years of Android history, from 2008-2020, and considering the distinct dynamic data sources. The emulator data set is composed of 28,745 malicious apps from 209 malware families and 35,246 benign samples. The real device data set contains 41,382 malware, belonging to 240 malware families, and 36,755 benign apps. Made publicly available as KronoDroid, in a structured format, it is the largest hybrid-featured Android dataset and the only one providing timestamped data, considering dynamic sources’ particularities and including samples for over 209 Android malware families.

Original languageEnglish
Article number102399
JournalComputers & Security
Volume110
DOIs
Publication statusPublished - Nov 2021
Externally publishedYes

Keywords

  • Android malware
  • Dataset
  • Malware analysis
  • Malware detection
  • Mobile malware

ASJC Scopus subject areas

  • General Computer Science
  • Law

Fingerprint

Dive into the research topics of 'KronoDroid: Time-based hybrid-featured dataset for effective android malware detection and characterization'. Together they form a unique fingerprint.

Cite this