Abstract
The vast body of research in the Android malware detection domain has demonstrated that machine learning can provide high performance for mobile malware detection. However, the learning models have been usually evaluated with data sets encompassing short time frames, generating doubts about the feasibility of these models in operational settings that deal with the ever-evolving malware threat landscape. Although a limited number of studies have developed concept drift resilient models for handling data drift, they have never considered the impact of different timestamps on the detection solutions. Timestamps are critical to locating the data samples within the historical timeline. Different timestamping approaches may locate samples differently, which, in turn, can significantly impact the performance of the model and, consequently, the adaptive capabilities of the system to concept drift. In this study, we conducted a comprehensive benchmarking that compares the detection performance of six distinct timestamping approaches for static and dynamic feature sets. Our experiments have demonstrated that timestamp selection is an important decision that has a significant impact on concept drift modeling and the long-term performance of the model regardless of the feature type used for model construction.
Original language | English |
---|---|
Article number | 102835 |
Journal | Computers & Security |
Volume | 122 |
DOIs | |
Publication status | Published - Nov 2022 |
Externally published | Yes |
Keywords
- Android malware
- Concept drift
- Data drift
- Machine learning
- Malware detection
- Malware evolution
- Timestamp
ASJC Scopus subject areas
- General Computer Science
- Law