TY - JOUR
T1 - Cross-device behavioral consistency
T2 - Benchmarking and implications for effective android malware detection
AU - Guerra-Manzanares, Alejandro
AU - Valbe, Martin
PY - 2022/9/15
Y1 - 2022/9/15
N2 - Most of the proposed solutions using dynamic features for Android malware detection collect and test their systems using a single and particular data collection device, either a real device or an emulator. The results obtained using these particular devices are then generalized to any Android platform. This extensive generalization is based on the assumption of consistent behavior of apps across devices. This study performs an extensive benchmarking of this assumption for system calls, executing Android malware and benign samples under the same conditions in 9 different collection devices, including real and virtual devices. The results indicate the existence of significant differences between real devices and emulators in system calls usage and, consequently, in the collected behavioral profiles obtained from running the same set of applications on different devices. Furthermore, the impact of these differences on machine learning-based malware detection models is evaluated. In this regard, a significant degenerative effect on the detection performance of the model is produced when data collected on different devices are used in the training and testing sets. Therefore, the empirical findings do not support the assumption of cross-device consistent behavior of Android apps when system calls are used as descriptive features.
AB - Most of the proposed solutions using dynamic features for Android malware detection collect and test their systems using a single and particular data collection device, either a real device or an emulator. The results obtained using these particular devices are then generalized to any Android platform. This extensive generalization is based on the assumption of consistent behavior of apps across devices. This study performs an extensive benchmarking of this assumption for system calls, executing Android malware and benign samples under the same conditions in 9 different collection devices, including real and virtual devices. The results indicate the existence of significant differences between real devices and emulators in system calls usage and, consequently, in the collected behavioral profiles obtained from running the same set of applications on different devices. Furthermore, the impact of these differences on machine learning-based malware detection models is evaluated. In this regard, a significant degenerative effect on the detection performance of the model is produced when data collected on different devices are used in the training and testing sets. Therefore, the empirical findings do not support the assumption of cross-device consistent behavior of Android apps when system calls are used as descriptive features.
KW - Android emulator
KW - Android malware
KW - Benchmark
KW - Malware behavior
KW - Malware detection
KW - Real device
KW - System calls
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=pure_ris_china&SrcAuth=WosAPI&KeyUT=WOS:001221469200013&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1016/j.mlwa.2022.100357
DO - 10.1016/j.mlwa.2022.100357
M3 - Article
VL - 9
JO - Machine Learning with Applications
JF - Machine Learning with Applications
M1 - 100357
ER -