TY - GEN
T1 - Time-frame Analysis of System Calls Behavior in Machine Learning-Based Mobile Malware Detection
AU - Guerra-Manzanares, Alejandro
AU - Nõmm, Sven
AU - Bahsi, Hayretdin
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Dynamic features are frequently used in the machine learning based approaches to detect malicious applications on Android devices. These features are constructed by collecting the system calls observed during a certain period of time. In spite of the popularity of this approach, very little attention has been paid to the analysis of the length of the collection time-frame and its impact on the detection performance of induced learning models, which constitutes the scope of this research. Such analysis helps to understand the accuracy and performance trade-off in data collection efforts taking place at the various stages of the machine learning workflow. Our time-frame analysis also addresses different data collection environments, emulator and real device, and the variations in detection capabilities in the case of detecting recent or older malware. System calls of 330 benign and malicious applications, collected on different time periods, are monitored and logged for each minute-long interval for a total of fifteen minutes. First, distribution of the system calls is analysed. After, the discriminatory power of each system call is evaluated cumulatively for each minute-long interval. Fisher's score is used to assess the discriminatory power of each feature. It is revealed that the system calls observed during the first minute possess the highest discriminatory power, whereas the discriminatory power of the system calls observed on greater time-frames is lower. Finally, this finding is tested by training and evaluating traditional machine learning classifiers.
AB - Dynamic features are frequently used in the machine learning based approaches to detect malicious applications on Android devices. These features are constructed by collecting the system calls observed during a certain period of time. In spite of the popularity of this approach, very little attention has been paid to the analysis of the length of the collection time-frame and its impact on the detection performance of induced learning models, which constitutes the scope of this research. Such analysis helps to understand the accuracy and performance trade-off in data collection efforts taking place at the various stages of the machine learning workflow. Our time-frame analysis also addresses different data collection environments, emulator and real device, and the variations in detection capabilities in the case of detecting recent or older malware. System calls of 330 benign and malicious applications, collected on different time periods, are monitored and logged for each minute-long interval for a total of fifteen minutes. First, distribution of the system calls is analysed. After, the discriminatory power of each system call is evaluated cumulatively for each minute-long interval. Fisher's score is used to assess the discriminatory power of each feature. It is revealed that the system calls observed during the first minute possess the highest discriminatory power, whereas the discriminatory power of the system calls observed on greater time-frames is lower. Finally, this finding is tested by training and evaluating traditional machine learning classifiers.
KW - dynamic behavior
KW - machine learning
KW - malware detection
KW - mobile malware
KW - system calls
KW - time analysis
UR - http://www.scopus.com/inward/record.url?scp=85076376420&partnerID=8YFLogxK
U2 - 10.1109/CSET.2019.8904908
DO - 10.1109/CSET.2019.8904908
M3 - Conference contribution
AN - SCOPUS:85076376420
T3 - 2019 International Conference on Cyber Security for Emerging Technologies, CSET 2019
BT - 2019 International Conference on Cyber Security for Emerging Technologies, CSET 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Conference on Cyber Security for Emerging Technologies, CSET 2019
Y2 - 27 October 2019 through 29 October 2019
ER -