TY - GEN
T1 - In-depth feature selection and ranking for automated detection of mobile malware
AU - Guerra-Manzanares, Alejandro
AU - Nõmm, Sven
AU - Bahsi, Hayretdin
N1 - Publisher Copyright:
© 2019 by SCITEPRESS - Science and Technology Publications, Lda.
PY - 2019
Y1 - 2019
N2 - New malware detection techniques are highly needed due to the increasing threat posed by mobile malware. Machine learning techniques have provided promising results in this problem domain. However, feature selection, which is an essential instrument to overcome the curse of dimensionality, presenting higher interpretable results and optimizing the utilization of computational resources, requires more attention in order to induce better learning models for mobile malware detection. In this paper, in order to find out the minimum feature set that provides higher accuracy and analyze the discriminatory powers of different features, we employed feature selection and ranking methods to datasets characterized by system calls and permissions. These features were extracted from malware application samples belonging to two different time-frames (2010-2012 and 2017-2018) and benign applications. We demonstrated that selected feature sets with small sizes, in both feature categories, are able to provide high accuracy results. However, we identified a decline in the discriminatory power of the selected features in both categories when the dataset is induced by the recent malware samples instead of old ones, indicating a concept drift. Although we plan to model the concept drift in our future studies, the feature selection results presented in this study give a valuable insight regarding the change occurred in the best discriminating features during the evolvement of mobile malware over time.
AB - New malware detection techniques are highly needed due to the increasing threat posed by mobile malware. Machine learning techniques have provided promising results in this problem domain. However, feature selection, which is an essential instrument to overcome the curse of dimensionality, presenting higher interpretable results and optimizing the utilization of computational resources, requires more attention in order to induce better learning models for mobile malware detection. In this paper, in order to find out the minimum feature set that provides higher accuracy and analyze the discriminatory powers of different features, we employed feature selection and ranking methods to datasets characterized by system calls and permissions. These features were extracted from malware application samples belonging to two different time-frames (2010-2012 and 2017-2018) and benign applications. We demonstrated that selected feature sets with small sizes, in both feature categories, are able to provide high accuracy results. However, we identified a decline in the discriminatory power of the selected features in both categories when the dataset is induced by the recent malware samples instead of old ones, indicating a concept drift. Although we plan to model the concept drift in our future studies, the feature selection results presented in this study give a valuable insight regarding the change occurred in the best discriminating features during the evolvement of mobile malware over time.
KW - Feature Selection
KW - Machine Learning
KW - Mobile Malware
UR - http://www.scopus.com/inward/record.url?scp=85064668886&partnerID=8YFLogxK
U2 - 10.5220/0007349602740283
DO - 10.5220/0007349602740283
M3 - Conference contribution
AN - SCOPUS:85064668886
T3 - ICISSP 2019 - Proceedings of the 5th International Conference on Information Systems Security and Privacy
SP - 274
EP - 283
BT - ICISSP 2019 - Proceedings of the 5th International Conference on Information Systems Security and Privacy
A2 - Mori, Paolo
A2 - Furnell, Steven
A2 - Camp, Olivier
PB - SciTePress
T2 - 5th International Conference on Information Systems Security and Privacy, ICISSP 2019
Y2 - 23 February 2019 through 25 February 2019
ER -