Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism

Jinfu Chen; Weijia Wang; Bo Liu; Saihua Cai; Dave Towey; Shengran Wang

doi:10.1016/j.infsof.2024.107453

Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism

Jinfu Chen, Weijia Wang, Bo Liu, Saihua Cai, Dave Towey, Shengran Wang

School of Computer Science

Research output: Journal Publication › Article › peer-review

6 Citations (Scopus)

Abstract

Context: Desirable characteristics in vulnerability-detection (VD) systems (VDSs) include both good detection capability (high accuracy, low false positive rate, low false negative rate, etc.) and low time overheads. The widely used VDSs based on models such as Recurrent Neural Networks (RNNs) have some problems, such as low time efficiency, failing to learn the vulnerability features better, and insufficient amounts of vulnerability features. Therefore, it is very important to construct an automatic detection model with high detection accuracy. Objective: This paper reports on training based on the source code to analyze and learn from the code's patterns and structures by deep-learning techniques to generate an efficient VD model that does not require manual feature design. Method: We propose a software VD model based on multi-feature fusion and deep neural networks called AIdetectorX-SP. It first uses a Temporal Convolutional Network (TCN) and adds a Self-attention Mechanism (SaM) to the TCN to build a model for extracting vulnerability logic features, then transforms the source code into an image input to a Convolutional Neural Network (CNN) to extract structural and semantic information. Finally, we use feature-fusion technology to design and implement an improved deep-learning-based VDS, called AIdetectorX Sequence with Picturization (AIdetectorX-SP). Results: We report on experiments conducted using publicly-available and widely-used datasets to evaluate the effectiveness of AIdetectorX-SP, with results indicating that AIdetectorX-SP is an effective VDS; that the combination of TCN and SaM can effectively extract vulnerability logic features; and that the pictorial code can extract code structure features, which can further improve the VD capability. Conclusion: In this paper, we propose a novel detection model for software vulnerability based on TCNs, SaM, and software picturization. The proposed model solves some shortcomings and limitations of existing VDSs, and obtains a high software-VD accuracy with a high degree of stability.

Original language	English
Article number	107453
Journal	Information and Software Technology
Volume	171
DOIs	https://doi.org/10.1016/j.infsof.2024.107453
Publication status	Published - Jul 2024

Keywords

Deep learning
Feature fusion
Self-attention Mechanism
Software vulnerability detection
Source-code picturization
Temporal Convolutional Network

ASJC Scopus subject areas

Software
Information Systems
Computer Science Applications

Access to Document

10.1016/j.infsof.2024.107453

Cite this

@article{dd06f1e2edea4504bb86ccefb4e9dde1,

title = "Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism",

abstract = "Context: Desirable characteristics in vulnerability-detection (VD) systems (VDSs) include both good detection capability (high accuracy, low false positive rate, low false negative rate, etc.) and low time overheads. The widely used VDSs based on models such as Recurrent Neural Networks (RNNs) have some problems, such as low time efficiency, failing to learn the vulnerability features better, and insufficient amounts of vulnerability features. Therefore, it is very important to construct an automatic detection model with high detection accuracy. Objective: This paper reports on training based on the source code to analyze and learn from the code's patterns and structures by deep-learning techniques to generate an efficient VD model that does not require manual feature design. Method: We propose a software VD model based on multi-feature fusion and deep neural networks called AIdetectorX-SP. It first uses a Temporal Convolutional Network (TCN) and adds a Self-attention Mechanism (SaM) to the TCN to build a model for extracting vulnerability logic features, then transforms the source code into an image input to a Convolutional Neural Network (CNN) to extract structural and semantic information. Finally, we use feature-fusion technology to design and implement an improved deep-learning-based VDS, called AIdetectorX Sequence with Picturization (AIdetectorX-SP). Results: We report on experiments conducted using publicly-available and widely-used datasets to evaluate the effectiveness of AIdetectorX-SP, with results indicating that AIdetectorX-SP is an effective VDS; that the combination of TCN and SaM can effectively extract vulnerability logic features; and that the pictorial code can extract code structure features, which can further improve the VD capability. Conclusion: In this paper, we propose a novel detection model for software vulnerability based on TCNs, SaM, and software picturization. The proposed model solves some shortcomings and limitations of existing VDSs, and obtains a high software-VD accuracy with a high degree of stability.",

keywords = "Deep learning, Feature fusion, Self-attention Mechanism, Software vulnerability detection, Source-code picturization, Temporal Convolutional Network",

author = "Jinfu Chen and Weijia Wang and Bo Liu and Saihua Cai and Dave Towey and Shengran Wang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = jul,

doi = "10.1016/j.infsof.2024.107453",

language = "English",

volume = "171",

journal = "Information and Software Technology",

issn = "0950-5849",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism

AU - Chen, Jinfu

AU - Wang, Weijia

AU - Liu, Bo

AU - Cai, Saihua

AU - Towey, Dave

AU - Wang, Shengran

PY - 2024/7

Y1 - 2024/7

N2 - Context: Desirable characteristics in vulnerability-detection (VD) systems (VDSs) include both good detection capability (high accuracy, low false positive rate, low false negative rate, etc.) and low time overheads. The widely used VDSs based on models such as Recurrent Neural Networks (RNNs) have some problems, such as low time efficiency, failing to learn the vulnerability features better, and insufficient amounts of vulnerability features. Therefore, it is very important to construct an automatic detection model with high detection accuracy. Objective: This paper reports on training based on the source code to analyze and learn from the code's patterns and structures by deep-learning techniques to generate an efficient VD model that does not require manual feature design. Method: We propose a software VD model based on multi-feature fusion and deep neural networks called AIdetectorX-SP. It first uses a Temporal Convolutional Network (TCN) and adds a Self-attention Mechanism (SaM) to the TCN to build a model for extracting vulnerability logic features, then transforms the source code into an image input to a Convolutional Neural Network (CNN) to extract structural and semantic information. Finally, we use feature-fusion technology to design and implement an improved deep-learning-based VDS, called AIdetectorX Sequence with Picturization (AIdetectorX-SP). Results: We report on experiments conducted using publicly-available and widely-used datasets to evaluate the effectiveness of AIdetectorX-SP, with results indicating that AIdetectorX-SP is an effective VDS; that the combination of TCN and SaM can effectively extract vulnerability logic features; and that the pictorial code can extract code structure features, which can further improve the VD capability. Conclusion: In this paper, we propose a novel detection model for software vulnerability based on TCNs, SaM, and software picturization. The proposed model solves some shortcomings and limitations of existing VDSs, and obtains a high software-VD accuracy with a high degree of stability.

AB - Context: Desirable characteristics in vulnerability-detection (VD) systems (VDSs) include both good detection capability (high accuracy, low false positive rate, low false negative rate, etc.) and low time overheads. The widely used VDSs based on models such as Recurrent Neural Networks (RNNs) have some problems, such as low time efficiency, failing to learn the vulnerability features better, and insufficient amounts of vulnerability features. Therefore, it is very important to construct an automatic detection model with high detection accuracy. Objective: This paper reports on training based on the source code to analyze and learn from the code's patterns and structures by deep-learning techniques to generate an efficient VD model that does not require manual feature design. Method: We propose a software VD model based on multi-feature fusion and deep neural networks called AIdetectorX-SP. It first uses a Temporal Convolutional Network (TCN) and adds a Self-attention Mechanism (SaM) to the TCN to build a model for extracting vulnerability logic features, then transforms the source code into an image input to a Convolutional Neural Network (CNN) to extract structural and semantic information. Finally, we use feature-fusion technology to design and implement an improved deep-learning-based VDS, called AIdetectorX Sequence with Picturization (AIdetectorX-SP). Results: We report on experiments conducted using publicly-available and widely-used datasets to evaluate the effectiveness of AIdetectorX-SP, with results indicating that AIdetectorX-SP is an effective VDS; that the combination of TCN and SaM can effectively extract vulnerability logic features; and that the pictorial code can extract code structure features, which can further improve the VD capability. Conclusion: In this paper, we propose a novel detection model for software vulnerability based on TCNs, SaM, and software picturization. The proposed model solves some shortcomings and limitations of existing VDSs, and obtains a high software-VD accuracy with a high degree of stability.

KW - Deep learning

KW - Feature fusion

KW - Self-attention Mechanism

KW - Software vulnerability detection

KW - Source-code picturization

KW - Temporal Convolutional Network

UR - http://www.scopus.com/inward/record.url?scp=85189757612&partnerID=8YFLogxK

U2 - 10.1016/j.infsof.2024.107453

DO - 10.1016/j.infsof.2024.107453

M3 - Article

AN - SCOPUS:85189757612

SN - 0950-5849

VL - 171

JO - Information and Software Technology

JF - Information and Software Technology

M1 - 107453

ER -

Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this