Multi-frame feature-fusion-based model for violence detection

Mujtaba Asad, Jie Yang, Jiang He, Pourya Shamsolmoali, Xiangjian He

Research output: Journal PublicationArticlepeer-review

36 Citations (Scopus)


Human behavior detection is essential for public safety and monitoring. However, in human-based surveillance systems, it requires continuous human attention and observation, which is a difficult task. Detection of violent human behavior using autonomous surveillance systems is of critical importance for uninterrupted video surveillance. In this paper, we propose a novel method to detect fights or violent actions based on learning both the spatial and temporal features from equally spaced sequential frames of a video. Multi-level features for two sequential frames, extracted from the convolutional neural network’s top and bottom layers, are combined using the proposed feature fusion method to take into account the motion information. We also proposed Wide-Dense Residual Block to learn these combined spatial features from the two input frames. These learned features are then concatenated and fed to long short-term memory units for capturing temporal dependencies. The feature fusion method and use of additional wide-dense residual blocks enable the network to learn combined features from the input frames effectively and yields better accuracy results. Experimental results evaluated on four publicly available datasets: HockeyFight, Movies, ViolentFlow and BEHAVE show the superior performance of the proposed model in comparison with the state-of-the-art methods.

Original languageEnglish
Pages (from-to)1415-1431
Number of pages17
JournalVisual Computer
Issue number6
Publication statusPublished - Jun 2021
Externally publishedYes


  • Autonomous Video Surveillance
  • Feature fusion
  • Spatio-temporal features
  • Violence detection

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Multi-frame feature-fusion-based model for violence detection'. Together they form a unique fingerprint.

Cite this