Feature fusion based deep spatiotemporal model for violence detection in videos

Mujtaba Asad, Zuopeng Yang, Zubair Khan, Jie Yang, Xiangjian He

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)


It is essential for public monitoring and security to detect violent behavior in surveillance videos. However, it requires constant human observation and attention, which is a challenging task. Autonomous detection of violent activities is essential for continuous, uninterrupted video surveillance systems. This paper proposed a novel method to detect violent activities in videos, using fused spatial feature maps, based on Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) units. The spatial features are extracted through CNN, and multi-level spatial features fusion method is used to combine the spatial features maps from two equally spaced sequential input video frames to incorporate motion characteristics. The additional residual layer blocks are used to further learn these fused spatial features to increase the classification accuracy of the network. The combined spatial features of input frames are then fed to LSTM units to learn the global temporal information. The output of this network classifies the violent or non-violent category present in the input video frame. Experimental results on three different standard benchmark datasets: Hockey Fight, Crowd Violence and BEHAVE show that the proposed algorithm provides better ability to recognize violent actions in different scenarios and results in improved performance compared to the state-of-the-art methods.

Original languageEnglish
Title of host publicationNeural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
EditorsTom Gedeon, Kok Wai Wong, Minho Lee
Number of pages13
ISBN (Print)9783030367077
Publication statusPublished - 2019
Externally publishedYes
Event26th International Conference on Neural Information Processing, ICONIP 2019 - Sydney, Australia
Duration: 12 Dec 201915 Dec 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11953 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference26th International Conference on Neural Information Processing, ICONIP 2019


  • Autonomous video
  • CNN
  • LSTM
  • Surveillance spatiotemporal features
  • Violence detection

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)


Dive into the research topics of 'Feature fusion based deep spatiotemporal model for violence detection in videos'. Together they form a unique fingerprint.

Cite this