Multi-cue based 3D residual network for action recognition

Ming Zong, Ruili Wang, Zhe Chen, Maoli Wang, Xun Wang, Johan Potgieter

Research output: Journal PublicationArticlepeer-review

10 Citations (Scopus)

Abstract

Convolutional neural network (CNN) is a natural structure for video modelling that has been successfully applied in the field of action recognition. The existing 3D CNN-based action recognition methods mainly perform 3D convolutions on individual cues (e.g. appearance and motion cues) and rely on the design of subsequent networks to fuse these cues together. In this paper, we propose a novel multi-cue 3D convolutional neural network (M3D), which integrates three individual cues (i.e. an appearance cue, a direct motion cue, and a salient motion cue) directly. Different from the existing methods, the proposed M3D model directly performs 3D convolutions on multiple cues instead of a single cue. Compared with the previous methods, this model can obtain more discriminative and robust features by integrating three different cues as a whole. Further, we propose a novel residual multi-cue 3D convolution model (R-M3D) to improve the representation ability to obtain representative video features. Experimental results verify the effectiveness of proposed M3D model, and the proposed R-M3D model (pre-trained on the Kinetics dataset) achieves competitive performance compared with the state-of-the-art models on UCF101 and HMDB51 datasets.

Original languageEnglish
Pages (from-to)5167-5181
Number of pages15
JournalNeural Computing and Applications
Volume33
Issue number10
DOIs
Publication statusPublished - May 2021
Externally publishedYes

Keywords

  • 3D convolution
  • Action recognition
  • Multi-cue
  • Residual
  • Salient motion cue

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Multi-cue based 3D residual network for action recognition'. Together they form a unique fingerprint.

Cite this