Motion saliency based multi-stream multiplier ResNets for action recognition

Ming Zong; Ruili Wang; Xiubo Chen; Zhe Chen; Yuanhao Gong

doi:10.1016/j.imavis.2021.104108

Motion saliency based multi-stream multiplier ResNets for action recognition

Ming Zong, Ruili Wang, Xiubo Chen, Zhe Chen, Yuanhao Gong

Research output: Journal Publication › Article › peer-review

63 Citations (Scopus)

Abstract

In this paper, we propose a Motion Saliency based multi-stream Multiplier ResNets (MSM-ResNets) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. Similar to conventional two-stream CNNs models, the appearance stream and motion stream are responsible for capturing the appearance information and motion information, respectively, while the motion saliency stream is responsible for capturing the salient motion information. In particular, to effectively utilize the spatiotemporal interactive information between different streams, the proposed MSM-ResNets model establishes interactive connections between different streams instead of fusing three streams at the final output layer. Two kinds of different multiplicative connections are injected, the first one is to inject multiplicative connections from the motion stream to the appearance stream, while the second one is to inject multiplicative connections from the motion saliency stream to the motion stream. Experimental results verify the effectiveness of the proposed MSM-ResNets on two standard action recognition datasets: UCF101 and HMDB51.

Original language	English
Article number	104108
Journal	Image and Vision Computing
Volume	107
DOIs	https://doi.org/10.1016/j.imavis.2021.104108
Publication status	Published - Mar 2021
Externally published	Yes

Keywords

Action recognition
Motion saliency
Multiplicative connections
Spatiotemporal interactive information

ASJC Scopus subject areas

Signal Processing
Computer Vision and Pattern Recognition

Access to Document

10.1016/j.imavis.2021.104108

Cite this

@article{720f7d267fe44a48aa26d4304c5c623f,

title = "Motion saliency based multi-stream multiplier ResNets for action recognition",

abstract = "In this paper, we propose a Motion Saliency based multi-stream Multiplier ResNets (MSM-ResNets) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. Similar to conventional two-stream CNNs models, the appearance stream and motion stream are responsible for capturing the appearance information and motion information, respectively, while the motion saliency stream is responsible for capturing the salient motion information. In particular, to effectively utilize the spatiotemporal interactive information between different streams, the proposed MSM-ResNets model establishes interactive connections between different streams instead of fusing three streams at the final output layer. Two kinds of different multiplicative connections are injected, the first one is to inject multiplicative connections from the motion stream to the appearance stream, while the second one is to inject multiplicative connections from the motion saliency stream to the motion stream. Experimental results verify the effectiveness of the proposed MSM-ResNets on two standard action recognition datasets: UCF101 and HMDB51.",

keywords = "Action recognition, Motion saliency, Multiplicative connections, Spatiotemporal interactive information",

author = "Ming Zong and Ruili Wang and Xiubo Chen and Zhe Chen and Yuanhao Gong",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = mar,

doi = "10.1016/j.imavis.2021.104108",

language = "English",

volume = "107",

journal = "Image and Vision Computing",

issn = "0262-8856",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Motion saliency based multi-stream multiplier ResNets for action recognition

AU - Zong, Ming

AU - Wang, Ruili

AU - Chen, Xiubo

AU - Chen, Zhe

AU - Gong, Yuanhao

PY - 2021/3

Y1 - 2021/3

N2 - In this paper, we propose a Motion Saliency based multi-stream Multiplier ResNets (MSM-ResNets) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. Similar to conventional two-stream CNNs models, the appearance stream and motion stream are responsible for capturing the appearance information and motion information, respectively, while the motion saliency stream is responsible for capturing the salient motion information. In particular, to effectively utilize the spatiotemporal interactive information between different streams, the proposed MSM-ResNets model establishes interactive connections between different streams instead of fusing three streams at the final output layer. Two kinds of different multiplicative connections are injected, the first one is to inject multiplicative connections from the motion stream to the appearance stream, while the second one is to inject multiplicative connections from the motion saliency stream to the motion stream. Experimental results verify the effectiveness of the proposed MSM-ResNets on two standard action recognition datasets: UCF101 and HMDB51.

AB - In this paper, we propose a Motion Saliency based multi-stream Multiplier ResNets (MSM-ResNets) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. Similar to conventional two-stream CNNs models, the appearance stream and motion stream are responsible for capturing the appearance information and motion information, respectively, while the motion saliency stream is responsible for capturing the salient motion information. In particular, to effectively utilize the spatiotemporal interactive information between different streams, the proposed MSM-ResNets model establishes interactive connections between different streams instead of fusing three streams at the final output layer. Two kinds of different multiplicative connections are injected, the first one is to inject multiplicative connections from the motion stream to the appearance stream, while the second one is to inject multiplicative connections from the motion saliency stream to the motion stream. Experimental results verify the effectiveness of the proposed MSM-ResNets on two standard action recognition datasets: UCF101 and HMDB51.

KW - Action recognition

KW - Motion saliency

KW - Multiplicative connections

KW - Spatiotemporal interactive information

UR - http://www.scopus.com/inward/record.url?scp=85099878452&partnerID=8YFLogxK

U2 - 10.1016/j.imavis.2021.104108

DO - 10.1016/j.imavis.2021.104108

M3 - Article

AN - SCOPUS:85099878452

SN - 0262-8856

VL - 107

JO - Image and Vision Computing

JF - Image and Vision Computing

M1 - 104108

ER -

Motion saliency based multi-stream multiplier ResNets for action recognition

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this