TY - GEN
T1 - Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition
AU - Liu, Zhenbing
AU - Li, Zeya
AU - Zong, Ming
AU - Ji, Wanting
AU - Wang, Ruili
AU - Tian, Yan
N1 - Publisher Copyright:
© 2020, Springer Nature Singapore Pte Ltd.
PY - 2020
Y1 - 2020
N2 - Human action recognition is a challenging research topic since videos often contain clutter backgrounds, which impairs the performance of human action recognition. In this paper, we propose a novel spatiotemporal saliency based multi-stream ResNet for human action recognition, which combines three different streams: a spatial stream with RGB frames as input, a temporal stream with optical flow frames as input, and a spatiotemporal saliency stream with spatiotemporal saliency maps as input. The spatiotemporal saliency stream is responsible for capturing the spatiotemporal object foreground information from spatiotemporal saliency maps which are generated by a geodesic distance based video segmentation method. Such architecture can reduce the background interference in videos and provide the spatiotemporal object foreground information for human action recognition. Experimental results on UCF101 and HMDB51 datasets demonstrate that the complementary spatiotemporal information can further improve the performance of action recognition, and our proposed method obtains the competitive performance compared with the state-of-the-art methods.
AB - Human action recognition is a challenging research topic since videos often contain clutter backgrounds, which impairs the performance of human action recognition. In this paper, we propose a novel spatiotemporal saliency based multi-stream ResNet for human action recognition, which combines three different streams: a spatial stream with RGB frames as input, a temporal stream with optical flow frames as input, and a spatiotemporal saliency stream with spatiotemporal saliency maps as input. The spatiotemporal saliency stream is responsible for capturing the spatiotemporal object foreground information from spatiotemporal saliency maps which are generated by a geodesic distance based video segmentation method. Such architecture can reduce the background interference in videos and provide the spatiotemporal object foreground information for human action recognition. Experimental results on UCF101 and HMDB51 datasets demonstrate that the complementary spatiotemporal information can further improve the performance of action recognition, and our proposed method obtains the competitive performance compared with the state-of-the-art methods.
KW - Action recognition
KW - ResNet
KW - Spatiotemporal saliency map image
UR - http://www.scopus.com/inward/record.url?scp=85082986414&partnerID=8YFLogxK
U2 - 10.1007/978-981-15-3651-9_8
DO - 10.1007/978-981-15-3651-9_8
M3 - Conference contribution
AN - SCOPUS:85082986414
SN - 9789811536502
T3 - Communications in Computer and Information Science
SP - 74
EP - 84
BT - Pattern Recognition - ACPR 2019 Workshops, Proceedings
A2 - Cree, Michael
A2 - Huang, Fay
A2 - Yuan, Junsong
A2 - Yan, Wei Qi
PB - Springer
T2 - 5th Asian Conference on Pattern Recognition,ACPR 2019
Y2 - 26 November 2019 through 29 November 2019
ER -