TY - GEN
T1 - Violence detection based on spatio-temporal feature and fisher vector
AU - Cai, Huangkai
AU - Jiang, He
AU - Huang, Xiaolin
AU - Yang, Jie
AU - He, Xiangjian
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - A novel framework based on local spatio-temporal features and a Bag-of-Words (BoW) model is proposed for violence detection. The framework utilizes Dense Trajectories (DT) and MPEG flow video descriptor (MF) as feature descriptors and employs Fisher Vector (FV) in feature coding. DT and MF algorithms are more descriptive and robust, because they are combinations of various feature descriptors, which describe trajectory shape, appearance, motion and motion boundary, respectively. FV is applied to transform low level features to high level features. FV method preserves much information, because not only the affiliations of descriptors are found in the codebook, but also the first and second order statistics are used to represent videos. Some tricks, that PCA, K-means++ and codebook size, are used to improve the final performance of video classification. In comprehensive consideration of accuracy, speed and application scenarios, the proposed method for violence detection is analysed. Experimental results show that the proposed approach outperforms the state-of-the-art approaches for violence detection in both crowd scenes and non-crowd scenes.
AB - A novel framework based on local spatio-temporal features and a Bag-of-Words (BoW) model is proposed for violence detection. The framework utilizes Dense Trajectories (DT) and MPEG flow video descriptor (MF) as feature descriptors and employs Fisher Vector (FV) in feature coding. DT and MF algorithms are more descriptive and robust, because they are combinations of various feature descriptors, which describe trajectory shape, appearance, motion and motion boundary, respectively. FV is applied to transform low level features to high level features. FV method preserves much information, because not only the affiliations of descriptors are found in the codebook, but also the first and second order statistics are used to represent videos. Some tricks, that PCA, K-means++ and codebook size, are used to improve the final performance of video classification. In comprehensive consideration of accuracy, speed and application scenarios, the proposed method for violence detection is analysed. Experimental results show that the proposed approach outperforms the state-of-the-art approaches for violence detection in both crowd scenes and non-crowd scenes.
KW - Dense Trajectories
KW - Fisher Vector
KW - Linear support vector machine
KW - MPEG flow video descriptor
KW - Violence detection
UR - http://www.scopus.com/inward/record.url?scp=85057121608&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-03398-9_16
DO - 10.1007/978-3-030-03398-9_16
M3 - Conference contribution
AN - SCOPUS:85057121608
SN - 9783030033972
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 180
EP - 190
BT - Pattern Recognition and Computer Vision - First Chinese Conference, PRCV 2018, Proceedings
A2 - Lai, Jian-Huang
A2 - Zha, Hongbin
A2 - Zhou, Jie
A2 - Liu, Cheng-Lin
A2 - Tan, Tieniu
A2 - Zheng, Nanning
A2 - Chen, Xilin
PB - Springer Verlag
T2 - 1st Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2018
Y2 - 23 November 2018 through 26 November 2018
ER -