Learning deep semantic attributes for user video summarization

Ke Sun; Jiasong Zhu; Zhuo Lei; Xianxu Hou; Qian Zhang; Jiang Duan; Guoping Qiu

doi:10.1109/ICME.2017.8019411

Learning deep semantic attributes for user video summarization

Ke Sun, Jiasong Zhu, Zhuo Lei, Xianxu Hou, Qian Zhang, Jiang Duan, Guoping Qiu

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

12 Citations (Scopus)

Abstract

This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features as well as predict the semantic attributes of video segments which enables us to represent video contents with visual and semantic features simultaneously. Thirdly, we construct a temporally constrained video segment affinity matrix and use a partially near duplicate image discovery technique to cluster visually and semantically consistent video frames together. These frame clusters can then be condensed to form an informative and compact summary of the video. We will present experimental results to show the effectiveness of the semantic attributes in assisting the visual features in video summarization and our new technique achieves state-of-the-art performance.

Original language	English
Title of host publication	2017 IEEE International Conference on Multimedia and Expo, ICME 2017
Publisher	IEEE Computer Society
Pages	643-648
Number of pages	6
ISBN (Electronic)	9781509060672
DOIs	https://doi.org/10.1109/ICME.2017.8019411
Publication status	Published - 28 Aug 2017
Event	2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hong Kong Duration: 10 Jul 2017 → 14 Jul 2017

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
Volume	0
ISSN (Print)	1945-7871
ISSN (Electronic)	1945-788X

Conference

Conference	2017 IEEE International Conference on Multimedia and Expo, ICME 2017
Country/Territory	Hong Kong
City	Hong Kong
Period	10/07/17 → 14/07/17

Keywords

Bundling Center Clustering
Deep Convolution Neural Network
Semantic Attribute
Video Summarization

ASJC Scopus subject areas

Computer Networks and Communications
Computer Science Applications

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/ICME.2017.8019411

Cite this

Sun, K., Zhu, J., Lei, Z., Hou, X., Zhang, Q., Duan, J., & Qiu, G. (2017). Learning deep semantic attributes for user video summarization. In 2017 IEEE International Conference on Multimedia and Expo, ICME 2017 (pp. 643-648). Article 8019411 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 0). IEEE Computer Society. https://doi.org/10.1109/ICME.2017.8019411

@inproceedings{06dd0c6eedc54de2845cbe98afc7d9ef,

title = "Learning deep semantic attributes for user video summarization",

abstract = "This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features as well as predict the semantic attributes of video segments which enables us to represent video contents with visual and semantic features simultaneously. Thirdly, we construct a temporally constrained video segment affinity matrix and use a partially near duplicate image discovery technique to cluster visually and semantically consistent video frames together. These frame clusters can then be condensed to form an informative and compact summary of the video. We will present experimental results to show the effectiveness of the semantic attributes in assisting the visual features in video summarization and our new technique achieves state-of-the-art performance.",

keywords = "Bundling Center Clustering, Deep Convolution Neural Network, Semantic Attribute, Video Summarization",

author = "Ke Sun and Jiasong Zhu and Zhuo Lei and Xianxu Hou and Qian Zhang and Jiang Duan and Guoping Qiu",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 2017 IEEE International Conference on Multimedia and Expo, ICME 2017 ; Conference date: 10-07-2017 Through 14-07-2017",

year = "2017",

month = aug,

day = "28",

doi = "10.1109/ICME.2017.8019411",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

pages = "643--648",

booktitle = "2017 IEEE International Conference on Multimedia and Expo, ICME 2017",

address = "United States",

}

Sun, K, Zhu, J, Lei, Z, Hou, X, Zhang, Q, Duan, J & Qiu, G 2017, Learning deep semantic attributes for user video summarization. in 2017 IEEE International Conference on Multimedia and Expo, ICME 2017., 8019411, Proceedings - IEEE International Conference on Multimedia and Expo, vol. 0, IEEE Computer Society, pp. 643-648, 2017 IEEE International Conference on Multimedia and Expo, ICME 2017, Hong Kong, Hong Kong, 10/07/17. https://doi.org/10.1109/ICME.2017.8019411

Learning deep semantic attributes for user video summarization. / Sun, Ke; Zhu, Jiasong; Lei, Zhuo et al.
2017 IEEE International Conference on Multimedia and Expo, ICME 2017. IEEE Computer Society, 2017. p. 643-648 8019411 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 0).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Learning deep semantic attributes for user video summarization

AU - Sun, Ke

AU - Zhu, Jiasong

AU - Lei, Zhuo

AU - Hou, Xianxu

AU - Zhang, Qian

AU - Duan, Jiang

AU - Qiu, Guoping

PY - 2017/8/28

Y1 - 2017/8/28

N2 - This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features as well as predict the semantic attributes of video segments which enables us to represent video contents with visual and semantic features simultaneously. Thirdly, we construct a temporally constrained video segment affinity matrix and use a partially near duplicate image discovery technique to cluster visually and semantically consistent video frames together. These frame clusters can then be condensed to form an informative and compact summary of the video. We will present experimental results to show the effectiveness of the semantic attributes in assisting the visual features in video summarization and our new technique achieves state-of-the-art performance.

AB - This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features as well as predict the semantic attributes of video segments which enables us to represent video contents with visual and semantic features simultaneously. Thirdly, we construct a temporally constrained video segment affinity matrix and use a partially near duplicate image discovery technique to cluster visually and semantically consistent video frames together. These frame clusters can then be condensed to form an informative and compact summary of the video. We will present experimental results to show the effectiveness of the semantic attributes in assisting the visual features in video summarization and our new technique achieves state-of-the-art performance.

KW - Bundling Center Clustering

KW - Deep Convolution Neural Network

KW - Semantic Attribute

KW - Video Summarization

UR - http://www.scopus.com/inward/record.url?scp=85030251231&partnerID=8YFLogxK

U2 - 10.1109/ICME.2017.8019411

DO - 10.1109/ICME.2017.8019411

M3 - Conference contribution

AN - SCOPUS:85030251231

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 643

EP - 648

BT - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017

PB - IEEE Computer Society

T2 - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017

Y2 - 10 July 2017 through 14 July 2017

ER -

Learning deep semantic attributes for user video summarization

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this