Learning deep semantic attributes for user video summarization

Ke Sun, Jiasong Zhu, Zhuo Lei, Xianxu Hou, Qian Zhang, Jiang Duan, Guoping Qiu

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

12 Citations (Scopus)


This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features as well as predict the semantic attributes of video segments which enables us to represent video contents with visual and semantic features simultaneously. Thirdly, we construct a temporally constrained video segment affinity matrix and use a partially near duplicate image discovery technique to cluster visually and semantically consistent video frames together. These frame clusters can then be condensed to form an informative and compact summary of the video. We will present experimental results to show the effectiveness of the semantic attributes in assisting the visual features in video summarization and our new technique achieves state-of-the-art performance.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Multimedia and Expo, ICME 2017
PublisherIEEE Computer Society
Number of pages6
ISBN (Electronic)9781509060672
Publication statusPublished - 28 Aug 2017
Event2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hong Kong
Duration: 10 Jul 201714 Jul 2017

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X


Conference2017 IEEE International Conference on Multimedia and Expo, ICME 2017
Country/TerritoryHong Kong
CityHong Kong


  • Bundling Center Clustering
  • Deep Convolution Neural Network
  • Semantic Attribute
  • Video Summarization

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications


Dive into the research topics of 'Learning deep semantic attributes for user video summarization'. Together they form a unique fingerprint.

Cite this