FrameRank: A text processing approach to video summarization

Zhuo Lei; Chao Zhang; Qian Zhang; Guoping Qiu

doi:10.1109/ICME.2019.00071

FrameRank: A text processing approach to video summarization

Zhuo Lei, Chao Zhang, Qian Zhang, Guoping Qiu

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

11 Citations (Scopus)

Abstract

Video summarization has been extensively studied in the past decades. However, user-generated video summarization is much less explored since there lack large-scale video datasets within which human-generated video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated video summarization dataset - UGSum52 - that consists of 52 videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated video summarization, we manually annotate 25 summaries for each video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated video summarization. Based on this dataset, we present FrameRank, an unsupervised video summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.

Original language	English
Title of host publication	Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019
Publisher	IEEE Computer Society
Pages	368-373
Number of pages	6
ISBN (Electronic)	9781538695524
DOIs	https://doi.org/10.1109/ICME.2019.00071
Publication status	Published - Jul 2019
Event	2019 IEEE International Conference on Multimedia and Expo, ICME 2019 - Shanghai, China Duration: 8 Jul 2019 → 12 Jul 2019

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
Volume	2019-July
ISSN (Print)	1945-7871
ISSN (Electronic)	1945-788X

Conference

Conference	2019 IEEE International Conference on Multimedia and Expo, ICME 2019
Country/Territory	China
City	Shanghai
Period	8/07/19 → 12/07/19

Keywords

FrameRank
Graph
KL divergence
Unsupervised learning
Video summarization

ASJC Scopus subject areas

Computer Networks and Communications
Computer Science Applications

Access to Document

10.1109/ICME.2019.00071

Cite this

Lei, Z., Zhang, C., Zhang, Q., & Qiu, G. (2019). FrameRank: A text processing approach to video summarization. In Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019 (pp. 368-373). Article 8785002 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2019-July). IEEE Computer Society. https://doi.org/10.1109/ICME.2019.00071

@inproceedings{e1bb054081a949709bcc5da9259d66d1,

title = "FrameRank: A text processing approach to video summarization",

abstract = "Video summarization has been extensively studied in the past decades. However, user-generated video summarization is much less explored since there lack large-scale video datasets within which human-generated video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated video summarization dataset - UGSum52 - that consists of 52 videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated video summarization, we manually annotate 25 summaries for each video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated video summarization. Based on this dataset, we present FrameRank, an unsupervised video summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.",

keywords = "FrameRank, Graph, KL divergence, Unsupervised learning, Video summarization",

author = "Zhuo Lei and Chao Zhang and Qian Zhang and Guoping Qiu",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 IEEE International Conference on Multimedia and Expo, ICME 2019 ; Conference date: 08-07-2019 Through 12-07-2019",

year = "2019",

month = jul,

doi = "10.1109/ICME.2019.00071",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

pages = "368--373",

booktitle = "Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019",

address = "United States",

}

Lei, Z, Zhang, C, Zhang, Q & Qiu, G 2019, FrameRank: A text processing approach to video summarization. in Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019., 8785002, Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2019-July, IEEE Computer Society, pp. 368-373, 2019 IEEE International Conference on Multimedia and Expo, ICME 2019, Shanghai, China, 8/07/19. https://doi.org/10.1109/ICME.2019.00071

FrameRank: A text processing approach to video summarization. / Lei, Zhuo; Zhang, Chao; Zhang, Qian et al.
Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019. IEEE Computer Society, 2019. p. 368-373 8785002 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2019-July).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - FrameRank

T2 - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019

AU - Lei, Zhuo

AU - Zhang, Chao

AU - Zhang, Qian

AU - Qiu, Guoping

PY - 2019/7

Y1 - 2019/7

N2 - Video summarization has been extensively studied in the past decades. However, user-generated video summarization is much less explored since there lack large-scale video datasets within which human-generated video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated video summarization dataset - UGSum52 - that consists of 52 videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated video summarization, we manually annotate 25 summaries for each video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated video summarization. Based on this dataset, we present FrameRank, an unsupervised video summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.

AB - Video summarization has been extensively studied in the past decades. However, user-generated video summarization is much less explored since there lack large-scale video datasets within which human-generated video summaries are unambiguously defined and annotated. Toward this end, we propose a user-generated video summarization dataset - UGSum52 - that consists of 52 videos (207 minutes). In constructing the dataset, because of the subjectivity of user-generated video summarization, we manually annotate 25 summaries for each video, which are in total 1300 summaries. To the best of our knowledge, it is currently the largest dataset for user-generated video summarization. Based on this dataset, we present FrameRank, an unsupervised video summarization method that employs a frame-to-frame level affinity graph to identify coherent and informative frames to summarize a video. We use the Kullback-Leibler(KL)-divergence-based graph to rank temporal segments according to the amount of semantic information contained in their frames. We illustrate the effectiveness of our method by applying it to three datasets SumMe, TVSum and UGSum52 and show it achieves state-of-the-art results.

KW - FrameRank

KW - Graph

KW - KL divergence

KW - Unsupervised learning

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85070947733&partnerID=8YFLogxK

U2 - 10.1109/ICME.2019.00071

DO - 10.1109/ICME.2019.00071

M3 - Conference contribution

AN - SCOPUS:85070947733

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 368

EP - 373

BT - Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019

PB - IEEE Computer Society

Y2 - 8 July 2019 through 12 July 2019

ER -

FrameRank: A text processing approach to video summarization

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this