A three-level framework for affective content analysis and its case studies

Min Xu; Jinqiao Wang; Xiangjian He; Jesse S. Jin; Suhuai Luo; Hanqing Lu

doi:10.1007/s11042-012-1046-8

A three-level framework for affective content analysis and its case studies

Min Xu, Jinqiao Wang, Xiangjian He, Jesse S. Jin, Suhuai Luo, Hanqing Lu

Research output: Journal Publication › Article › peer-review

25 Citations (Scopus)

Abstract

Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations.

Original language	English
Pages (from-to)	757-779
Number of pages	23
Journal	Multimedia Tools and Applications
Volume	70
Issue number	2
DOIs	https://doi.org/10.1007/s11042-012-1046-8
Publication status	Published - May 2014
Externally published	Yes

Keywords

Affective content analysis
Mid-level representation
Multiple modality

ASJC Scopus subject areas

Software
Media Technology
Hardware and Architecture
Computer Networks and Communications

Access to Document

10.1007/s11042-012-1046-8

Cite this

@article{caee6f8873a34301aae0544040b51a01,

title = "A three-level framework for affective content analysis and its case studies",

abstract = "Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations.",

keywords = "Affective content analysis, Mid-level representation, Multiple modality",

author = "Min Xu and Jinqiao Wang and Xiangjian He and Jin, {Jesse S.} and Suhuai Luo and Hanqing Lu",

note = "Funding Information: Acknowledgements This research was supported by National Natural Science Foundation of China No. 61003161, No. 60905008 and UTS ECR Grant.",

year = "2014",

month = may,

doi = "10.1007/s11042-012-1046-8",

language = "English",

volume = "70",

pages = "757--779",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "2",

}

TY - JOUR

T1 - A three-level framework for affective content analysis and its case studies

AU - Xu, Min

AU - Wang, Jinqiao

AU - He, Xiangjian

AU - Jin, Jesse S.

AU - Luo, Suhuai

AU - Lu, Hanqing

N1 - Funding Information: Acknowledgements This research was supported by National Natural Science Foundation of China No. 61003161, No. 60905008 and UTS ECR Grant.

PY - 2014/5

Y1 - 2014/5

N2 - Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations.

AB - Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations.

KW - Affective content analysis

KW - Mid-level representation

KW - Multiple modality

UR - http://www.scopus.com/inward/record.url?scp=84901983723&partnerID=8YFLogxK

U2 - 10.1007/s11042-012-1046-8

DO - 10.1007/s11042-012-1046-8

M3 - Article

AN - SCOPUS:84901983723

SN - 1380-7501

VL - 70

SP - 757

EP - 779

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 2

ER -

A three-level framework for affective content analysis and its case studies

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this