A three-level framework for affective content analysis and its case studies

Min Xu, Jinqiao Wang, Xiangjian He, Jesse S. Jin, Suhuai Luo, Hanqing Lu

Research output: Journal PublicationArticlepeer-review

25 Citations (Scopus)


Emotional factors directly reflect audiences' attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations.

Original languageEnglish
Pages (from-to)757-779
Number of pages23
JournalMultimedia Tools and Applications
Issue number2
Publication statusPublished - May 2014
Externally publishedYes


  • Affective content analysis
  • Mid-level representation
  • Multiple modality

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'A three-level framework for affective content analysis and its case studies'. Together they form a unique fingerprint.

Cite this