CLIP-Guided Bidirectional Prompt and Semantic Supervision for Dynamic Facial Expression Recognition

Junliang Zhang, Xu Liu, Yu Liang, Xiaole Xian, Weicheng Xie, Linlin Shen, Siyang Song

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Due to the insufficient semantic information supervision in existing works for dynamic facial expression recognition (DFER), videos with similar facial changes but different expressions may be easily confused. Thanks to the potential textual information for semantic supervision, contrastive language-image pretraining (CLIP) model provides a new direction for DFER. However, pre-trained CLIP based on image-text pairs has difficulty in capturing temporal features in the video domain. Therefore, we propose a novel visual language model that captures and aggregates dynamic features of expressions in semantic supervision via Inter-Frame Interaction Transformer (Inter-FIT) and Multi-Scale Temporal Aggregation (MSTA). Furthermore, though prompt learning is often used in CLIP to enhance semantic supervision, previous studies have only focused on the role of textual prompts, ignoring the importance of visual prompts in facilitating the relationality between the two. Therefore, we designed a Bidirectional Enhanced Prompt (BiEhPro) to facilitate the learning of this relationality between text and visual cues in enhancing semantic supervision. Extensive experiments and ablation studies on three benchmark datasets, i.e., DFEW, FERV39K, and MAFW, validate the effectiveness of our modules and algorithm. Code is publicly available at https://github.com/JunLiangZ/CLIP-Guided-DFER.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Joint Conference on Biometrics, IJCB 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350364132
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event18th IEEE International Joint Conference on Biometrics, IJCB 2024 - Buffalo, United States
Duration: 15 Sept 202418 Sept 2024

Publication series

NameProceedings - 2024 IEEE International Joint Conference on Biometrics, IJCB 2024

Conference

Conference18th IEEE International Joint Conference on Biometrics, IJCB 2024
Country/TerritoryUnited States
CityBuffalo
Period15/09/2418/09/24

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Biomedical Engineering
  • Instrumentation

Fingerprint

Dive into the research topics of 'CLIP-Guided Bidirectional Prompt and Semantic Supervision for Dynamic Facial Expression Recognition'. Together they form a unique fingerprint.

Cite this