TY - GEN
T1 - Online Emotion-Driven Generation of Multiple Appropriate Facial Reactions
AU - Huang, Jiajian
AU - Song, Siyang
AU - Kong, Xiangyu
AU - Xie, Weicheng
AU - Shen, Linlin
AU - Yu, Zitong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - The multiple appropriate facial reaction online generation task aims to generate real-time, appropriate, and diverse facial reactions for virtual listeners in response to audio-visual behaviours expressed by a human speaker. While recent approaches have focused on improving reaction diversity and coarse synchronicity, they often fail to capture emotionally coherent responses that align with both the emotion type and intensity level of the speaker. In this work, we propose an emotion-driven framework that treats the speaker’s emotional state as the core driving force behind listener behavior. Our framework integrates a pre-trained audio emotion encoder (PAEE) and visual emotion encoder (PVEE) to extract fine-grained emotional representations from speech and facial expressions. We further design a lightweight, online-capable Motion Representation Module (MRM), optimized for real-time generation, that captures emotional intensity through facial motion amplitude and variation, enabling our system to dynamically modulate the strength of listener reactions with low latency. Besides, an Unpredictable Motion Generator (UMG) further introduces minor, stochastic perturbations, making the generated reactions more lifelike and individualized. Extensive experiments demonstrate that our method achieves significant improvements in reaction appropriateness and diversity, while maintaining real-time performance. The codes are available at this link.
AB - The multiple appropriate facial reaction online generation task aims to generate real-time, appropriate, and diverse facial reactions for virtual listeners in response to audio-visual behaviours expressed by a human speaker. While recent approaches have focused on improving reaction diversity and coarse synchronicity, they often fail to capture emotionally coherent responses that align with both the emotion type and intensity level of the speaker. In this work, we propose an emotion-driven framework that treats the speaker’s emotional state as the core driving force behind listener behavior. Our framework integrates a pre-trained audio emotion encoder (PAEE) and visual emotion encoder (PVEE) to extract fine-grained emotional representations from speech and facial expressions. We further design a lightweight, online-capable Motion Representation Module (MRM), optimized for real-time generation, that captures emotional intensity through facial motion amplitude and variation, enabling our system to dynamically modulate the strength of listener reactions with low latency. Besides, an Unpredictable Motion Generator (UMG) further introduces minor, stochastic perturbations, making the generated reactions more lifelike and individualized. Extensive experiments demonstrate that our method achieves significant improvements in reaction appropriateness and diversity, while maintaining real-time performance. The codes are available at this link.
KW - Emotion-driven modeling
KW - Online facial reaction generation
UR - https://www.scopus.com/pages/publications/105031554475
U2 - 10.1007/978-981-95-6123-0_18
DO - 10.1007/978-981-95-6123-0_18
M3 - Conference contribution
AN - SCOPUS:105031554475
SN - 9789819561223
T3 - Lecture Notes in Computer Science
SP - 183
EP - 194
BT - Biometric Recognition - 19th Chinese Conference, CCBR 2025, Proceedings
A2 - Jia, Wei
A2 - Leng, Lu
A2 - Min, Weidong
A2 - Chu, Jun
A2 - Gui, Jie
A2 - Shu, Xiangbo
A2 - Ben, Xianye
A2 - Sun, Zhenan
A2 - Fang, Yuming
PB - Springer Science and Business Media Deutschland GmbH
T2 - 19th Chinese Conference on Biometric Recognition, CCBR 2025
Y2 - 21 November 2025 through 23 November 2025
ER -