Abstract
Virtual streamers have been increasingly adopted in entertainment live streaming, yet the effectiveness of their social interactions remains unexplored. Owing to the avatar abstraction of faces and the limited flexibility of facial expression, they rely primarily on vocal tone and textual content to convey emotion. Drawing on the Elaboration Likelihood Model and Cognitive Tuning Theory, this study examines how streamers' cross-modal emotional misalignment between voice and text influences viewer engagement through streamer-viewer emotional synchrony. Using moment-to-moment data and machine learning–based emotion recognition techniques, we find that greater cross-modal emotional misalignment of streamers increases viewer engagement by heightening viewers’ emotional responses to vocal cues. Additionally, the positivity of the streamer’s vocal tone strengthens the effect of cross-modal emotional misalignment on vocal–emotional synchrony. Finally, we reveal the dual effects of cross-modal emotional misalignment on viewer consumption; while it increases short-term spending on paid comments and virtual gifting, it reduces long-term commitment in the form of premium subscriptions. Our study contributes to the research on live streaming and emotional interaction, and provides practical implications for designing emotionally intelligent virtual streamers.
| Original language | English |
|---|---|
| Article number | 104222 |
| Journal | Information and Management |
| DOIs | |
| Publication status | Published Online - 27 Jul 2025 |
Keywords
- Multimodal emotions
- Emotional alignment
- Streamer-viewer emotional synchrony
- Elaboration Likelihood Model
- Cognitive Tuning theory
- Virtual streamer
- Live streaming