DCEPNet: Dual-Channel Emotional Perception Network for Speech Emotion Recognition

Fei Xiang, Hongbo Liu, Ruili Wang, Junjie Hou, Xingang Wang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Speech emotion recognition has been widely used in many applications such as call centres and mental health monitoring. However, speech emotion recognition still faces great challenges due to the diversity of speech features and complexity of emotion, especially the problem of inadequate feature extraction. To enhance the ability to capture emotional features, a dual-channel emotional perception network (DCEPNet) is proposed: (i) For the first channel, a multi-branch time-domain perception (MBT) is proposed to capture key emotional segments in the speech signal. (ii) For the second channel, a multi-window transformer (MWFormer) is proposed to solve the problem of insufficient emotion multi-granularity information extraction. Experimental results demonstrate the proposed model outperforms the state-of-the-art models on the CASIA Chinese dataset.

Original languageEnglish
Title of host publicationProceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400712739
DOIs
Publication statusPublished - 28 Dec 2024
Externally publishedYes
Event6th ACM International Conference on Multimedia in Asia, MMAsia 2024 - Auckland, New Zealand
Duration: 3 Dec 20246 Dec 2024

Publication series

NameProceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024

Conference

Conference6th ACM International Conference on Multimedia in Asia, MMAsia 2024
Country/TerritoryNew Zealand
CityAuckland
Period3/12/246/12/24

Keywords

  • Dual-channel emotional perception network
  • Multi-branch time-domain perception
  • Multi-window transformer
  • Speech emotion recognition

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'DCEPNet: Dual-Channel Emotional Perception Network for Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this