Abstract
Speech emotion recognition has been widely used in many applications such as call centres and mental health monitoring. However, speech emotion recognition still faces great challenges due to the diversity of speech features and complexity of emotion, especially the problem of inadequate feature extraction. To enhance the ability to capture emotional features, a dual-channel emotional perception network (DCEPNet) is proposed: (i) For the first channel, a multi-branch time-domain perception (MBT) is proposed to capture key emotional segments in the speech signal. (ii) For the second channel, a multi-window transformer (MWFormer) is proposed to solve the problem of insufficient emotion multi-granularity information extraction. Experimental results demonstrate the proposed model outperforms the state-of-the-art models on the CASIA Chinese dataset.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024 |
| Publisher | Association for Computing Machinery, Inc |
| ISBN (Electronic) | 9798400712739 |
| DOIs | |
| Publication status | Published - 28 Dec 2024 |
| Externally published | Yes |
| Event | 6th ACM International Conference on Multimedia in Asia, MMAsia 2024 - Auckland, New Zealand Duration: 3 Dec 2024 → 6 Dec 2024 |
Publication series
| Name | Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024 |
|---|
Conference
| Conference | 6th ACM International Conference on Multimedia in Asia, MMAsia 2024 |
|---|---|
| Country/Territory | New Zealand |
| City | Auckland |
| Period | 3/12/24 → 6/12/24 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Free Keywords
- Dual-channel emotional perception network
- Multi-branch time-domain perception
- Multi-window transformer
- Speech emotion recognition
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design
- Human-Computer Interaction
Fingerprint
Dive into the research topics of 'DCEPNet: Dual-Channel Emotional Perception Network for Speech Emotion Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver