TY - GEN
T1 - DCEPNet
T2 - 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
AU - Xiang, Fei
AU - Liu, Hongbo
AU - Wang, Ruili
AU - Hou, Junjie
AU - Wang, Xingang
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/12/28
Y1 - 2024/12/28
N2 - Speech emotion recognition has been widely used in many applications such as call centres and mental health monitoring. However, speech emotion recognition still faces great challenges due to the diversity of speech features and complexity of emotion, especially the problem of inadequate feature extraction. To enhance the ability to capture emotional features, a dual-channel emotional perception network (DCEPNet) is proposed: (i) For the first channel, a multi-branch time-domain perception (MBT) is proposed to capture key emotional segments in the speech signal. (ii) For the second channel, a multi-window transformer (MWFormer) is proposed to solve the problem of insufficient emotion multi-granularity information extraction. Experimental results demonstrate the proposed model outperforms the state-of-the-art models on the CASIA Chinese dataset.
AB - Speech emotion recognition has been widely used in many applications such as call centres and mental health monitoring. However, speech emotion recognition still faces great challenges due to the diversity of speech features and complexity of emotion, especially the problem of inadequate feature extraction. To enhance the ability to capture emotional features, a dual-channel emotional perception network (DCEPNet) is proposed: (i) For the first channel, a multi-branch time-domain perception (MBT) is proposed to capture key emotional segments in the speech signal. (ii) For the second channel, a multi-window transformer (MWFormer) is proposed to solve the problem of insufficient emotion multi-granularity information extraction. Experimental results demonstrate the proposed model outperforms the state-of-the-art models on the CASIA Chinese dataset.
KW - Dual-channel emotional perception network
KW - Multi-branch time-domain perception
KW - Multi-window transformer
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85216201111&partnerID=8YFLogxK
U2 - 10.1145/3696409.3700257
DO - 10.1145/3696409.3700257
M3 - Conference contribution
AN - SCOPUS:85216201111
T3 - Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
BT - Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
PB - Association for Computing Machinery, Inc
Y2 - 3 December 2024 through 6 December 2024
ER -