Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Wangkai Jin; Junyu Liu; Meili Feng; Jianfeng Ren

doi:10.1109/SEAI55746.2022.9832286

Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Wangkai Jin, Junyu Liu, Meili Feng, Jianfeng Ren

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

5 Citations (Scopus)

Abstract

The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We quantitatively examine the benefits of our framework by using Capsule Neural Networks, a state-of-the-art approach for PSED. The experimental results show that our method achieves a 7% reduction in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset.

Original language	English
Title of host publication	2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	146-150
Number of pages	5
ISBN (Electronic)	9781665482233
DOIs	https://doi.org/10.1109/SEAI55746.2022.9832286
Publication status	Published - 2022
Event	2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022 - Xiamen, China Duration: 10 Jun 2022 → 12 Jun 2022

Publication series

Name	2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022

Conference

Conference	2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022
Country/Territory	China
City	Xiamen
Period	10/06/22 → 12/06/22

Keywords

capsule neural network
polyphonic sound event detection
time-frequency representation

ASJC Scopus subject areas

Artificial Intelligence
Computer Science Applications
Computer Vision and Pattern Recognition
Software
Control and Optimization

Access to Document

10.1109/SEAI55746.2022.9832286

Cite this

Jin, W., Liu, J., Feng, M., & Ren, J. (2022). Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation. In 2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022 (pp. 146-150). (2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SEAI55746.2022.9832286

Jin, Wangkai ; Liu, Junyu ; Feng, Meili et al. / Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation. 2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 146-150 (2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022).

@inproceedings{c4cba781e06b4f4cbb2478ddcac6e3e2,

title = "Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation",

abstract = "The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We quantitatively examine the benefits of our framework by using Capsule Neural Networks, a state-of-the-art approach for PSED. The experimental results show that our method achieves a 7% reduction in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset.",

keywords = "capsule neural network, polyphonic sound event detection, time-frequency representation",

author = "Wangkai Jin and Junyu Liu and Meili Feng and Jianfeng Ren",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022 ; Conference date: 10-06-2022 Through 12-06-2022",

year = "2022",

doi = "10.1109/SEAI55746.2022.9832286",

language = "English",

series = "2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "146--150",

booktitle = "2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022",

address = "United States",

}

Jin, W, Liu, J, Feng, M & Ren, J 2022, Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation. in 2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022. 2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022, Institute of Electrical and Electronics Engineers Inc., pp. 146-150, 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022, Xiamen, China, 10/06/22. https://doi.org/10.1109/SEAI55746.2022.9832286

Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation. / Jin, Wangkai; Liu, Junyu; Feng, Meili et al.
2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022. Institute of Electrical and Electronics Engineers Inc., 2022. p. 146-150 (2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

AU - Jin, Wangkai

AU - Liu, Junyu

AU - Feng, Meili

AU - Ren, Jianfeng

PY - 2022

Y1 - 2022

N2 - The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We quantitatively examine the benefits of our framework by using Capsule Neural Networks, a state-of-the-art approach for PSED. The experimental results show that our method achieves a 7% reduction in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset.

AB - The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We quantitatively examine the benefits of our framework by using Capsule Neural Networks, a state-of-the-art approach for PSED. The experimental results show that our method achieves a 7% reduction in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset.

KW - capsule neural network

KW - polyphonic sound event detection

KW - time-frequency representation

UR - http://www.scopus.com/inward/record.url?scp=85136329114&partnerID=8YFLogxK

U2 - 10.1109/SEAI55746.2022.9832286

DO - 10.1109/SEAI55746.2022.9832286

M3 - Conference contribution

AN - SCOPUS:85136329114

T3 - 2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022

SP - 146

EP - 150

BT - 2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022

Y2 - 10 June 2022 through 12 June 2022

ER -

Jin W, Liu J, Feng M , Ren J. Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation. In 2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022. Institute of Electrical and Electronics Engineers Inc. 2022. p. 146-150. (2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022). doi: 10.1109/SEAI55746.2022.9832286

Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this