Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Wangkai Jin, Junyu Liu, Meili Feng, Jianfeng Ren

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We quantitatively examine the benefits of our framework by using Capsule Neural Networks, a state-of-the-art approach for PSED. The experimental results show that our method achieves a 7% reduction in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset.

Original languageEnglish
Title of host publication2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages146-150
Number of pages5
ISBN (Electronic)9781665482233
DOIs
Publication statusPublished - 2022
Event2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022 - Xiamen, China
Duration: 10 Jun 202212 Jun 2022

Publication series

Name2022 2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022

Conference

Conference2nd IEEE International Conference on Software Engineering and Artificial Intelligence, SEAI 2022
Country/TerritoryChina
CityXiamen
Period10/06/2212/06/22

Keywords

  • capsule neural network
  • polyphonic sound event detection
  • time-frequency representation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Software
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation'. Together they form a unique fingerprint.

Cite this