Learnt dictionary based active learning method for environmental sound event tagging

Xiao Qin, Wanting Ji, Ruili Wang, Chang An Yuan

Research output: Journal PublicationArticlepeer-review

3 Citations (Scopus)

Abstract

Sound event tagging is a process that adds texts or labels to sound segments based on their salient features and/or annotations. In the real world, since annotating cost is much expensive, tagged sound segments are limited, while untagged sound segments can be obtained easily and inexpensively. Thus, semi-automatic tagging becomes very important, which can assign labels to massive untagged sound segments according to a small number of manually annotated sound segments. Active learning is an effective technique to solve this problem, in which selected sound segments are manually tagged while other sound segments are automatically tagged. In this paper, a learnt dictionary based active learning method is proposed for environmental sound event tagging, which can significantly reduce the annotating cost in the process of semi-automatic tagging. The proposed method is based on a learnt dictionary, as dictionary learning is more adapt to sound feature extraction. Moreover, tagging accuracy and annotating cost are used to measure the performance of the proposed method. Experimental results demonstrate that the proposed method has higher tagging accuracy but requires much less annotating cost than other existing methods.

Original languageEnglish
Pages (from-to)29493-29508
Number of pages16
JournalMultimedia Tools and Applications
Volume78
Issue number20
DOIs
Publication statusPublished - 1 Oct 2019
Externally publishedYes

Keywords

  • Active learning
  • Dictionary learning
  • Internet of things
  • k-medoids clustering
  • Sound event tagging
  • Sparse coding

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Learnt dictionary based active learning method for environmental sound event tagging'. Together they form a unique fingerprint.

Cite this