Adversarial latent representation learning for speech enhancement

Yuanhang Qiu, Ruili Wang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

This paper proposes a novel adversarial latent representation learning (ALRL) method for speech enhancement. Based on adversarial feature learning, ALRL employs an extra encoder to learn an inverse mapping from the generated data distribution to the latent space. The encoder builds an inner connection with the generator, and provides relevant latent information for adversarial feature modelling. A new loss function is proposed to implement the encoder mapping simultaneously. In addition, the multi-head self-attention is also applied to the encoder for learning of long-range dependencies and further effective adversarial representations. The experimental results demonstrate that ALRL outperforms current GAN-based speech enhancement methods.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages2662-2666
Number of pages5
ISBN (Print)9781713820697
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords

  • Adversarial feature learning
  • Latent space
  • Speech enhancement

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Adversarial latent representation learning for speech enhancement'. Together they form a unique fingerprint.

Cite this