GAN-in-GAN for Monaural Speech Enhancement

Yicun Duan, Jianfeng Ren, Heng Yu, Xudong Jiang

Research output: Journal PublicationArticlepeer-review

4 Citations (Scopus)

Abstract

Some generative adversarial networks (GANs) have been developed to remove background noise in real-world audio recordings. MetricGAN and its variants focus on generating a clean spectrogram from a noisy one, but the final audio quality can't be guaranteed. SEGAN and its variants directly generate an enhanced audio from a noisy one, but their over-long input representations make it less effective in identifying and removing audio noise. In this letter, a novel GAN-in-GAN framework is proposed, where the inner GAN conducts spectrogram-to-spectrogram recovery under the supervision of metric discriminators to effectively clean the audio noise, and the outer GAN conducts an audio-to-audio recovery under the supervision of multi-resolution discriminators to optimize the final audio quality. To tackle the challenges of utilizing multiple adversarial losses for training the proposed GAN-in-GAN simultaneously, a novel gradient balancing scheme is proposed to facilitate a coherent training. The proposed method is compared with state-of-the-art methods on the VoiceBank+DEMAND dataset for audio denoising. It outperforms all the compared methods.

Original languageEnglish
Pages (from-to)853-857
Number of pages5
JournalIEEE Signal Processing Letters
Volume30
DOIs
Publication statusPublished - 2023

Keywords

  • GAN-in-GAN
  • Generative adversarial network
  • gradient balancing
  • speech enhancement

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'GAN-in-GAN for Monaural Speech Enhancement'. Together they form a unique fingerprint.

Cite this