Abstract
Some generative adversarial networks (GANs) have been developed to remove background noise in real-world audio recordings. MetricGAN and its variants focus on generating a clean spectrogram from a noisy one, but the final audio quality can't be guaranteed. SEGAN and its variants directly generate an enhanced audio from a noisy one, but their over-long input representations make it less effective in identifying and removing audio noise. In this letter, a novel GAN-in-GAN framework is proposed, where the inner GAN conducts spectrogram-to-spectrogram recovery under the supervision of metric discriminators to effectively clean the audio noise, and the outer GAN conducts an audio-to-audio recovery under the supervision of multi-resolution discriminators to optimize the final audio quality. To tackle the challenges of utilizing multiple adversarial losses for training the proposed GAN-in-GAN simultaneously, a novel gradient balancing scheme is proposed to facilitate a coherent training. The proposed method is compared with state-of-the-art methods on the VoiceBank+DEMAND dataset for audio denoising. It outperforms all the compared methods.
Original language | English |
---|---|
Pages (from-to) | 853-857 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 30 |
DOIs | |
Publication status | Published - 2023 |
Keywords
- GAN-in-GAN
- Generative adversarial network
- gradient balancing
- speech enhancement
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering
- Applied Mathematics