PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition

Chengxi Lei, Satwinder Singh, Feng Hou, Xiaoyun Jia, Ruili Wang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Most of the current speech data augmentation methods operate on either the raw waveform or the amplitude spectrum of speech. In this paper, we propose a novel speech data augmentation method called PhasePerturbation that operates dynamically on the phase spectrum of speech. Instead of statically rotating a phase by a constant degree, PhasePerturbation utilizes three dynamic phase spectrum operations, i.e., a randomization operation, a frequency masking operation, and a temporal masking operation, to enhance the diversity of speech data. We conduct experiments on wav2vec2.0 pre-trained ASR models by fine-tuning them with the PhasePerturbation augmented TIMIT corpus. The experimental results demonstrate 10.9% relative reduction in the word error rate (WER) compared with the baseline model fine-tuned without any augmentation operation. Furthermore, the proposed method achieves additional improvements (12.9% and 15.9%) in WER by complementing the Vocal Tract Length Perturbation (VTLP) and the SpecAug, which are both amplitude spectrum-based augmentation methods. The results highlight the capability of PhasePerturbation to improve the current amplitude spectrum-based augmentation methods.

Original languageEnglish
Title of host publicationProceedings of the 5th ACM International Conference on Multimedia in Asia, MMAsia 2023 Workshops
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400703263
DOIs
Publication statusPublished - 6 Dec 2023
Externally publishedYes
Event5th ACM International Conference on Multimedia in Asia, MMAsia 2023 Workshops - Hybrid, Tainan, Taiwan, Province of China
Duration: 6 Dec 20238 Dec 2023

Publication series

NameProceedings of the 5th ACM International Conference on Multimedia in Asia, MMAsia 2023 Workshops

Conference

Conference5th ACM International Conference on Multimedia in Asia, MMAsia 2023 Workshops
Country/TerritoryTaiwan, Province of China
CityHybrid, Tainan
Period6/12/238/12/23

Keywords

  • data augmentation
  • phase spectrum augmentation
  • speech recognition

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition'. Together they form a unique fingerprint.

Cite this