Abstract
Recent speech data augmentation approaches use static augmentation operations or policies with consistency magnitude scaling. However, few work is done to explore the influence of the dynamic magnitude of augmentation policies. In this paper, we propose a novel speech data augmentation approach, CyclicAugment, to generate more diversified augmentation policies by dynamically configuring the magnitude of augmentation policies with a cosine annealing scheduler. We also propose additional augmentation operations to enlarge the diversity of augmentation policies. Motivated by learning rate warm restart and cyclical learning rates, we hypothesize that using dynamically configured magnitude for augmentation policies can also help escape local optima more efficiently than static augmentation policies with consistency magnitude scaling. Experimental results demonstrate that our approach is effective for escaping local optima. Our approach achieves 12%-35% relative improvement in word error rate (WER) over SpecAugment and RandAugment on the LibriSpeech 960h dataset, and achieves state-of-the-art result 7.1% in phoneme error rate (PER) on the TIMIT 5h dataset.
Original language | English |
---|---|
Pages (from-to) | 3859-3863 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2022-September |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022 |
Keywords
- cosine annealing scheduler
- data augmentation
- random augmentation
- speech recognition
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation