CyclicAugment: Speech Data Random Augmentation with Cosine Annealing Scheduler for Automatic Speech Recognition

Zhihan Wang; Feng Hou; Yuanhang Qiu; Zhizhong Ma; Satwinder Singh; Ruili Wang

doi:10.21437/Interspeech.2022-526

CyclicAugment: Speech Data Random Augmentation with Cosine Annealing Scheduler for Automatic Speech Recognition

Zhihan Wang, Feng Hou, Yuanhang Qiu, Zhizhong Ma, Satwinder Singh, Ruili Wang

Research output: Journal Publication › Conference article › peer-review

6 Citations (Scopus)

Abstract

Recent speech data augmentation approaches use static augmentation operations or policies with consistency magnitude scaling. However, few work is done to explore the influence of the dynamic magnitude of augmentation policies. In this paper, we propose a novel speech data augmentation approach, CyclicAugment, to generate more diversified augmentation policies by dynamically configuring the magnitude of augmentation policies with a cosine annealing scheduler. We also propose additional augmentation operations to enlarge the diversity of augmentation policies. Motivated by learning rate warm restart and cyclical learning rates, we hypothesize that using dynamically configured magnitude for augmentation policies can also help escape local optima more efficiently than static augmentation policies with consistency magnitude scaling. Experimental results demonstrate that our approach is effective for escaping local optima. Our approach achieves 12%-35% relative improvement in word error rate (WER) over SpecAugment and RandAugment on the LibriSpeech 960h dataset, and achieves state-of-the-art result 7.1% in phoneme error rate (PER) on the TIMIT 5h dataset.

Original language	English
Pages (from-to)	3859-3863
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-526
Publication status	Published - 2022
Externally published	Yes
Event	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022

Keywords

cosine annealing scheduler
data augmentation
random augmentation
speech recognition

ASJC Scopus subject areas

Language and Linguistics
Human-Computer Interaction
Signal Processing
Software
Modelling and Simulation

Access to Document

10.21437/Interspeech.2022-526

Cite this

@article{c8c4d2a5d3a5466f95955d13562f2b73,

title = "CyclicAugment: Speech Data Random Augmentation with Cosine Annealing Scheduler for Automatic Speech Recognition",

abstract = "Recent speech data augmentation approaches use static augmentation operations or policies with consistency magnitude scaling. However, few work is done to explore the influence of the dynamic magnitude of augmentation policies. In this paper, we propose a novel speech data augmentation approach, CyclicAugment, to generate more diversified augmentation policies by dynamically configuring the magnitude of augmentation policies with a cosine annealing scheduler. We also propose additional augmentation operations to enlarge the diversity of augmentation policies. Motivated by learning rate warm restart and cyclical learning rates, we hypothesize that using dynamically configured magnitude for augmentation policies can also help escape local optima more efficiently than static augmentation policies with consistency magnitude scaling. Experimental results demonstrate that our approach is effective for escaping local optima. Our approach achieves 12\%-35\% relative improvement in word error rate (WER) over SpecAugment and RandAugment on the LibriSpeech 960h dataset, and achieves state-of-the-art result 7.1\% in phoneme error rate (PER) on the TIMIT 5h dataset.",

keywords = "cosine annealing scheduler, data augmentation, random augmentation, speech recognition",

author = "Zhihan Wang and Feng Hou and Yuanhang Qiu and Zhizhong Ma and Satwinder Singh and Ruili Wang",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 ISCA.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-526",

language = "English",

volume = "2022-September",

pages = "3859--3863",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

publisher = "International Speech Communication Association",

}

CyclicAugment: Speech Data Random Augmentation with Cosine Annealing Scheduler for Automatic Speech Recognition. / Wang, Zhihan; Hou, Feng; Qiu, Yuanhang et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2022-September, 2022, p. 3859-3863.

Research output: Journal Publication › Conference article › peer-review

TY - JOUR

T1 - CyclicAugment

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

AU - Wang, Zhihan

AU - Hou, Feng

AU - Qiu, Yuanhang

AU - Ma, Zhizhong

AU - Singh, Satwinder

AU - Wang, Ruili

PY - 2022

Y1 - 2022

N2 - Recent speech data augmentation approaches use static augmentation operations or policies with consistency magnitude scaling. However, few work is done to explore the influence of the dynamic magnitude of augmentation policies. In this paper, we propose a novel speech data augmentation approach, CyclicAugment, to generate more diversified augmentation policies by dynamically configuring the magnitude of augmentation policies with a cosine annealing scheduler. We also propose additional augmentation operations to enlarge the diversity of augmentation policies. Motivated by learning rate warm restart and cyclical learning rates, we hypothesize that using dynamically configured magnitude for augmentation policies can also help escape local optima more efficiently than static augmentation policies with consistency magnitude scaling. Experimental results demonstrate that our approach is effective for escaping local optima. Our approach achieves 12%-35% relative improvement in word error rate (WER) over SpecAugment and RandAugment on the LibriSpeech 960h dataset, and achieves state-of-the-art result 7.1% in phoneme error rate (PER) on the TIMIT 5h dataset.

AB - Recent speech data augmentation approaches use static augmentation operations or policies with consistency magnitude scaling. However, few work is done to explore the influence of the dynamic magnitude of augmentation policies. In this paper, we propose a novel speech data augmentation approach, CyclicAugment, to generate more diversified augmentation policies by dynamically configuring the magnitude of augmentation policies with a cosine annealing scheduler. We also propose additional augmentation operations to enlarge the diversity of augmentation policies. Motivated by learning rate warm restart and cyclical learning rates, we hypothesize that using dynamically configured magnitude for augmentation policies can also help escape local optima more efficiently than static augmentation policies with consistency magnitude scaling. Experimental results demonstrate that our approach is effective for escaping local optima. Our approach achieves 12%-35% relative improvement in word error rate (WER) over SpecAugment and RandAugment on the LibriSpeech 960h dataset, and achieves state-of-the-art result 7.1% in phoneme error rate (PER) on the TIMIT 5h dataset.

KW - cosine annealing scheduler

KW - data augmentation

KW - random augmentation

KW - speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85135626785&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-526

DO - 10.21437/Interspeech.2022-526

M3 - Conference article

AN - SCOPUS:85135626785

SN - 2308-457X

VL - 2022-September

SP - 3859

EP - 3863

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 18 September 2022 through 22 September 2022

ER -

CyclicAugment: Speech Data Random Augmentation with Cosine Annealing Scheduler for Automatic Speech Recognition

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this