Mix-fine-tune: An Alternate Fine-tuning Strategy for Domain Adaptation and Generalization of Low-resource ASR

Chengxi Lei, Satwinder Singh, Feng Hou, Ruili Wang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Self-supervised Learning (SSL) using extensive unlabeled speech data has significantly improved the performance of ASR models on datasets like LibriSpeech. However, few studies have addressed the issue of domain mismatch between the data used to pre-train and fine-tune ASR models. Moreover, the Empirical Risk Minimization (ERM) principle, commonly used to train deep learning models, often causes the trained models to exhibit undesirable behaviors such as memorizing training data and being sensitive to adversarial examples. Thus, in this paper, we propose an alternate fine-tuning strategy, called Mix-fine-tune, to address domain mismatch in ASR systems and the limitations of the ERM training principle. Mix-fine-tune use a data-driven weighted sum of two speech sequences as input and the corresponding text sequences are used to calculate a weighted audio-text alignment Connectionist Temporal Classification (CTC) loss for fine-tuning a pre-trained model. Additionally, Mix-fine-tune incorporates the masked Contrastive Predictive Coding (CPC) loss, previously used exclusively for pre-training, into the fine-tuning process. Our novel strategy alternates between minimizing the CTC loss and the CPC loss to address the domain mismatch between pre-training and fine-tuning. We validate our method by fine-tuning different sizes of the Wav2Vec model using the public Air Traffic Control (ATC) corpus. The experiments show that Mix-fine-tune efficiently adapts the models pre-trained on general speech corpora like LibriSpeech to a specific domain (e.g., the air traffic control domain) by fine-turning.

Original languageEnglish
Title of host publicationProceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400712739
DOIs
Publication statusPublished - 28 Dec 2024
Externally publishedYes
Event6th ACM International Conference on Multimedia in Asia, MMAsia 2024 - Auckland, New Zealand
Duration: 3 Dec 20246 Dec 2024

Publication series

NameProceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024

Conference

Conference6th ACM International Conference on Multimedia in Asia, MMAsia 2024
Country/TerritoryNew Zealand
CityAuckland
Period3/12/246/12/24

Keywords

  • domain adaptation
  • fine-tuning
  • self-supervised learning
  • speech recognition

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Mix-fine-tune: An Alternate Fine-tuning Strategy for Domain Adaptation and Generalization of Low-resource ASR'. Together they form a unique fingerprint.

Cite this