TY - GEN
T1 - A Novel Two-step Fine-tuning Framework for Transfer Learning in Low-Resource Neural Machine Translation
AU - Gao, Yuan
AU - Hou, Feng
AU - Wang, Ruili
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Existing transfer learning methods for neural machine translation typically use a well-trained translation model (i.e., a parent model) of a high-resource language pair to directly initialize a translation model (i.e., a child model) of a low-resource language pair, and the child model is then fine-tuned with corresponding datasets. In this paper, we propose a novel two-step fine-tuning (TSFT) framework for transfer learning in low-resource neural machine translation. In the first step, we adjust the parameters of the parent model to fit the child language by using the child source data. In the second step, we transfer the adjusted parameters to the child model and fine-tune it with a proposed distillation loss for efficient optimization. Our experimental results on five low-resource translations demonstrate that our framework yields significant improvements over various strong transfer learning baselines. Further analysis demonstrated the effectiveness of different components in our framework.
AB - Existing transfer learning methods for neural machine translation typically use a well-trained translation model (i.e., a parent model) of a high-resource language pair to directly initialize a translation model (i.e., a child model) of a low-resource language pair, and the child model is then fine-tuned with corresponding datasets. In this paper, we propose a novel two-step fine-tuning (TSFT) framework for transfer learning in low-resource neural machine translation. In the first step, we adjust the parameters of the parent model to fit the child language by using the child source data. In the second step, we transfer the adjusted parameters to the child model and fine-tune it with a proposed distillation loss for efficient optimization. Our experimental results on five low-resource translations demonstrate that our framework yields significant improvements over various strong transfer learning baselines. Further analysis demonstrated the effectiveness of different components in our framework.
UR - http://www.scopus.com/inward/record.url?scp=85197902914&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.findings-naacl.203
DO - 10.18653/v1/2024.findings-naacl.203
M3 - Conference contribution
AN - SCOPUS:85197902914
T3 - Findings of the Association for Computational Linguistics: NAACL 2024 - Findings
SP - 3214
EP - 3224
BT - Findings of the Association for Computational Linguistics
A2 - Duh, Kevin
A2 - Gomez, Helena
A2 - Bethard, Steven
PB - Association for Computational Linguistics (ACL)
T2 - 2024 Findings of the Association for Computational Linguistics: NAACL 2024
Y2 - 16 June 2024 through 21 June 2024
ER -