JHU IWSLT 2024 Dialectal and Low-resource System Description

Nathaniel Romney Robinson; Kaiser Sun; Cihan Xiao; Niyati Bafna; Weiting Tan; Haoran Xu; Henry Li Xinyuan; Ankur Kejriwal; Sanjeev Khudanpur; Kenton Murray; Paul McNamee

doi:10.18653/v1/2024.iwslt-1.19

JHU IWSLT 2024 Dialectal and Low-resource System Description

Nathaniel Romney Robinson, Kaiser Sun, Cihan Xiao, Niyati Bafna, Weiting Tan, Haoran Xu, Henry Li Xinyuan, Ankur Kejriwal, Sanjeev Khudanpur, Kenton Murray, Paul McNamee

Abstract

Johns Hopkins University (JHU) submitted systems for all eight language pairs in the 2024 Low-Resource Language Track. The main effort of this work revolves around fine-tuning large and publicly available models in three proposed systems: i) end-to-end speech translation (ST) fine-tuning of Seamless4MT v2; ii) ST fine-tuning of Whisper; iii) a cascaded system involving automatic speech recognition with fine-tuned Whisper and machine translation with NLLB. On top of systems above, we conduct a comparative analysis on different training paradigms, such as intra-distillation for NLLB as well as joint training and curriculum learning for SeamlessM4T v2. Our results show that the best-performing approach differs by language pairs, but that i) fine-tuned SeamlessM4T v2 tends to perform best for source languages on which it was pre-trained, ii) multi-task training helps Whisper fine-tuning, iii) cascaded systems with Whisper and NLLB tend to outperform Whisper alone, and iv) intra-distillation helps NLLB fine-tuning.

Anthology ID:: 2024.iwslt-1.19
Volume:: Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:: IWSLT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 140–153
Language:
URL:: https://aclanthology.org/2024.iwslt-1.19
DOI:: 10.18653/v1/2024.iwslt-1.19
Bibkey:
Cite (ACL):: Nathaniel Romney Robinson, Kaiser Sun, Cihan Xiao, Niyati Bafna, Weiting Tan, Haoran Xu, Henry Li Xinyuan, Ankur Kejriwal, Sanjeev Khudanpur, Kenton Murray, and Paul McNamee. 2024. JHU IWSLT 2024 Dialectal and Low-resource System Description. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 140–153, Bangkok, Thailand (in-person and online). Association for Computational Linguistics.
Cite (Informal):: JHU IWSLT 2024 Dialectal and Low-resource System Description (Romney Robinson et al., IWSLT 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.iwslt-1.19.pdf

PDF Cite Search