Jonah Dauvet


2025

Automatic speech translation (AST) promotes seamless communication among speakers of different languages. While current state-of-the-art models excel with high-resource languages, their performance on low-resource languages (LRLs) is not well-established. We investigate this by evaluating state-of-the-art models on 10 LRLs with varying data amounts (10-30+ hours). Through six finetuning strategies and experimenting with three main AST paradigms, we observe that: (1) The latest Large Language Models (LLMs) may struggle with LRLs. (2) Comprehensive experiments suggest that for LRLs, more AST finetuning data is not always beneficial. (3) Our 2-Stage with ASR corrector finetuning recipe can substantially improve AST performance on LRLs, achieving up to a 5.8x BLEU score boost on translating related languages to English, while on par with the best monolingual finetuning in BLEU score when translating the target language to English. (4) We share effective engineering practices, including how to effectively adapt AST models to unseen languages.