Recent Highlights in Multilingual and Multimodal Speech Translation

Danni Liu, Jan Niehues


Abstract
Speech translation has witnessed significant progress driven by advancements in modeling techniques and the growing availability of training data. In this paper, we highlight recent advances in two ongoing research directions in ST: scaling the models to 1) many translation directions (multilingual ST) and 2) beyond the text output modality (multimodal ST). We structure this review by examining the sequential stages of a model’s development lifecycle: determining training resources, selecting model architecture, training procedures, evaluation metrics, and deployment considerations. We aim to highlight recent developments in each stage, with a particular focus on model architectures (dedicated speech translation models and LLM-based general-purpose model) and training procedures (task-specific vs. task-invariant approaches). Based on the reviewed advancements, we identify and discuss ongoing challenges within the field of speech translation.
Anthology ID:
2024.iwslt-1.29
Volume:
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
235–253
Language:
URL:
https://aclanthology.org/2024.iwslt-1.29
DOI:
10.18653/v1/2024.iwslt-1.29
Bibkey:
Cite (ACL):
Danni Liu and Jan Niehues. 2024. Recent Highlights in Multilingual and Multimodal Speech Translation. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 235–253, Bangkok, Thailand (in-person and online). Association for Computational Linguistics.
Cite (Informal):
Recent Highlights in Multilingual and Multimodal Speech Translation (Liu & Niehues, IWSLT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.iwslt-1.29.pdf