KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization

Zhaolin Li; Yining Liu; Danni Liu; Tuan-Nam Nguyen; Enes Yavuz Ugan; Tu Anh Dinh; Carlos Mullov; Alex Waibel; Jan Niehues

doi:10.18653/v1/2025.iwslt-1.20

KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization

Zhaolin Li, Yining Liu, Danni Liu, Tuan Nam Nguyen, Enes Yavuz Ugan, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues

Abstract

This paper presents KIT’s submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech Translation (ST) systems for three language pairs: Bemba, North Levantine Arabic, and Tunisian Arabic into English. Building upon pre-trained models, we fine-tune our systems with different strategies to utilize resources efficiently. This study further explores system enhancement with synthetic data and model regularization. Specifically, we investigate MT-augmented ST by generating translations from ASR data using MT models. For North Levantine, which lacks parallel ST training data, a system trained solely on synthetic data slightly surpasses the cascaded system trained on real data. We also explore augmentation using text-to-speech models by generating synthetic speech from MT data, demonstrating the benefits of synthetic data in improving both ASR and ST performance for Bemba. Additionally, we apply intra-distillation to enhance model performance. Our experiments show that this approach consistently improves results across ASR, MT, and ST tasks, as well as across different pre-trained models. Finally, we apply Minimum Bayes Risk decoding to combine the cascaded and end-to-end systems, achieving an improvement of approximately 1.5 BLEU points.

Anthology ID:: 2025.iwslt-1.20
Volume:: Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
Venues:: IWSLT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 212–221
Language:
URL:: https://aclanthology.org/2025.iwslt-1.20/
DOI:: 10.18653/v1/2025.iwslt-1.20
Bibkey:
Cite (ACL):: Zhaolin Li, Yining Liu, Danni Liu, Tuan Nam Nguyen, Enes Yavuz Ugan, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, and Jan Niehues. 2025. KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 212–221, Vienna, Austria (in-person and online). Association for Computational Linguistics.
Cite (Informal):: KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization (Li et al., IWSLT 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.iwslt-1.20.pdf

PDF Cite Search Fix data