HITSZ’s End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track

Xuchen Wei; Yangxin Wu; Yaoyin Zhang; Henglyu Liu; Kehai Chen (陈科海); Xuefeng Bai (白雪峰); Min Zhang

doi:10.18653/v1/2025.iwslt-1.43

HITSZ’s End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track

Xuchen Wei, Yangxin Wu, Yaoyin Zhang, Henglyu Liu, Kehai Chen, Xuefeng Bai, Min Zhang

Abstract

This paper presents HITSZ’s submission for the IWSLT 2025 Indic track, focusing on speech-to-text translation (ST) for English-to-Indic and Indic-to-English language pairs. To enhance translation quality in this low-resource scenario, we propose an end-to-end system integrating the pre-trained Whisper automated speech recognition (ASR) model with Krutrim, an Indic-specialized large language model (LLM). Experimental results demonstrate that our end-to-end system achieved average BLEU scores of 28.88 for English-to-Indic directions and 27.86 for Indic-to-English directions. Furthermore, we investigated the Chain-of-Thought (CoT) method. While this method showed potential for significant translation quality improvements on successfully parsed outputs (e.g. a 13.84 BLEU increase for Tamil-to-English), we observed challenges in ensuring the model consistently adheres to the required CoT output format.

Anthology ID:: 2025.iwslt-1.43
Volume:: Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
Venues:: IWSLT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 405–411
Language:
URL:: https://aclanthology.org/2025.iwslt-1.43/
DOI:: 10.18653/v1/2025.iwslt-1.43
Bibkey:
Cite (ACL):: Xuchen Wei, Yangxin Wu, Yaoyin Zhang, Henglyu Liu, Kehai Chen, Xuefeng Bai, and Min Zhang. 2025. HITSZ’s End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 405–411, Vienna, Austria (in-person and online). Association for Computational Linguistics.
Cite (Informal):: HITSZ’s End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track (Wei et al., IWSLT 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.iwslt-1.43.pdf

PDF Cite Search Fix data