Federico Costa

2026

BSC’s Submission to the Instruction Following Track of IWSLT 2026
Oriol Pareras | Joan Llado | Pol Buitrago | Marc Casals-Salvador | Federico Costa | Cristina Espana-Bonet
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)

We present the Barcelona Supercomputing Center (BSC) submission to the Instruction Following (IF) track of IWSLT 2026, which evaluates unified spoken language systems capable of solving multiple tasks through natural language instructions. Our system consists of an end-to-end (E2E) architecture that combines a speech encoder with a translation-oriented Large Language Model. The model is trained on speech and text data, covering automatic speech recognition, translation, question answering, and instruction following. We investigate a Chain-of-Thought (CoT) generation strategy that explicitly decomposes tasks by producing an intermediate transcription before the final output, which enables effective reuse of text-only supervision and improves robustness across tasks. To further support generalization, we design diverse prompt formulations and align text-only and speech inputs under a shared inference pattern. Results on IWSLT 2025 evaluation data show that our approach achieves competitive and even state-of-the-art performance across tasks.

Co-authors

Venues

IWSLT1
WS1

Fix author