Chad Howe

2026

Team QUESPA System Submission for the IWSLT 2026 Dialectal and Low-resource Speech Translation Task
John E. Ortega | Rodolfo Joel Zevallos | Fabrício Carraro | Stephanny Gabriela Sánchez Bautista | Chad Howe
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)

This paper describes the QUESPA team’s speech translation (ST) submissions for the Quechua to Spanish (QUE-SPA) track of the IWSLT 2026 Evaluation Campaign on dialectal and low-resource speech translation. The campaign supports a single submission category, namely unconstrained. This marks our fourth consecutive participation in the IWSLT shared task, building upon prior systems with substantial improvements. Our 2026 submission comprises three unconstrained-only systems. The best-performing system (contrastive 2) extends our strongest model from the previous year by leveraging a high-performing pre-trained language model (PLM) for end-to-end speech translation without cascading, augmented with additional Quechua-Collao text - now made available on the IWSLT GitHub. Fine-tuning Microsoft’s SpeechT5 model in an ST setting, combined with targeted data augmentation, results in a BLEU score of 27.2 on the official evaluation set. Additionally, we evaluate prompt-based machine translation using Gemini, DeepSeek, GPT-5, Claude, and Qwen for the first time. Aside from that, we introduce SIDON, an audio enhancement framework designed to improve audio quality. This paper provides a comparative analysis across our current and three previous IWSLT submissions, with a detailed examination of the impact of synthetic data, unconstrained external resources, and audio enhancement techniques on fine-tuning performance. Our results highlight the complementary role of PLM-based ST, LLM prompting, and ASR enhancement in advancing low-resource speech translation.

Co-authors

Venues

IWSLT1
WS1

Fix author