IIIT-BGP IWSLT 2026 Systems for Low-resource ST

Kaustuk Pratap Singh, Dipanshu ., Vedant Singh, Kumar Rishu


Abstract
We present low-resource Bhojpuri-Hindi speech translation systems for the IWSLT 2026 shared task, covering both end-to-end and cascaded settings. Our end-to-end model connects a Bhojpuri-finetuned Wav2Vec2 encoder to a pretrained NLLB-200 decoder via a lightweight interconnection adapter that combines learnable layer aggregation, CNN-based temporal compression, and Transformer refinement, with optional LoRA-based decoder adaptation. For our cascaded system, we finetune Whisper for Bhojpuri ASR and NLLB-200 for Hindi MT, and further apply QE Fusion with COMET-Kiwi to improve translation selection from beam candidates.
Anthology ID:
2026.iwslt-1.31
Volume:
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:
July
Year:
2026
Address:
San Diego, USA (in-person and online)
Editors:
Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:
IWSLT | WS
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
272–283
Language:
URL:
https://aclanthology.org/2026.iwslt-1.31/
DOI:
Bibkey:
Cite (ACL):
Kaustuk Pratap Singh, Dipanshu ., Vedant Singh, and Kumar Rishu. 2026. IIIT-BGP IWSLT 2026 Systems for Low-resource ST. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 272–283, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):
IIIT-BGP IWSLT 2026 Systems for Low-resource ST (Singh et al., IWSLT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwslt-1.31.pdf