Kaustuk Pratap Singh
2026
IIIT-BGP IWSLT 2026 Systems for Low-resource ST
Kaustuk Pratap Singh | Dipanshu . | Vedant Singh | Kumar Rishu
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Kaustuk Pratap Singh | Dipanshu . | Vedant Singh | Kumar Rishu
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
We present low-resource Bhojpuri-Hindi speech translation systems for the IWSLT 2026 shared task, covering both end-to-end and cascaded settings. Our end-to-end model connects a Bhojpuri-finetuned Wav2Vec2 encoder to a pretrained NLLB-200 decoder via a lightweight interconnection adapter that combines learnable layer aggregation, CNN-based temporal compression, and Transformer refinement, with optional LoRA-based decoder adaptation. For our cascaded system, we finetune Whisper for Bhojpuri ASR and NLLB-200 for Hindi MT, and further apply QE Fusion with COMET-Kiwi to improve translation selection from beam candidates.