Davit Karamyan
2026
NeMo@IWSLT 2026: Cascaded System for Simultaneous Speech Translation
Lilit Grigoryan | Vladimir Bataev | Andrei Andrusenko | Oleksii Hrinchuk | Davit Karamyan | Enas Albasiri | Vitaly Lavrukhin | Nikolay Karpov | Boris Ginsburg
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Lilit Grigoryan | Vladimir Bataev | Andrei Andrusenko | Oleksii Hrinchuk | Davit Karamyan | Enas Albasiri | Vitaly Lavrukhin | Nikolay Karpov | Boris Ginsburg
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
This paper describes the NVIDIA NeMo team’s submission to the IWSLT 2026 Simultaneous Speech Translation (SimulST) tracks. We use a cascaded architecture combining a dual-mode Unified ASR Transducer model with a multilingual Large Language Model (LLM). The ASR is trained to deliver stable transcriptions across wide range of latencies, providing a reliable foundation for high-quality LLM translation. Our submission participates in the English–German, English–Italian, and English–Chinese tasks, in both standard and contextualized settings, as well as the Czech–English standard track, covering both low- and high-latency scenarios. We further analyze how ASR and LLM design choices affect the system’s overall latency and translation quality.