A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

Aziz Sharipov Ortega, Dominik Macháček


Abstract
We implement a direct speech translation model Canary for simultaneous translation with AlignAtt simultaneous policy. We focus on Nemo toolkit with the recent state-of-the-art foundation model Canary-1B-v2 that has only one billion of parameters, which is suitable for small pocket devices. This is a CUNI submission to IWSLT 2026 Simultaneous Speech Translation Shared task on Czech to English and English to German and Italian.
Anthology ID:
2026.iwslt-1.22
Volume:
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:
July
Year:
2026
Address:
San Diego, USA (in-person and online)
Editors:
Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:
IWSLT | WS
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
197–203
Language:
URL:
https://aclanthology.org/2026.iwslt-1.22/
DOI:
Bibkey:
Cite (ACL):
Aziz Sharipov Ortega and Dominik Macháček. 2026. A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 197–203, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):
A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026 (Sharipov Ortega & Macháček, IWSLT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwslt-1.22.pdf