Pinch-AST: Robust Cascaded Speech Translation System for the IWSLT 2026 Simultaneous Speech Translation Task

Carlos Bentes, Christian Safka


Abstract
We describe Pinch-AST, our submission to the IWSLT 2026 Simultaneous Speech-to-Text Translation shared task, covering all four official directions (En → De, En → It, En → Zh, Cs → En) under both low- and high- latency regimes. Pinch-AST is a cascaded system pairing off-the-shelf speech models with a translation backbone adapted per language pair via LoRA on ASR-noise-augmented parallel data. The streaming policy is a character-level longest-common-prefix re-translation strategy, and the full pipeline runs on a single H100 80 GB GPU within the real-time budget. Evaluated on the IWSLT 2026 development set, Pinch-AST achieves competitive quality–latency trade-offs across all four language pairs in both latency regimes.
Anthology ID:
2026.iwslt-1.30
Volume:
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:
July
Year:
2026
Address:
San Diego, USA (in-person and online)
Editors:
Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:
IWSLT | WS
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
268–271
Language:
URL:
https://aclanthology.org/2026.iwslt-1.30/
DOI:
Bibkey:
Cite (ACL):
Carlos Bentes and Christian Safka. 2026. Pinch-AST: Robust Cascaded Speech Translation System for the IWSLT 2026 Simultaneous Speech Translation Task. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 268–271, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):
Pinch-AST: Robust Cascaded Speech Translation System for the IWSLT 2026 Simultaneous Speech Translation Task (Bentes & Safka, IWSLT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwslt-1.30.pdf