FBK’s Long-form SpeechLLMs for IWSLT 2026 Instruction Following

Zhihang Xie, Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli


Abstract
This paper describes our submission to the IWSLT 2026 Instruction Following shared task. SpeechLLM systems are developed for both short-form and long-form speech instruction following under constrained settings. For the short track, strong performance is achieved on MCIF, with a SIFS score of 2.0708. For the long track, three speech segmentation strategies are investigated, and the HIFS score is introduced to account for unstable long-form generation. Experimental results show that fixed 30-second segmentation provides the most robust long-form performance, achieving the highest HIFS score of 2.0663. Further analysis shows that hallucination mainly manifests as repetitive insertions, substantially affecting ASR and SSUM, while short-form capabilities are largely retained after long-form extension.
Anthology ID:
2026.iwslt-1.29
Volume:
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:
July
Year:
2026
Address:
San Diego, USA (in-person and online)
Editors:
Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:
IWSLT | WS
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
255–267
Language:
URL:
https://aclanthology.org/2026.iwslt-1.29/
DOI:
Bibkey:
Cite (ACL):
Zhihang Xie, Marco Gaido, Sara Papi, Matteo Negri, and Luisa Bentivogli. 2026. FBK’s Long-form SpeechLLMs for IWSLT 2026 Instruction Following. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 255–267, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):
FBK’s Long-form SpeechLLMs for IWSLT 2026 Instruction Following (Xie et al., IWSLT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwslt-1.29.pdf