Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

Ioannis Tsiamas, Gerard I. Gállego, Jose Fonollosa, Marta R. Costa-jussá


Abstract
This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model, thus maximizing transfer learning from MT. After this pretraining, we fine-tune our system end-to-end on ST, with Cross Entropy and Knowledge Distillation. Apart from the available ST corpora, we create synthetic data with SegAugment to better adapt our models to the custom segmentations of the IWSLT test sets. Our best single model obtains 31.2 BLEU points on MuST-C tst-COMMON, 29.8 points on IWLST.tst2020 and 33.4 points on the newly released IWSLT.ACLdev2023.
Anthology ID:
2023.iwslt-1.38
Volume:
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
397–410
Language:
URL:
https://aclanthology.org/2023.iwslt-1.38
DOI:
10.18653/v1/2023.iwslt-1.38
Bibkey:
Cite (ACL):
Ioannis Tsiamas, Gerard I. Gállego, Jose Fonollosa, and Marta R. Costa-jussá. 2023. Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 397–410, Toronto, Canada (in-person and online). Association for Computational Linguistics.
Cite (Informal):
Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23 (Tsiamas et al., IWSLT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.iwslt-1.38.pdf