The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task

Ziqiang Zhang, Junyi Ao


Abstract
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task, which translates from English audio to German, Chinese, and Japanese. The YiTrans system is built on large-scale pre-trained encoder-decoder models. More specifically, we first design a multi-stage pre-training strategy to build a multi-modality model with a large amount of labeled and unlabeled data. We then fine-tune the corresponding components of the model for the downstream speech translation tasks. Moreover, we make various efforts to improve performance, such as data filtering, data augmentation, speech segmentation, model ensemble, and so on. Experimental results show that our YiTrans system obtains a significant improvement than the strong baseline on three translation directions, and it achieves +5.2 BLEU improvements over last year’s optimal end-to-end system on tst2021 English-German.
Anthology ID:
2022.iwslt-1.11
Volume:
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
Month:
May
Year:
2022
Address:
Dublin, Ireland (in-person and online)
Venues:
ACL | IWSLT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
158–168
Language:
URL:
https://aclanthology.org/2022.iwslt-1.11
DOI:
10.18653/v1/2022.iwslt-1.11
Bibkey:
Cite (ACL):
Ziqiang Zhang and Junyi Ao. 2022. The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 158–168, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
Cite (Informal):
The YiTrans Speech Translation System for IWSLT 2022 Offline Shared Task (Zhang & Ao, IWSLT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.iwslt-1.11.pdf
Data
LibriSpeechMuST-COpenSubtitlesVoxPopuli