The BIGAI Offline Speech Translation Systems for IWSLT 2023 Evaluation

Zhihang Xie


Abstract
This paper describes the BIGAI’s submission to IWSLT 2023 Offline Speech Translation task on three language tracks from English to Chinese, German and Japanese. The end-to-end systems are built upon a Wav2Vec2 model for speech recognition and mBART50 models for machine translation. An adapter module is applied to bridge the speech module and the translation module. The CTC loss between speech features and source token sequence is incorporated during training. Experiments show that the systems can generate reasonable translations on three languages. The proposed models achieve BLEU scores of 22.3 for en→de, 10.7 for en→ja and 33.0 for en→zh on tst2023 TED datasets. However, the performance is decreased by a significant margin on complex scenarios like persentations and interview.
Anthology ID:
2023.iwslt-1.7
Volume:
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
123–129
Language:
URL:
https://aclanthology.org/2023.iwslt-1.7
DOI:
10.18653/v1/2023.iwslt-1.7
Bibkey:
Cite (ACL):
Zhihang Xie. 2023. The BIGAI Offline Speech Translation Systems for IWSLT 2023 Evaluation. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 123–129, Toronto, Canada (in-person and online). Association for Computational Linguistics.
Cite (Informal):
The BIGAI Offline Speech Translation Systems for IWSLT 2023 Evaluation (Xie, IWSLT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.iwslt-1.7.pdf