BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

Liyan Kang, Luyang Huang, Ningxin Peng, Peihao Zhu, Zewei Sun, Shanbo Cheng, Mingxuan Wang, Degen Huang, Jinsong Su


Abstract
We present a large-scale video subtitle translation dataset, *BigVideo*, to facilitate the study of multi-modality machine translation. Compared with the widely used *How2* and *VaTeX* datasets, *BigVideo* is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: *Ambiguous* with the presence of ambiguous words, and *Unambiguous* in which the text context is self-contained for translation. To better model the common semantics shared across texts and videos, we introduce a contrastive learning method in the cross-modal encoder. Extensive experiments on the *BigVideo* shows that: a) Visual information consistently improves the NMT model in terms of BLEU, BLEURT and COMET on both Ambiguous and Unambiguous test sets. b) Visual information helps disambiguation, compared to the strong text baseline on terminology-targeted scores and human evaluation.
Anthology ID:
2023.findings-acl.535
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8456–8473
Language:
URL:
https://aclanthology.org/2023.findings-acl.535
DOI:
10.18653/v1/2023.findings-acl.535
Bibkey:
Cite (ACL):
Liyan Kang, Luyang Huang, Ningxin Peng, Peihao Zhu, Zewei Sun, Shanbo Cheng, Mingxuan Wang, Degen Huang, and Jinsong Su. 2023. BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8456–8473, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation (Kang et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.535.pdf