Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Yaoming Zhu, Zewei Sun, Shanbo Cheng, Luyang Huang, Liwei Wu, Mingxuan Wang


Abstract
Multimodal machine translation (MMT) aims to improve translation quality by incorporating information from other modalities, such as vision. Previous MMT systems focus on better access and use of visual information and tend to validate their methods on image-related datasets. However, these studies face two challenges. First, they can only utilize a limited amount of data that is composed of bilingual texts and images (referred to as “triple data”), which is scarce. Second, current benchmarks for MMT are restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and a new dataset for MMT. We propose a novel framework for MMT that addresses these challenges by utilizing large-scale non-triple data, such as monolingual image-text and parallel text-only data. Additionally, we construct a new e-commercial multimodal translation dataset, named EMMT, of which the test set is specifically designed to include ambiguous words that require visual context for accurate translation. Experiments show that our method is well-suited for real-world scenarios and can significantly improve translation performance with more non-triple data. In addition, our model also rivals or surpasses various SOTA models in conventional multimodal translation benchmarks.
Anthology ID:
2023.findings-acl.168
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2679–2697
Language:
URL:
https://aclanthology.org/2023.findings-acl.168
DOI:
10.18653/v1/2023.findings-acl.168
Bibkey:
Cite (ACL):
Yaoming Zhu, Zewei Sun, Shanbo Cheng, Luyang Huang, Liwei Wu, and Mingxuan Wang. 2023. Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2679–2697, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation (Zhu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.168.pdf
Video:
 https://aclanthology.org/2023.findings-acl.168.mp4