Multimodal Neural Machine Translation with Search Engine Based Image Retrieval

ZhenHao Tang, XiaoBing Zhang, Zi Long, XiangHua Fu


Abstract
Recently, numbers of works shows that the performance of neural machine translation (NMT) can be improved to a certain extent with using visual information. However, most of these conclusions are drawn from the analysis of experimental results based on a limited set of bilingual sentence-image pairs, such as Multi30K.In these kinds of datasets, the content of one bilingual parallel sentence pair must be well represented by a manually annotated image,which is different with the actual translation situation. we propose an open-vocabulary image retrieval methods to collect descriptive images for bilingual parallel corpus using image search engine, and we propose text-aware attentive visual encoder to filter incorrectly collected noise images. Experiment results on Multi30K and other two translation datasets show that our proposed method achieves significant improvements over strong baselines.
Anthology ID:
2022.wat-1.11
Volume:
Proceedings of the 9th Workshop on Asian Translation
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
WAT
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
89–98
Language:
URL:
https://aclanthology.org/2022.wat-1.11
DOI:
Bibkey:
Cite (ACL):
ZhenHao Tang, XiaoBing Zhang, Zi Long, and XiangHua Fu. 2022. Multimodal Neural Machine Translation with Search Engine Based Image Retrieval. In Proceedings of the 9th Workshop on Asian Translation, pages 89–98, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
Cite (Informal):
Multimodal Neural Machine Translation with Search Engine Based Image Retrieval (Tang et al., WAT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wat-1.11.pdf
Data
Multi30K