Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions

Yuting Zhao, Mamoru Komachi, Tomoyuki Kajiwara, Chenhui Chu


Abstract
Existing studies on multimodal neural machine translation (MNMT) have mainly focused on the effect of combining visual and textual modalities to improve translations. However, it has been suggested that the visual modality is only marginally beneficial. Conventional visual attention mechanisms have been used to select the visual features from equally-sized grids generated by convolutional neural networks (CNNs), and may have had modest effects on aligning the visual concepts associated with textual objects, because the grid visual features do not capture semantic information. In contrast, we propose the application of semantic image regions for MNMT by integrating visual and textual features using two individual attention mechanisms (double attention). We conducted experiments on the Multi30k dataset and achieved an improvement of 0.5 and 0.9 BLEU points for English-German and English-French translation tasks, compared with the MNMT with grid visual features. We also demonstrated concrete improvements on translation performance benefited from semantic image regions.
Anthology ID:
2020.eamt-1.12
Volume:
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Month:
November
Year:
2020
Address:
Lisboa, Portugal
Editors:
André Martins, Helena Moniz, Sara Fumega, Bruno Martins, Fernando Batista, Luisa Coheur, Carla Parra, Isabel Trancoso, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof, Mary Nurminen, Lena Marg, Mikel L. Forcada
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
105–114
Language:
URL:
https://aclanthology.org/2020.eamt-1.12
DOI:
Bibkey:
Cite (ACL):
Yuting Zhao, Mamoru Komachi, Tomoyuki Kajiwara, and Chenhui Chu. 2020. Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 105–114, Lisboa, Portugal. European Association for Machine Translation.
Cite (Informal):
Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions (Zhao et al., EAMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.eamt-1.12.pdf
Data
Visual Genome