Adding multimodal capabilities to a text-only translation model

Vipin Vijayan, Braeden Bowen, Scott Grigsby, Timothy Anderson, Jeremy Gwinnup


Abstract
While most current work in multimodal machine translation (MMT) uses the Multi30k dataset for training and evaluation, we find that the resulting models overfit to the Multi30k dataset to an extreme degree. Consequently, these models perform very badly when evaluated against typical text-only testing sets such as the newstest datasets. In order to perform well on both Multi30k and typical text-only datasets, we use a performant text-only machine translation (MT) model as the starting point of our MMT model. We add vision-text adapter layers connected via gating mechanisms to the MT model, and incrementally transform the MT model into an MMT model by 1) pre-training using vision-based masking of the source text and 2) fine-tuning on Multi30k. We achieve a state-of-the-art performance on the Multi30k 2016 en-de test set of 46.5 BLEU4 score and 0.61 CoMMuTE score via this approach while retaining the performance of the original text-only MT model against the newstest dataset.
Anthology ID:
2024.amta-research.3
Volume:
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Month:
September
Year:
2024
Address:
Chicago, USA
Editors:
Rebecca Knowles, Akiko Eriguchi, Shivali Goel
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
14–28
Language:
URL:
https://aclanthology.org/2024.amta-research.3
DOI:
Bibkey:
Cite (ACL):
Vipin Vijayan, Braeden Bowen, Scott Grigsby, Timothy Anderson, and Jeremy Gwinnup. 2024. Adding multimodal capabilities to a text-only translation model. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 14–28, Chicago, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Adding multimodal capabilities to a text-only translation model (Vijayan et al., AMTA 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.amta-research.3.pdf