Yifei Xin


2024

pdf bib
Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup
Xuxin Cheng | Ziyu Yao | Yifei Xin | Hao An | Hongxiang Li | Yaowei Li | Yuexian Zou
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal machine translation (MMT) aims to improve the performance of machine translation with the help of visual information, which has received widespread attention recently. It has been verified that visual information brings greater performance gains when the textual information is limited. However, most previous works ignore to take advantage of the complete textual inputs and the limited textual inputs at the same time, which limits the overall performance. To solve this issue, we propose a mixup method termed Soul-Mix to enhance MMT by using visual information more effectively. We mix the predicted translations of complete textual input and the limited textual inputs. Experimental results on the Multi30K dataset of three translation directions show that our Soul-Mix significantly outperforms existing approaches and achieves new state-of-the-art performance with fewer parameters than some previous models. Besides, the strength of Soul-Mix is more obvious on more challenging MSCOCO dataset which includes more out-of-domain instances with lots of ambiguous verbs.