Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup

Xuxin Cheng, Ziyu Yao, Yifei Xin, Hao An, Hongxiang Li, Yaowei Li, Yuexian Zou


Abstract
Multimodal machine translation (MMT) aims to improve the performance of machine translation with the help of visual information, which has received widespread attention recently. It has been verified that visual information brings greater performance gains when the textual information is limited. However, most previous works ignore to take advantage of the complete textual inputs and the limited textual inputs at the same time, which limits the overall performance. To solve this issue, we propose a mixup method termed Soul-Mix to enhance MMT by using visual information more effectively. We mix the predicted translations of complete textual input and the limited textual inputs. Experimental results on the Multi30K dataset of three translation directions show that our Soul-Mix significantly outperforms existing approaches and achieves new state-of-the-art performance with fewer parameters than some previous models. Besides, the strength of Soul-Mix is more obvious on more challenging MSCOCO dataset which includes more out-of-domain instances with lots of ambiguous verbs.
Anthology ID:
2024.luhme-long.608
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11283–11294
Language:
URL:
https://aclanthology.org/2024.luhme-long.608/
DOI:
10.18653/v1/2024.acl-long.608
Bibkey:
Cite (ACL):
Xuxin Cheng, Ziyu Yao, Yifei Xin, Hao An, Hongxiang Li, Yaowei Li, and Yuexian Zou. 2024. Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11283–11294, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup (Cheng et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.608.pdf