Jianquan Liu


2024

pdf bib
Visual Pivoting Unsupervised Multimodal Machine Translation in Low-Resource Distant Language Pairs
Turghun Tayir | Lin Li | Xiaohui Tao | Mieradilijiang Maimaiti | Ming Li | Jianquan Liu
Findings of the Association for Computational Linguistics: EMNLP 2024

Unsupervised multimodal machine translation (UMMT) aims to leverage vision information as a pivot between two languages to achieve better performance on low-resource language pairs. However, there is presently a challenge: how to handle alignment between distant language pairs (DLPs) in UMMT. To this end, this paper proposes a visual pivoting UMMT method for DLPs. Specifically, we first construct a dataset containing two DLPs, including English-Uyghur and Chinese-Uyghur. We then apply the visual pivoting method for both to pre-training and fine-tuning, and we observe that the images on the encoder and decoder of UMMT have noticeable effects on DLPs. Finally, we introduce informative multi-granularity image features to facilitate further alignment of the latent space between the two languages. Experimental results show that the proposed method significantly outperforms several baselines on DLPs and close language pairs (CLPs).