An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

Simran Khanuja, Sathyanarayanan Ramamoorthy, Yueqi Song, Graham Neubig


Abstract
Given the rise of multimedia content, human translators increasingly focus on culturally adapting not only words but also other modalities such as images to convey the same meaning. While several applications stand to benefit from this, machine translation systems remain confined to dealing with language in speech and text. In this work, we introduce a new task of translating images to make them culturally relevant. First, we build three pipelines comprising state-of-the-art generative models to do the task. Next, we build a two-part evaluation dataset – (i) concept: comprising 600 images that are cross-culturally coherent, focusing on a single concept per image; and (ii) application: comprising 100 images curated from real-world applications. We conduct a multi-faceted human evaluation of translated images to assess for cultural relevance and meaning preservation. We find that as of today, image-editing models fail at this task, but can be improved by leveraging LLMs and retrievers in the loop. Best pipelines can only translate 5% of images for some countries in the easier concept dataset and no translation is successful for some countries in the application dataset, highlighting the challenging nature of the task. Our project webpage is here: https://machine-transcreation.github.io/image-transcreation and our code, data and model outputs can be found here: https://github.com/simran-khanuja/image-transcreation.
Anthology ID:
2024.emnlp-main.573
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10258–10279
Language:
URL:
https://aclanthology.org/2024.emnlp-main.573/
DOI:
10.18653/v1/2024.emnlp-main.573
Bibkey:
Cite (ACL):
Simran Khanuja, Sathyanarayanan Ramamoorthy, Yueqi Song, and Graham Neubig. 2024. An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10258–10279, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance (Khanuja et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.573.pdf