Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese

Xixian Liao; Carlos Escolano; Audrey Mash; Francesca De Luca Fornaciari; Javier García Gilabert; Miguel Claramunt Argote; Ella Bohman; Maite Melero

Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese

Xixian Liao, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Javier García Gilabert, Miguel Claramunt Argote, Ella Bohman, Maite Melero

Abstract

High-quality machine translation requires datasets that not only ensure linguistic accuracy but also capture regional and cultural nuances. While many existing benchmarks, such as FLORES-200, rely on English as a pivot language, this approach can overlook the specificity of direct language pairs, particularly for underrepresented combinations like Catalan-Chinese. In this study, we demonstrate that even with a relatively small dataset of approximately 1,000 sentences, we can significantly improve MT localization. To this end, we introduce a dataset specifically designed to enhance Catalan-to-Chinese translation by prioritizing regionally and culturally specific topics. Unlike pivot-based datasets, our data source ensures a more faithful representation of Catalan linguistic and cultural elements, leading to more accurate translations of local terms and expressions. Using this dataset, we demonstrate better performance over the English-pivot FLORES-200 dev set and achieve competitive results on the FLORES-200 devtest set when evaluated with neural-based metrics. We release this dataset as both a human-preference resource and a benchmark for Catalan-Chinese translation. Additionally, we include Spanish translations for each sentence, facilitating extensions to Spanish-Chinese translation tasks.

Anthology ID:: 2025.mtsummit-1.12
Volume:: Proceedings of Machine Translation Summit XX: Volume 1
Month:: June
Year:: 2025
Address:: Geneva, Switzerland
Editors:: Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
Venue:: MTSummit
SIG:
Publisher:: European Association for Machine Translation
Note:
Pages:: 150–161
Language:
URL:: https://aclanthology.org/2025.mtsummit-1.12/
DOI:
Bibkey:
Cite (ACL):: Xixian Liao, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Javier García Gilabert, Miguel Claramunt Argote, Ella Bohman, and Maite Melero. 2025. Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese. In Proceedings of Machine Translation Summit XX: Volume 1, pages 150–161, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):: Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese (Liao et al., MTSummit 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mtsummit-1.12.pdf

PDF Cite Search Fix data