Improving LLM Abilities in Idiomatic Translation

Sundesh Donthi, Maximilian Spencer, Om B. Patel, Joon Young Doh, Eid Rodan, Kevin Zhu, Sean O’Brien


Abstract
Translating idiomatic expressions remains a challenge for large language models (LLMs), as they often produce literal, semantically incorrect translations—for instance, directly converting “break a leg” into a nonsensical phrase in the target language. While external resources like IdiomKB can supply the figurative meaning and thus yield semantically accurate translations, this approach does not preserve the cultural and stylistic nuances that make idioms so distinctive. Our study focuses on idiomatic translations across multiple languages, including Chinese (ZH), Urdu (UR), and Hindi (HI), with clearly defined abbreviations for each. We propose two methods for improving idiomatic translation fidelity: a Semantic Idiom Alignment (SIA) approach that uses pre-trained sentence embeddings to identify target-language idioms, and a Language-Model-based Idiom Alignment (LIA) approach that prompts an LLM to suggest appropriate idiom counterparts. Human evaluations across multiple language pairs show that SIA better preserves idiomatic style. To support this work, we introduce idiom datasets in low-resource languages (Urdu and Hindi). Our results indicate that aligning idioms at the semantic level can improve cross-lingual style preservation and cultural authenticity.
Anthology ID:
2025.loreslm-1.13
Volume:
Proceedings of the First Workshop on Language Models for Low-Resource Languages
Month:
January
Year:
2025
Address:
Abu Dhabi, United Arab Emirates
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venues:
LoResLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
175–181
Language:
URL:
https://aclanthology.org/2025.loreslm-1.13/
DOI:
Bibkey:
Cite (ACL):
Sundesh Donthi, Maximilian Spencer, Om B. Patel, Joon Young Doh, Eid Rodan, Kevin Zhu, and Sean O’Brien. 2025. Improving LLM Abilities in Idiomatic Translation. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 175–181, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Improving LLM Abilities in Idiomatic Translation (Donthi et al., LoResLM 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.loreslm-1.13.pdf