Sundesh Donthi
2025
Improving LLM Abilities in Idiomatic Translation
Sundesh Donthi
|
Maximilian Spencer
|
Om B. Patel
|
Joon Young Doh
|
Eid Rodan
|
Kevin Zhu
|
Sean O’Brien
Proceedings of the First Workshop on Language Models for Low-Resource Languages
Translating idiomatic expressions remains a challenge for large language models (LLMs), as they often produce literal, semantically incorrect translations—for instance, directly converting “break a leg” into a nonsensical phrase in the target language. While external resources like IdiomKB can supply the figurative meaning and thus yield semantically accurate translations, this approach does not preserve the cultural and stylistic nuances that make idioms so distinctive. Our study focuses on idiomatic translations across multiple languages, including Chinese (ZH), Urdu (UR), and Hindi (HI), with clearly defined abbreviations for each. We propose two methods for improving idiomatic translation fidelity: a Semantic Idiom Alignment (SIA) approach that uses pre-trained sentence embeddings to identify target-language idioms, and a Language-Model-based Idiom Alignment (LIA) approach that prompts an LLM to suggest appropriate idiom counterparts. Human evaluations across multiple language pairs show that SIA better preserves idiomatic style. To support this work, we introduce idiom datasets in low-resource languages (Urdu and Hindi). Our results indicate that aligning idioms at the semantic level can improve cross-lingual style preservation and cultural authenticity.
Search
Fix data
Co-authors
- Joon Young Doh 1
- Sean O’Brien 1
- Om B. Patel 1
- Eid Rodan 1
- Maximilian Spencer 1
- show all...