The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

Niyati Bafna; Tianjian Li; Kenton Murray; David R. Mortensen; David Yarowsky; Hale Sirin; Daniel Khashabi

The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

Niyati Bafna, Tianjian Li, Kenton Murray, David R. Mortensen, David Yarowsky, Hale Sirin, Daniel Khashabi

Abstract

Multilingual generation with large language models (LLMs) is often of poor quality for mid- to low-resource languages, but the causes for this are not well-understood. We first demonstrate the existence of an implicit task-solving→translation pipeline for generation, whereby the model first solves the required task in a largely target-language-agnostic manner, and subsequently translates answer concepts into the intended target language. We hypothesize that the failure of the translation stage, despite task-solving success, is an important culprit for the observed low quality of final outputs, and formalize this as the translation barrier hypothesis. We quantify the extent to which either stage in the pipeline is responsible for final failure for a word translation task across 108 language pairs, and find that the translation barrier explains a dominant portion of error for a majority of language pairs, and is especially severe for low-resource target languages. Our results highlight an important bottleneck for end-to-end multilingual generation, relevant for future work seeking to improve multilinguality in LLMs.

Anthology ID:: 2025.ijcnlp-long.83
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venues:: IJCNLP | AACL
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 1541–1568
Language:
URL:: https://aclanthology.org/2025.ijcnlp-long.83/
DOI:
Bibkey:
Cite (ACL):: Niyati Bafna, Tianjian Li, Kenton Murray, David R. Mortensen, David Yarowsky, Hale Sirin, and Daniel Khashabi. 2025. The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 1541–1568, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure (Bafna et al., IJCNLP-AACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ijcnlp-long.83.pdf

PDF Cite Search Fix data