When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy

Jirui Qi, Shan Chen, Zidi Xiong, Raquel Fernández, Danielle Bitterman, Arianna Bisazza


Abstract
Recent Large Reasoning Models (LRMs) with thinking traces have shown strong performance on English reasoning tasks. However, the extent to which LRMs can think in other languages is less studied. This is as important as answer accuracy for real-world applications since users may find the thinking trace useful for oversight only if expressed in their languages. In this work, we comprehensively evaluate two leading families of LRMs on our established benchmark XReasoning. Surprisingly, even the most advanced models often revert to English or produce fragmented reasoning in other languages, revealing a substantial gap in the capability of thinking in non-English languages. Promoting models to reason in the user’s language via prompt hacking enhances readability and oversight. This could gain user trust, but reduces answer accuracy, exposing an important trade-off. We further demonstrate that targeted post-training, even with just 100 instances, can mitigate this language mismatch, although accuracy is still degraded. Our results reveal the limited multilingual reasoning capabilities of current LRMs and suggest directions for future research. All code and datasets are released at https://github.com/Betswish/mCoT-XReasoning.
Anthology ID:
2025.findings-emnlp.1103
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20279–20296
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1103/
DOI:
Bibkey:
Cite (ACL):
Jirui Qi, Shan Chen, Zidi Xiong, Raquel Fernández, Danielle Bitterman, and Arianna Bisazza. 2025. When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20279–20296, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy (Qi et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1103.pdf
Checklist:
 2025.findings-emnlp.1103.checklist.pdf