Query-Following vs Context-Anchoring: How LLMs Handle Cross-Turn Language Switching

Kyuhee Kim, Chengheng Li Chen, Anna Sotnikova


Abstract
When multilingual users switch languages mid-conversation, how should LLMs respond? We extend MultiChallenge to evaluate cross-turn language switching, translating 182 multi-turn conversations into German, Chinese, Spanish, and Arabic. Across five frontier models, we observe asymmetric behavior: switching into a foreign language (EN→X) yields high query-language fidelity (89–99%), but switching back to English (X→EN) reveals divergent policies. GPT-5 follows the query language (>95%), while Claude Opus 4.5 and Command R+ maintain the established conversation language (<8%). Task accuracy remains stable across conditions regardless of language selection differences. A simple explicit system prompt shows limited effectiveness in modifying these defaults.
Anthology ID:
2026.mme-main.13
Volume:
Proceedings of the First Workshop on Multilingual Multicultural Evaluation
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Pinzhen Chen, Vilém Zouhar, Hanxu Hu, Simran Khanuja, Wenhao Zhu, Barry Haddow, Alexandra Birch, Alham Fikri Aji, Rico Sennrich, Sara Hooker
Venues:
MME | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
196–203
Language:
URL:
https://aclanthology.org/2026.mme-main.13/
DOI:
Bibkey:
Cite (ACL):
Kyuhee Kim, Chengheng Li Chen, and Anna Sotnikova. 2026. Query-Following vs Context-Anchoring: How LLMs Handle Cross-Turn Language Switching. In Proceedings of the First Workshop on Multilingual Multicultural Evaluation, pages 196–203, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Query-Following vs Context-Anchoring: How LLMs Handle Cross-Turn Language Switching (Kim et al., MME 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.mme-main.13.pdf