Chengheng Li Chen
2026
Query-Following vs Context-Anchoring: How LLMs Handle Cross-Turn Language Switching
Kyuhee Kim | Chengheng Li Chen | Anna Sotnikova
Proceedings of the First Workshop on Multilingual Multicultural Evaluation
Kyuhee Kim | Chengheng Li Chen | Anna Sotnikova
Proceedings of the First Workshop on Multilingual Multicultural Evaluation
When multilingual users switch languages mid-conversation, how should LLMs respond? We extend MultiChallenge to evaluate cross-turn language switching, translating 182 multi-turn conversations into German, Chinese, Spanish, and Arabic. Across five frontier models, we observe asymmetric behavior: switching into a foreign language (EN→X) yields high query-language fidelity (89–99%), but switching back to English (X→EN) reveals divergent policies. GPT-5 follows the query language (>95%), while Claude Opus 4.5 and Command R+ maintain the established conversation language (<8%). Task accuracy remains stable across conditions regardless of language selection differences. A simple explicit system prompt shows limited effectiveness in modifying these defaults.