Chengheng Li Chen

2026

Query-Following vs Context-Anchoring: How LLMs Handle Cross-Turn Language Switching
Kyuhee Kim | Chengheng Li Chen | Anna Sotnikova
Proceedings of the First Workshop on Multilingual Multicultural Evaluation

When multilingual users switch languages mid-conversation, how should LLMs respond? We extend MultiChallenge to evaluate cross-turn language switching, translating 182 multi-turn conversations into German, Chinese, Spanish, and Arabic. Across five frontier models, we observe asymmetric behavior: switching into a foreign language (EN→X) yields high query-language fidelity (89–99%), but switching back to English (X→EN) reveals divergent policies. GPT-5 follows the query language (>95%), while Claude Opus 4.5 and Command R+ maintain the established conversation language (<8%). Task accuracy remains stable across conditions regardless of language selection differences. A simple explicit system prompt shows limited effectiveness in modifying these defaults.

Co-authors

Kyuhee Kim 1
Anna Sotnikova 1

Venues

MME1
WS1

Fix author