German-English Code-Switching in Large Language Models

Firat Cem Aksüt, Stefan Hillmann, Pia Knoeferle, Sebastian Möller


Abstract
Code-Switching (CS) is common in multilingual communication, yet it is unclear how well current Large Language Models (LLMs) reproduce naturally occurring switching patterns. This paper studies German–English CS ("Denglisch") generated by GPT-4o and LLaMA-3.3, using Reddit data from the Denglisch Corpus as a reference. Model outputs are compared to authentic posts using established CS metrics (M-Index, I-Index, CESAR), an analysis of Shared Lexical Items (SLIs) as switch triggers, and a human evaluation of perceived naturalness and fluency. Both models approximate global CS characteristics but differ in the diversity and complexity in comparison to real data. LLaMA-3.3 more closely matches corpus-level metrics, whereas GPT-4o produces more conservative switching that is rated as significantly more natural and fluent. In addition, GPT-4o reproduces SLI-triggered switching patterns similar to those found in authentic data, while this effect is weaker for LLaMA-3.3.
Anthology ID:
2026.vardial-1.7
Volume:
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–100
Language:
URL:
https://aclanthology.org/2026.vardial-1.7/
DOI:
Bibkey:
Cite (ACL):
Firat Cem Aksüt, Stefan Hillmann, Pia Knoeferle, and Sebastian Möller. 2026. German-English Code-Switching in Large Language Models. In Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 87–100, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
German-English Code-Switching in Large Language Models (Aksüt et al., VarDial 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.vardial-1.7.pdf