Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models

Garry Kuwanto; Chaitanya Agarwal; Genta Indra Winata; Derry Tanti Wijaya

Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models

Garry Kuwanto, Chaitanya Agarwal, Genta Indra Winata, Derry Tanti Wijaya

Abstract

Code-switching is a common practice for millions of multilingual speakers but remains challenging for Large Language Models (LLMs). This paper investigates LLM capabilities in generating code-switched text, conducting extensive experiments across five diverse language pairs: English paired with Hindi, Tamil, Malayalam, and Indonesian, as well as Indonesian-Javanese. Our analysis, grounded in comprehensive human evaluations by native speakers, uncovers a directional asymmetry: LLMs consistently produce higher-quality (more accurate and fluent) code-switched text when prompted with a lower-resource language (e.g., Hindi, Tamil, Javanese) as the source, compared to when a higher-resource language (English, Indonesian) serves as the source. This asymmetry mirrors sociolinguistic patterns, particularly the Matrix Language Frame model, suggesting LLMs implicitly learn common code-switching structures from their training data where regional languages often form the grammatical base. Furthermore, we find that explicit linguistic guidance, applied through Equivalence Constraint Theory (ECT) to identify switching points, primarily benefits generation quality only in the less common, higher-resource-source direction where LLMs intrinsically struggle. These findings highlight a crucial interplay between the implicit linguistic knowledge captured by LLMs and the targeted utility of explicit linguistic constraints. We also introduce CSPref, a pairwise preference dataset derived from our human evaluations, to facilitate future research in code-switching generation and evaluation.

Anthology ID:: 2026.cdl-1.1
Volume:: Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Month:: July
Year:: 2026
Address:: Grand Hyatt Manchester San Diego, 1 Market Pl, San Diego, CA 92101
Editors:: Martin Ziqiao Ma, Emmy Liu, Jing Liu, Tyler A. Chang, Abdellah Fourtassi, Alex Warstadt, Michael Hahn, Weiwei Sun, Freda Shi
Venues:: CDL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–14
Language:
URL:: https://aclanthology.org/2026.cdl-1.1/
DOI:
Bibkey:
Cite (ACL):: Garry Kuwanto, Chaitanya Agarwal, Genta Indra Winata, and Derry Tanti Wijaya. 2026. Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models. In Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL), pages 1–14, Grand Hyatt Manchester San Diego, 1 Market Pl, San Diego, CA 92101. Association for Computational Linguistics.
Cite (Informal):: Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models (Kuwanto et al., CDL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.cdl-1.1.pdf

PDF Cite Search Fix data