An improved Code-Switching Detection System for some Indic Languages

Karan Bhanushali, Fritz Hohl


Abstract
Code-switching is a common feature of multilingual communication, and identifying where the language switches reliably is essential for downstream tasks such as generating code-switched machine translations. This paper introduces CSDI, a Code-Switching Detection (CSD) system for Indic text, which jointly learns CSD, Named Entity Recognition, and Part-of-Speech tagging through a shared encoder. Leveraging multitask learning, CSDI captures linguistic cues that signal switching boundaries and achieves a new state-of-the-art macro-F1 score with near-zero 𝛥CMI across six Indic languages. The model also demonstrates strong cross-lingual transfer, effectively leveraging high-resource languages to improve low-resource performance. Despite challenges such as intra-word code-mixing and limited token-level context, CSDI establishes a new baseline for scalable, low-resource NLP research in code-mixed environments.
Anthology ID:
2026.mme-main.3
Volume:
Proceedings of the First Workshop on Multilingual Multicultural Evaluation
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Pinzhen Chen, Vilém Zouhar, Hanxu Hu, Simran Khanuja, Wenhao Zhu, Barry Haddow, Alexandra Birch, Alham Fikri Aji, Rico Sennrich, Sara Hooker
Venues:
MME | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–48
Language:
URL:
https://aclanthology.org/2026.mme-main.3/
DOI:
Bibkey:
Cite (ACL):
Karan Bhanushali and Fritz Hohl. 2026. An improved Code-Switching Detection System for some Indic Languages. In Proceedings of the First Workshop on Multilingual Multicultural Evaluation, pages 35–48, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
An improved Code-Switching Detection System for some Indic Languages (Bhanushali & Hohl, MME 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.mme-main.3.pdf