Karan Bhanushali
2026
An improved Code-Switching Detection System for some Indic Languages
Karan Bhanushali | Fritz Hohl
Proceedings of the First Workshop on Multilingual Multicultural Evaluation
Karan Bhanushali | Fritz Hohl
Proceedings of the First Workshop on Multilingual Multicultural Evaluation
Code-switching is a common feature of multilingual communication, and identifying where the language switches reliably is essential for downstream tasks such as generating code-switched machine translations. This paper introduces CSDI, a Code-Switching Detection (CSD) system for Indic text, which jointly learns CSD, Named Entity Recognition, and Part-of-Speech tagging through a shared encoder. Leveraging multitask learning, CSDI captures linguistic cues that signal switching boundaries and achieves a new state-of-the-art macro-F1 score with near-zero 𝛥CMI across six Indic languages. The model also demonstrates strong cross-lingual transfer, effectively leveraging high-resource languages to improve low-resource performance. Despite challenges such as intra-word code-mixing and limited token-level context, CSDI establishes a new baseline for scalable, low-resource NLP research in code-mixed environments.