Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text

Vivek Srivastava, Mayank Singh


Abstract
Code-mixing is a frequent communication style among multilingual speakers where they mix words and phrases from two different languages in the same utterance of text or speech. Identifying and filtering code-mixed text is a challenging task due to its co-existence with monolingual and noisy text. Over the years, several code-mixing metrics have been extensively used to identify and validate code-mixed text quality. This paper demonstrates several inherent limitations of code-mixing metrics with examples from the already existing datasets that are popularly used across various experiments.
Anthology ID:
2021.calcs-1.2
Volume:
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:
June
Year:
2021
Address:
Online
Venues:
CALCS | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6–14
Language:
URL:
https://aclanthology.org/2021.calcs-1.2
DOI:
10.18653/v1/2021.calcs-1.2
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.calcs-1.2.pdf