Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text

Vivek Srivastava, Mayank Singh


Abstract
Code-mixing is a frequent communication style among multilingual speakers where they mix words and phrases from two different languages in the same utterance of text or speech. Identifying and filtering code-mixed text is a challenging task due to its co-existence with monolingual and noisy text. Over the years, several code-mixing metrics have been extensively used to identify and validate code-mixed text quality. This paper demonstrates several inherent limitations of code-mixing metrics with examples from the already existing datasets that are popularly used across various experiments.
Anthology ID:
2021.calcs-1.2
Volume:
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:
June
Year:
2021
Address:
Online
Editors:
Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
Venue:
CALCS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6–14
Language:
URL:
https://aclanthology.org/2021.calcs-1.2
DOI:
10.18653/v1/2021.calcs-1.2
Bibkey:
Cite (ACL):
Vivek Srivastava and Mayank Singh. 2021. Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 6–14, Online. Association for Computational Linguistics.
Cite (Informal):
Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text (Srivastava & Singh, CALCS 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.calcs-1.2.pdf