M. Shahiki Tash


2022

pdf bib
Word Level Language Identification in Code-mixed Kannada-English Texts using traditional machine learning algorithms
M. Shahiki Tash | Z. Ahani | A.l. Tonja | M. Gemeda | N. Hussain | O. Kolesnikova
Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts

Language Identification at the Word Level in Kannada-English Texts. This paper de- scribes the system paper of CoLI-Kanglish 2022 shared task. The goal of this task is to identify the different languages used in CoLI- Kanglish 2022. This dataset is distributed into different categories including Kannada, En- glish, Mixed-Language, Location, Name, and Others. This Code-Mix was compiled by CoLI- Kanglish 2022 organizers from posts on social media. We use two classification techniques, KNN and SVM, and achieve an F1-score of 0.58 and place third out of nine competitors.