A.l. Tonja
2022
Word Level Language Identification in Code-mixed Kannada-English Texts using traditional machine learning algorithms
M. Shahiki Tash
|
Z. Ahani
|
A.l. Tonja
|
M. Gemeda
|
N. Hussain
|
O. Kolesnikova
Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts
Language Identification at the Word Level in Kannada-English Texts. This paper de- scribes the system paper of CoLI-Kanglish 2022 shared task. The goal of this task is to identify the different languages used in CoLI- Kanglish 2022. This dataset is distributed into different categories including Kannada, En- glish, Mixed-Language, Location, Name, and Others. This Code-Mix was compiled by CoLI- Kanglish 2022 organizers from posts on social media. We use two classification techniques, KNN and SVM, and achieve an F1-score of 0.58 and place third out of nine competitors.