Event Annotation and Detection in Kannada-English Code-Mixed Social Media Data

Sumukh S, Abhinav Appidi, Manish Shrivastava


Abstract
Code-mixing (CM) is a frequently observed phenomenon on social media platforms in multilingual societies such as India. While the increase in code-mixed content on these platforms provides good amount of data for studying various aspects of code-mixing, the lack of automated text analysis tools makes such studies difficult. To overcome the same, tools such as language identifiers, Parts-of-Speech (POS) taggers and Named Entity Recognition (NER) for analysing code-mixed data have been developed. One such important tool is Event Detection, an important information retrieval task which can be used to identify critical facts occurring in the vast streams of unstructured text data available. While event detection from text is a hard problem on its own, social media data adds to it with its informal nature, and code-mixed (Kannada-English) data further complicates the problem due to its word-level mixing, lack of structure and incomplete information. In this work, we have tried to address this problem. We have proposed guidelines for the annotation of events in Kannada-English CM data and provided some baselines for the same with careful feature selection.
Anthology ID:
2023.ranlp-1.108
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1007–1014
Language:
URL:
https://aclanthology.org/2023.ranlp-1.108
DOI:
Bibkey:
Cite (ACL):
Sumukh S, Abhinav Appidi, and Manish Shrivastava. 2023. Event Annotation and Detection in Kannada-English Code-Mixed Social Media Data. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 1007–1014, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Event Annotation and Detection in Kannada-English Code-Mixed Social Media Data (S et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.108.pdf