Creation of Corpus and Analysis in Code-Mixed Kannada-English Social Media Data for POS Tagging

Abhinav Reddy Appidi, Vamshi Krishna Srirangam, Darsi Suhas, Manish Shrivastava


Abstract
Part-of-Speech (POS) is one of the essential tasks for many Natural Language Processing (NLP) applications. There has been a significant amount of work done in POS tagging for resource-rich languages. POS tagging is an essential phase of text analysis in understanding the semantics and context of language. These tags are useful for higher-level tasks such as building parse trees, which can be used for Named Entity Recognition, Coreference resolution, Sentiment Analysis, and Question Answering. There has been work done on code-mixed social media corpus but not on POS tagging of Kannada-English code-mixed data. Here, we present Kannada-English code- mixed social media corpus annotated with corresponding POS tags. We also experimented with machine learning classification models CRF, Bi-LSTM, and Bi-LSTM-CRF models on our corpus.
Anthology ID:
2020.icon-main.13
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Editors:
Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
101–107
Language:
URL:
https://aclanthology.org/2020.icon-main.13
DOI:
Bibkey:
Cite (ACL):
Abhinav Reddy Appidi, Vamshi Krishna Srirangam, Darsi Suhas, and Manish Shrivastava. 2020. Creation of Corpus and Analysis in Code-Mixed Kannada-English Social Media Data for POS Tagging. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 101–107, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
Creation of Corpus and Analysis in Code-Mixed Kannada-English Social Media Data for POS Tagging (Appidi et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-main.13.pdf