Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data

Devanshu Jain, Maria Kustikova, Mayank Darbari, Rishabh Gupta, Stephen Mayhew


Abstract
In this work, we address the problem of Named Entity Recognition (NER) in code-switched tweets as a part of the Workshop on Computational Approaches to Linguistic Code-switching (CALCS) at ACL’18. Code-switching is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential code-switching, respectively. Processing such data is challenging using state of the art methods since such technology is generally geared towards processing monolingual text. In this paper we explored ways to use language identification and translation to recognize named entities in such data, however, utilizing simple features (sans multi-lingual features) with Conditional Random Field (CRF) classifier achieved the best results. Our experiments were mainly aimed at the (ENG-SPA) English-Spanish dataset but we submitted a language-independent version of our system to the (MSA-EGY) Arabic-Egyptian dataset as well and achieved good results.
Anthology ID:
W18-3213
Volume:
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–109
Language:
URL:
https://aclanthology.org/W18-3213
DOI:
10.18653/v1/W18-3213
Bibkey:
Cite (ACL):
Devanshu Jain, Maria Kustikova, Mayank Darbari, Rishabh Gupta, and Stephen Mayhew. 2018. Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, pages 103–109, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data (Jain et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3213.pdf