Projecting named entity recognizers without annotated or parallel corpora

Jue Hou, Maximilian Koppatz, José María Hoya Quecedo, Roman Yangarber


Abstract
Named entity recognition (NER) is a well-researched task in the field of NLP, which typically requires large annotated corpora for training usable models. This is a problem for languages which lack large annotated corpora, such as Finnish. We propose an approach to create a named entity recognizer with no annotated or parallel documents, by leveraging strong NER models that exist for English. We automatically gather a large amount of chronologically matched data in two languages, then project named entity annotations from the English documents onto the Finnish ones, by resolving the matches with limited linguistic rules. We use this “artificially” annotated data to train a BiLSTM-CRF model. Our results show that this method can produce annotated instances with high precision, and the resulting model achieves state-of-the-art performance.
Anthology ID:
W19-6124
Volume:
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Month:
September–October
Year:
2019
Address:
Turku, Finland
Editors:
Mareike Hartmann, Barbara Plank
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
232–241
Language:
URL:
https://aclanthology.org/W19-6124/
DOI:
Bibkey:
Cite (ACL):
Jue Hou, Maximilian Koppatz, José María Hoya Quecedo, and Roman Yangarber. 2019. Projecting named entity recognizers without annotated or parallel corpora. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 232–241, Turku, Finland. Linköping University Electronic Press.
Cite (Informal):
Projecting named entity recognizers without annotated or parallel corpora (Hou et al., NoDaLiDa 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-6124.pdf