Entity Projection via Machine Translation for Cross-Lingual NER

Alankar Jain, Bhargavi Paranjape, Zachary C. Lipton


Abstract
Although over 100 languages are supported by strong off-the-shelf machine translation systems, only a subset of them possess large annotated corpora for named entity recognition. Motivated by this fact, we leverage machine translation to improve annotation-projection approaches to cross-lingual named entity recognition. We propose a system that improves over prior entity-projection methods by: (a) leveraging machine translation systems twice: first for translating sentences and subsequently for translating entities; (b) matching entities based on orthographic and phonetic similarity; and (c) identifying matches based on distributional statistics derived from the dataset. Our approach improves upon current state-of-the-art methods for cross-lingual named entity recognition on 5 diverse languages by an average of 4.1 points. Further, our method achieves state-of-the-art F_1 scores for Armenian, outperforming even a monolingual model trained on Armenian source data.
Anthology ID:
D19-1100
Original:
D19-1100v1
Version 2:
D19-1100v2
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1083–1092
Language:
URL:
https://aclanthology.org/D19-1100
DOI:
10.18653/v1/D19-1100
Bibkey:
Cite (ACL):
Alankar Jain, Bhargavi Paranjape, and Zachary C. Lipton. 2019. Entity Projection via Machine Translation for Cross-Lingual NER. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1083–1092, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Entity Projection via Machine Translation for Cross-Lingual NER (Jain et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1100.pdf
Code
 alankarj/cross_lingual_ner
Data
CoNLL 2002CoNLL 2003Polyglot-NER