Entity Projection via Machine Translation for Cross-Lingual NER

Alankar Jain; Bhargavi Paranjape; Zachary C. Lipton

doi:10.18653/v1/D19-1100

Entity Projection via Machine Translation for Cross-Lingual NER

Alankar Jain, Bhargavi Paranjape, Zachary C. Lipton

Abstract

Although over 100 languages are supported by strong off-the-shelf machine translation systems, only a subset of them possess large annotated corpora for named entity recognition. Motivated by this fact, we leverage machine translation to improve annotation-projection approaches to cross-lingual named entity recognition. We propose a system that improves over prior entity-projection methods by: (a) leveraging machine translation systems twice: first for translating sentences and subsequently for translating entities; (b) matching entities based on orthographic and phonetic similarity; and (c) identifying matches based on distributional statistics derived from the dataset. Our approach improves upon current state-of-the-art methods for cross-lingual named entity recognition on 5 diverse languages by an average of 4.1 points. Further, our method achieves state-of-the-art F_1 scores for Armenian, outperforming even a monolingual model trained on Armenian source data.

Anthology ID:: D19-1100
Original:: D19-1100v1
Version 2:: D19-1100v2
Volume:: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:: EMNLP | IJCNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1083–1092
Language:
URL:: https://aclanthology.org/D19-1100/
DOI:: 10.18653/v1/D19-1100
Bibkey:
Cite (ACL):: Alankar Jain, Bhargavi Paranjape, and Zachary C. Lipton. 2019. Entity Projection via Machine Translation for Cross-Lingual NER. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1083–1092, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Entity Projection via Machine Translation for Cross-Lingual NER (Jain et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-1100.pdf

PDF (v2) PDF (v1) Cite Search Fix data