Analysis of Zero-Shot Crosslingual Learning between English and Korean for Named Entity Recognition

Jongin Kim, Nayoung Choi, Seunghyun Lim, Jungwhan Kim, Soojin Chung, Hyunsoo Woo, Min Song, Jinho D. Choi


Abstract
This paper presents a English-Korean parallel dataset that collects 381K news articles where 1,400 of them, comprising 10K sentences, are manually labeled for crosslingual named entity recognition (NER). The annotation guidelines for the two languages are developed in parallel, that yield the inter-annotator agreement scores of 91 and 88% for English and Korean respectively, indicating sublime quality annotation in our dataset. Three types of crosslingual learning approaches, direct model transfer, embedding projection, and annotation projection, are used to develop zero-shot Korean NER models. Our best model gives the F1-score of 51% that is very encouraging, considering the extremely distinct natures of these two languages. This is pioneering work that explores zero-shot cross-lingual learning between English and Korean and provides rich parallel annotation for a core NLP task such as named entity recognition.
Anthology ID:
2021.mrl-1.19
Volume:
Proceedings of the 1st Workshop on Multilingual Representation Learning
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Duygu Ataman, Alexandra Birch, Alexis Conneau, Orhan Firat, Sebastian Ruder, Gozde Gul Sahin
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
224–237
Language:
URL:
https://aclanthology.org/2021.mrl-1.19
DOI:
10.18653/v1/2021.mrl-1.19
Bibkey:
Cite (ACL):
Jongin Kim, Nayoung Choi, Seunghyun Lim, Jungwhan Kim, Soojin Chung, Hyunsoo Woo, Min Song, and Jinho D. Choi. 2021. Analysis of Zero-Shot Crosslingual Learning between English and Korean for Named Entity Recognition. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 224–237, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Analysis of Zero-Shot Crosslingual Learning between English and Korean for Named Entity Recognition (Kim et al., MRL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mrl-1.19.pdf
Code
 emorynlp/mrl-2021
Data
OntoNotes 5.0