CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition

Tingting Ma, Qianhui Wu, Huiqiang Jiang, Börje Karlsson, Tiejun Zhao, Chin-Yew Lin


Abstract
Cross-lingual named entity recognition (NER) aims to train an NER system that generalizes well to a target language by leveraging labeled data in a given source language. Previous work alleviates the data scarcity problem by translating source-language labeled data or performing knowledge distillation on target-language unlabeled data. However, these methods may suffer from label noise due to the automatic labeling process. In this paper, we propose CoLaDa, a Collaborative Label Denoising Framework, to address this problem. Specifically, we first explore a model-collaboration-based denoising scheme that enables models trained on different data sources to collaboratively denoise pseudo labels used by each other. We then present an instance-collaboration-based strategy that considers the label consistency of each token’s neighborhood in the representation space for denoising. Experiments on different benchmark datasets show that the proposed CoLaDa achieves superior results compared to previous methods, especially when generalizing to distant languages.
Anthology ID:
2023.acl-long.330
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5995–6009
Language:
URL:
https://aclanthology.org/2023.acl-long.330
DOI:
10.18653/v1/2023.acl-long.330
Bibkey:
Cite (ACL):
Tingting Ma, Qianhui Wu, Huiqiang Jiang, Börje Karlsson, Tiejun Zhao, and Chin-Yew Lin. 2023. CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5995–6009, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition (Ma et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.330.pdf
Video:
 https://aclanthology.org/2023.acl-long.330.mp4