Learning to Denoise Distantly-Labeled Data for Entity Typing

Yasumasa Onoe, Greg Durrett


Abstract
Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach consists of two parts. First, a filtering function discards examples from the distantly labeled data that are wholly unusable. Second, a relabeling function repairs noisy labels for the retained examples. Each of these components is a model trained on synthetically-noised examples generated from a small manually-labeled set. We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018). Our baseline model is an extension of their model with pre-trained ELMo representations, which already achieves state-of-the-art performance. Adding distant data that has been denoised with our learned models gives further performance gains over this base model, outperforming models trained on raw distant data or heuristically-denoised distant data.
Anthology ID:
N19-1250
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2407–2417
Language:
URL:
https://aclanthology.org/N19-1250
DOI:
10.18653/v1/N19-1250
Bibkey:
Cite (ACL):
Yasumasa Onoe and Greg Durrett. 2019. Learning to Denoise Distantly-Labeled Data for Entity Typing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2407–2417, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Learning to Denoise Distantly-Labeled Data for Entity Typing (Onoe & Durrett, NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1250.pdf
Code
 yasumasaonoe/DenoiseET
Data
OntoNotes 5.0Open Entity