Learning from Noisy Labels for Entity-Centric Information Extraction

Wenxuan Zhou, Muhao Chen


Abstract
Recent information extraction approaches have relied on training deep neural models. However, such models can easily overfit noisy labels and suffer from performance degradation. While it is very costly to filter noisy labels in large learning resources, recent studies show that such labels take more training steps to be memorized and are more frequently forgotten than clean labels, therefore are identifiable in training. Motivated by such properties, we propose a simple co-regularization framework for entity-centric information extraction, which consists of several neural models with identical structures but different parameter initialization. These models are jointly optimized with the task-specific losses and are regularized to generate similar predictions based on an agreement loss, which prevents overfitting on noisy labels. Extensive experiments on two widely used but noisy benchmarks for information extraction, TACRED and CoNLL03, demonstrate the effectiveness of our framework. We release our code to the community for future research.
Anthology ID:
2021.emnlp-main.437
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5381–5392
Language:
URL:
https://aclanthology.org/2021.emnlp-main.437
DOI:
10.18653/v1/2021.emnlp-main.437
Bibkey:
Cite (ACL):
Wenxuan Zhou and Muhao Chen. 2021. Learning from Noisy Labels for Entity-Centric Information Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5381–5392, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Learning from Noisy Labels for Entity-Centric Information Extraction (Zhou & Chen, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.437.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.437.mp4
Code
 wzhouad/NLL-IE
Data
CoNLL++CoNLL-2003TACRED