Revisiting DocRED - Addressing the False Negative Problem in Relation Extraction

Qingyu Tan; Lu Xu; Lidong Bing; Hwee Tou Ng; Sharifah Mahani Aljunied

doi:10.18653/v1/2022.emnlp-main.580

Revisiting DocRED - Addressing the False Negative Problem in Relation Extraction

Qingyu Tan, Lu Xu, Lidong Bing, Hwee Tou Ng, Sharifah Mahani Aljunied

Abstract

The DocRED dataset is one of the most popular and widely used benchmarks for document-level relation extraction (RE). It adopts a recommend-revise annotation scheme so as to have a large-scale annotated dataset. However, we find that the annotation of DocRED is incomplete, i.e., false negative samples are prevalent. We analyze the causes and effects of the overwhelming false negative problem in the DocRED dataset. To address the shortcoming, we re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED. We name our revised DocRED dataset Re-DocRED. We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points. Moreover, we conduct a comprehensive analysis to identify the potential areas for further improvement.

Anthology ID:: 2022.emnlp-main.580
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8472–8487
Language:
URL:: https://aclanthology.org/2022.emnlp-main.580/
DOI:: 10.18653/v1/2022.emnlp-main.580
Bibkey:
Cite (ACL):: Qingyu Tan, Lu Xu, Lidong Bing, Hwee Tou Ng, and Sharifah Mahani Aljunied. 2022. Revisiting DocRED - Addressing the False Negative Problem in Relation Extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8472–8487, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Revisiting DocRED - Addressing the False Negative Problem in Relation Extraction (Tan et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.580.pdf
Video:: https://aclanthology.org/2022.emnlp-main.580.mp4

PDF Cite Search Video Fix data