DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction

MeiHan Tong, Bin Xu, Shuai Wang, Meihuan Han, Yixin Cao, Jiangqi Zhu, Siyu Chen, Lei Hou, Juanzi Li


Abstract
Event extraction aims to identify an event and then extract the arguments participating in the event. Despite the great success in sentence-level event extraction, events are more naturally presented in the form of documents, with event arguments scattered in multiple sentences. However, a major barrier to promote document-level event extraction has been the lack of large-scale and practical training and evaluation datasets. In this paper, we present DocEE, a new document-level event extraction dataset including 27,000+ events, 180,000+ arguments. We highlight three features: large-scale manual annotations, fine-grained argument types and application-oriented settings. Experiments show that there is still a big gap between state-of-the-art models and human beings (41% Vs 85% in F1 score), indicating that DocEE is an open issue. DocEE is now available at https://github.com/tongmeihan1995/DocEE.git.
Anthology ID:
2022.naacl-main.291
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3970–3982
Language:
URL:
https://aclanthology.org/2022.naacl-main.291
DOI:
10.18653/v1/2022.naacl-main.291
Bibkey:
Cite (ACL):
MeiHan Tong, Bin Xu, Shuai Wang, Meihuan Han, Yixin Cao, Jiangqi Zhu, Siyu Chen, Lei Hou, and Juanzi Li. 2022. DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3970–3982, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction (Tong et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.291.pdf
Video:
 https://aclanthology.org/2022.naacl-main.291.mp4
Code
 tongmeihan1995/docee
Data
WikiEvents