The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation

Hao Peng, Xiaozhi Wang, Feng Yao, Kaisheng Zeng, Lei Hou, Juanzi Li, Zhiyuan Liu, Weixing Shen


Abstract
Event extraction (EE) is a crucial task aiming at extracting events from texts, which includes two subtasks: event detection (ED) and event argument extraction (EAE). In this paper, we check the reliability of EE evaluations and identify three major pitfalls: (1) The data preprocessing discrepancy makes the evaluation results on the same dataset not directly comparable, but the data preprocessing details are not widely noted and specified in papers. (2) The output space discrepancy of different model paradigms makes different-paradigm EE models lack grounds for comparison and also leads to unclear mapping issues between predictions and annotations. (3) The absence of pipeline evaluation of many EAE-only works makes them hard to be directly compared with EE works and may not well reflect the model performance in real-world pipeline scenarios. We demonstrate the significant influence of these pitfalls through comprehensive meta-analyses of recent papers and empirical experiments. To avoid these pitfalls, we suggest a series of remedies, including specifying data preprocessing, standardizing outputs, and providing pipeline evaluation results. To help implement these remedies, we develop a consistent evaluation framework OmniEvent, which can be obtained from https://github.com/THU-KEG/OmniEvent.
Anthology ID:
2023.findings-acl.586
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9206–9227
Language:
URL:
https://aclanthology.org/2023.findings-acl.586
DOI:
10.18653/v1/2023.findings-acl.586
Bibkey:
Cite (ACL):
Hao Peng, Xiaozhi Wang, Feng Yao, Kaisheng Zeng, Lei Hou, Juanzi Li, Zhiyuan Liu, and Weixing Shen. 2023. The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9206–9227, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation (Peng et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.586.pdf
Video:
 https://aclanthology.org/2023.findings-acl.586.mp4