Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization

Jin Zhao; Jingxuan Tu; Bingyang Ye; Xinrui Hu; Nianwen Xue; James Pustejovsky

doi:10.18653/v1/2025.naacl-long.178

Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization

Jin Zhao, Jingxuan Tu, Bingyang Ye, Xinrui Hu, Nianwen Xue, James Pustejovsky

Abstract

Cross-Document Event Coreference (CDEC) annotation is challenging and difficult to scale, resulting in existing datasets being small and lacking diversity. We introduce a new approach leveraging large language models (LLMs) to decontextualize event mentions, by simplifying the document-level annotation task to sentence pairs with enriched context, enabling the creation of Richer EventCorefBank (RECB), a denser and more expressive dataset annotated at faster speed. Decontextualization has been shown to improve annotation speed without compromising quality and to enhance model performance. Our baseline experiment indicates that systems trained on RECB achieve comparable results on the EventCorefBank(ECB+) test set, showing the high quality of our dataset and its generalizability on other CDEC datasets. In addition, our evaluation shows that the strong baseline models are still struggling with RECB comparing to other CDEC datasets, suggesting that the richness and diversity of RECB present significant challenges to current CDEC systems.

Anthology ID:: 2025.naacl-long.178
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3499–3513
Language:
URL:: https://aclanthology.org/2025.naacl-long.178/
DOI:: 10.18653/v1/2025.naacl-long.178
Bibkey:
Cite (ACL):: Jin Zhao, Jingxuan Tu, Bingyang Ye, Xinrui Hu, Nianwen Xue, and James Pustejovsky. 2025. Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3499–3513, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization (Zhao et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.178.pdf

PDF Cite Search Fix data