Realistic Evaluation Principles for Cross-document Coreference Resolution

Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan


Abstract
We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regarding singleton coreference clusters, which we address by decoupling the evaluation of mention detection from that of coreference linking. Second, we argue that models should not exploit the synthetic topic structure of the standard ECB+ dataset, forcing models to confront the lexical ambiguity challenge, as intended by the dataset creators. We demonstrate empirically the drastic impact of our more realistic evaluation principles on a competitive model, yielding a score which is 33 F1 lower compared to evaluating by prior lenient practices.
Anthology ID:
2021.starsem-1.13
Volume:
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics
Month:
August
Year:
2021
Address:
Online
Editors:
Lun-Wei Ku, Vivi Nastase, Ivan Vulić
Venue:
*SEM
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
143–151
Language:
URL:
https://aclanthology.org/2021.starsem-1.13
DOI:
10.18653/v1/2021.starsem-1.13
Bibkey:
Cite (ACL):
Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, and Ido Dagan. 2021. Realistic Evaluation Principles for Cross-document Coreference Resolution. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, pages 143–151, Online. Association for Computational Linguistics.
Cite (Informal):
Realistic Evaluation Principles for Cross-document Coreference Resolution (Cattan et al., *SEM 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.starsem-1.13.pdf
Code
 ariecattan/coref
Data
ECB+