Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective

Ian Porada, Alexandra Olteanu, Kaheer Suleman, Adam Trischler, Jackie Cheung


Abstract
It is increasingly common to evaluate the same coreference resolution (CR) model on multiple datasets. Do these multi-dataset evaluations allow us to draw meaningful conclusions about model generalization? Or, do they rather reflect the idiosyncrasies of a particular experimental setup (e.g., the specific datasets used)? To study this, we view evaluation through the lens of measurement modeling, a framework commonly used in the social sciences for analyzing the validity of measurements. By taking this perspective, we show how multi-dataset evaluations risk conflating different factors concerning what, precisely, is being measured. This in turn makes it difficult to draw more generalizable conclusions from these evaluations. For instance, we show that across seven datasets, measurements intended to reflect CR model generalization are often correlated with differences in both how coreference is defined and how it is operationalized; this limits our ability to draw conclusions regarding the ability of CR models to generalize across any singular dimension. We believe the measurement modeling framework provides the needed vocabulary for discussing challenges surrounding what is actually being measured by CR evaluations.
Anthology ID:
2024.findings-acl.909
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15380–15395
Language:
URL:
https://aclanthology.org/2024.findings-acl.909
DOI:
Bibkey:
Cite (ACL):
Ian Porada, Alexandra Olteanu, Kaheer Suleman, Adam Trischler, and Jackie Cheung. 2024. Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective. In Findings of the Association for Computational Linguistics ACL 2024, pages 15380–15395, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Challenges to Evaluating the Generalization of Coreference Resolution Models: A Measurement Modeling Perspective (Porada et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.909.pdf