Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

William Held, Dan Iter, Dan Jurafsky


Abstract
Performing event and entity coreference resolution across documents vastly increases the number of candidate mentions, making it intractable to do the full n2 pairwise comparisons. Existing approaches simplify by considering coreference only within document clusters, but this fails to handle inter-cluster coreference, common in many applications. As a result cross-document coreference algorithms are rarely applied to downstream tasks. We draw on an insight from discourse coherence theory: potential coreferences are constrained by the reader’s discourse focus. We model the entities/events in a reader’s focus as a neighborhood within a learned latent embedding space which minimizes the distance between mentions and the centroids of their gold coreference clusters. We then use these neighborhoods to sample only hard negatives to train a fine-grained classifier on mention pairs and their local discourse features. Our approach achieves state-of-the-art results for both events and entities on the ECB+, Gun Violence, Football Coreference, and Cross-Domain Cross-Document Coreference corpora. Furthermore, training on multiple corpora improves average performance across all datasets by 17.2 F1 points, leading to a robust coreference resolution model that is now feasible to apply to downstream tasks.
Anthology ID:
2021.emnlp-main.106
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1406–1417
Language:
URL:
https://aclanthology.org/2021.emnlp-main.106
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.106.pdf
Code
 helw150/event_entity_coref_ecb_plus
Data
ECB+