Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment

Sourav Dutta; Gerhard Weikum

doi:10.1162/tacl_a_00119

Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment

Abstract

Identifying and linking named entities across information sources is the basis of knowledge acquisition and at the heart of Web search, recommendations, and analytics. An important problem in this context is cross-document co-reference resolution (CCR): computing equivalence classes of textual mentions denoting the same entity, within and across documents. Prior methods employ ranking, clustering, or probabilistic graphical models using syntactic features and distant features from knowledge bases. However, these methods exhibit limitations regarding run-time and robustness. This paper presents the CROCS framework for unsupervised CCR, improving the state of the art in two ways. First, we extend the way knowledge bases are harnessed, by constructing a notion of semantic summaries for intra-document co-reference chains using co-occurring entity mentions belonging to different chains. Second, we reduce the computational cost by a new algorithm that embeds sample-based bisection, using spectral clustering or graph partitioning, in a hierarchical clustering process. This allows scaling up CCR to large corpora. Experiments with three datasets show significant gains in output quality, compared to the best prior methods, and the run-time efficiency of CROCS.

Anthology ID:: Q15-1002
Volume:: Transactions of the Association for Computational Linguistics, Volume 3
Month:
Year:: 2015
Address:: Cambridge, MA
Editors:: Michael Collins, Lillian Lee
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 15–28
Language:
URL:: https://aclanthology.org/Q15-1002/
DOI:: 10.1162/tacl_a_00119
Bibkey:
Cite (ACL):: Sourav Dutta and Gerhard Weikum. 2015. Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment. Transactions of the Association for Computational Linguistics, 3:15–28.
Cite (Informal):: Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment (Dutta & Weikum, TACL 2015)
Copy Citation:
PDF:: https://aclanthology.org/Q15-1002.pdf
Data: New York Times Annotated Corpus, YAGO

PDF Cite Search Fix data