TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Megan Leszczynski; Daniel Fu; Mayee Chen; Christopher Ré

doi:10.18653/v1/2022.findings-acl.169

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Megan Leszczynski, Daniel Fu, Mayee Chen, Christopher Re

Abstract

Entity retrieval—retrieving information about entity mentions in a query—is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there are several challenges: (1) existing type-based retrieval methods require mention boundaries as input, but open-domain tasks run on unstructured text, (2) type-based methods should not compromise overall performance, and (3) type-based methods should be robust to noisy and missing types. In this work, we introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval for open-domain tasks. TABi leverages a type-enforced contrastive loss to encourage entities and queries of similar types to be close in the embedding space. TABi improves retrieval of rare entities on the Ambiguous Entity Retrieval (AmbER) sets, while maintaining strong overall retrieval performance on open-domain tasks in the KILT benchmark compared to state-of-the-art retrievers. TABi is also robust to incomplete type systems, improving rare entity retrieval over baselines with only 5% type coverage of the training dataset. We make our code publicly available.

Anthology ID:: 2022.findings-acl.169
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2147–2166
Language:
URL:: https://aclanthology.org/2022.findings-acl.169/
DOI:: 10.18653/v1/2022.findings-acl.169
Bibkey:
Cite (ACL):: Megan Leszczynski, Daniel Fu, Mayee Chen, and Christopher Re. 2022. TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2147–2166, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval (Leszczynski et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-acl.169.pdf
Video:: https://aclanthology.org/2022.findings-acl.169.mp4

PDF Cite Search Video Fix data