EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains

Frank Mtumbuka; Steven Schockaert

doi:10.18653/v1/2024.eacl-long.106

EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains

Abstract

Entity typing is the task of assigning semantic types to the entities that are mentioned in a text. In the case of fine-grained entity typing (FET), a large set of candidate type labels is considered. Since obtaining sufficient amounts of manual annotations is then prohibitively expensive, FET models are typically trained using distant supervision. In this paper, we propose to improve on this process by pre-training an entity encoder such that embeddings of coreferring entities are more similar to each other than to the embeddings of other entities. The main problem with this strategy, which helps to explain why it has not previously been considered, is that predicted coreference links are often too noisy. We show that this problem can be addressed by using a simple trick: we only consider coreference links that are predicted by two different off-the-shelf systems. With this prudent use of coreference links, our pre-training strategy allows us to improve the state-of-the-art in benchmarks on fine-grained entity typing, as well as traditional entity extraction.

Anthology ID:: 2024.eacl-long.106
Volume:: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Yvette Graham, Matthew Purver
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1768–1781
Language:
URL:: https://aclanthology.org/2024.eacl-long.106/
DOI:: 10.18653/v1/2024.eacl-long.106
Bibkey:
Cite (ACL):: Frank Mtumbuka and Steven Schockaert. 2024. EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1768–1781, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains (Mtumbuka & Schockaert, EACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.eacl-long.106.pdf
Video:: https://aclanthology.org/2024.eacl-long.106.mp4

PDF Cite Search Video Fix data