Jointly Embedding Entities and Text with Distant Supervision

Denis Newman-Griffis, Albert M Lai, Eric Fosler-Lussier


Abstract
Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomedical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use.
Anthology ID:
W18-3026
Volume:
Proceedings of the Third Workshop on Representation Learning for NLP
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Isabelle Augenstein, Kris Cao, He He, Felix Hill, Spandana Gella, Jamie Kiros, Hongyuan Mei, Dipendra Misra
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
195–206
Language:
URL:
https://aclanthology.org/W18-3026
DOI:
10.18653/v1/W18-3026
Bibkey:
Cite (ACL):
Denis Newman-Griffis, Albert M Lai, and Eric Fosler-Lussier. 2018. Jointly Embedding Entities and Text with Distant Supervision. In Proceedings of the Third Workshop on Representation Learning for NLP, pages 195–206, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Jointly Embedding Entities and Text with Distant Supervision (Newman-Griffis et al., RepL4NLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3026.pdf
Code
 OSU-slatelab/JET +  additional community code
Data
WikiSRS