Representation Learning of Entities and Documents from Knowledge Base Descriptions

Ikuya Yamada, Hiroyuki Shindo, Yoshiyasu Takefuji


Abstract
In this paper, we describe TextEnt, a neural network model that learns distributed representations of entities and documents directly from a knowledge base (KB). Given a document in a KB consisting of words and entity annotations, we train our model to predict the entity that the document describes and map the document and its target entity close to each other in a continuous vector space. Our model is trained using a large number of documents extracted from Wikipedia. The performance of the proposed model is evaluated using two tasks, namely fine-grained entity typing and multiclass text classification. The results demonstrate that our model achieves state-of-the-art performance on both tasks. The code and the trained representations are made available online for further academic research.
Anthology ID:
C18-1016
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
190–201
Language:
URL:
https://aclanthology.org/C18-1016
DOI:
Bibkey:
Cite (ACL):
Ikuya Yamada, Hiroyuki Shindo, and Yoshiyasu Takefuji. 2018. Representation Learning of Entities and Documents from Knowledge Base Descriptions. In Proceedings of the 27th International Conference on Computational Linguistics, pages 190–201, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Representation Learning of Entities and Documents from Knowledge Base Descriptions (Yamada et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1016.pdf
Code
 wikipedia2vec/wikipedia2vec +  additional community code
Data
Figment