Improved Word Representation Learning with Sememes

Yilin Niu, Ruobing Xie, Zhiyuan Liu, Maosong Sun


Abstract
Sememes are minimum semantic units of word meanings, and the meaning of each word sense is typically composed by several sememes. Since sememes are not explicit for each word, people manually annotate word sememes and form linguistic common-sense knowledge bases. In this paper, we present that, word sememe information can improve word representation learning (WRL), which maps words into a low-dimensional semantic space and serves as a fundamental step for many NLP tasks. The key idea is to utilize word sememes to capture exact meanings of a word within specific contexts accurately. More specifically, we follow the framework of Skip-gram and present three sememe-encoded models to learn representations of sememes, senses and words, where we apply the attention scheme to detect word senses in various contexts. We conduct experiments on two tasks including word similarity and word analogy, and our models significantly outperform baselines. The results indicate that WRL can benefit from sememes via the attention scheme, and also confirm our models being capable of correctly modeling sememe information.
Anthology ID:
P17-1187
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2049–2058
Language:
URL:
https://aclanthology.org/P17-1187
DOI:
10.18653/v1/P17-1187
Bibkey:
Cite (ACL):
Yilin Niu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Improved Word Representation Learning with Sememes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2049–2058, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Improved Word Representation Learning with Sememes (Niu et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1187.pdf
Software:
 P17-1187.Software.zip
Dataset:
 P17-1187.Datasets.zip
Code
 thunlp/SE-WRL