Representation Learning for Unseen Words by Bridging Subwords to Semantic Networks

Yeachan Kim, Kang-Min Kim, SangKeun Lee


Abstract
Pre-trained word embeddings are widely used in various fields. However, the coverage of pre-trained word embeddings only includes words that appeared in corpora where pre-trained embeddings are learned. It means that the words which do not appear in training corpus are ignored in tasks, and it could lead to the limited performance of neural models. In this paper, we propose a simple yet effective method to represent out-of-vocabulary (OOV) words. Unlike prior works that solely utilize subword information or knowledge, our method makes use of both information to represent OOV words. To this end, we propose two stages of representation learning. In the first stage, we learn subword embeddings from the pre-trained word embeddings by using an additive composition function of subwords. In the second stage, we map the learned subwords into semantic networks (e.g., WordNet). We then re-train the subword embeddings by using lexical entries on semantic lexicons that could include newly observed subwords. This two-stage learning makes the coverage of words broaden to a great extent. The experimental results clearly show that our method provides consistent performance improvements over strong baselines that use subwords or lexical resources separately.
Anthology ID:
2020.lrec-1.587
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4774–4780
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.587
DOI:
Bibkey:
Cite (ACL):
Yeachan Kim, Kang-Min Kim, and SangKeun Lee. 2020. Representation Learning for Unseen Words by Bridging Subwords to Semantic Networks. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4774–4780, Marseille, France. European Language Resources Association.
Cite (Informal):
Representation Learning for Unseen Words by Bridging Subwords to Semantic Networks (Kim et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.587.pdf