Injecting Word Embeddings with Another Language’s Resource : An Application of Bilingual Embeddings

Prakhar Pandey, Vikram Pudi, Manish Shrivastava


Abstract
Word embeddings learned from text corpus can be improved by injecting knowledge from external resources, while at the same time also specializing them for similarity or relatedness. These knowledge resources (like WordNet, Paraphrase Database) may not exist for all languages. In this work we introduce a method to inject word embeddings of a language with knowledge resource of another language by leveraging bilingual embeddings. First we improve word embeddings of German, Italian, French and Spanish using resources of English and test them on variety of word similarity tasks. Then we demonstrate the utility of our method by creating improved embeddings for Urdu and Telugu languages using Hindi WordNet, beating the previously established baseline for Urdu.
Anthology ID:
I17-2020
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
116–121
Language:
URL:
https://aclanthology.org/I17-2020
DOI:
Bibkey:
Cite (ACL):
Prakhar Pandey, Vikram Pudi, and Manish Shrivastava. 2017. Injecting Word Embeddings with Another Language’s Resource : An Application of Bilingual Embeddings. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 116–121, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Injecting Word Embeddings with Another Language’s Resource : An Application of Bilingual Embeddings (Pandey et al., IJCNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/I17-2020.pdf