Prakhar Pandey
2017
Injecting Word Embeddings with Another Language’s Resource : An Application of Bilingual Embeddings
Prakhar Pandey
|
Vikram Pudi
|
Manish Shrivastava
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Word embeddings learned from text corpus can be improved by injecting knowledge from external resources, while at the same time also specializing them for similarity or relatedness. These knowledge resources (like WordNet, Paraphrase Database) may not exist for all languages. In this work we introduce a method to inject word embeddings of a language with knowledge resource of another language by leveraging bilingual embeddings. First we improve word embeddings of German, Italian, French and Spanish using resources of English and test them on variety of word similarity tasks. Then we demonstrate the utility of our method by creating improved embeddings for Urdu and Telugu languages using Hindi WordNet, beating the previously established baseline for Urdu.
Search