A Distribution-based Model to Learn Bilingual Word Embeddings

Hailong Cao, Tiejun Zhao, Shu Zhang, Yao Meng


Abstract
We introduce a distribution based model to learn bilingual word embeddings from monolingual data. It is simple, effective and does not require any parallel data or any seed lexicon. We take advantage of the fact that word embeddings are usually in form of dense real-valued low-dimensional vector and therefore the distribution of them can be accurately estimated. A novel cross-lingual learning objective is proposed which directly matches the distributions of word embeddings in one language with that in the other language. During the joint learning process, we dynamically estimate the distributions of word embeddings in two languages respectively and minimize the dissimilarity between them through standard back propagation algorithm. Our learned bilingual word embeddings allow to group each word and its translations together in the shared vector space. We demonstrate the utility of the learned embeddings on the task of finding word-to-word translations from monolingual corpora. Our model achieved encouraging performance on data in both related languages and substantially different languages.
Anthology ID:
C16-1171
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
1818–1827
Language:
URL:
https://aclanthology.org/C16-1171
DOI:
Bibkey:
Cite (ACL):
Hailong Cao, Tiejun Zhao, Shu Zhang, and Yao Meng. 2016. A Distribution-based Model to Learn Bilingual Word Embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 1818–1827, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
A Distribution-based Model to Learn Bilingual Word Embeddings (Cao et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1171.pdf