Learning Bilingual Word Embeddings Using Lexical Definitions

Weijia Shi, Muhao Chen, Yingtao Tian, Kai-Wei Chang


Abstract
Bilingual word embeddings, which represent lexicons of different languages in a shared embedding space, are essential for supporting semantic and knowledge transfers in a variety of cross-lingual NLP tasks. Existing approaches to training bilingual word embeddings require either large collections of pre-defined seed lexicons that are expensive to obtain, or parallel sentences that comprise coarse and noisy alignment. In contrast, we propose BiLex that leverages publicly available lexical definitions for bilingual word embedding learning. Without the need of predefined seed lexicons, BiLex comprises a novel word pairing strategy to automatically identify and propagate the precise fine-grain word alignment from lexical definitions. We evaluate BiLex in word-level and sentence-level translation tasks, which seek to find the cross-lingual counterparts of words and sentences respectively. BiLex significantly outperforms previous embedding methods on both tasks.
Anthology ID:
W19-4316
Volume:
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, Marek Rei
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
142–147
Language:
URL:
https://aclanthology.org/W19-4316
DOI:
10.18653/v1/W19-4316
Bibkey:
Cite (ACL):
Weijia Shi, Muhao Chen, Yingtao Tian, and Kai-Wei Chang. 2019. Learning Bilingual Word Embeddings Using Lexical Definitions. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 142–147, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Learning Bilingual Word Embeddings Using Lexical Definitions (Shi et al., RepL4NLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4316.pdf