Learning Multilingual Word Embeddings Using Image-Text Data

Karan Singhal; Karthik Raman; Balder ten Cate

doi:10.18653/v1/W19-1807

Learning Multilingual Word Embeddings Using Image-Text Data

Karan Singhal, Karthik Raman, Balder ten Cate

Abstract

There has been significant interest recently in learning multilingual word embeddings – in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

Anthology ID:: W19-1807
Volume:: Proceedings of the Second Workshop on Shortcomings in Vision and Language
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Raffaella Bernardi, Raquel Fernandez, Spandana Gella, Kushal Kafle, Christopher Kanan, Stefan Lee, Moin Nabi
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 68–77
Language:
URL:: https://aclanthology.org/W19-1807/
DOI:: 10.18653/v1/W19-1807
Bibkey:
Cite (ACL):: Karan Singhal, Karthik Raman, and Balder ten Cate. 2019. Learning Multilingual Word Embeddings Using Image-Text Data. In Proceedings of the Second Workshop on Shortcomings in Vision and Language, pages 68–77, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Learning Multilingual Word Embeddings Using Image-Text Data (Singhal et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-1807.pdf

PDF Cite Search Fix data