%0 Conference Proceedings
%T A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
%A Ailem, Melissa
%A Zhang, Bowen
%A Bellet, Aurelien
%A Denis, Pascal
%A Sha, Fei
%Y Riloff, Ellen
%Y Chiang, David
%Y Hockenmaier, Julia
%Y Tsujii, Jun’ichi
%S Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
%D 2018
%8 oct nov
%I Association for Computational Linguistics
%C Brussels, Belgium
%F ailem-etal-2018-probabilistic
%X Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.
%R 10.18653/v1/D18-1177
%U https://aclanthology.org/D18-1177
%U https://doi.org/10.18653/v1/D18-1177
%P 1478-1487