Deep Generative Model for Joint Alignment and Word Representation

Miguel Rios, Wilker Aziz, Khalil Sima’an


Abstract
This work exploits translation data as a source of semantically relevant learning signal for models of word representation. In particular, we exploit equivalence through translation as a form of distributional context and jointly learn how to embed and align with a deep generative model. Our EmbedAlign model embeds words in their complete observed context and learns by marginalisation of latent lexical alignments. Besides, it embeds words as posterior probability densities, rather than point estimates, which allows us to compare words in context using a measure of overlap between distributions (e.g. KL divergence). We investigate our model’s performance on a range of lexical semantics tasks achieving competitive results on several standard benchmarks including natural language inference, paraphrasing, and text similarity.
Anthology ID:
N18-1092
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1011–1023
Language:
URL:
https://aclanthology.org/N18-1092
DOI:
10.18653/v1/N18-1092
Bibkey:
Cite (ACL):
Miguel Rios, Wilker Aziz, and Khalil Sima’an. 2018. Deep Generative Model for Joint Alignment and Word Representation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1011–1023, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Deep Generative Model for Joint Alignment and Word Representation (Rios et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-1092.pdf
Video:
 https://aclanthology.org/N18-1092.mp4
Code
 uva-slpl/embedalign