A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations

Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel


Abstract
We propose a generative model for a sentence that uses two latent variables, with one intended to represent the syntax of the sentence and the other to represent its semantics. We show we can achieve better disentanglement between semantic and syntactic representations by training with multiple losses, including losses that exploit aligned paraphrastic sentences and word-order information. We evaluate our models on standard semantic similarity tasks and novel syntactic similarity tasks. Empirically, we find that the model with the best performing syntactic and semantic representations also gives rise to the most disentangled representations.
Anthology ID:
N19-1254
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2453–2464
Language:
URL:
https://aclanthology.org/N19-1254
DOI:
10.18653/v1/N19-1254
Bibkey:
Cite (ACL):
Mingda Chen, Qingming Tang, Sam Wiseman, and Kevin Gimpel. 2019. A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2453–2464, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations (Chen et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1254.pdf
Code
 mingdachen/disentangle-semantics-syntax
Data
Penn Treebank