Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders

Giangiacomo Mercatali, André Freitas


Abstract
The ability of learning disentangled representations represents a major step for interpretable NLP systems as it allows latent linguistic features to be controlled. Most approaches to disentanglement rely on continuous variables, both for images and text. We argue that despite being suitable for image datasets, continuous variables may not be ideal to model features of textual data, due to the fact that most generative factors in text are discrete. We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations. The proposed model outperforms continuous and discrete baselines on several qualitative and quantitative benchmarks for disentanglement as well as on a text style transfer downstream application.
Anthology ID:
2021.findings-emnlp.301
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3547–3556
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.301
DOI:
10.18653/v1/2021.findings-emnlp.301
Bibkey:
Cite (ACL):
Giangiacomo Mercatali and André Freitas. 2021. Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3547–3556, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders (Mercatali & Freitas, Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.301.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.301.mp4