Learning Sparse Sentence Encoding without Supervision: An Exploration of Sparsity in Variational Autoencoders

Victor Prokhorov, Yingzhen Li, Ehsan Shareghi, Nigel Collier


Abstract
It has been long known that sparsity is an effective inductive bias for learning efficient representation of data in vectors with fixed dimensionality, and it has been explored in many areas of representation learning. Of particular interest to this work is the investigation of the sparsity within the VAE framework which has been explored a lot in the image domain, but has been lacking even a basic level of exploration in NLP. Additionally, NLP is also lagging behind in terms of learning sparse representations of large units of text e.g., sentences. We use the VAEs that induce sparse latent representations of large units of text to address the aforementioned shortcomings. First, we move in this direction by measuring the success of unsupervised state-of-the-art (SOTA) and other strong VAE-based sparsification baselines for text and propose a hierarchical sparse VAE model to address the stability issue of SOTA. Then, we look at the implications of sparsity on text classification across 3 datasets, and highlight a link between performance of sparse latent representations on downstream tasks and its ability to encode task-related information.
Anthology ID:
2021.repl4nlp-1.5
Volume:
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Anna Rogers, Iacer Calixto, Ivan Vulić, Naomi Saphra, Nora Kassner, Oana-Maria Camburu, Trapit Bansal, Vered Shwartz
Venue:
RepL4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–46
Language:
URL:
https://aclanthology.org/2021.repl4nlp-1.5
DOI:
10.18653/v1/2021.repl4nlp-1.5
Bibkey:
Cite (ACL):
Victor Prokhorov, Yingzhen Li, Ehsan Shareghi, and Nigel Collier. 2021. Learning Sparse Sentence Encoding without Supervision: An Exploration of Sparsity in Variational Autoencoders. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 34–46, Online. Association for Computational Linguistics.
Cite (Informal):
Learning Sparse Sentence Encoding without Supervision: An Exploration of Sparsity in Variational Autoencoders (Prokhorov et al., RepL4NLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.repl4nlp-1.5.pdf
Code
 VictorProkhorov/HSVAE