LangVAE and LangSpace: Building and Probing for Language Model VAEs

Danilo S. Carvalho; Yingji Zhang; Harriet Unsworth; André Freitas

doi:10.18653/v1/2025.emnlp-demos.57

LangVAE and LangSpace: Building and Probing for Language Model VAEs

Danilo Carvalho, Yingji Zhang, Harriet Unsworth, Andre Freitas

Abstract

We present LangVAE, a novel framework for modular construction of variational autoencoders (VAEs) on top of pre-trained large language models (LLMs). Such language model VAEs can encode the knowledge of their pre-trained components into more compact and semantically disentangled representations. The representations obtained in this way can be analysed with the LangVAE companion framework: LangSpace, which implements a collection of probing methods, such as vector traversal and interpolation, disentanglement measures, and cluster visualisations. LangVAE and LangSpace offer a flexible, efficient and scalable way of building and analysing textual representations, with simple integration for models available on the HuggingFace Hub. Additionally, we conducted a set of experiments with different encoder and decoder combinations, as well as annotated inputs, revealing a wide range of interactions across architectural families and sizes w.r.t.generalisation and disentanglement. Our findings demonstrate a promising framework for systematising the experimentation and understanding of textual representations.

Anthology ID:: 2025.emnlp-demos.57
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Ivan Habernal, Peter Schulam, Jörg Tiedemann
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 749–759
Language:
URL:: https://aclanthology.org/2025.emnlp-demos.57/
DOI:: 10.18653/v1/2025.emnlp-demos.57
Bibkey:
Cite (ACL):: Danilo Carvalho, Yingji Zhang, Harriet Unsworth, and Andre Freitas. 2025. LangVAE and LangSpace: Building and Probing for Language Model VAEs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 749–759, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: LangVAE and LangSpace: Building and Probing for Language Model VAEs (Carvalho et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-demos.57.pdf

PDF Cite Search Fix data