MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Yury Zemlyanskiy; Michiel de Jong; Luke Vilnis; Santiago Ontanon; William Cohen; Sumit Sanghai; Joshua Ainslie

doi:10.18653/v1/2024.naacl-short.64

MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontanon, William Cohen, Sumit Sanghai, Joshua Ainslie

Abstract

Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN (de Jong et al., 2023a) pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.

Anthology ID:: 2024.naacl-short.64
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 737–744
Language:
URL:: https://aclanthology.org/2024.naacl-short.64/
DOI:: 10.18653/v1/2024.naacl-short.64
Bibkey:
Cite (ACL):: Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontanon, William Cohen, Sumit Sanghai, and Joshua Ainslie. 2024. MEMORY-VQ: Compression for Tractable Internet-Scale Memory. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 737–744, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: MEMORY-VQ: Compression for Tractable Internet-Scale Memory (Zemlyanskiy et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-short.64.pdf
Video:: https://aclanthology.org/2024.naacl-short.64.mp4

PDF Cite Search Video Fix data