Dodo: Dynamic Contextual Compression for Decoder-only LMs

Guanghui Qin, Corby Rosset, Ethan Chau, Nikhil Rao, Benjamin Van Durme


Abstract
Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space. Moreover, off-the-shelf models such as LLaMA can be adapted to Dodo by efficient parameter tuning methods such as LoRA. In use, Dodo can act as either an autoregressive LM or a context compressor for downstream tasks. We demonstrate through experiments in language modeling, question answering, and summarization that Dodo retains capabilities in these tasks, while drastically reducing the overhead during decoding. For example, in the autoencoding task, Dodo shrinks context at a 20x compression ratio with a BLEU score of 98% for reconstruction, achieving nearly lossless encoding.
Anthology ID:
2024.acl-long.536
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9961–9975
Language:
URL:
https://aclanthology.org/2024.acl-long.536
DOI:
Bibkey:
Cite (ACL):
Guanghui Qin, Corby Rosset, Ethan Chau, Nikhil Rao, and Benjamin Van Durme. 2024. Dodo: Dynamic Contextual Compression for Decoder-only LMs. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9961–9975, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Dodo: Dynamic Contextual Compression for Decoder-only LMs (Qin et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.536.pdf