Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage

Yichen Jiang, Marco Vecchio, Mohit Bansal, Anders Johannsen


Abstract
Long prompts present a significant challenge for practical LLM-based systems that need to operate with low latency and limited resources. We investigate prompt compression for zero-shot dialogue systems that learn to use unseen APIs directly in-context from their documentation, which may take up hundreds of prompt tokens per API. We start from a recently introduced approach (Mu et al., 2023) that learns to compress the prompt into a few “gist token” activations during finetuning. However, this simple idea is ineffective in compressing API documentation, resulting in low accuracy compared to the baseline using an uncompressed prompt. In this work, we introduce two major improvements. First, we specialize gist tokens for different hierarchies within an API: we use one Gistarg token for compressing an argument and one Gistvalue token for compressing an acceptable value of a categorical argument. We then dynamically reveal Gistvalue tokens only when they are needed. Second, we add a reconstruction loss to predict the API documentation from the gist tokens. On multiple API-calling tasks, our proposed system keeps the simplicity, efficiency, and large compression factor (20x on SGD) of the gist token approach while achieving significantly better accuracy.
Anthology ID:
2024.findings-eacl.143
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2162–2174
Language:
URL:
https://aclanthology.org/2024.findings-eacl.143
DOI:
Bibkey:
Cite (ACL):
Yichen Jiang, Marco Vecchio, Mohit Bansal, and Anders Johannsen. 2024. Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage. In Findings of the Association for Computational Linguistics: EACL 2024, pages 2162–2174, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage (Jiang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.143.pdf