A Simple yet Efficient Prompt Compression Method for Text Classification Data Annotation Using LLM

Yiran Xie, Debin Xiao, Ping Wang, Shuming Liu


Abstract
Effectively balancing accuracy and cost is a critical challenge when using large language models (LLMs) for corpus annotation. This paper introduces a novel compression method based on keyword extraction (PCKE) that effectively reduces the number of prompt tokens in text classification annotation tasks, with minimal to no loss in accuracy. Our approach begins with an LLM that generates both category labels and relevant keywords from a small unannotated dataset. These outputs are used to train a BERT-based multi-task model capable of simultaneous classification and keyword extraction. For larger unannotated corpora, this model extracts keywords which are then used in place of full texts for LLM annotation. The significant reduction in prompt tokens result in substantial cost savings. Furthermore, the use of a few well-chosen keywords ensures that classification accuracy is maintained. Extensive experiments validate that our method not only achieves a superior compression rate but also maintains high accuracy, outperforming existing general-purpose compression techniques. Our approach offers a practical and cost-efficient solution for large-scale text classification annotation using LLMs, particularly applicable in industrial settings.
Anthology ID:
2025.coling-industry.44
Volume:
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
511–521
Language:
URL:
https://aclanthology.org/2025.coling-industry.44/
DOI:
Bibkey:
Cite (ACL):
Yiran Xie, Debin Xiao, Ping Wang, and Shuming Liu. 2025. A Simple yet Efficient Prompt Compression Method for Text Classification Data Annotation Using LLM. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 511–521, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A Simple yet Efficient Prompt Compression Method for Text Classification Data Annotation Using LLM (Xie et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-industry.44.pdf