Just Read the Codebook! Make Use of Quality Codebooks in Zero-Shot Classification of Multilabel Frame Datasets

Mattes Ruckdeschel


Abstract
The recent development of Large Language Models lowered the barrier to entry for using Natural Language Processing methods for various tasks in the related scientific field of Computational Social Science and has led to more scrutiny of their performance on complex datasets. While in many cases the costly fine-tuning of smaller Language Models outperforms LLMs, zero and few-shot approaches on consumer hardware have the potential to deepen interdisciplinary research efforts, whilst opening up NLP research to complex, niche datasets that are hard to classify. The great effort that is coding datasets comes with the benefit of concise instructions for how to code the data at hand. We investigate, whether highly specific, instructive codebooks created by social scientists in order to code text with a multitude of complex labels can improve zero-shot performance on (quantized) LLMs. Our findings show, that using the latest LLMs, zero-shot performance can improve by providing a codebook on two complex datasets with a total of four different topics and can outperform few-shot In-Context-Learning setups. The approach is equally or more token-efficient, and requires less hands-on engineering, making it particularly compelling for practical research.
Anthology ID:
2025.coling-main.422
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6317–6337
Language:
URL:
https://aclanthology.org/2025.coling-main.422/
DOI:
Bibkey:
Cite (ACL):
Mattes Ruckdeschel. 2025. Just Read the Codebook! Make Use of Quality Codebooks in Zero-Shot Classification of Multilabel Frame Datasets. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6317–6337, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Just Read the Codebook! Make Use of Quality Codebooks in Zero-Shot Classification of Multilabel Frame Datasets (Ruckdeschel, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.422.pdf