Automating Annotation Guideline Improvements using LLMs: A Case Study

Adrien Bibal, Nathaniel Gerlek, Goran Muric, Elizabeth Boschee, Steven C. Fincke, Mike Ross, Steven N. Minton


Abstract
Annotating texts can be a tedious task, especially when texts are noisy. At the root of the issue, guidelines are not always optimized enough to be able to perform the required annotation task. In difficult cases, complex workflows are designed to be able to reach the best possible guidelines. However, crowdsource workers are commonly recruited to go through these complex workflows, limiting the number of iterations over the workflows, and therefore, the possible results because of the slow speed and the high cost of workers. In this paper, our case study, based on the entity recognition problem, suggests that LLMs can help produce guidelines of high quality (inter-annotator agreement going from 0.593 to 0.84 when improving WNUT-17’s guidelines), while being faster and cheaper than crowdsource workers.
Anthology ID:
2025.comedi-1.13
Volume:
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Michael Roth, Dominik Schlechtweg
Venues:
CoMeDi | WS
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
129–144
Language:
URL:
https://aclanthology.org/2025.comedi-1.13/
DOI:
Bibkey:
Cite (ACL):
Adrien Bibal, Nathaniel Gerlek, Goran Muric, Elizabeth Boschee, Steven C. Fincke, Mike Ross, and Steven N. Minton. 2025. Automating Annotation Guideline Improvements using LLMs: A Case Study. In Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation, pages 129–144, Abu Dhabi, UAE. International Committee on Computational Linguistics.
Cite (Informal):
Automating Annotation Guideline Improvements using LLMs: A Case Study (Bibal et al., CoMeDi 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.comedi-1.13.pdf