LLMs as Span Annotators: A Comparative Study of LLMs and Humans

Zdeněk Kasner; Vilém Zouhar; Patrícia Schmidtová; Ivan Kartáč; Kristýna Onderková; Ondřej Plátek; Dimitra Gkatzia; Saad Mahamood; Ondřej Dušek; Simone Balloccu

LLMs as Span Annotators: A Comparative Study of LLMs and Humans

Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová, Ivan Kartáč, Kristýna Onderková, Ondrej Platek, Dimitra Gkatzia, Saad Mahamood, Ondrej Dusek, Simone Balloccu

Abstract

Span annotation - annotating specific text features at the span level - can be used to evaluate texts where single-score metrics fail to provide actionable feedback. Until recently, span annotation was done by human annotators or fine-tuned models. In this paper, we study whether large language models (LLMs) can serve as an alternative to human annotators. We compare the abilities of LLMs to skilled human annotators on three span annotation tasks: evaluating data-to-text generation, identifying translation errors, and detecting propaganda techniques. We show that overall, LLMs have only moderate inter-annotator agreement (IAA) with human annotators. However, we demonstrate that LLMs make errors at a similar rate as skilled crowdworkers. LLMs also produce annotations at a fraction of the cost per output annotation. We release the dataset of over 40k model and human span annotations for further research.

Anthology ID:: 2026.mme-main.1
Volume:: Proceedings of the First Workshop on Multilingual Multicultural Evaluation
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Pinzhen Chen, Vilém Zouhar, Hanxu Hu, Simran Khanuja, Wenhao Zhu, Barry Haddow, Alexandra Birch, Alham Fikri Aji, Rico Sennrich, Sara Hooker
Venues:: MME | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–22
Language:
URL:: https://aclanthology.org/2026.mme-main.1/
DOI:
Bibkey:
Cite (ACL):: Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová, Ivan Kartáč, Kristýna Onderková, Ondrej Platek, Dimitra Gkatzia, Saad Mahamood, Ondrej Dusek, and Simone Balloccu. 2026. LLMs as Span Annotators: A Comparative Study of LLMs and Humans. In Proceedings of the First Workshop on Multilingual Multicultural Evaluation, pages 1–22, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: LLMs as Span Annotators: A Comparative Study of LLMs and Humans (Kasner et al., MME 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.mme-main.1.pdf

PDF Cite Search Fix data