Sources of Disagreement in Data for LLM Instruction Tuning

Russel Dsouza; Venelin Kovatchev

Sources of Disagreement in Data for LLM Instruction Tuning

Abstract

In this paper we study the patterns of label disagreement in data used for instruction tuning Large Language models (LLMs). Specifically, we focus on data used for Reinforcement Learning from Human Feedback (RLHF). Our objective is to determine what is the primary source of disagreement: the individual data points, the choice of annotators, or the task formulation. We annotate the same dataset multiple times under different conditions and compare the overall agreement and the patterns of disagreement. For task formulation, we compare “single” format where annotators rate LLM responses individually with “preference” format where annotators select one of two possible responses. For annotators, we compare data from human labelers with automatic data labeling using LLMs. Our results indicate that: (1) there are very few “universally ambiguous” instances. The label disagreement depends largely on the task formulation and the choice of annotators; (2) the overall agreement remains consistent across experiments. We find no evidence that “preference” data is of higher quality than “single” data; and (3) the change of task formulation and annotators impacts the resulting instance-level labels. The labels obtained in different experiments are correlated, but not identical.

Anthology ID:: 2025.comedi-1.3
Volume:: Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Michael Roth, Dominik Schlechtweg
Venues:: CoMeDi | WS
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 20–32
Language:
URL:: https://aclanthology.org/2025.comedi-1.3/
DOI:
Bibkey:
Cite (ACL):: Russel Dsouza and Venelin Kovatchev. 2025. Sources of Disagreement in Data for LLM Instruction Tuning. In Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation, pages 20–32, Abu Dhabi, UAE. International Committee on Computational Linguistics.
Cite (Informal):: Sources of Disagreement in Data for LLM Instruction Tuning (Dsouza & Kovatchev, CoMeDi 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.comedi-1.3.pdf

PDF Cite Search Fix data