Predicting Median, Disagreement and Noise Label in Ordinal Word-in-Context Data

Tejaswi Choppa; Michael Roth; Dominik Schlechtweg

Predicting Median, Disagreement and Noise Label in Ordinal Word-in-Context Data

Tejaswi Choppa, Michael Roth, Dominik Schlechtweg

Abstract

TThe quality of annotated data is crucial for Machine Learning models, particularly in word sense annotation in context (Word-in-Context, WiC). WiC datasets often show significant annotator disagreement, and information is lost when creating gold labels through majority or median aggregation. Recent work has addressed this by incorporating disagreement data through new label aggregation methods. Modeling disagreement is important since real-world scenarios often lack clean data and require predictions on inherently difficult samples. Disagreement prediction can help detect complex cases or to reflect inherent data ambiguity. We aim to model different aspects of ordinal Word-in-Context annotations necessary to build a more human-like model: (i) the aggregated label, which has traditionally been the modeling aim, (ii) the disagreement between annotators, and (iii) the aggregated noise label which annotators can choose to exclude data points from annotation. We find that disagreement and noise are impacted by various properties of data like ambiguity, which in turn points to data uncertainty.

Anthology ID:: 2025.comedi-1.6
Volume:: Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Michael Roth, Dominik Schlechtweg
Venues:: CoMeDi | WS
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 65–77
Language:
URL:: https://aclanthology.org/2025.comedi-1.6/
DOI:
Bibkey:
Cite (ACL):: Tejaswi Choppa, Michael Roth, and Dominik Schlechtweg. 2025. Predicting Median, Disagreement and Noise Label in Ordinal Word-in-Context Data. In Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation, pages 65–77, Abu Dhabi, UAE. International Committee on Computational Linguistics.
Cite (Informal):: Predicting Median, Disagreement and Noise Label in Ordinal Word-in-Context Data (Choppa et al., CoMeDi 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.comedi-1.6.pdf

PDF Cite Search Fix data