Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks

Negar Mokhberian, Myrl Marmarelis, Frederic Hopp, Valerio Basile, Fred Morstatter, Kristina Lerman


Abstract
Supervised classification heavily depends on datasets annotated by humans. However, in subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters. Annotations have commonly been aggregated by employing methods like majority voting to determine a single ground truth label. In subjective tasks, aggregating labels will result in biased labeling and, consequently, biased models that can overlook minority opinions. Previous studies have shed light on the pitfalls of label aggregation and have introduced a handful of practical approaches to tackle this issue. Recently proposed multi-annotator models, which predict labels individually per annotator, are vulnerable to under-determination for annotators with few samples. This problem is exacerbated in crowdsourced datasets. In this work, we propose Annotator Aware Representations for Texts (AART) for subjective classification tasks. Our approach involves learning representations of annotators, allowing for exploration of annotation behaviors. We show the improvement of our method on metrics that assess the performance on capturing individual annotators’ perspectives. Additionally, we demonstrate fairness metrics to evaluate our model’s equability of performance for marginalized annotators compared to others.
Anthology ID:
2024.naacl-long.407
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7330–7342
Language:
URL:
https://aclanthology.org/2024.naacl-long.407
DOI:
Bibkey:
Cite (ACL):
Negar Mokhberian, Myrl Marmarelis, Frederic Hopp, Valerio Basile, Fred Morstatter, and Kristina Lerman. 2024. Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7330–7342, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks (Mokhberian et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.407.pdf
Copyright:
 2024.naacl-long.407.copyright.pdf