Using LLMs and Preference Optimization for Agreement-Aware HateWiC Classification

Sebastian Loftus; Adrian Mülthaler; Sanne Hoeken; Sina Zarrieß; Özge Alacam

Using LLMs and Preference Optimization for Agreement-Aware HateWiC Classification

Sebastian Loftus, Adrian Mülthaler, Sanne Hoeken, Sina Zarrieß, Ozge Alacam

Abstract

Annotator disagreement poses a significant challenge in subjective tasks like hate speech detection. In this paper, we introduce a novel variant of the HateWiC task that explicitly models annotator agreement by estimating the proportion of annotators who classify the meaning of a term as hateful. To tackle this challenge, we explore the use of Llama 3 models fine-tuned through Direct Preference Optimization (DPO). Our experiments show that while LLMs perform well for majority-based hate classification, they struggle with the more complex agreement-aware task. DPO fine-tuning offers improvements, particularly when applied to instruction-tuned models. Yet, our results emphasize the need for improved modeling of subjectivity in hate classification and this study can serve as foundation for future advancements.

Anthology ID:: 2025.woah-1.47
Volume:: Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:: WOAH | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 538–547
Language:
URL:: https://aclanthology.org/2025.woah-1.47/
DOI:
Bibkey:
Cite (ACL):: Sebastian Loftus, Adrian Mülthaler, Sanne Hoeken, Sina Zarrieß, and Ozge Alacam. 2025. Using LLMs and Preference Optimization for Agreement-Aware HateWiC Classification. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 538–547, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Using LLMs and Preference Optimization for Agreement-Aware HateWiC Classification (Loftus et al., WOAH 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.woah-1.47.pdf

PDF Cite Search Fix data