Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language

Xinmeng Hou


Abstract
This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations compared to original datasets based on descriptive instructions. Our experiments show that LLMs can serve as effective alternatives when professional annotators are unavailable. Moreover, smaller models fine-tuned on multi-source LLM-annotated data outperform models trained on larger, single-source human-annotated datasets. These findings highlight the value of structured guidelines in reducing subjective variability, maintaining performance with limited data, and embracing language diversity. Content Warning: This article only analyzes offensive language for academic purposes. Discretion is advised.
Anthology ID:
2024.nlp4dh-1.36
Volume:
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:
November
Year:
2024
Address:
Miami, USA
Editors:
Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
362–376
Language:
URL:
https://aclanthology.org/2024.nlp4dh-1.36
DOI:
Bibkey:
Cite (ACL):
Xinmeng Hou. 2024. Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 362–376, Miami, USA. Association for Computational Linguistics.
Cite (Informal):
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language (Hou, NLP4DH 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4dh-1.36.pdf