Learning Subjective Label Distributions via Sociocultural Descriptors

Mohammed Fayiz Parappan; Ricardo Henao

doi:10.18653/v1/2025.emnlp-main.1026

Learning Subjective Label Distributions via Sociocultural Descriptors

Abstract

Subjectivity in NLP tasks, _e.g._, toxicity classification, has emerged as a critical challenge precipitated by the increased deployment of NLP systems in content-sensitive domains. Conventional approaches aggregate annotator judgements (labels), ignoring minority perspectives, and overlooking the influence of the sociocultural context behind such annotations. We propose a framework where subjectivity in binary labels is modeled as an empirical distribution accounting for the variation in annotators through human values extracted from sociocultural descriptors using a language model. The framework also allows for downstream tasks such as population and sociocultural group-level majority label prediction. Experiments on three toxicity datasets covering human-chatbot conversations and social media posts annotated with diverse annotator pools demonstrate that our approach yields well-calibrated toxicity distribution predictions across binary toxicity labels, which are further used for majority label prediction across cultural subgroups, improving over existing methods.

Anthology ID:: 2025.emnlp-main.1026
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20322–20338
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1026/
DOI:: 10.18653/v1/2025.emnlp-main.1026
Bibkey:
Cite (ACL):: Mohammed Fayiz Parappan and Ricardo Henao. 2025. Learning Subjective Label Distributions via Sociocultural Descriptors. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20322–20338, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Learning Subjective Label Distributions via Sociocultural Descriptors (Parappan & Henao, EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1026.pdf
Checklist:: 2025.emnlp-main.1026.checklist.pdf

PDF Cite Search Checklist Fix data