Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Takuma Udagawa, Yang Zhao, Hiroshi Kanayama, Bishwaranjan Bhattacharjee


Abstract
Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.
Anthology ID:
2025.findings-emnlp.2
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16–25
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.2/
DOI:
Bibkey:
Cite (ACL):
Takuma Udagawa, Yang Zhao, Hiroshi Kanayama, and Bishwaranjan Bhattacharjee. 2025. Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 16–25, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification (Udagawa et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.2.pdf
Checklist:
 2025.findings-emnlp.2.checklist.pdf