Bias and Reliability in AI Safety Assessment: Multi-Facet Rasch Analysis of Human Moderators

Chunling Niu, Kelly Bradley, Biao Ma, Brian Waltman, Loren Cossette, Rui Jin


Abstract
Using Multi-Facet Rasch Modeling on 36,400 safety ratings of AI-generated conversations, we reveal significant racial disparities (Asian 39.1%, White 28.7% detection rates) and content-specific bias patterns. Simulations show that diverse teams of 8-10 members achieve 70%+ reliability versus 62% for smaller homogeneous teams, providing evidence-based guidelines for AI-generated content moderation.
Anthology ID:
2025.aimecon-main.42
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
393–397
Language:
URL:
https://aclanthology.org/2025.aimecon-main.42/
DOI:
Bibkey:
Cite (ACL):
Chunling Niu, Kelly Bradley, Biao Ma, Brian Waltman, Loren Cossette, and Rui Jin. 2025. Bias and Reliability in AI Safety Assessment: Multi-Facet Rasch Analysis of Human Moderators. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 393–397, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
Bias and Reliability in AI Safety Assessment: Multi-Facet Rasch Analysis of Human Moderators (Niu et al., AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-main.42.pdf