Kelly Bradley
2025
Bias and Reliability in AI Safety Assessment: Multi-Facet Rasch Analysis of Human Moderators
Chunling Niu
|
Kelly Bradley
|
Biao Ma
|
Brian Waltman
|
Loren Cossette
|
Rui Jin
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Using Multi-Facet Rasch Modeling on 36,400 safety ratings of AI-generated conversations, we reveal significant racial disparities (Asian 39.1%, White 28.7% detection rates) and content-specific bias patterns. Simulations show that diverse teams of 8-10 members achieve 70%+ reliability versus 62% for smaller homogeneous teams, providing evidence-based guidelines for AI-generated content moderation.