Alina Landowska

2025

pdf bib abs
Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding
Maciej Skorski | Alina Landowska
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)

How do Large Language Models understand moral dimensions compared to humans?This first comprehensive large-scale Bayesian evaluation of leading language models provides the answer. In contrast to prior approaches based on deterministic ground truth (obtained via majority or inclusion consensus), we obtain the labels by modelling annotators’ disagreement to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity).We evaluated Claude Sonnet 4, DeepSeek-V3, and Llama 4 Maverick across 250K+ annotations from nearly 700 annotators in 100K+ texts spanning social networks, news, and discussion forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models generally rank among the top 25% of annotators in terms of balanced accuracy, substantially better than average humans.Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.

Co-authors

Maciej Skorski 1

Venues

uncertainlp1
ws1

Fix author