Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding

Maciej Skorski, Alina Landowska


Abstract
How do Large Language Models understand moral dimensions compared to humans?This first comprehensive large-scale Bayesian evaluation of leading language models provides the answer. In contrast to prior approaches based on deterministic ground truth (obtained via majority or inclusion consensus), we obtain the labels by modelling annotators’ disagreement to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity).We evaluated Claude Sonnet 4, DeepSeek-V3, and Llama 4 Maverick across 250K+ annotations from nearly 700 annotators in 100K+ texts spanning social networks, news, and discussion forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models generally rank among the top 25% of annotators in terms of balanced accuracy, substantially better than average humans.Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.
Anthology ID:
2025.uncertainlp-main.3
Volume:
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editor:
Noidea Noidea
Venues:
UncertaiNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–26
Language:
URL:
https://aclanthology.org/2025.uncertainlp-main.3/
DOI:
Bibkey:
Cite (ACL):
Maciej Skorski and Alina Landowska. 2025. Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding. In Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025), pages 17–26, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding (Skorski & Landowska, UncertaiNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.uncertainlp-main.3.pdf