The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas

Giovanni Marraffini, Andrés Cotton, Noe Hsueh, Axel Fridman, Juan Wisznia, Luciano Corro


Abstract
The question of how to make decisions that maximise the well-being of all persons is very relevant to design language models that are beneficial to humanity and free from harm. We introduce the Greatest Good Benchmark to evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis across 15 diverse LLMs reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards. Most LLMs have a marked preference for impartial beneficence and rejection of instrumental harm. These findings showcase the ‘artificial moral compass’ of LLMs, offering insights into their moral alignment.
Anthology ID:
2024.emnlp-main.1224
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21950–21959
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1224
DOI:
Bibkey:
Cite (ACL):
Giovanni Marraffini, Andrés Cotton, Noe Hsueh, Axel Fridman, Juan Wisznia, and Luciano Corro. 2024. The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21950–21959, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas (Marraffini et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1224.pdf
Data:
 2024.emnlp-main.1224.data.zip