Annotation alignment: Comparing LLM and human annotations of conversational safety

Rajiv Movva, Pang Wei Koh, Emma Pierson


Abstract
Do LLMs align with human perceptions of safety? We study this question via *annotation alignment*, the extent to which LLMs and humans agree when annotating the safety of user-chatbot conversations. We leverage the recent DICES dataset (Aroyo et al. 2023), in which 350 conversations are each rated for safety by 112 annotators spanning 10 race-gender groups. GPT-4 achieves a Pearson correlation of r=0.59 with the average annotator rating, higher than the median annotator’s correlation with the average (r=0.51). We show that larger datasets are needed to resolve whether GPT-4 exhibits disparities in how well it correlates with different demographic groups. Also, there is substantial idiosyncratic variation in correlation within groups, suggesting that race & gender do not fully capture differences in alignment. Finally, we find that GPT-4 cannot predict when one demographic group finds a conversation more unsafe than another.
Anthology ID:
2024.emnlp-main.511
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9048–9062
Language:
URL:
https://aclanthology.org/2024.emnlp-main.511/
DOI:
10.18653/v1/2024.emnlp-main.511
Bibkey:
Cite (ACL):
Rajiv Movva, Pang Wei Koh, and Emma Pierson. 2024. Annotation alignment: Comparing LLM and human annotations of conversational safety. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9048–9062, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Annotation alignment: Comparing LLM and human annotations of conversational safety (Movva et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.511.pdf
Data:
 2024.emnlp-main.511.data.zip