SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems

Emily Dinan, Gavin Abercrombie, A. Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser


Abstract
The social impact of natural language processing and its applications has received increasing attention. In this position paper, we focus on the problem of safety for end-to-end conversational AI. We survey the problem landscape therein, introducing a taxonomy of three observed phenomena: the Instigator, Yea-Sayer, and Impostor effects. We then empirically assess the extent to which current tools can measure these effects and current systems display them. We release these tools as part of a “first aid kit” (SafetyKit) to quickly assess apparent safety concerns. Our results show that, while current tools are able to provide an estimate of the relative safety of systems in various settings, they still have several shortcomings. We suggest several future directions and discuss ethical considerations.
Anthology ID:
2022.acl-long.284
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4113–4133
Language:
URL:
https://aclanthology.org/2022.acl-long.284
DOI:
10.18653/v1/2022.acl-long.284
Bibkey:
Cite (ACL):
Emily Dinan, Gavin Abercrombie, A. Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, and Verena Rieser. 2022. SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4113–4133, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems (Dinan et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.284.pdf
Video:
 https://aclanthology.org/2022.acl-long.284.mp4
Data
Blended Skill TalkHONEST