Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Agostina Calabrese; Leonardo Neves; Neil Shah; Maarten Bos; Björn Ross; Mirella Lapata; Francesco Barbieri

doi:10.18653/v1/2024.acl-short.38

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten Bos, Björn Ross, Mirella Lapata, Francesco Barbieri

Abstract

Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators’ decision making time by 7.4%.

Anthology ID:: 2024.acl-short.38
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 398–408
Language:
URL:: https://aclanthology.org/2024.acl-short.38/
DOI:: 10.18653/v1/2024.acl-short.38
Bibkey:
Cite (ACL):: Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten Bos, Björn Ross, Mirella Lapata, and Francesco Barbieri. 2024. Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 398–408, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster (Calabrese et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-short.38.pdf

PDF Cite Search Fix data