Annotating Targets of Toxic Language at the Span Level

Baran Barbarestani, Isa Maks, Piek Vossen


Abstract
In this paper, we discuss an interpretable framework to integrate toxic language annotations. Most data sets address only one aspect of the complex relationship in toxic communication and are inconsistent with each other. Enriching annotations with more details and information is however of great importance in order to develop high-performing and comprehensive explainable language models. Such systems should recognize and interpret both expressions that are toxic as well as expressions that make reference to specific targets to combat toxic language. We therefore created a crowd-annotation task to mark the spans of words that refer to target communities as an extension of the HateXplain data set. We present a quantitative and qualitative analysis of the annotations. We also fine-tuned RoBERTa-base on our data and experimented with different data thresholds to measure their effect on the classification. The F1-score of our best model on the test set is 79%. The annotations are freely available and can be combined with the existing HateXplain annotation to build richer and more complete models.
Anthology ID:
2022.trac-1.6
Volume:
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
TRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–51
Language:
URL:
https://aclanthology.org/2022.trac-1.6
DOI:
Bibkey:
Cite (ACL):
Baran Barbarestani, Isa Maks, and Piek Vossen. 2022. Annotating Targets of Toxic Language at the Span Level. In Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022), pages 43–51, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Annotating Targets of Toxic Language at the Span Level (Barbarestani et al., TRAC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.trac-1.6.pdf
Code
 cltl/target-spans-detection
Data
Hate SpeechHateXplain