Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language

Michael Wiegand, Jana Kampfmeier, Elisabeth Eder, Josef Ruppenhofer


Abstract
We address the task of identifying euphemistic abuse (e.g. “You inspire me to fall asleep”) paraphrasing simple explicitly abusive utterances (e.g. “You are boring”). For this task, we introduce a novel dataset that has been created via crowdsourcing. Special attention has been paid to the generation of appropriate negative (non-abusive) data. We report on classification experiments showing that classifiers trained on previous datasets are less capable of detecting such abuse. Best automatic results are obtained by a classifier that augments training data from our new dataset with automatically-generated GPT-3 completions. We also present a classifier that combines a few manually extracted features that exemplify the major linguistic phenomena constituting euphemistic abuse.
Anthology ID:
2023.emnlp-main.1012
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16280–16297
Language:
URL:
https://aclanthology.org/2023.emnlp-main.1012
DOI:
10.18653/v1/2023.emnlp-main.1012
Bibkey:
Cite (ACL):
Michael Wiegand, Jana Kampfmeier, Elisabeth Eder, and Josef Ruppenhofer. 2023. Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16280–16297, Singapore. Association for Computational Linguistics.
Cite (Informal):
Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language (Wiegand et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.1012.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.1012.mp4