The Linguistic Ideologies of Deep Abusive Language Classification

Michael Castelle


Abstract
This paper brings together theories from sociolinguistics and linguistic anthropology to critically evaluate the so-called “language ideologies” — the set of beliefs and ways of speaking about language—in the practices of abusive language classification in modern machine learning-based NLP. This argument is made at both a conceptual and empirical level, as we review approaches to abusive language from different fields, and use two neural network methods to analyze three datasets developed for abusive language classification tasks (drawn from Wikipedia, Facebook, and StackOverflow). By evaluating and comparing these results, we argue for the importance of incorporating theories of pragmatics and metapragmatics into both the design of classification tasks as well as in ML architectures.
Anthology ID:
W18-5120
Original:
W18-5120v1
Version 2:
W18-5120v2
Volume:
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, Jacqueline Wernimont
Venue:
ALW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
160–170
Language:
URL:
https://aclanthology.org/W18-5120
DOI:
10.18653/v1/W18-5120
Bibkey:
Cite (ACL):
Michael Castelle. 2018. The Linguistic Ideologies of Deep Abusive Language Classification. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 160–170, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
The Linguistic Ideologies of Deep Abusive Language Classification (Castelle, ALW 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5120.pdf