A Lightweight Yet Robust Approach to Textual Anomaly Detection

Leslie Barrett, Robert Kingan, Alexandra Ortan, Madhavan Seshadri


Abstract
Highly imbalanced textual datasets continue to pose a challenge for supervised learning models. However, viewing such imbalanced text data as an anomaly detection (AD) problem has advantages for certain tasks such as detecting hate speech, or inappropriate and/or offensive language in large social media feeds. There the unwanted content tends to be both rare and non-uniform with respect to its thematic character, and better fits the definition of an anomaly than a class. Several recent approaches to textual AD use transformer models, achieving good results but with trade-offs in pre-training and inflexibility with respect to new domains. In this paper we compare two linear models within the NMF family, which also have a recent history in textual AD. We introduce a new approach based on an alternative regularization of the NMF objective. Our results surpass other linear AD models and are on par with deep models, performing comparably well even in very small outlier concentrations.
Anthology ID:
2022.trac-1.8
Volume:
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Ritesh Kumar, Atul Kr. Ojha, Marcos Zampieri, Shervin Malmasi, Daniel Kadar
Venue:
TRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–67
Language:
URL:
https://aclanthology.org/2022.trac-1.8
DOI:
Bibkey:
Cite (ACL):
Leslie Barrett, Robert Kingan, Alexandra Ortan, and Madhavan Seshadri. 2022. A Lightweight Yet Robust Approach to Textual Anomaly Detection. In Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022), pages 62–67, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
A Lightweight Yet Robust Approach to Textual Anomaly Detection (Barrett et al., TRAC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.trac-1.8.pdf
Data
Hate SpeechReuters-21578