AD-NLP: A Benchmark for Anomaly Detection in Natural Language Processing

Matei Bejan, Andrei Manolache, Marius Popescu


Abstract
Deep learning models have reignited the interest in Anomaly Detection research in recent years. Methods for Anomaly Detection in text have shown strong empirical results on ad-hoc anomaly setups that are usually made by downsampling some classes of a labeled dataset. This can lead to reproducibility issues and models that are biased toward detecting particular anomalies while failing to recognize them in more sophisticated scenarios. In the present work, we provide a unified benchmark for detecting various types of anomalies, focusing on problems that can be naturally formulated as Anomaly Detection in text, ranging from syntax to stylistics. In this way, we are hoping to facilitate research in Text Anomaly Detection. We also evaluate and analyze two strong shallow baselines, as well as two of the current state-of-the-art neural approaches, providing insights into the knowledge the neural models are learning when performing the anomaly detection task. We provide the code for evaluation, downloading, and preprocessing the dataset at https://github.com/mateibejan1/ad-nlp/.
Anthology ID:
2023.emnlp-main.664
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10766–10778
Language:
URL:
https://aclanthology.org/2023.emnlp-main.664
DOI:
10.18653/v1/2023.emnlp-main.664
Bibkey:
Cite (ACL):
Matei Bejan, Andrei Manolache, and Marius Popescu. 2023. AD-NLP: A Benchmark for Anomaly Detection in Natural Language Processing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10766–10778, Singapore. Association for Computational Linguistics.
Cite (Informal):
AD-NLP: A Benchmark for Anomaly Detection in Natural Language Processing (Bejan et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.664.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.664.mp4