DATE: Detecting Anomalies in Text via Self-Supervision of Transformers

Andrei Manolache, Florin Brad, Elena Burceanu


Abstract
Leveraging deep learning models for Anomaly Detection (AD) has seen widespread use in recent years due to superior performances over traditional methods. Recent deep methods for anomalies in images learn better features of normality in an end-to-end self-supervised setting. These methods train a model to discriminate between different transformations applied to visual data and then use the output to compute an anomaly score. We use this approach for AD in text, by introducing a novel pretext task on text sequences. We learn our DATE model end-to-end, enforcing two independent and complementary self-supervision signals, one at the token-level and one at the sequence-level. Under this new task formulation, we show strong quantitative and qualitative results on the 20Newsgroups and AG News datasets. In the semi-supervised setting, we outperform state-of-the-art results by +13.5% and +6.9%, respectively (AUROC). In the unsupervised configuration, DATE surpasses all other methods even when 10% of its training data is contaminated with outliers (compared with 0% for the others).
Anthology ID:
2021.naacl-main.25
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
267–277
Language:
URL:
https://aclanthology.org/2021.naacl-main.25
DOI:
10.18653/v1/2021.naacl-main.25
Bibkey:
Cite (ACL):
Andrei Manolache, Florin Brad, and Elena Burceanu. 2021. DATE: Detecting Anomalies in Text via Self-Supervision of Transformers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 267–277, Online. Association for Computational Linguistics.
Cite (Informal):
DATE: Detecting Anomalies in Text via Self-Supervision of Transformers (Manolache et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.25.pdf
Video:
 https://aclanthology.org/2021.naacl-main.25.mp4
Code
 bit-ml/date
Data
AG News