Enhancing Two Steps Textual Anomaly Detection through Anisotropy Mitigation

Pierre Fihey; Matthieu Labeau; Pavlo Mozharovskyi

Enhancing Two Steps Textual Anomaly Detection through Anisotropy Mitigation

Pierre Fihey, Matthieu Labeau, Pavlo Mozharovskyi

Abstract

Anomaly detection aims at distinguishing between in-distribution samples, which belong to the same distribution as the training set, and out-of-distribution samples, which lie outside of it. In textual anomaly detection, recent approaches routinely apply anomaly detection algorithms directly to embeddings extracted from pre-trained embedding models (two-stage approaches). However, the geometric properties of pre-trained embeddings can hinder the effectiveness of detection algorithms, which often rely on distance-based measures. In this work, we first highlight the relevance of similarity-trained models for textual anomaly detection. Beyond being trained to capture semantic similarities, these models also exhibit geometric properties that appear better suited to detection algorithms. We further demonstrate that, besides model choice, a simple post-processing step can significantly improve anomaly detection by adapting embeddings to the assumptions made by classical detection algorithms. The bulk of our experiments is done on a reformulation of the classification tasks from the MTEB benchmark into anomaly detection tasks.

Anthology ID:: 2026.acl-long.1312
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28442–28464
Language:
URL:: https://aclanthology.org/2026.acl-long.1312/
DOI:
Bibkey:
Cite (ACL):: Pierre Fihey, Matthieu Labeau, and Pavlo Mozharovskyi. 2026. Enhancing Two Steps Textual Anomaly Detection through Anisotropy Mitigation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28442–28464, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Enhancing Two Steps Textual Anomaly Detection through Anisotropy Mitigation (Fihey et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1312.pdf
Checklist:: 2026.acl-long.1312.checklist.pdf

PDF Cite Search Checklist Fix data