Improving Unsupervised Out-of-domain detection through Pseudo Labeling and Learning

Byounghan Lee, Jaesik Kim, Junekyu Park, Kyung-Ah Sohn


Abstract
Unsupervised out-of-domain (OOD) detection is a task aimed at discriminating whether given samples are from the in-domain or not, without the categorical labels of in-domain instances. Unlike supervised OOD, as there are no labels for training a classifier, previous works on unsupervised OOD detection adopted the one-class classification (OCC) approach, assuming that the training samples come from a single domain. However, in-domain instances in many real world applications can have a heterogeneous distribution (i.e., across multiple domains or multiple classes). In this case, OCC methods have difficulty in reflecting the categorical information of the domain properly. To tackle this issue, we propose a two-stage framework that leverages the latent categorical information to improve representation learning for textual OOD detection. In the first stage, we train a transformer-based sentence encoder for pseudo labeling by contrastive loss and cluster loss. The second stage is pseudo label learning in which the model is re-trained with pseudo-labels obtained in the first stage. The empirical results on the three datasets show that our two-stage framework significantly outperforms baseline models in more challenging scenarios.
Anthology ID:
2023.findings-eacl.76
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1031–1041
Language:
URL:
https://aclanthology.org/2023.findings-eacl.76
DOI:
10.18653/v1/2023.findings-eacl.76
Bibkey:
Cite (ACL):
Byounghan Lee, Jaesik Kim, Junekyu Park, and Kyung-Ah Sohn. 2023. Improving Unsupervised Out-of-domain detection through Pseudo Labeling and Learning. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1031–1041, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Improving Unsupervised Out-of-domain detection through Pseudo Labeling and Learning (Lee et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-eacl.76.pdf
Video:
 https://aclanthology.org/2023.findings-eacl.76.mp4