CED: Comparing Embedding Differences for Detecting Out-of-Distribution and Hallucinated Text

Hakyung Lee, Keon-Hee Park, Hoyoon Byun, Jeyoon Yeom, Jihee Kim, Gyeong-Moon Park, Kyungwoo Song


Abstract
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety and robustness of models deployed in real-world scenarios. While most studies on OOD detection focus on fine-tuned models trained on in-distribution (ID) data, detecting OOD in pre-trained models is also important due to computational limitations and the widespread use of open-source pre-trained models. However, in the same domain shift setting, the OOD detection performance of pre-trained models is insufficient because both ID and OOD samples originate from the same domain, leading to a high overlap in their embeddings. To address this issue, we introduce a new method called CED, a training-free OOD detection technique designed to enhance the distinction between ID and OOD datasets. We theoretically validate that specific auxiliary and oracle samples that satisfy certain conditions improve this distinction. Motivated by our theoretical analysis, CED enhances the differentiation by utilizing these specially designed auxiliary and oracle samples. As a result, CED significantly improves the ability of pre-trained models to distinguish between ID and OOD samples in text classification and hallucination detection tasks. Furthermore, we verify that CED is a plug-and-play method compatible with various backbone networks, such as RoBERTa, Llama, and OpenAI Embedding.
Anthology ID:
2024.findings-emnlp.874
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14866–14882
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.874
DOI:
10.18653/v1/2024.findings-emnlp.874
Bibkey:
Cite (ACL):
Hakyung Lee, Keon-Hee Park, Hoyoon Byun, Jeyoon Yeom, Jihee Kim, Gyeong-Moon Park, and Kyungwoo Song. 2024. CED: Comparing Embedding Differences for Detecting Out-of-Distribution and Hallucinated Text. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 14866–14882, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
CED: Comparing Embedding Differences for Detecting Out-of-Distribution and Hallucinated Text (Lee et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.874.pdf