Mokhtar Billami
2024
IRIT-Berger-Levrault at SemEval-2024: How Sensitive Sentence Embeddings are to Hallucinations?
Nihed Bendahman
|
Karen Pinel-sauvagnat
|
Gilles Hubert
|
Mokhtar Billami
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This article presents our participation to Task 6 of SemEval-2024, named SHROOM (a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes), which aims at detecting hallucinations. We propose two types of approaches for the task: the first one is based on sentence embeddings and cosine similarity metric, and the second one uses LLMs (Large Language Model). We found that LLMs fail to improve the performance achieved by embedding generation models. The latter outperform the baseline provided by the organizers, and our best system achieves 78% accuracy.