HalluRAG-RUG at SemEval-2025 Task 3: Using Retrieval-Augmented Generation for Hallucination Detection in Model Outputs

Silvana Abdi; Mahrokh Hassani; Rosalien Kinds; Timo Strijbis; Roman Terpstra

HalluRAG-RUG at SemEval-2025 Task 3: Using Retrieval-Augmented Generation for Hallucination Detection in Model Outputs

Silvana Abdi, Mahrokh Hassani, Rosalien Kinds, Timo Strijbis, Roman Terpstra

Abstract

Large Language Models (LLMs) suffer from a critical limitation: hallucinations, which refers to models generating fluent but factually incorrect text. This paper presents our approach to hallucination detection in English model outputs as part of the SemEval-2025 Task 3 (Mu-SHROOM). Our method, HalluRAG-RUG, integrates Retrieval-Augmented Generation (RAG) using Llama-3 and prediction models using token probabilities and semantic similarity. We retrieved relevant factual information using a named entity recognition (NER)-based Wikipedia search and applied abstractive summarization to refine the knowledge base. The hallucination detection pipeline then used this retrieved knowledge to identify inconsistent spans in model-generated text. This result was combined with the results of two systems which identified hallucinations based on token probabilities and low-similarity sentences. Our system placed 33rd out of 41, performing slightly below the ‘mark all’ baseline but surpassing the ‘mark none’ and ‘neural’ baselines with an IoU of 0.3093 and a correlation of 0.0833.

Anthology ID:: 2025.semeval-1.116
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 846–851
Language:
URL:: https://aclanthology.org/2025.semeval-1.116/
DOI:
Bibkey:
Cite (ACL):: Silvana Abdi, Mahrokh Hassani, Rosalien Kinds, Timo Strijbis, and Roman Terpstra. 2025. HalluRAG-RUG at SemEval-2025 Task 3: Using Retrieval-Augmented Generation for Hallucination Detection in Model Outputs. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 846–851, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: HalluRAG-RUG at SemEval-2025 Task 3: Using Retrieval-Augmented Generation for Hallucination Detection in Model Outputs (Abdi et al., SemEval 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.semeval-1.116.pdf

PDF Cite Search Fix data