UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection

Michelle Wastl; Jannis Vamvas; Rico Sennrich

UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection

Michelle Wastl, Jannis Vamvas, Rico Sennrich

Abstract

This paper presents our system developed for the SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. The objective of this task is to identify spans of hallucinated text in the output of large language models across 14 high- and low- resource languages. To address this challenge, we propose two consistency-based approaches: (a) token-level consistency with a superior LLM and (b) token-level self-consistency with the underlying model of the sequence that is to be evaluated. Our results show effectiveness when compared to simple mark-all baselines, competitiveness to other submissions of the shared task and for some languages to GPT4o- mini prompt-based approaches.

Anthology ID:: 2025.semeval-1.38
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 257–270
Language:
URL:: https://aclanthology.org/2025.semeval-1.38/
DOI:
Bibkey:
Cite (ACL):: Michelle Wastl, Jannis Vamvas, and Rico Sennrich. 2025. UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 257–270, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection (Wastl et al., SemEval 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.semeval-1.38.pdf

PDF Cite Search Fix data