Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?

Evangelia Gogoulou; Shorouq Zahra; Liane Guillou; Luise Dürlich; Joakim Nivre

Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?

Evangelia Gogoulou, Shorouq Zahra, Liane Guillou, Luise Dürlich, Joakim Nivre

Abstract

A frequently observed problem with LLMs is their tendency to generate output that is nonsensical, illogical, or factually incorrect, often referred to broadly as “hallucination”. Building on the recently proposed HalluciGen task for hallucination detection and generation, we evaluate a suite of open-access LLMs on their ability to detect intrinsic hallucinations in two conditional generation tasks: translation and paraphrasing. We study how model performance varies across tasks and languages and we investigate the impact of model size, instruction-tuning, and prompt choice. We find that performance varies across models but is consistent across prompts. Finally, we find that NLI models perform comparably well, suggesting that LLM-based detectors are not the only viable option for this specific task.

Anthology ID:: 2025.gem-1.13
Volume:: Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:: July
Year:: 2025
Address:: Vienna, Austria and virtual meeting
Editors:: Ofir Arviv, Miruna Clinciu, Kaustubh Dhole, Rotem Dror, Sebastian Gehrmann, Eliya Habba, Itay Itzhak, Simon Mille, Yotam Perlitz, Enrico Santus, João Sedoc, Michal Shmueli Scheuer, Gabriel Stanovsky, Oyvind Tafjord
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 161–177
Language:
URL:: https://aclanthology.org/2025.gem-1.13/
DOI:
Bibkey:
Cite (ACL):: Evangelia Gogoulou, Shorouq Zahra, Liane Guillou, Luise Dürlich, and Joakim Nivre. 2025. Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 161–177, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation? (Gogoulou et al., GEM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.gem-1.13.pdf

PDF Cite Search Fix data