BlueToad at SemEval-2025 Task 3: Using Question-Answering-Based Language Models to Extract Hallucinations from Machine-Generated Text

Michiel Pronk; Ekaterina Kamyshanova; Thijmen Adam; Maxim Van Der Maesen De Sombreff

BlueToad at SemEval-2025 Task 3: Using Question-Answering-Based Language Models to Extract Hallucinations from Machine-Generated Text

Michiel Pronk, Ekaterina Kamyshanova, Thijmen Adam, Maxim Van Der Maesen De Sombreff

Abstract

Hallucination in machine-generated text poses big risks in various domains, such as finance, medicine, and engineering. Task 3 of SemEval-2025, Mu-SHROOM, challenges participants to detect hallucinated spans in such text. Our approach uses pre-trained language models and fine-tuning strategies to enhance hallucination spam detection, focusing on the English track. Firstly, we applied GPT-4o mini to generate synthetic data by labeling unlabeled data. Then, we employed encoder-only pre-trained language models with a question-answering architecture for hallucination span detection, ultimately choosing XLM-RoBERTa for fine-tuning on multilingual data. This model appeared to be our best and ranked 18th and 22nd on the English track with 0.469 intersection-over-union and 0.441 correlation scores, respectively. It achieved promising results across multiple languages, surpassing baseline methods in 11 out of 13 languages, with Hindi having the highest scores of 0.645 intersection-over-union and 0.684 correlation coefficient. Our findings highlight the potential of a QA approach and using synthetic and multilingual data for hallucination span detection.

Anthology ID:: 2025.semeval-1.95
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 684–694
Language:
URL:: https://aclanthology.org/2025.semeval-1.95/
DOI:
Bibkey:
Cite (ACL):: Michiel Pronk, Ekaterina Kamyshanova, Thijmen Adam, and Maxim Van Der Maesen De Sombreff. 2025. BlueToad at SemEval-2025 Task 3: Using Question-Answering-Based Language Models to Extract Hallucinations from Machine-Generated Text. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 684–694, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: BlueToad at SemEval-2025 Task 3: Using Question-Answering-Based Language Models to Extract Hallucinations from Machine-Generated Text (Pronk et al., SemEval 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.semeval-1.95.pdf

PDF Cite Search Fix data