DeepPavlov at SemEval-2024 Task 6: Detection of Hallucinations and Overgeneration Mistakes with an Ensemble of Transformer-based Models

Ivan Maksimov, Vasily Konovalov, Andrei Glinskii


Abstract
The inclination of large language models (LLMs) to produce mistaken assertions, known as hallucinations, can be problematic. These hallucinations could potentially be harmful since sporadic factual inaccuracies within the generated text might be concealed by the overall coherence of the content, making it immensely challenging for users to identify them. The goal of the SHROOM shared-task is to detect grammatically sound outputs that contain incorrect or unsupported semantic information. Although there are a lot of existing hallucination detectors in generated AI content, we found out that pretrained Natural Language Inference (NLI) models yet exhibit success in detecting hallucinations. Moreover their ensemble outperforms more complicated models.
Anthology ID:
2024.semeval-1.42
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
274–278
Language:
URL:
https://aclanthology.org/2024.semeval-1.42
DOI:
Bibkey:
Cite (ACL):
Ivan Maksimov, Vasily Konovalov, and Andrei Glinskii. 2024. DeepPavlov at SemEval-2024 Task 6: Detection of Hallucinations and Overgeneration Mistakes with an Ensemble of Transformer-based Models. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 274–278, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
DeepPavlov at SemEval-2024 Task 6: Detection of Hallucinations and Overgeneration Mistakes with an Ensemble of Transformer-based Models (Maksimov et al., SemEval 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.semeval-1.42.pdf
Supplementary material:
 2024.semeval-1.42.SupplementaryMaterial.zip
Supplementary material:
 2024.semeval-1.42.SupplementaryMaterial.txt