HAT: Hallucination Annotation for Translation

Rajen Chatterjee; Xintong Li; Paisarn Charoenpornsawat; Allen Lee

HAT: Hallucination Annotation for Translation

Rajen Chatterjee, Xintong Li, Paisarn Charoenpornsawat, Allen Lee

Abstract

Hallucinations in machine translation (MT)—outputs that may be fluent yet unfaithful to the source content—remain a critical obstacle. They hinder the reliable deployment of MT systems in real-world applications. Despite growing attention to this phenomenon, progress has been constrained by the lack of large-scale, high-quality benchmarks dedicated to hallucination detection. We introduce HAT (Hallucination Annotation for Translation), a novel dataset designed to advance research on this problem. HAT comprises 350,959 span-level annotated samples across 38 language pairs, with approximately 8,000–10,000 samples per pair partitioned into training, development, and test sets. Annotations were produced by professional translators under rigorous quality control protocols to ensure reliability. We provide a detailed analysis of hallucination distributions and establish benchmark performance using a diverse set of baselines, including automatic MT evaluation metrics as well as large language models. By providing the first large-scale, systematically annotated resource for hallucination detection in MT, HAT enables the development of more faithful translation models and lays the groundwork for future research on building trustworthy machine translation systems.

Anthology ID:: 2026.acl-long.721
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15865–15888
Language:
URL:: https://aclanthology.org/2026.acl-long.721/
DOI:
Bibkey:
Cite (ACL):: Rajen Chatterjee, Xintong Li, Paisarn Charoenpornsawat, and Allen Lee. 2026. HAT: Hallucination Annotation for Translation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15865–15888, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: HAT: Hallucination Annotation for Translation (Chatterjee et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.721.pdf
Checklist:: 2026.acl-long.721.checklist.pdf

PDF Cite Search Checklist Fix data