HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Michiel Van Der Meer; Pavel Korshunov; Sébastien Marcel; Lonneke van der Plas

doi:10.18653/v1/2025.acl-long.1510

HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Michiel Van Der Meer, Pavel Korshunov, Sébastien Marcel, Lonneke Van Der Plas

Abstract

Misinformation can be countered with fact-checking, but the process is costly and slow. Identifying checkworthy claims is the first step, where automation can help scale fact-checkers’ efforts. However, detection methods struggle with content that is (1) multimodal, (2) from diverse domains, and (3) synthetic. We introduce HintsOfTruth, a public dataset for multimodal checkworthiness detection with 27K real-world and synthetic image/claim pairs. The mix of real and synthetic data makes this dataset unique and ideal for benchmarking detection methods. We compare fine-tuned and prompted Large Language Models (LLMs). We find that well-configured lightweight text-based encoders perform comparably to multimodal models but the former only focus on identifying non-claim-like content. Multimodal LLMs can be more accurate but come at a significant computational cost, making them impractical for large-scale applications. When faced with synthetic data, multimodal models perform more robustly.

Anthology ID:: 2025.acl-long.1510
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31274–31291
Language:
URL:: https://aclanthology.org/2025.acl-long.1510/
DOI:: 10.18653/v1/2025.acl-long.1510
Bibkey:
Cite (ACL):: Michiel Van Der Meer, Pavel Korshunov, Sébastien Marcel, and Lonneke Van Der Plas. 2025. HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31274–31291, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims (Van Der Meer et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1510.pdf

PDF Cite Search Fix data