Silvia Paniagua
2025
VeritasQA: A Truthfulness Benchmark Aimed at Multilingual Transferability
Javier Aula-Blasco
|
Júlia Falcão
|
Susana Sotelo
|
Silvia Paniagua
|
Aitor Gonzalez-Agirre
|
Marta Villegas
Proceedings of the 31st International Conference on Computational Linguistics
As Large Language Models (LLMs) become available in a wider range of domains and applications, evaluating the truthfulness of multilingual LLMs is an issue of increasing relevance. TruthfulQA (Lin et al., 2022) is one of few benchmarks designed to evaluate how models imitate widespread falsehoods. However, it is strongly English-centric and starting to become outdated. We present VeritasQA, a context- and time-independent truthfulness benchmark built with multilingual transferability in mind, and available in Spanish, Catalan, Galician and English. VeritasQA comprises a set of 353 questions and answers inspired by common misconceptions and falsehoods that are not tied to any particular country or recent event. We release VeritasQA under an open license and present the evaluation results of 15 models of various architectures and sizes.