Lucas Holanda

2026

Causal_QA.PT: A Human–LLM Co-Curated Benchmark for Causal Question Answering in Portuguese Language
Lia Furtado | Cíntia Araripe | Jocelani Castilhos | Lucas Holanda | Vladia Pinheiro
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

We present Causal_QA.PT, a human–LLM co-curated benchmark for causal question answering in Portuguese, addressing the lack of high-quality evaluation resources for causal reasoning in non-English languages. The dataset is developed through a hybrid human–LLM process with targeted generation, validation, and evaluation procedures, and is organized according to the PEARL causal typology. Using this resource, we evaluate the ability of Large Language Models to answer causal questions in Portuguese and examine the role of explicitly providing causal class information in prompt design. Our findings show that current LLMs are capable of producing high-quality causal responses in Portuguese, with GPT-5 Mini in particular demonstrating strong performance in judgment-based evaluation. Explicit causal class information yields model- and question-dependent benefits, particularly for interventional and counterfactual questions. Finally, we observe that human reference answers are not always superior, underscoring the importance of careful benchmark curation and robust evaluation for underrepresented languages.

Co-authors

Venues

PROPOR1

Fix author