GRI-QA: a Comprehensive Benchmark for Table Question Answering over Environmental Data

Michele Luca Contalbo; Sara Pederzoli; Francesco Del Buono; Venturelli Valeria; Francesco Guerra; Matteo Paganelli

doi:10.18653/v1/2025.findings-acl.814

GRI-QA: a Comprehensive Benchmark for Table Question Answering over Environmental Data

Michele Luca Contalbo, Sara Pederzoli, Francesco Del Buono, Venturelli Valeria, Francesco Guerra, Matteo Paganelli

Abstract

Assessing corporate environmental sustainability with Table Question Answering systems is challenging due to complex tables, specialized terminology, and the variety of questions they must handle. In this paper, we introduce GRI-QA, a test benchmark designed to evaluate Table QA approaches in the environmental domain. Using GRI standards, we extract and annotate tables from non-financial corporate reports, generating question-answer pairs through a hybrid LLM-human approach. The benchmark includes eight datasets, categorized by the types of operations required, including operations on multiple tables from multiple documents. Our evaluation reveals a significant gap between human and model performance, particularly in multi-step reasoning, highlighting the relevance of the benchmark and the need for further research in domain-specific Table QA. Code and benchmark datasets are available at https://github.com/softlab-unimore/gri_qa.

Anthology ID:: 2025.findings-acl.814
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15764–15779
Language:
URL:: https://aclanthology.org/2025.findings-acl.814/
DOI:: 10.18653/v1/2025.findings-acl.814
Bibkey:
Cite (ACL):: Michele Luca Contalbo, Sara Pederzoli, Francesco Del Buono, Venturelli Valeria, Francesco Guerra, and Matteo Paganelli. 2025. GRI-QA: a Comprehensive Benchmark for Table Question Answering over Environmental Data. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15764–15779, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: GRI-QA: a Comprehensive Benchmark for Table Question Answering over Environmental Data (Contalbo et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.814.pdf

PDF Cite Search Fix data