T2-RAGBench: Text-and-Table Aware Retrieval-Augmented Generation

Jan Strich; Enes Kutay Isgorur; Maximilian Trescher; Chris Biemann; Martin Semmann

T2-RAGBench: Text-and-Table Aware Retrieval-Augmented Generation

Jan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, Martin Semmann

Abstract

Since many real-world documents combine textual and tabular data, robust Retrieval Augmented Generation (RAG) systems are essential for effectively accessing and analyzing such content to support complex reasoning tasks. Therefore, this paper introduces T²-RAGBench, a benchmark comprising 23,088 question-context-answer triples, designed to evaluate RAG methods on real-world text-and-table data. Unlike typical QA datasets that operate under Oracle Context settings, T²-RAGBench challenges models to first retrieve the correct context before conducting numerical reasoning. Existing QA datasets containing text-and-table data typically contain context-dependent questions, which may yield multiple correct answers depending on the provided context. To address this, we transform SOTA datasets into a context-independent format, validated by experts as 91.3% context-independent questions, enabling reliable RAG evaluation. Our comprehensive evaluation identifies Hybrid BM25 , a technique that combines dense and sparse vectors, as the most effective approach for text-and-table data. However, results demonstrate that T²-RAGBench remains challenging even for SOTA LLMs and RAG methods. Further ablation studies examine the impact of embedding models and corpus size on retrieval performance. T²-RAGBench provides a realistic and rigorous benchmark for existing RAG methods on text-and-table data. Code and dataset are available online: https://github.com/uhh-hcds/g4kmu-paper

Anthology ID:: 2026.eacl-long.8
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 165–191
Language:
URL:: https://aclanthology.org/2026.eacl-long.8/
DOI:
Bibkey:
Cite (ACL):: Jan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, and Martin Semmann. 2026. T2-RAGBench: Text-and-Table Aware Retrieval-Augmented Generation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 165–191, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: T2-RAGBench: Text-and-Table Aware Retrieval-Augmented Generation (Strich et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.8.pdf
Checklist:: 2026.eacl-long.8.checklist.pdf

PDF Cite Search Checklist Fix data