TabaQA at SemEval-2025 Task 8: Column Augmented Generation for Question Answering over Tabular Data

Ekaterina Antropova; Egor Kratkov; Roman Derunets; Margarita Trofimova; Ivan Bondarenko; Alexander Panchenko; Vasily Konovalov; Maksim Savkin

TabaQA at SemEval-2025 Task 8: Column Augmented Generation for Question Answering over Tabular Data

Ekaterina Antropova, Egor Kratkov, Roman Derunets, Margarita Trofimova, Ivan Bondarenko, Alexander Panchenko, Vasily Konovalov, Maksim Savkin

Abstract

The DataBench shared task in the SemEval-2025 competition aims to tackle the problem of QA from data in tables. Given the diversity of the structure of tables, there are different approaches to retrieving the answer. Although Retrieval-Augmented Generation (RAG) is a viable solution, extracting relevant information from tables remains challenging. In addition, the table can be prohibitively large for direct integration into the LLM context. In this paper, we address QA over tabular data first by identifying relevant columns that might contain the answers, then the LLM generates answers by providing the context of the relevant columns, and finally, the LLM refines its answers. This approach secured us 7th place in the DataBench lite category.

Anthology ID:: 2025.semeval-1.126
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 937–952
Language:
URL:: https://aclanthology.org/2025.semeval-1.126/
DOI:
Bibkey:
Cite (ACL):: Ekaterina Antropova, Egor Kratkov, Roman Derunets, Margarita Trofimova, Ivan Bondarenko, Alexander Panchenko, Vasily Konovalov, and Maksim Savkin. 2025. TabaQA at SemEval-2025 Task 8: Column Augmented Generation for Question Answering over Tabular Data. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 937–952, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: TabaQA at SemEval-2025 Task 8: Column Augmented Generation for Question Answering over Tabular Data (Antropova et al., SemEval 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.semeval-1.126.pdf

PDF Cite Search Fix data