Efficient Table Retrieval and Understanding with Multimodal Large Language Models

Zhuoyan Xu; Haoyang Fang; Boran Han; Bonan Min; Bernie Wang; Cuixiong Hu; Shuai Zhang

Efficient Table Retrieval and Understanding with Multimodal Large Language Models

Zhuoyan Xu, Haoyang Fang, Boran Han, Bonan Min, Bernie Wang, Cuixiong Hu, Shuai Zhang

Abstract

Tabular data is frequently captured in image form across a wide range of real-world scenarios such as financial reports, handwritten records, and document scans. These visual representations pose unique challenges for machine understanding, as they combine both structural and visual complexities. While recent advances in Multimodal Large Language Models (MLLMs) show promising results in table understanding, they typically assume the relevant table is readily available. However, a more practical scenario involves identifying and reasoning over relevant tables from large-scale collections to answer user queries. To address this gap, we propose , a framework that enables MLLMs to answer queries over large collections of table images. Our approach first retrieves candidate tables using jointly trained visual-text foundation models, then leverages MLLMs to perform fine-grained reranking of these candidates, and finally employs MLLMs to reason over the selected tables for answer generation. Through extensive experiments on a newly constructed dataset comprising 88,161 training and 9,819 testing samples across 8 benchmarks with 48,504 unique tables, we demonstrate that our framework significantly outperforms existing methods by 7.0% in retrieval recall and 6.1% in answer accuracy, offering a practical solution for real-world table understanding tasks.

Anthology ID:: 2026.findings-eacl.226
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4327–4340
Language:
URL:: https://aclanthology.org/2026.findings-eacl.226/
DOI:
Bibkey:
Cite (ACL):: Zhuoyan Xu, Haoyang Fang, Boran Han, Bonan Min, Bernie Wang, Cuixiong Hu, and Shuai Zhang. 2026. Efficient Table Retrieval and Understanding with Multimodal Large Language Models. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4327–4340, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Efficient Table Retrieval and Understanding with Multimodal Large Language Models (Xu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.226.pdf
Checklist:: 2026.findings-eacl.226.checklist.pdf

PDF Cite Search Checklist Fix data