Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning

Xinyu Zhang; Aibo Song; Jingyi Qiu; Jiahui Jin; Tianbo Zhang; Xiaolin Fang

doi:10.18653/v1/2025.acl-long.1298

Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning

Xinyu Zhang, Aibo Song, Jingyi Qiu, Jiahui Jin, Tianbo Zhang, Xiaolin Fang

Abstract

Relation Extraction (RE) is a key task in table understanding, aiming to extract semantic relations between columns. However, complex tables with hierarchical headers are hard to obtain high-quality textual formats (e.g., Markdown) for input under practical scenarios like webpage screenshots and scanned documents, while table images are more accessible and intuitive. Besides, existing works overlook the need of mining relations among multiple columns rather than just the semantic relation between two specific columns in real-world practice. In this work, we explore utilizing Multimodal Large Language Models (MLLMs) to address RE in tables with complex structures. We creatively extend the concept of RE to include calculational relations, enabling multi-task learning of both semantic and calculational RE for mutual reinforcement. Specifically, we reconstruct table images into graph structure based on neighboring nodes to extract graph-level visual features. Such feature enhancement alleviates the insensitivity of MLLMs to the positional information within table images. We then propose a Chain-of-Thought distillation framework with self-correction mechanism to enhance MLLMs’ reasoning capabilities without increasing parameter scale. Our method significantly outperforms most baselines on wide datasets. Additionally, we release a benchmark dataset for calculational RE in complex tables.

Anthology ID:: 2025.acl-long.1298
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26770–26781
Language:
URL:: https://aclanthology.org/2025.acl-long.1298/
DOI:: 10.18653/v1/2025.acl-long.1298
Bibkey:
Cite (ACL):: Xinyu Zhang, Aibo Song, Jingyi Qiu, Jiahui Jin, Tianbo Zhang, and Xiaolin Fang. 2025. Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 26770–26781, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning (Zhang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1298.pdf

PDF Cite Search Fix data