TableVLM: Multi-modal Pre-training for Table Structure Recognition

Leiyuan Chen, Chengsong Huang, Xiaoqing Zheng, Jinshu Lin, Xuanjing Huang


Abstract
Tables are widely used in research and business, which are suitable for human consumption, but not easily machine-processable, particularly when tables are present in images. One of the main challenges to extracting data from images of tables is accurately recognizing table structures, especially for complex tables with cross rows and columns. In this study, we propose a novel multi-modal pre-training model for table structure recognition, named TableVLM.With a two-stream multi-modal transformer-based encoder-decoder architecture, TableVLM learns to capture rich table structure-related features by multiple carefully-designed unsupervised objectives inspired by the notion of masked visual-language modeling. To pre-train this model, we also created a dataset, called ComplexTable, which consists of 1,000K samples to be released publicly. Experiment results show that the model built on pre-trained TableVLM can improve the performance up to 1.97% in tree-editing-distance-score on ComplexTable.
Anthology ID:
2023.acl-long.137
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2437–2449
Language:
URL:
https://aclanthology.org/2023.acl-long.137
DOI:
10.18653/v1/2023.acl-long.137
Bibkey:
Cite (ACL):
Leiyuan Chen, Chengsong Huang, Xiaoqing Zheng, Jinshu Lin, and Xuanjing Huang. 2023. TableVLM: Multi-modal Pre-training for Table Structure Recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2437–2449, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
TableVLM: Multi-modal Pre-training for Table Structure Recognition (Chen et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.137.pdf
Video:
 https://aclanthology.org/2023.acl-long.137.mp4