Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs

Mingyang Zhou; Yi Fung; Long Chen; Christopher Thomas; Heng Ji; Shih-Fu Chang

doi:10.18653/v1/2023.findings-acl.85

Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs

Mingyang Zhou, Yi Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang

Abstract

Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language (V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP) to facilitate the model with different skills to interpret the table information. We have conducted extensive experiments on chart question answering and chart summarization to verify the effectiveness of the proposed pre-training strategies. In particular, on the ChartQA benchmark, our ChartT5 outperforms the state-of-the-art non-pretraining methods by over 8% performance gains.

Anthology ID:: 2023.findings-acl.85
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1314–1326
Language:
URL:: https://aclanthology.org/2023.findings-acl.85
DOI:: 10.18653/v1/2023.findings-acl.85
Bibkey:
Cite (ACL):: Mingyang Zhou, Yi Fung, Long Chen, Christopher Thomas, Heng Ji, and Shih-Fu Chang. 2023. Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1314–1326, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs (Zhou et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.85.pdf

PDF Cite Search