ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

Ahmed Masry; Mehrad Shahmohammadi; Md. Rizwan Parvez; Enamul Hoque; Shafiq Joty

doi:10.18653/v1/2024.findings-acl.619

ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty

Abstract

Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and summarization. A common strategy to solve these tasks is to fine-tune various models originally trained on vision tasks language. However, such task-specific models are not capable of solving a wide range of chart-related tasks, constraining their real-world applicability. To overcome these challenges, we introduce ChartInsruct: a novel chart-specific vision-language Instruction-following dataset comprising 191K instructions generated with 71K charts. We then present two distinct systems for instruction tuning on such datasets: (1) an end-to-end model that connects a vision encoder for chart understanding with a LLM; and (2) a pipeline model that employs a two-step approach to extract chart data tables and input them into the LLM. In experiments on four downstream tasks, we first show the effectiveness of our model–achieving a new set of state-of-the-art results. Further evaluation shows that our instruction-tuning approach supports a wide array of real-world chart comprehension and reasoning scenarios, thereby expanding the scope and applicability of our models to new kinds of tasks.

Anthology ID:: 2024.findings-acl.619
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10387–10409
Language:
URL:: https://aclanthology.org/2024.findings-acl.619/
DOI:: 10.18653/v1/2024.findings-acl.619
Bibkey:
Cite (ACL):: Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, and Shafiq Joty. 2024. ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10387–10409, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning (Masry et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.619.pdf

PDF Cite Search Fix data