ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Ahmed Masry; Megh Thakkar; Aayush Bajaj; Aaryaman Kartha; Enamul Hoque; Shafiq Joty

ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Ahmed Masry, Megh Thakkar, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque, Shafiq Joty

Abstract

Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, and use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across 5 benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries. We release the code, model checkpoints, dataset, and demos at https://github.com/vis-nlp/ChartGemma.

Anthology ID:: 2025.coling-industry.54
Volume:: Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 625–643
Language:
URL:: https://aclanthology.org/2025.coling-industry.54/
DOI:
Bibkey:
Cite (ACL):: Ahmed Masry, Megh Thakkar, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque, and Shafiq Joty. 2025. ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 625–643, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild (Masry et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-industry.54.pdf

PDF Cite Search Fix data