Yichen Xu
2024
TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging
Liang Zhang
|
Anwen Hu
|
Haiyang Xu
|
Ming Yan
|
Yichen Xu
|
Qin Jin
|
Ji Zhang
|
Fei Huang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in chart understanding. However, the sheer size of these models limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through Program-of-Thoughts (PoT) learning, which trains the model to generate Python programs for numerical calculations, and (2) reduce lengthy vision feature sequences through Vision Token Merging, which gradually merges most similar vision tokens. Extensive experiments demonstrate that our 3B TinyChart achieves SOTA performance on various chart understanding benchmarks including ChartQA, Chart-to-Text, Chart-to-Table, OpenCQA, and ChartX. It outperforms several chart-understanding MLLMs with up to 13B parameters, and close-sourced MLLM GPT-4V on ChartQA, with higher throughput during inference due to a smaller model scale and more efficient vision encoding.
Search
Co-authors
- Liang Zhang 1
- Anwen Hu 1
- Haiyang Xu 1
- Ming Yan 1
- Qin Jin 1
- show all...