ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Xuanle Zhao; Xianzhen Luo; Qi Shi; Chi Chen; Shuo Wang; Zhiyuan Liu; Maosong Sun

doi:10.18653/v1/2025.acl-long.363

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Zhiyuan Liu, Maosong Sun

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding tasks. However, interpreting charts with textual descriptions often leads to information loss, as it fails to fully capture the dense information embedded in charts. In contrast, parsing charts into code provides lossless representations that can effectively contain all critical details. Although existing open-source MLLMs have achieved success in chart understanding tasks, they still face two major challenges when applied to chart-to-code tasks: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data. To address these challenges, we propose ChartCoder, the first dedicated chart-to-code MLLM, which leverages Code LLMs as the language backbone to enhance the executability of the generated code. Furthermore, we introduce Chart2Code-160k, the first large-scale and diverse dataset for chart-to-code generation, and propose the Snippet-of-Thought (SoT) method, which transforms direct chart-to-code generation data into step-by-step generation. Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks, achieving superior chart restoration and code excitability. Our code is available at https://github.com/thunlp/ChartCoder.

Anthology ID:: 2025.acl-long.363
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7333–7348
Language:
URL:: https://aclanthology.org/2025.acl-long.363/
DOI:: 10.18653/v1/2025.acl-long.363
Bibkey:
Cite (ACL):: Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Zhiyuan Liu, and Maosong Sun. 2025. ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7333–7348, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation (Zhao et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.363.pdf

PDF Cite Search Fix data