VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Bocheng Zou, Mu Cai, Jianrui Zhang, Yong Jae Lee


Abstract
In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the world using geometry primitives such as polygons. Vector graphics (VG), on the other hand, offer a textual representation of visual content, which can be more concise and powerful for content like cartoons, sketches and scientific figures. Recent studies have shown promising results on processing vector graphics with capable Large Language Models (LLMs). However, such works focus solely on qualitative results, understanding, or a specific type of vector graphics. We propose VGBench, a comprehensive benchmark for LLMs on handling vector graphics through diverse aspects, including (a) both visual understanding and generation, (b) evaluation of various vector graphics formats, (c) diverse question types, (d) wide range of prompting techniques, (e) under multiple LLMs and (f) comparison with VLMs on rasterized representations. Evaluating on our collected 4279 understanding and 5845 generation samples, we find that LLMs show strong capability on both aspects while exhibiting less desirable performance on low-level formats (SVG). Both data and evaluation pipeline will be open-sourced.
Anthology ID:
2024.emnlp-main.213
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3647–3659
Language:
URL:
https://aclanthology.org/2024.emnlp-main.213
DOI:
10.18653/v1/2024.emnlp-main.213
Bibkey:
Cite (ACL):
Bocheng Zou, Mu Cai, Jianrui Zhang, and Yong Jae Lee. 2024. VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 3647–3659, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation (Zou et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.213.pdf
Software:
 2024.emnlp-main.213.software.zip
Data:
 2024.emnlp-main.213.data.zip