MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, Jing Ma


Abstract
Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to solve these problems. The results highlight the lack of powerful vision-code models, and we hope MMCode can serve as an inspiration for future works in this domain. The data and code are publicly available.
Anthology ID:
2024.findings-emnlp.42
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
736–783
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.42
DOI:
Bibkey:
Cite (ACL):
Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, and Jing Ma. 2024. MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 736–783, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems (Li et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.42.pdf
Software:
 2024.findings-emnlp.42.software.zip
Data:
 2024.findings-emnlp.42.data.zip