Advancement in Graph Understanding: A Multimodal Benchmark and Fine-Tuning of Vision-Language Models

Qihang Ai, Jiafan Li, Jincheng Dai, Jianwu Zhou, Lemao Liu, Haiyun Jiang, Shuming Shi


Abstract
Graph data organizes complex relationships and interactions between objects, facilitating advanced analysis and decision-making across different fields. In this paper, we propose a new paradigm for interactive and instructional graph data understanding and reasoning.Instead of adopting complex graph neural models or heuristic graph-to-text instruction design, we leverage Vision-Language Models (VLMs) to encode the graph images with varying structures across different domains. This paper first evaluates the capabilities of public VLMs in graph learning from multiple aspects. Then it introduces a novel instruction-following dataset for multimodal graph understanding and reasoning in English and Chinese. Besides, by fine-tuning MiniGPT-4 and LLaVA on our dataset, we achieved an accuracy increase of 5%-15% compared to baseline models, with the best-performing model attaining scores comparable to Gemini in GPT-asissted Evaluation. This research not only showcases the potential of integrating VLMs with graph data but also opens new avenues for advancements in graph data understanding.
Anthology ID:
2024.acl-long.404
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7485–7501
Language:
URL:
https://aclanthology.org/2024.acl-long.404
DOI:
10.18653/v1/2024.acl-long.404
Bibkey:
Cite (ACL):
Qihang Ai, Jiafan Li, Jincheng Dai, Jianwu Zhou, Lemao Liu, Haiyun Jiang, and Shuming Shi. 2024. Advancement in Graph Understanding: A Multimodal Benchmark and Fine-Tuning of Vision-Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7485–7501, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Advancement in Graph Understanding: A Multimodal Benchmark and Fine-Tuning of Vision-Language Models (Ai et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.404.pdf