GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models

Zike Yuan; Ming Liu; Hui Wang; Bing Qin (秦兵)

GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models

Abstract

Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs’ graph comprehension and reasoning. GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs, subdividing capabilities into 10 distinct areas tested through 19 tasks. Our benchmark includes 11 datasets with 5,140 graphs of varying complexity. We evaluate four closed-source and eight open-source LLMs, conducting thorough analyses from both ability and task perspectives. Key findings reveal that OpenAI o1 model has amazing comprehension and reasoning capabilities, semantic enrichment enhances reasoning performance, node ordering impacts task success, and the ability to process longer texts does not necessarily improve graph comprehension or reasoning.

Anthology ID:: 2025.coling-main.531
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7925–7948
Language:
URL:: https://aclanthology.org/2025.coling-main.531/
DOI:
Bibkey:
Cite (ACL):: Zike Yuan, Ming Liu, Hui Wang, and Bing Qin. 2025. GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7925–7948, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models (Yuan et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.531.pdf

PDF Cite Search Fix data