GQLBench: A Large-Scale Cross-Domain, Cross-Dialect Benchmark for NL2GQL

Yanning Su; Yuhang Zhou (周宇航); Yang Fang; Sen Liu; Guangnan Ye (叶广楠); Hongfeng Chai (柴洪峰)

GQLBench: A Large-Scale Cross-Domain, Cross-Dialect Benchmark for NL2GQL

Yanning Su, Yuhang Zhou, Yang Fang, Sen Liu, Guangnan Ye, Hongfeng Chai

Abstract

Despite growing interest in NL2GQL, benchmarking progress has been constrained by the lack of resources that are simultaneously large-scale, cross-domain, and cross-dialect. To address this gap, we present **GQLBench**, a new benchmark built through an automated and scalable framework that integrates NL2SQL-to-NL2GQL conversion with graph-native data generation. GQLBench supports execution-based evaluation on both Cypher and ISO-GQL, covering hundreds of graph databases and over 20k natural language questions for each dialect. By combining converted data from mature NL2SQL resources with synthetic graph-specific queries, it captures both schema diversity from real-world relational sources and graph-native reasoning challenges, including long paths and cycles. Beyond overall performance comparison, GQLBench also enables fine-grained evaluation across dialects, graph patterns, and query complexity. Experiments on advanced LLMs show that even strong proprietary models struggle on GQLBench, with gemini-3-flash achieving only 35.40% average execution accuracy across the two dialects. Our data and code are available at https://github.com/qxssadf/GQLBench.

Anthology ID:: 2026.acl-long.1476
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31989–32014
Language:
URL:: https://aclanthology.org/2026.acl-long.1476/
DOI:
Bibkey:
Cite (ACL):: Yanning Su, Yuhang Zhou, Yang Fang, Sen Liu, Guangnan Ye, and Hongfeng Chai. 2026. GQLBench: A Large-Scale Cross-Domain, Cross-Dialect Benchmark for NL2GQL. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31989–32014, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: GQLBench: A Large-Scale Cross-Domain, Cross-Dialect Benchmark for NL2GQL (Su et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1476.pdf
Checklist:: 2026.acl-long.1476.checklist.pdf

PDF Cite Search Checklist Fix data