InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents

Zhenghao Zhu; Yuanfeng Song; Xing Chen; Chengzhong Liu; Cui Yakun; Caleb Chen Cao; Sirui Han; Yike Guo

InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents

Zhenghao Zhu, Yuanfeng Song, Xing Chen, Chengzhong Liu, Cui Yakun, Caleb Chen Cao, Sirui Han, Yike Guo

Abstract

Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent of large language models (LLMs) and multi-agent systems, more and more researchers are making use of these technologies for insight discovery. However, there are few benchmarks for evaluating insight discovery capabilities. As one of the most comprehensive existing frameworks, InsightBench also suffers from many critical flaws: format inconsistencies, poorly conceived objectives, and redundant insights. These issues may significantly affect the quality of data and the evaluation of agents. To address these issues, we thoroughly investigate shortcomings in InsightBench and propose essential criteria for a high-quality insight benchmark. Regarding this, we develop a data-curation pipeline to construct a new dataset named InsightEval. We further introduce a novel metric to measure the exploratory performance of agents. Through extensive experiments on InsightEval, we highlight prevailing challenges in automated insight discovery and raise some key findings to guide future research in this promising direction.

Anthology ID:: 2026.findings-acl.1729
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34632–34656
Language:
URL:: https://aclanthology.org/2026.findings-acl.1729/
DOI:
Bibkey:
Cite (ACL):: Zhenghao Zhu, Yuanfeng Song, Xing Chen, Chengzhong Liu, Cui Yakun, Caleb Chen Cao, Sirui Han, and Yike Guo. 2026. InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34632–34656, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents (Zhu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1729.pdf
Checklist:: 2026.findings-acl.1729.checklist.pdf

PDF Cite Search Checklist Fix data