An LLM-Based Approach for Insight Generation in Data Analysis

Alberto Sánchez Pérez; Alaa Boukhary; Paolo Papotti; Luis Castejón Lozano; Adam Elwood

doi:10.18653/v1/2025.naacl-long.24

An LLM-Based Approach for Insight Generation in Data Analysis

Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, Adam Elwood

Abstract

Generating insightful and actionable information from databases is critical in data analysis. This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights. Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables. Our framework includes a Hypothesis Generator to formulate domain-relevant questions, a Query Agent to answer such questions by generating SQL queries against a database, and a Summarization module to verbalize the insights. The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics. Experimental results on public and enterprise databases demonstrate that our approach generates more insightful insights than other approaches while maintaining correctness.

Anthology ID:: 2025.naacl-long.24
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 562–582
Language:
URL:: https://aclanthology.org/2025.naacl-long.24/
DOI:: 10.18653/v1/2025.naacl-long.24
Bibkey:
Cite (ACL):: Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, and Adam Elwood. 2025. An LLM-Based Approach for Insight Generation in Data Analysis. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 562–582, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: An LLM-Based Approach for Insight Generation in Data Analysis (Pérez et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.24.pdf

PDF Cite Search Fix data