On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Linyong Nan; Ellen Zhang; Weijin Zou; Yilun Zhao; Wenfei Zhou; Arman Cohan

doi:10.18653/v1/2024.findings-naacl.284

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Linyong Nan, Ellen Zhang, Weijin Zou, Yilun Zhao, Wenfei Zhou, Arman Cohan

Abstract

This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models (LLMs) interact with a SQL interpreter. The task necessitates LLMs to strategically generate multiple SQL queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. Our findings highlight that this task poses great challenges even for the state-of-the-art **GPT-4** model. We propose and evaluate two interaction strategies, and provide a fine-grained analysis of the individual stages within the interaction. A key discovery is the identification of two primary bottlenecks hindering effective interaction: the capacity for planning and the ability to generate multiple SQL queries. To address the challenge of accurately assessing answer quality, we introduce a multi-agent evaluation framework that simulates the academic peer-review process, enhancing the precision and reliability of our evaluations. This framework allows for a more nuanced understanding of the strengths and limitations of current LLMs in complex retrieval and reasoning tasks.

Anthology ID:: 2024.findings-naacl.284
Volume:: Findings of the Association for Computational Linguistics: NAACL 2024
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4556–4579
Language:
URL:: https://aclanthology.org/2024.findings-naacl.284
DOI:: 10.18653/v1/2024.findings-naacl.284
Bibkey:
Cite (ACL):: Linyong Nan, Ellen Zhang, Weijin Zou, Yilun Zhao, Wenfei Zhou, and Arman Cohan. 2024. On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4556–4579, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering (Nan et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-naacl.284.pdf

PDF Cite Search