SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models

Shicheng Liu, Jialiang Xu, Wesley Tjangnaka, Sina Semnani, Chen Yu, Monica Lam


Abstract
While most conversational agents are grounded on either free-text or structured knowledge, many knowledge corpora consist of hybrid sources.This paper presents the first conversational agent that supports the full generality of hybrid data access for large knowledge corpora, through a language we developed called SUQL (Structured and Unstructured Query Language). Specifically, SUQL extends SQL with free-text primitives (\smallSUMMARY and \smallANSWER), so information retrieval can be composed with structured data accesses arbitrarily in a formal, succinct, precise, and interpretable notation. With SUQL, we propose the first semantic parser, an LLM with in-context learning, that can handle hybrid data sources.Our in-context learning-based approach, when applied to the HybridQA dataset, comes within 8.9% Exact Match and 7.1% F1 of the SOTA, which was trained on 62K data samples. More significantly, unlike previous approaches, our technique is applicable to large databases and free-text corpora. We introduce a dataset consisting of crowdsourced questions and conversations on Yelp, a large, real restaurant knowledge base with structured and unstructured data. We show that our few-shot conversational agent based on SUQL finds an entity satisfying all user requirements 90.3% of the time, compared to 63.4% for a baseline based on linearization.
Anthology ID:
2024.findings-naacl.283
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4535–4555
Language:
URL:
https://aclanthology.org/2024.findings-naacl.283
DOI:
Bibkey:
Cite (ACL):
Shicheng Liu, Jialiang Xu, Wesley Tjangnaka, Sina Semnani, Chen Yu, and Monica Lam. 2024. SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4535–4555, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models (Liu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.283.pdf
Copyright:
 2024.findings-naacl.283.copyright.pdf