Ali Al Lawati

2025

pdf bib abs
Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text
Ali Al Lawati | Jason Lucas | Prasenjit Mitra
Proceedings of the 31st International Conference on Computational Linguistics

Large Language Models (LLMs) have demonstrated remarkable performance in various NLP tasks, including semantic parsing, which translates natural language into formal code representations. However, the reverse operation, translating code into natural language, termed semantic captioning, has received less attention. This task is increasingly important as LLMs are integrated into platforms for code generation, security analysis, and educational purposes. In this paper, we focus on the captioning of SQL query (SQ2Text) to address the critical need for understanding and explaining SQL queries in an era where LLM-generated code poses potential security risks. We repurpose semantic parsing datasets for semantic captioning, specifically SQL2Text. To overcome the limited robustness of Text2SQL datasets for the reverse task, we introduce an iterative ICL prompt leveraging GPT-4o to generate multiple additional utterances. We conduct experiments across multiple in-context learning (ICL) methods, emphasizing smaller, more computationally efficient LLMs. Our findings demonstrate that leveraging the inherent graph properties of SQL for few-shot ICL sample selection significantly outperforms random selection by up to 39% on BLEU score and provides better results than alternative approaches. Dataset and codes are accessible.

Co-authors

Jason Lucas 1
Prasenjit Mitra 1

Venues

coling1

Fix data