Daiane Ucceli Kreitlow


2026

Text-to-SQL systems aim to translate natural language questions into Structured Query Language (SQL) queries, enabling database access without requiring SQL expertise. In real-world scenarios, these systems often need to manage multiple databases with heterogeneous schemas, making Schema Linking a crucial preliminary step for identifying relevant databases, tables, and columns. This study investigates Schema Linking for questions written in Brazilian Portuguese and compares two schema representation strategies: natural-language descriptions generated by Large Language Models (LLMs) and representations based on Data Definition Language (DDL) and Data Manipulation Language (DML) commands. Experiments conducted on a Brazilian Portuguese version of the Spider dataset, with over 200 databases, evaluated several LLMs and embedding models. The experimental results based on Hit@k show that natural language descriptions consistently outperform DDL/DML-based representations, demonstrating the effectiveness of LLM-generated schema descriptions for Schema Linking tasks.