BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish

Burak Aktaş; Mehmet Can Baytekin; Süha Kağan Köse; Ömer İlbilgi; Elif Özge Yılmaz; Cagri Toraman; Bilge Kaan Görür

BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish

Burak Aktaş, Mehmet Can Baytekin, Süha Kağan Köse, Ömer İlbilgi, Elif Özge Yılmaz, Cagri Toraman, Bilge Kaan Görür

Abstract

Text-to-SQL systems have achieved strong performance on English benchmarks, yet their behavior in morphologically rich, low-resource languages remains largely unexplored. We introduce BIRDTurk, the first Turkish adaptation of the BIRD benchmark, constructed through a controlled translation pipeline that adapts schema identifiers to Turkish while strictly preserving the logical structure and execution semantics of SQL queries and databases. Translation quality is validated on a sample size determined by the Central Limit Theorem to ensure 95% confidence, achieving 98.15% accuracy on human-evaluated samples. Using BIRDTurk, we evaluate inference-based prompting, agentic multi-stage reasoning, and supervised fine-tuning. Our results reveal that Turkish introduces consistent performance degradation–driven by both structural linguistic divergence and underrepresentation in LLM pretraining–while agentic reasoning demonstrates stronger cross-lingual robustness. Supervised fine-tuning remains challenging for standard multilingual baselines but scales effectively with modern instruction-tuned models. BIRDTurk provides a controlled testbed for cross-lingual Text-to-SQL evaluation under realistic database conditions. We release the training and development splits to support future research.

Anthology ID:: 2026.sigturk-1.13
Volume:: Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Kemal Oflazer, Abdullatif Köksal, Onur Varol
Venues:: SIGTURK | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 155–171
Language:
URL:: https://aclanthology.org/2026.sigturk-1.13/
DOI:
Bibkey:
Cite (ACL):: Burak Aktaş, Mehmet Can Baytekin, Süha Kağan Köse, Ömer İlbilgi, Elif Özge Yılmaz, Cagri Toraman, and Bilge Kaan Görür. 2026. BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish. In Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026), pages 155–171, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish (Aktaş et al., SIGTURK 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.sigturk-1.13.pdf

PDF Cite Search Fix data