OmniOData: Unleashing Small Language Models for OData Query Generation with Synthetic Data and Reinforcement Learning

Tao Bai; Zhaochen Li; Hongxin Shao; Daniel Dahlmeier

OmniOData: Unleashing Small Language Models for OData Query Generation with Synthetic Data and Reinforcement Learning

Tao Bai, Zhaochen Li, Hongxin Shao, Daniel Dahlmeier

Abstract

Despite the success of Large Language Models (LLMs) in structured query generation, OData—a critical RESTful protocol for enterprise APIs—remains under-researched due to a lack of high-fidelity, execution-validated datasets. To bridge this gap, we introduce OmniOData, a framework that generates SynOData, the first large-scale OData corpus featuring execution-grounded queries and reasoning traces. Using this corpus, we develop OmniOData-R1 (1.5B–3B parameters), a family of models that match or surpass frontier proprietary systems, such as GPT-4o and Gemini 3, on realistic industrial benchmarks. Our results demonstrate that the synergy of execution-verified synthetic data and Reinforcement Learning (RL) effectively unlocks the latent reasoning of Small Language Models (SLMs), providing a high-performance, low-latency solution for specialized enterprise query generation.The code and data will be released under an open-source license.

Anthology ID:: 2026.acl-industry.119
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Yunyao Li, Georg Rehm, Mei Tu
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1738–1754
Language:
URL:: https://aclanthology.org/2026.acl-industry.119/
DOI:
Bibkey:
Cite (ACL):: Tao Bai, Zhaochen Li, Hongxin Shao, and Daniel Dahlmeier. 2026. OmniOData: Unleashing Small Language Models for OData Query Generation with Synthetic Data and Reinforcement Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1738–1754, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: OmniOData: Unleashing Small Language Models for OData Query Generation with Synthetic Data and Reinforcement Learning (Bai et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-industry.119.pdf

PDF Cite Search Fix data