DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications

Joachim Daiber; Victor Maricato; Ayan Sinha; Andrew Rabinovich

DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications

Joachim Daiber, Victor Maricato, Ayan Sinha, Andrew Rabinovich

Abstract

We introduce DispatchQA, a benchmark to evaluate how well small language models (SLMs) translate open‐ended search queries into executable API calls via explicit function calling. Our benchmark focuses on the latency-sensitive e-commerce setting and measures SLMs’ impact on both search relevance and search latency. We provide strong, replicable baselines based on Llama 3.1 8B Instruct fine-tuned on synthetically generated data and find that fine-tuned SLMs produce search quality comparable or better than large language models such as GPT-4o while achieving up to 3× faster inference. All data, code, and training checkpoints are publicly released to spur further research on resource‐efficient query understanding.

Anthology ID:: 2025.emnlp-industry.154
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2221–2233
Language:
URL:: https://aclanthology.org/2025.emnlp-industry.154/
DOI:
Bibkey:
Cite (ACL):: Joachim Daiber, Victor Maricato, Ayan Sinha, and Andrew Rabinovich. 2025. DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2221–2233, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications (Daiber et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-industry.154.pdf

PDF Cite Search Fix data