SkyLLM: Cross-LLM-APIs Federation for Cost-effective Query Processing

Heng Zhao; Yifei Zhu

doi:10.18653/v1/2025.findings-acl.1073

SkyLLM: Cross-LLM-APIs Federation for Cost-effective Query Processing

Abstract

Large language models (LLMs) have demonstrated exceptional capabilities across a wide range of tasks, from text generation to complex problem-solving. LLM APIs provide easy access to these models by streamlining deployment and usage. Combining LLMs with complementary strengths has been shown to yield substantial performance gains over a monolithic LLM. However, invoking a fixed set of LLM APIs for each query incurs higher API costs and increased inference latency. To address these limitations, we propose SkyLLM, a system composed of a set of estimators and an API selector, which federates multiple LLM APIs and dynamically assigns a non-empty subset of these APIs to each query prior to inference under cost and latency budgets. The selected subset consists of either a single LLM or multiple LLMs. A single LLM efficiently handles simple queries at low cost, whereas multiple LLMs are employed for more complex queries to overcome performance limitations. We evaluate SkyLLM against individual LLMs and representative ensemble LLM methods from the literature. SkyLLM achieves the highest accuracy under a high budget. It can also be cost-effective, matching the most accurate individual LLM while cutting costs by 67.8%.

Anthology ID:: 2025.findings-acl.1073
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20864–20873
Language:
URL:: https://aclanthology.org/2025.findings-acl.1073/
DOI:: 10.18653/v1/2025.findings-acl.1073
Bibkey:
Cite (ACL):: Heng Zhao and Yifei Zhu. 2025. SkyLLM: Cross-LLM-APIs Federation for Cost-effective Query Processing. In Findings of the Association for Computational Linguistics: ACL 2025, pages 20864–20873, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: SkyLLM: Cross-LLM-APIs Federation for Cost-effective Query Processing (Zhao & Zhu, Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1073.pdf

PDF Cite Search Fix data