EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference

Yuebin Xu; Zhiyi Chen; Zeyi Wen

doi:10.18653/v1/2025.emnlp-main.394

EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference

Abstract

Tuning inference hyperparameters, such as temperature and maximum output tokens, on downstream tasks can enhance inference performance. However, directly applying hyperparameter optimization to these hyperparameters is token-expensive. Multi-fidelity optimization improves HPO efficiency with low-fidelity evaluations, but its static scheduling strategies ignore token consumption, leading to high costs. To address these limitations, we propose a token-efficient multi-fidelity optimization method, which enhances inference performance and minimizes token usage. Our method is empowered by (i) a token-based fidelity definition with explicit token cost modeling on configurations; (ii) a novel Token-Aware Expected Improvement acquisition function that selects configurations based on performance gain per token; and (iii) a dynamic fidelity scheduling mechanism that adapts to real-time budget status. We evaluate our method on LLaMA-2 and LLaMA-3 series across MMLU, Humaneval, MedQA, and OpenBookQA. Our method improves over the HELM leaderboard by 7.1%, 24.3%, 21.9%, and 4.6%, respectively. Compared to existing multi-fidelity HPO baselines, our method reduces token consumption by over 80% while maintaining or surpassing performance, demonstrating the state-of-the-art token efficiency for inference-time optimization.

Anthology ID:: 2025.emnlp-main.394
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7735–7745
Language:
URL:: https://aclanthology.org/2025.emnlp-main.394/
DOI:: 10.18653/v1/2025.emnlp-main.394
Bibkey:
Cite (ACL):: Yuebin Xu, Zhiyi Chen, and Zeyi Wen. 2025. EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7735–7745, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference (Xu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.394.pdf
Checklist:: 2025.emnlp-main.394.checklist.pdf

PDF Cite Search Checklist Fix data