When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models

Shufan Chen; He Zheng; Lei Cui

doi:10.18653/v1/2025.findings-naacl.200

When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models

Abstract

Although large language models rely on parametric knowledge to achieve exceptional performance across various question-answering tasks, they still face challenges when addressing knowledge-based long-tail questions. Augmented generation techniques, such as chain-of-thought prompting and retrieval augmentation, can effectively enhance the ability of these models to answer long-tail questions. However, improving accuracy through augmented generation often results in significant latency within question-answering systems. This paper addresses the issue of “when and how to augment the input” by proposing an adaptive question routing framework. This framework employs a query router to select the most appropriate augmentation path at the right time, thereby enhancing both the accuracy and efficiency of question-answering systems. Extensive comparative experiments on benchmarks such as AmbigNQ, HotpotQA, MMLU-STEM, and PopQA demonstrate that our method surpasses existing approaches in both accuracy and efficiency. Furthermore, this paper introduces two metrics for evaluating adaptive question augmentation methods and presents a new benchmark for adaptive question augmentation, aiming to advance the field.

Anthology ID:: 2025.findings-naacl.200
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3621–3634
Language:
URL:: https://aclanthology.org/2025.findings-naacl.200/
DOI:: 10.18653/v1/2025.findings-naacl.200
Bibkey:
Cite (ACL):: Shufan Chen, He Zheng, and Lei Cui. 2025. When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3621–3634, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models (Chen et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.200.pdf

PDF Cite Search Fix data