Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

Junlin Wang; Siddhartha Jain; Dejiao Zhang; Baishakhi Ray; Varun Kumar; Ben Athiwaratkun

doi:10.18653/v1/2024.emnlp-main.1112

Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun

Abstract

A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces a framework that incorporates the compute budget into the evaluation, providing a more informative comparison that takes into account both performance metrics and computational cost. In this budget-aware perspective, we find that complex reasoning strategies often don’t surpass simpler baselines purely due to algorithmic ingenuity, but rather due to the larger computational resources allocated. When we provide a simple baseline like chain-of-thought self-consistency with comparable compute resources, it frequently outperforms reasoning strategies proposed in the literature. In this scale-aware perspective, we find that unlike self-consistency, certain strategies such as multi-agent debate or Reflexion can become worse if more compute budget is utilized.

Anthology ID:: 2024.emnlp-main.1112
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19916–19939
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1112/
DOI:: 10.18653/v1/2024.emnlp-main.1112
Bibkey:
Cite (ACL):: Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, and Ben Athiwaratkun. 2024. Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 19916–19939, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies (Wang et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.1112.pdf
Software:: 2024.emnlp-main.1112.software.zip

PDF Cite Search Software Fix data