From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models

Viktor Hangya; Fabian Küch; Darina Gold

doi:10.18653/v1/2025.emnlp-main.1148

From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models

Abstract

Iterative evaluation of LLMs during training is essential to ensure expected capability development, but can be time- and compute-intensive. While NLU tasks, where the model selects from fixed answer choices, are cheap to evaluate, essential capabilities like reasoning and code generation rely on the more time-consuming NLG (token-by-token generation) format. In this work, our aim is to decrease the computational burden of NLG benchmarks in order to enable monitoring crucial LLM capabilities during model training. We reformulate generative tasks into computationally cheaper NLU alternatives. We test the performance correlation between the original and reformulated tasks using 8 LMs of various sizes and 4 capabilities: mathematical reasoning, code generation, factual knowledge and reading comprehension. Our results show a strong correlation between task formats, supporting capability assessment via cheaper alternatives and achieving over 35x average reduction in evaluation time. Our project is available at: https://github.com/Fraunhofer-IIS/EvalShortcut

Anthology ID:: 2025.emnlp-main.1148
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22565–22581
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1148/
DOI:: 10.18653/v1/2025.emnlp-main.1148
Bibkey:
Cite (ACL):: Viktor Hangya, Fabian Küch, and Darina Gold. 2025. From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22565–22581, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models (Hangya et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1148.pdf
Checklist:: 2025.emnlp-main.1148.checklist.pdf

PDF Cite Search Checklist Fix data