Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions

Emmy Liu; Amanda Bertsch; Lintang Sutawika*; Lindia Tjuatja; Patrick Fernandes; Lara Marinov; Michael Chen; Shreya Singhal; Carolin Lawrence; Aditi Raghunathan; Kiril Gashteovski; Graham Neubig

Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions

Emmy Liu, Amanda Bertsch, Lintang Sutawika, Lindia Tjuatja, Patrick Fernandes, Lara Marinov, Michael Chen, Shreya Singhal, Carolin Lawrence, Aditi Raghunathan, Kiril Gashteovski, Graham Neubig

Abstract

Improvements in language model capabilities are often attributed to increasing model size or training data, but in some cases smaller models trained on curated data or with different architectural decisions can outperform larger ones trained on more tokens. What accounts for this? To quantify the impact of these design choices, we meta-analyze 92 open-source pretrained models across a wide array of scales, including state-of-the-art open-weights models as well as less performant models and those with less conventional design decisions. We find that by incorporating features besides model size and number of training tokens, we can achieve a relative 3-28% increase in ability to predict downstream performance compared with using scale alone. Analysis of model design decisions reveal insights into data composition, such as the trade-off between language and code tasks at 15-25% code, as well as the negative impact of web data on truthfulness. Broadly, our framework lays a foundation for more systematic investigation of how model development choices shape final capabilities.

Anthology ID:: 2025.emnlp-main.830
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16407–16438
Language:
URL:: https://aclanthology.org/2025.emnlp-main.830/
DOI:
Bibkey:
Cite (ACL):: Emmy Liu, Amanda Bertsch, Lintang Sutawika, Lindia Tjuatja, Patrick Fernandes, Lara Marinov, Michael Chen, Shreya Singhal, Carolin Lawrence, Aditi Raghunathan, Kiril Gashteovski, and Graham Neubig. 2025. Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 16407–16438, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions (Liu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.830.pdf
Checklist:: 2025.emnlp-main.830.checklist.pdf

PDF Cite Search Checklist Fix data