From Parameters to Performance: A Data-Driven Study on LLM Structure and Development

Suqing Wang; Zuchao Li; Shi Luohe; Bo Du; Hai Zhao; Yun Li; Qianren Wang

doi:10.18653/v1/2025.emnlp-main.1325

From Parameters to Performance: A Data-Driven Study on LLM Structure and Development

Suqing Wang, Zuchao Li, Shi Luohe, Bo Du, Hai Zhao, Yun Li, Qianren Wang

Abstract

Large language models (LLMs) have achieved remarkable success across various domains, driving significant technological advancements and innovations. Despite the rapid growth in model scale and capability, systematic, data-driven research on how structural configurations affect performance remains scarce. To address this gap, we present a large-scale dataset encompassing diverse open-source LLM structures and their performance across multiple benchmarks. Leveraging this dataset, we conduct a systematic, data mining-driven analysis to validate and quantify the relationship between structural configurations and performance. Our study begins with a review of the historical development of LLMs and an exploration of potential future trends. We then analyze how various structural choices impact performance across benchmarks and further corroborate our findings using mechanistic interpretability techniques. By providing data-driven insights into LLM optimization, our work aims to guide the targeted development and application of future models.

Anthology ID:: 2025.emnlp-main.1325
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26084–26101
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1325/
DOI:: 10.18653/v1/2025.emnlp-main.1325
Bibkey:
Cite (ACL):: Suqing Wang, Zuchao Li, Shi Luohe, Bo Du, Hai Zhao, Yun Li, and Qianren Wang. 2025. From Parameters to Performance: A Data-Driven Study on LLM Structure and Development. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26084–26101, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: From Parameters to Performance: A Data-Driven Study on LLM Structure and Development (Wang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1325.pdf
Checklist:: 2025.emnlp-main.1325.checklist.pdf

PDF Cite Search Checklist Fix data