Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research

Richard Diehl Martinez; David Demitri Africa; Yuval Weiss; Suchir Salhan; Ryan Daniels; Paula Buttery

doi:10.18653/v1/2025.emnlp-demos.22

Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research

Richard Diehl Martinez, David Demitri Africa, Yuval Weiss, Suchir Salhan, Ryan Daniels, Paula Buttery

Abstract

Building language models (LMs), especially small and medium ones, remains more art than science. While large LMs often improve by sheer scale, it is still unclear why many design choices work. For small LMs, this uncertainty is more limiting: tight parameter budgets make each decision critical, yet researchers still lack systematic, scientific ways to test and refine new ideas. We introduce Pico, a lightweight, modular framework that enables systematic, hypothesis-driven research for small and medium-scale language model development. Pico consists of two libraries that together provide a practical sandbox where researchers can make targeted changes to a model’s architecture or training procedures and directly observe their effects on the model’s behavior. To support reproducible experimentation, we also release a suite of baseline models, pico-decoder, trained under standardized conditions and open-sourced for the community. Case studies highlight how Pico can support iterative small LM design and analysis.

Anthology ID:: 2025.emnlp-demos.22
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Ivan Habernal, Peter Schulam, Jörg Tiedemann
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 295–306
Language:
URL:: https://aclanthology.org/2025.emnlp-demos.22/
DOI:: 10.18653/v1/2025.emnlp-demos.22
Bibkey:
Cite (ACL):: Richard Diehl Martinez, David Demitri Africa, Yuval Weiss, Suchir Salhan, Ryan Daniels, and Paula Buttery. 2025. Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 295–306, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research (Diehl Martinez et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-demos.22.pdf

PDF Cite Search Fix data