pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah Goodman, Christopher Manning, Christopher Potts


Abstract
Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce pyvene, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. pyvene supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how pyvene provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at ‘https://github.com/stanfordnlp/pyvene‘.
Anthology ID:
2024.naacl-demo.16
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kai-Wei Chang, Annie Lee, Nazneen Rajani
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
158–165
Language:
URL:
https://aclanthology.org/2024.naacl-demo.16
DOI:
Bibkey:
Cite (ACL):
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah Goodman, Christopher Manning, and Christopher Potts. 2024. pyvene: A Library for Understanding and Improving PyTorch Models via Interventions. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations), pages 158–165, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions (Wu et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-demo.16.pdf