Arthur Conmy
2024
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Tom Lieberum
|
Senthooran Rajamanoharan
|
Arthur Conmy
|
Lewis Smith
|
Nicolas Sonnerat
|
Vikrant Varma
|
Janos Kramar
|
Anca Dragan
|
Rohin Shah
|
Neel Nanda
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Copy Suppression: Comprehensively Understanding a Motif in Language Model Attention Heads
Callum Stuart McDougall
|
Arthur Conmy
|
Cody Rushing
|
Thomas McGrath
|
Neel Nanda
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
|
Can Rager
|
Arthur Conmy
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Co-authors
- Neel Nanda 2
- Tom Lieberum 1
- Senthooran Rajamanoharan 1
- Lewis Smith 1
- Nicolas Sonnerat 1
- show all...