Neel Nanda
2024
Language Models Linearly Represent Sentiment
Oskar John Hollinsworth
|
Curt Tigges
|
Atticus Geiger
|
Neel Nanda
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Tom Lieberum
|
Senthooran Rajamanoharan
|
Arthur Conmy
|
Lewis Smith
|
Nicolas Sonnerat
|
Vikrant Varma
|
Janos Kramar
|
Anca Dragan
|
Rohin Shah
|
Neel Nanda
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Copy Suppression: Comprehensively Understanding a Motif in Language Model Attention Heads
Callum Stuart McDougall
|
Arthur Conmy
|
Cody Rushing
|
Thomas McGrath
|
Neel Nanda
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
2023
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Neel Nanda
|
Andrew Lee
|
Martin Wattenberg
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Co-authors
- Arthur Conmy 2
- Andrew Lee 1
- Martin Wattenberg 1
- Oskar John Hollinsworth 1
- Curt Tigges 1
- show all...