An Adversarial Example for Direct Logit Attribution: Memory Management in GELU-4L

An Adversarial Example for Direct Logit Attribution: Memory Management in GELU-4L Jett Janiak author Can Rager author James Dao author Yeu-Tong Lau author 2024-11 text Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP Yonatan Belinkov editor Najoung Kim editor Jaap Jumelet editor Hosein Mohebbi editor Aaron Mueller editor Hanjie Chen editor Association for Computational Linguistics Miami, Florida, US conference publication janiak-etal-2024-adversarial 10.18653/v1/2024.blackboxnlp-1.15 https://aclanthology.org/2024.blackboxnlp-1.15/ 2024-11 232 237