Correcting Language Model Outputs by Editing Salient Layers

Kshitij Mishra, Tamer Soliman, Anil Ramakrishna, Aram Galstyan, Anoop Kumar


Abstract
Large language models can accumulate incorrect or outdated knowledge as the real world evolves. Compared to typical solutions such as retraining, retrieval augmented generation, model editing offers an effective yet low cost solution to address this issue. However, existing model editing algorithms employ manual selection of edit layers, which requires prior domain knowledge or expensive architecture-specific empirical layer selection methods, such as causal tracing. In this work, we propose SaLEM (Salient Layers Editing Model), an efficient solution for data driven layer selection for the model editing task. Our solution utilizes layer-wise saliency maps for layer selection, and matches the accuracy of prior approaches but with only 1/3 of their edits, enabling efficient updates to the parametric knowledge in large language models.
Anthology ID:
2024.findings-eacl.86
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1295–1305
Language:
URL:
https://aclanthology.org/2024.findings-eacl.86
DOI:
Bibkey:
Cite (ACL):
Kshitij Mishra, Tamer Soliman, Anil Ramakrishna, Aram Galstyan, and Anoop Kumar. 2024. Correcting Language Model Outputs by Editing Salient Layers. In Findings of the Association for Computational Linguistics: EACL 2024, pages 1295–1305, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Correcting Language Model Outputs by Editing Salient Layers (Mishra et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.86.pdf