Characterizing Mechanisms for Factual Recall in Language Models

Qinan Yu, Jack Merullo, Ellie Pavlick


Abstract
Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., “The capital of Poland is London”) to overwrite what it learned in pretraining (“Warsaw”). On Pythia and GPT2, the training frequency of both the query country (”Poland”) and the in-context city (”London”) highly affect the models’ likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.
Anthology ID:
2023.emnlp-main.615
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9924–9959
Language:
URL:
https://aclanthology.org/2023.emnlp-main.615
DOI:
10.18653/v1/2023.emnlp-main.615
Bibkey:
Cite (ACL):
Qinan Yu, Jack Merullo, and Ellie Pavlick. 2023. Characterizing Mechanisms for Factual Recall in Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9924–9959, Singapore. Association for Computational Linguistics.
Cite (Informal):
Characterizing Mechanisms for Factual Recall in Language Models (Yu et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.615.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.615.mp4