Probing Language Models on Their Knowledge Source

Zineddine Tighidet, Jiali Mei, Benjamin Piwowarski, Patrick Gallinari


Abstract
Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model’s PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.
Anthology ID:
2024.blackboxnlp-1.35
Volume:
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Yonatan Belinkov, Najoung Kim, Jaap Jumelet, Hosein Mohebbi, Aaron Mueller, Hanjie Chen
Venue:
BlackboxNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
604–614
Language:
URL:
https://aclanthology.org/2024.blackboxnlp-1.35
DOI:
Bibkey:
Cite (ACL):
Zineddine Tighidet, Jiali Mei, Benjamin Piwowarski, and Patrick Gallinari. 2024. Probing Language Models on Their Knowledge Source. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 604–614, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
Probing Language Models on Their Knowledge Source (Tighidet et al., BlackboxNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.blackboxnlp-1.35.pdf