%0 Conference Proceedings
%T How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis
%A Li, Shaobo
%A Li, Xiaoguang
%A Shang, Lifeng
%A Dong, Zhenhua
%A Sun, Chengjie
%A Liu, Bingquan
%A Ji, Zhenzhou
%A Jiang, Xin
%A Liu, Qun
%Y Muresan, Smaranda
%Y Nakov, Preslav
%Y Villavicencio, Aline
%S Findings of the Association for Computational Linguistics: ACL 2022
%D 2022
%8 May
%I Association for Computational Linguistics
%C Dublin, Ireland
%F li-etal-2022-pre
%X Recently, there has been a trend to investigate the factual knowledge captured by Pre-trained Language Models (PLMs). Many works show the PLMs’ ability to fill in the missing factual words in cloze-style prompts such as ”Dante was born in [MASK].” However, it is still a mystery how PLMs generate the results correctly: relying on effective clues or shortcut patterns? We try to answer this question by a causal-inspired analysis that quantitatively measures and evaluates the word-level patterns that PLMs depend on to generate the missing words. We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred. Our analysis shows: (1) PLMs generate the missing factual words more by the positionally close and highly co-occurred words than the knowledge-dependent words; (2) the dependence on the knowledge-dependent words is more effective than the positionally close and highly co-occurred words. Accordingly, we conclude that the PLMs capture the factual knowledge ineffectively because of depending on the inadequate associations.
%R 10.18653/v1/2022.findings-acl.136
%U https://aclanthology.org/2022.findings-acl.136
%U https://doi.org/10.18653/v1/2022.findings-acl.136
%P 1720-1732