How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis
Shaobo Li | Xiaoguang Li | Lifeng Shang | Zhenhua Dong | Chengjie Sun | Bingquan Liu | Zhenzhou Ji | Xin Jiang | Qun Liu
Findings of the Association for Computational Linguistics: ACL 2022
Recently, there has been a trend to investigate the factual knowledge captured by Pre-trained Language Models (PLMs). Many works show the PLMs’ ability to fill in the missing factual words in cloze-style prompts such as ”Dante was born in [MASK].” However, it is still a mystery how PLMs generate the results correctly: relying on effective clues or shortcut patterns? We try to answer this question by a causal-inspired analysis that quantitatively measures and evaluates the word-level patterns that PLMs depend on to generate the missing words. We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred. Our analysis shows: (1) PLMs generate the missing factual words more by the positionally close and highly co-occurred words than the knowledge-dependent words; (2) the dependence on the knowledge-dependent words is more effective than the positionally close and highly co-occurred words. Accordingly, we conclude that the PLMs capture the factual knowledge ineffectively because of depending on the inadequate associations.
Pre-training Language Models with Deterministic Factual Knowledge
Shaobo Li | Xiaoguang Li | Lifeng Shang | Chengjie Sun | Bingquan Liu | Zhenzhou Ji | Xin Jiang | Qun Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Previous works show that Pre-trained Language Models (PLMs) can capture factual knowledge. However, some analyses reveal that PLMs fail to perform it robustly, e.g., being sensitive to the changes of prompts when extracting factual knowledge. To mitigate this issue, we propose to let PLMs learn the deterministic relationship between the remaining context and the masked content. The deterministic relationship ensures that the masked factual content can be deterministically inferable based on the existing clues in the context. That would provide more stable patterns for PLMs to capture factual knowledge than randomly masking. Two pre-training tasks are further introduced to motivate PLMs to rely on the deterministic relationship when filling masks. Specifically, we use an external Knowledge Base (KB) to identify deterministic relationships and continuously pre-train PLMs with the proposed methods. The factual knowledge probing experiments indicate that the continuously pre-trained PLMs achieve better robustness in factual knowledge capturing. Further experiments on question-answering datasets show that trying to learn a deterministic relationship with the proposed methods can also help other knowledge-intensive tasks.
- Xiaoguang Li 2
- Lifeng Shang 2
- Cheng-Jie Sun 2
- Bingquan Liu 2
- Zhenzhou Ji 2
- show all...