Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?

Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe


Abstract
Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods (e.g. RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc.) infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training, which means it can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This part is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.
Anthology ID:
2024.naacl-long.447
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8091–8106
Language:
URL:
https://aclanthology.org/2024.naacl-long.447
DOI:
10.18653/v1/2024.naacl-long.447
Bibkey:
Cite (ACL):
Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, and Taro Watanabe. 2024. Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8091–8106, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion? (Sakai et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.447.pdf