CoDEx: A Comprehensive Knowledge Graph Completion Benchmark

Tara Safavi, Danai Koutra


Abstract
We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at https://bit.ly/2EPbrJs.
Anthology ID:
2020.emnlp-main.669
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8328–8350
Language:
URL:
https://aclanthology.org/2020.emnlp-main.669
DOI:
10.18653/v1/2020.emnlp-main.669
Bibkey:
Cite (ACL):
Tara Safavi and Danai Koutra. 2020. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8328–8350, Online. Association for Computational Linguistics.
Cite (Informal):
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark (Safavi & Koutra, EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.669.pdf
Code
 tsafavi/codex +  additional community code
Data
CoDEx LargeCoDEx MediumCoDEx SmallFB15k-237NELL-995