Prix-LM: Pretraining for Multilingual Knowledge Base Construction

Wenxuan Zhou, Fangyu Liu, Ivan Vulić, Nigel Collier, Muhao Chen


Abstract
Knowledge bases (KBs) contain plenty of structured world and commonsense knowledge. As such, they often complement distributional text-based information and facilitate various downstream tasks. Since their manual construction is resource- and time-intensive, recent efforts have tried leveraging large pretrained language models (PLMs) to generate additional monolingual knowledge facts for KBs. However, such methods have not been attempted for building and enriching multilingual KBs. Besides wider application, such multilingual KBs can provide richer combined knowledge than monolingual (e.g., English) KBs. Knowledge expressed in different languages may be complementary and unequally distributed: this implies that the knowledge available in high-resource languages can be transferred to low-resource ones. To achieve this, it is crucial to represent multilingual knowledge in a shared/unified space. To this end, we propose a unified representation model, Prix-LM, for multilingual KB construction and completion. We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs, and tune a multilingual language encoder XLM-R via a causal language modeling objective. Prix-LM integrates useful multilingual and KB-based factual knowledge into a single model. Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness, with gains reported over strong task-specialised baselines.
Anthology ID:
2022.acl-long.371
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5412–5424
Language:
URL:
https://aclanthology.org/2022.acl-long.371
DOI:
10.18653/v1/2022.acl-long.371
Bibkey:
Cite (ACL):
Wenxuan Zhou, Fangyu Liu, Ivan Vulić, Nigel Collier, and Muhao Chen. 2022. Prix-LM: Pretraining for Multilingual Knowledge Base Construction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5412–5424, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Prix-LM: Pretraining for Multilingual Knowledge Base Construction (Zhou et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.371.pdf
Code
 luka-group/prix-lm
Data
DBpediaLAMAXL-BEL