Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples

Linlin Liu, Xin Li, Ruidan He, Lidong Bing, Shafiq Joty, Luo Si


Abstract
Knowledge-enhanced language representation learning has shown promising results across various knowledge-intensive NLP tasks. However, prior methods are limited in efficient utilization of multilingual knowledge graph (KG) data for language model (LM) pretraining. They often train LMs with KGs in indirect ways, relying on extra entity/relation embeddings to facilitate knowledge injection. In this work, we explore methods to make better use of the multilingual annotation and language agnostic property of KG triples, and present novel knowledge based multilingual language models (KMLMs) trained directly on the knowledge triples. We first generate a large amount of multilingual synthetic sentences using the Wikidata KG triples. Then based on the intra- and inter-sentence structures of the generated data, we design pretraining tasks to enable the LMs to not only memorize the factual knowledge but also learn useful logical patterns. Our pretrained KMLMs demonstrate significant performance improvements on a wide range of knowledge-intensive cross-lingual tasks, including named entity recognition (NER), factual knowledge retrieval, relation classification, and a newly designed logical reasoning task.
Anthology ID:
2022.emnlp-main.462
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6878–6890
Language:
URL:
https://aclanthology.org/2022.emnlp-main.462
DOI:
10.18653/v1/2022.emnlp-main.462
Bibkey:
Cite (ACL):
Linlin Liu, Xin Li, Ruidan He, Lidong Bing, Shafiq Joty, and Luo Si. 2022. Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6878–6890, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples (Liu et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.462.pdf