Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

Mingqi Li; Fei Ding; Dan Zhang; Long Cheng; Hongxin Hu; Feng Luo

doi:10.18653/v1/2022.emnlp-main.202

Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

Mingqi Li, Fei Ding, Dan Zhang, Long Cheng, Hongxin Hu, Feng Luo

Abstract

Pre-trained multilingual language models play an important role in cross-lingual natural language understanding tasks. However, existing methods did not focus on learning the semantic structure of representation, and thus could not optimize their performance. In this paper, we propose Multi-level Multilingual Knowledge Distillation (MMKD), a novel method for improving multilingual language models. Specifically, we employ a teacher-student framework to adopt rich semantic representation knowledge in English BERT. We propose token-, word-, sentence-, and structure-level alignment objectives to encourage multiple levels of consistency between source-target pairs and correlation similarity between teacher and student models. We conduct experiments on cross-lingual evaluation benchmarks including XNLI, PAWS-X, and XQuAD. Experimental results show that MMKD outperforms other baseline models of similar size on XNLI and XQuAD and obtains comparable performance on PAWS-X. Especially, MMKD obtains significant performance gains on low-resource languages.

Anthology ID:: 2022.emnlp-main.202
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3097–3106
Language:
URL:: https://aclanthology.org/2022.emnlp-main.202
DOI:: 10.18653/v1/2022.emnlp-main.202
Bibkey:
Cite (ACL):: Mingqi Li, Fei Ding, Dan Zhang, Long Cheng, Hongxin Hu, and Feng Luo. 2022. Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3097–3106, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model (Li et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.202.pdf

PDF Cite Search