Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification

Siyu Lai; Hui Huang; Dong Jing; Yufeng Chen; Jinan Xu (徐金安); Jian Liu

doi:10.18653/v1/2021.findings-emnlp.55

Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification

Siyu Lai, Hui Huang, Dong Jing, Yufeng Chen, Jinan Xu, Jian Liu

Abstract

Recent multilingual pre-trained models, like XLM-RoBERTa (XLM-R), have been demonstrated effective in many cross-lingual tasks. However, there are still gaps between the contextualized representations of similar words in different languages. To solve this problem, we propose a novel framework named Multi-View Mixed Language Training (MVMLT), which leverages code-switched data with multi-view learning to fine-tune XLM-R. MVMLT uses gradient-based saliency to extract keywords which are the most relevant to downstream tasks and replaces them with the corresponding words in the target language dynamically. Furthermore, MVMLT utilizes multi-view learning to encourage contextualized embeddings to align into a more refined language-invariant space. Extensive experiments with four languages show that our model achieves state-of-the-art results on zero-shot cross-lingual sentiment classification and dialogue state tracking tasks, demonstrating the effectiveness of our proposed model.

Anthology ID:: 2021.findings-emnlp.55
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 599–610
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.55/
DOI:: 10.18653/v1/2021.findings-emnlp.55
Bibkey:
Cite (ACL):: Siyu Lai, Hui Huang, Dong Jing, Yufeng Chen, Jinan Xu, and Jian Liu. 2021. Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 599–610, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification (Lai et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.55.pdf
Video:: https://aclanthology.org/2021.findings-emnlp.55.mp4

PDF Cite Search Video Fix data