A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

Ziyi Yang, Yinfei Yang, Daniel Cer, Eric Darve


Abstract
Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method “Language Information Removal (LIR)” factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment shows that LIR is highly effectively on this task, yielding almost 100% relative improvement in MAP for weak-alignment models. We then evaluate the LIR on Amazon Reviews and XEVAL dataset, with the observation that removing language information is able to improve the cross-lingual transfer performance.
Anthology ID:
2021.emnlp-main.470
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5825–5832
Language:
URL:
https://aclanthology.org/2021.emnlp-main.470
DOI:
10.18653/v1/2021.emnlp-main.470
Bibkey:
Cite (ACL):
Ziyi Yang, Yinfei Yang, Daniel Cer, and Eric Darve. 2021. A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5825–5832, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations (Yang et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.470.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.470.mp4
Code
 ziyi-yang/lir
Data
LAReQAWiki-40B