Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces

Prince O Aboagye; Yan Zheng; Michael Yeh; Junpeng Wang; Zhongfang Zhuang; Huiyuan Chen; Liang Wang; Wei Zhang; Jeff Phillips

Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces

Prince O Aboagye, Yan Zheng, Michael Yeh, Junpeng Wang, Zhongfang Zhuang, Huiyuan Chen, Liang Wang, Wei Zhang, Jeff Phillips

Abstract

Motivated by the widespread interest in the cross-lingual transfer of NLP models from high resource to low resource languages, research on Cross-lingual word embeddings (CLWEs) has gained much popularity over the years. Among the most successful and attractive CLWE models are the unsupervised CLWE models. These unsupervised CLWE models pose the alignment task as a Wasserstein-Procrustes problem aiming to estimate a permutation matrix and an orthogonal matrix jointly. Most existing unsupervised CLWE models resort to Optimal Transport (OT) based methods to estimate the permutation matrix. However, linear programming algorithms and approximate OT solvers via Sinkhorn for computing the permutation matrix scale cubically and quadratically, respectively, in the input size. This makes it impractical and infeasible to compute OT distances exactly for larger sample size, resulting in a poor approximation quality of the permutation matrix and subsequently a less robust learned transfer function or mapper. This paper proposes an unsupervised projection-based CLWE model called quantized Wasserstein Procrustes (qWP) that jointly estimates a permutation matrix and an orthogonal matrix. qWP relies on a quantization step to estimate the permutation matrix between two probability distributions or measures. This approach substantially improves the approximation quality of empirical OT solvers given fixed computational cost. We demonstrate that qWP achieves state-of-the-art results on the Bilingual lexicon Induction (BLI) task.

Anthology ID:: 2022.amta-research.15
Volume:: Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Month:: September
Year:: 2022
Address:: Orlando, USA
Editors:: Kevin Duh, Francisco Guzmán
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 200–214
Language:
URL:: https://aclanthology.org/2022.amta-research.15/
DOI:
Bibkey:
Cite (ACL):: Prince O Aboagye, Yan Zheng, Michael Yeh, Junpeng Wang, Zhongfang Zhuang, Huiyuan Chen, Liang Wang, Wei Zhang, and Jeff Phillips. 2022. Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces. In Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 200–214, Orlando, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces (Aboagye et al., AMTA 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.amta-research.15.pdf

PDF Cite Search Fix data