DM-BLI: Dynamic Multiple Subspaces Alignment for Unsupervised Bilingual Lexicon Induction

Ling Hu, Yuemei Xu


Abstract
Unsupervised bilingual lexicon induction (BLI) task aims to find word translations between languages and has achieved great success in similar language pairs. However, related works mostly rely on a single linear mapping for language alignment and fail on distant or low-resource language pairs, achieving less than half the performance observed in rich-resource language pairs. In this paper, we introduce DM-BLI, a Dynamic Multiple subspaces alignment framework for unsupervised BLI. DM-BLI improves language alignment by utilizing multiple subspace alignments instead of a single mapping. We begin via unsupervised clustering to discover these subspaces in source embedding space. Then we identify and align corresponding subspaces in the target space using a rough global alignment. DM-BLI further employs intra-cluster and inter-cluster contrastive learning to refine precise alignment for each subspace pair. Experiments conducted on standard BLI datasets for 12 language pairs (6 rich-resource and 6 low-resource) demonstrate substantial gains achieved by our framework. We release our code at https://github.com/huling-2/DM-BLI.git.
Anthology ID:
2024.acl-long.112
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2041–2052
Language:
URL:
https://aclanthology.org/2024.acl-long.112
DOI:
Bibkey:
Cite (ACL):
Ling Hu and Yuemei Xu. 2024. DM-BLI: Dynamic Multiple Subspaces Alignment for Unsupervised Bilingual Lexicon Induction. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2041–2052, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
DM-BLI: Dynamic Multiple Subspaces Alignment for Unsupervised Bilingual Lexicon Induction (Hu & Xu, ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.112.pdf