Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction

Mladen Karan, Ivan Vulić, Anna Korhonen, Goran Glavaš


Abstract
Effective projection-based cross-lingual word embedding (CLWE) induction critically relies on the iterative self-learning procedure. It gradually expands the initial small seed dictionary to learn improved cross-lingual mappings. In this work, we present ClassyMap, a classification-based approach to self-learning, yielding a more robust and a more effective induction of projection-based CLWEs. Unlike prior self-learning methods, our approach allows for integration of diverse features into the iterative process. We show the benefits of ClassyMap for bilingual lexicon induction: we report consistent improvements in a weakly supervised setup (500 seed translation pairs) on a benchmark with 28 language pairs.
Anthology ID:
2020.acl-main.618
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6915–6922
Language:
URL:
https://aclanthology.org/2020.acl-main.618
DOI:
10.18653/v1/2020.acl-main.618
Bibkey:
Cite (ACL):
Mladen Karan, Ivan Vulić, Anna Korhonen, and Goran Glavaš. 2020. Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6915–6922, Online. Association for Computational Linguistics.
Cite (Informal):
Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction (Karan et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.618.pdf
Video:
 http://slideslive.com/38929124