Using Subword-Embeddings for Bilingual Lexicon Induction in Bantu Languages

Adrian Breiding, Alan Akbik


Abstract
Bilingual Lexicon Induction (BLI) is a valuable tool in machine translation and cross-lingual transfer learning, but it remains challenging for agglutinative and low-resource languages. In this work, we investigate the use of weighted sub-word embeddings in BLI for agglutinative languages. We further evaluate a graph-matching and Procrustes-based BLI approach on two Bantu languages, assessing its effectiveness in a previously underexplored language family. Our results for Swahili with an average P@1 score of 51.84% for a 3000 word dictionary demonstrate the success of the approach for Bantu languages. Weighted sub-word embeddings perform competitively on Swahili and outperform word embeddings in our experiments with Zulu.
Anthology ID:
2026.africanlp-main.29
Volume:
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Everlyn Asiko Chimoto, Constantine Lignos, Shamsuddeen Muhammad, Idris Abdulmumin, Clemencia Siro, David Ifeoluwa Adelani
Venues:
AfricaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
271–280
Language:
URL:
https://aclanthology.org/2026.africanlp-main.29/
DOI:
Bibkey:
Cite (ACL):
Adrian Breiding and Alan Akbik. 2026. Using Subword-Embeddings for Bilingual Lexicon Induction in Bantu Languages. In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), pages 271–280, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Using Subword-Embeddings for Bilingual Lexicon Induction in Bantu Languages (Breiding & Akbik, AfricaNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.africanlp-main.29.pdf