Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu

Derwin Ngomane, Rooweither Mabuya, Jade Abbott, Vukosi Marivate


Abstract
In this study, we investigate the effectiveness of using cross-lingual word embeddings for zero-shot transfer learning between a language with an abundant resource, English, and a languagewith limited resource, isiZulu. IsiZulu is a part of the South African Nguni language family, which is characterised by complex agglutinating morphology. We use VecMap, an open source tool, to obtain cross-lingual word embeddings. To perform an extrinsic evaluation of the effectiveness of the embeddings, we train a news classifier on labelled English data in order to categorise unlabelled isiZulu data using zero-shot transfer learning. In our study, we found our model to have a weighted average F1-score of 0.34. Our findings demonstrate that VecMap generates modular word embeddings in the cross-lingual space that have an impact on the downstream classifier used for zero-shot transfer learning.
Anthology ID:
2023.rail-1.2
Volume:
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Rooweither Mabuya, Don Mthobela, Mmasibidi Setaka, Menno Van Zaanen
Venue:
RAIL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–17
Language:
URL:
https://aclanthology.org/2023.rail-1.2
DOI:
10.18653/v1/2023.rail-1.2
Bibkey:
Cite (ACL):
Derwin Ngomane, Rooweither Mabuya, Jade Abbott, and Vukosi Marivate. 2023. Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu. In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 11–17, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu (Ngomane et al., RAIL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.rail-1.2.pdf
Video:
 https://aclanthology.org/2023.rail-1.2.mp4