Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu

Derwin Ngomane; Rooweither Mabuya; Jade Abbott; Vukosi Marivate

doi:10.18653/v1/2023.rail-1.2

Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu

Derwin Ngomane, Rooweither Mabuya, Jade Abbott, Vukosi Marivate

Abstract

In this study, we investigate the effectiveness of using cross-lingual word embeddings for zero-shot transfer learning between a language with an abundant resource, English, and a languagewith limited resource, isiZulu. IsiZulu is a part of the South African Nguni language family, which is characterised by complex agglutinating morphology. We use VecMap, an open source tool, to obtain cross-lingual word embeddings. To perform an extrinsic evaluation of the effectiveness of the embeddings, we train a news classifier on labelled English data in order to categorise unlabelled isiZulu data using zero-shot transfer learning. In our study, we found our model to have a weighted average F1-score of 0.34. Our findings demonstrate that VecMap generates modular word embeddings in the cross-lingual space that have an impact on the downstream classifier used for zero-shot transfer learning.

Anthology ID:: 2023.rail-1.2
Volume:: Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Editors:: Rooweither Mabuya, Don Mthobela, Mmasibidi Setaka, Menno Van Zaanen
Venue:: RAIL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11–17
Language:
URL:: https://aclanthology.org/2023.rail-1.2
DOI:: 10.18653/v1/2023.rail-1.2
Bibkey:
Cite (ACL):: Derwin Ngomane, Rooweither Mabuya, Jade Abbott, and Vukosi Marivate. 2023. Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu. In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 11–17, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu (Ngomane et al., RAIL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.rail-1.2.pdf
Video:: https://aclanthology.org/2023.rail-1.2.mp4

PDF Cite Search Video