IsiZulu noun classification based on replicating the ensemble approach for Runyankore

Zola Mahlaza; C. Maria Keet; Imaan Sayed; Alexander Van Der Leek

IsiZulu noun classification based on replicating the ensemble approach for Runyankore

Zola Mahlaza, C. Maria Keet, Imaan Sayed, Alexander Van Der Leek

Abstract

A noun’s class is a crucial component in NLP, because it governs agreement across the sentence in Niger Congo B (NCB) languages, among others. The phenomenon is ill-documented in most NCB languages, or in a non-reusable format, such as a printed dictionary subject to copyright restrictions. A promising approach by Byamugisha (2022) used a data-driven approach for Runyankore that combined syntax and semantics. The code and data are inaccessible however, and it remains to be seen whether it is suitable for other NCB languages. We aimed to reproduce Byamugisha’s experiment, but then for isiZulu. We conducted this as two independent experiments, so that we also could subject it to a meta-analysis. Results showed that it was reproducible only in part, mainly due to imprecision in the original description, and the current impossibility to generate the same kind of source data set generated from an existing grammar. The different choices made in attempting to reproduce the pipeline as well as differences in choice of training and test data had a large effect on the eventual accuracy of noun class disambiguation but could produce accuracies in the same range as for Runyankore: 80-85%.

Anthology ID:: 2025.loreslm-1.35
Volume:: Proceedings of the First Workshop on Language Models for Low-Resource Languages
Month:: January
Year:: 2025
Address:: Abu Dhabi, United Arab Emirates
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venues:: LoResLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 469–478
Language:
URL:: https://aclanthology.org/2025.loreslm-1.35/
DOI:
Bibkey:
Cite (ACL):: Zola Mahlaza, C. Maria Keet, Imaan Sayed, and Alexander Van Der Leek. 2025. IsiZulu noun classification based on replicating the ensemble approach for Runyankore. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 469–478, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: IsiZulu noun classification based on replicating the ensemble approach for Runyankore (Mahlaza et al., LoResLM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.loreslm-1.35.pdf

PDF Cite Search Fix data