Imaan Sayed


2025

pdf bib
IsiZulu noun classification based on replicating the ensemble approach for Runyankore
Zola Mahlaza | C. Maria Keet | Imaan Sayed | Alexander Van Der Leek
Proceedings of the First Workshop on Language Models for Low-Resource Languages

A noun’s class is a crucial component in NLP, because it governs agreement across the sentence in Niger Congo B (NCB) languages, among others. The phenomenon is ill-documented in most NCB languages, or in a non-reusable format, such as a printed dictionary subject to copyright restrictions. A promising approach by Byamugisha (2022) used a data-driven approach for Runyankore that combined syntax and semantics. The code and data are inaccessible however, and it remains to be seen whether it is suitable for other NCB languages. We aimed to reproduce Byamugisha’s experiment, but then for isiZulu. We conducted this as two independent experiments, so that we also could subject it to a meta-analysis. Results showed that it was reproducible only in part, mainly due to imprecision in the original description, and the current impossibility to generate the same kind of source data set generated from an existing grammar. The different choices made in attempting to reproduce the pipeline as well as differences in choice of training and test data had a large effect on the eventual accuracy of noun class disambiguation but could produce accuracies in the same range as for Runyankore: 80-85%.

pdf bib
On the Usage of Semantics, Syntax, and Morphology for Noun Classification in IsiZulu
Imaan Sayed | Zola Mahlaza | Alexander van der Leek | Jonathan Mopp | C. Maria Keet
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

There is limited work aimed at solving the core task of noun classification for Nguni languages. The task focuses on identifying the semantic categorisation of each noun and plays a crucial role in the ability to form semantically and morphologically valid sentences. The work by Byamugisha (2022) was the first to tackle the problem for a related, but non-Nguni, language. While there have been efforts to replicate it for a Nguni language, there has been no effort focused on comparing the technique used in the original work vs. contemporary neural methods or a number of traditional machine learning classification techniques that do not rely on human-guided knowledge to the same extent. We reproduce Byamugisha (2022)’s work with different configurations to account for differences in access to datasets and resources, compare the approach with a pre-trained transformer-based model, and traditional machine learning models that relyon less human-guided knowledge. The newly created data-driven models outperform the knowledge-infused models, with the best performing models achieving an F1 score of 0.97.