Zola Mahlaza
2024
Automatically Generating IsiZulu Words From Indo-Arabic Numerals
Zola Mahlaza
|
Tadiwa Magwenzi
|
C. Maria Keet
|
Langa Khumalo
Proceedings of the 17th International Natural Language Generation Conference
Artificial conversational agents are deployed to assist humans in a variety of tasks. Some of these tasks require the capability to communicate numbers as part of their internal and abstract representations of meaning, such as for banking and scheduling appointments. They currently cannot do so for isiZulu because there are no algorithms to do so due to a lack of speech and text data and the transformation is complex and it may include dependence on the type of noun that is counted. We solved this by extracting and iteratively improving on the rules for speaking and writing numerals as words and creating two algorithms to automate the transformation. Evaluation of the algorithms by two isiZulu grammarians showed that six out of seven number categories were 90-100% correct. The same software was used with an additional set of rules to create a large monolingual text corpus, made up of 771 643 sentences, to enable future data-driven approaches.
ReproHum #0866-04: Another Evaluation of Readers’ Reactions to News Headlines
Zola Mahlaza
|
Toky Hajatiana Raboanary
|
Kyle Seakgwa
|
C. Maria Keet
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
The reproduction of Natural Language Processing (NLP) studies is important in establishing their reliability. Nonetheless, many papers in NLP have never been reproduced. This paper presents a reproduction of Gabriel et al. (2022)’s work to establish the extent to which their findings, pertaining to the utility of large language models (T5 and GPT2) to automatically generate writer’s intents when given headlines to curb misinformation, can be confirmed. Our results show no evidence to support two of their four findings and they partially support the rest of the original findings. Specifically, while we confirmed that all the models are judged to be capable of influencing readers’ trust or distrust, there was a difference in T5’s capability to reduce trust. Our results show that its generations are more likely to have greater influence in reducing trust while Gabriel et al. (2022) found more cases where they had no impact at all. In addition, most of the model generations are considered socially acceptable only if we relax the criteria for determining a majority to mean more than chance rather than the apparent > 70% of the original study. Overall, while they found that “machine-generated MRF implications alongside news headlines to readers can increase their trust in real news while decreasing their trust in misinformation”, we found that they are more likely to decrease trust in both cases vs. having no impact at all.
2020
OWLSIZ: An isiZulu CNL for structured knowledge validation
Zola Mahlaza
|
C. Maria Keet
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
In iterative knowledge elicitation, engineers are expected to be directly involved in validating the already captured knowledge and obtaining new knowledge increments, thus making the process time consuming. Languages such as English have controlled natural languages than can be repurposed to generate natural language questions from an ontology in order to allow a domain expert to independently validate the contents of an ontology without understanding a ontology authoring language such as OWL. IsiZulu, South Africa’s main L1 language by number speakers, does not have such a resource, hence, it is not possible to build a verbaliser to generate such questions. Therefore, we propose an isiZulu controlled natural language, called OWL Simplified isiZulu (OWLSIZ), for producing grammatical and fluent questions from an ontology. Human evaluation of the generated questions showed that participants’ judgements agree that most (83%) questions are positive for grammaticality or understandability.
Search