Muhammed Yahia Gaffar Saeed Saeed
2025
Implicit Discourse Relation Classification For Nigerian Pidgin
Muhammed Yahia Gaffar Saeed Saeed
|
Peter Bourgonje
|
Vera Demberg
Proceedings of the 31st International Conference on Computational Linguistics
Nigerian Pidgin (NP) is an English-based creole language spoken by nearly 100 million people across Nigeria, and is still low-resource in NLP. In particular, there are currently no available discourse parsing tools, which, if available, would have the potential to improve various downstream tasks. Our research focuses on implicit discourse relation classification (IDRC) for NP, a task which, even in English, is not easily solved by prompting LLMs, but requires supervised training. % With this in mind, we have developed a framework for the task, which could also be used by researchers for other English-lexified languages. We systematically compare different approaches to the low resource IDRC task: in one approach, we use English IDRC tools directly on the NP text as well as on their English translations (followed by a back-projection of labels). In another approach, we create a synthetic discourse corpus for NP, in which we automatically translate the English discourse-annotated corpus PDTB to NP, project PDTB labels, and then train an NP IDR classifier. The latter approach of training a “native” NP classifier outperforms our baseline by 13.27% and 33.98% in f1 score for 4-way and 11-way classification, respectively.