AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus

Soumedhik Bharati; Shibam Mandal; Prithwish Ghosh; Swarup Kr Ghosh; Sayani Mondal

AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus

Soumedhik Bharati, Shibam Mandal, Prithwish Ghosh, Swarup Kr Ghosh, Sayani Mondal

Abstract

Hausa Ajami (Hausa written in Arabic script) remains severely under-resourced for computational morphology. We present AjamiMorph, a zero-annotation framework that discovers morphemes through consensus among three unsupervised methods, namely, Byte Pair Encoding (BPE), transition-based boundary detection using Pointwise Mutual Information (PMI), and computational linguistics based Distributional Affix Mining (DAM). Using a Hausa Ajami Bible corpus consisting of 637,414 tokens, AjamiMorph identifies 1,611 high-confidence morphemes, achieving 99.9% coverage. The inventory exhibits a linguistically realistic distribution (66.0% stems, 22.6% suffixes, 11.4% prefixes) and recovers 77.8% of known Hausa affixes. A permutation test that shuffles method assignments (preserving per-method selection sizes) confirms that the observed agreement is above-chance; chi-square remains as a secondary check. A lightweight 5-gram LM comparison (characters vs. consensus morphemes) provides an extrinsic signal. We also report negative results for script-driven Arabic assumptions and LLM-first annotation. This work provides the first unsupervised morpheme inventory for Hausa Ajami and demonstrates consensus as a robust strategy for zero-resource morphology.

Anthology ID:: 2026.abjadnlp-1.23
Volume:: Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Venues:: AbjadNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 166–171
Language:
URL:: https://aclanthology.org/2026.abjadnlp-1.23/
DOI:
Bibkey:
Cite (ACL):: Soumedhik Bharati, Shibam Mandal, Prithwish Ghosh, Swarup Kr Ghosh, and Sayani Mondal. 2026. AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 166–171, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus (Bharati et al., AbjadNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.abjadnlp-1.23.pdf
Optionalsupplementarymaterial:: 2026.abjadnlp-1.23.OptionalSupplementaryMaterial.rar

PDF Cite Search Optionalsupplementarymaterial Fix data