AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus

Soumedhik Bharati, Shibam Mandal, Prithwish Ghosh, Swarup Kr Ghosh, Sayani Mondal


Abstract
Hausa Ajami (Hausa written in Arabic script) remains severely under-resourced for computational morphology. We present AjamiMorph, a zero-annotation framework that discovers morphemes through consensus among three unsupervised methods, namely, Byte Pair Encoding (BPE), transition-based boundary detection using Pointwise Mutual Information (PMI), and computational linguistics based Distributional Affix Mining (DAM). Using a Hausa Ajami Bible corpus consisting of 637,414 tokens, AjamiMorph identifies 1,611 high-confidence morphemes, achieving 99.9% coverage. The inventory exhibits a linguistically realistic distribution (66.0% stems, 22.6% suffixes, 11.4% prefixes) and recovers 77.8% of known Hausa affixes. A permutation test that shuffles method assignments (preserving per-method selection sizes) confirms that the observed agreement is above-chance; chi-square remains as a secondary check. A lightweight 5-gram LM comparison (characters vs. consensus morphemes) provides an extrinsic signal. We also report negative results for script-driven Arabic assumptions and LLM-first annotation. This work provides the first unsupervised morpheme inventory for Hausa Ajami and demonstrates consensus as a robust strategy for zero-resource morphology.
Anthology ID:
2026.abjadnlp-1.23
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–171
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.23/
DOI:
Bibkey:
Cite (ACL):
Soumedhik Bharati, Shibam Mandal, Prithwish Ghosh, Swarup Kr Ghosh, and Sayani Mondal. 2026. AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 166–171, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
AjamiMorph: Zero-Annotation Morphological Discovery for Hausa Ajami via Multi-Method Consensus (Bharati et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.23.pdf
Optionalsupplementarymaterial:
 2026.abjadnlp-1.23.OptionalSupplementaryMaterial.rar