Ian Kamajaya


2025

pdf bib
IndoMorph: a Morphology Engine for Indonesian
Ian Kamajaya | David Moeljadi
Proceedings of the Second Workshop in South East Asian Language Processing

Indonesian is an agglutinative language and rich in morphology. Although it has more than 250 million speakers, it is a low resource language in NLP field. Many Indonesian NLP resources are scattered, undocumented, and not publicly available. In this paper we address the issue of analyzing morphology as well as generating Indonesian words. We introduce IndoMorph, a morphology analyzer and word generator for Indonesian. In an agglutinative language, morphology deconstruction can be crucial to understand the structure and meaning of words. IndoMorph can be useful for language modeling and testing certain analyses. In addition, it can be employed to make a new Indonesian subword representation resource such as Indonesian morphology dictionary (IMD), used as a language education tool, or embedded in various applications such as text analysis applications. We hope that IndoMorph can be employed not only in the Indonesian NLP research development, but also in the NLP research of any agglutinative languages.