IndoMorph: a Morphology Engine for Indonesian

Ian Kamajaya, David Moeljadi


Abstract
Indonesian is an agglutinative language and rich in morphology. Although it has more than 250 million speakers, it is a low resource language in NLP field. Many Indonesian NLP resources are scattered, undocumented, and not publicly available. In this paper we address the issue of analyzing morphology as well as generating Indonesian words. We introduce IndoMorph, a morphology analyzer and word generator for Indonesian. In an agglutinative language, morphology deconstruction can be crucial to understand the structure and meaning of words. IndoMorph can be useful for language modeling and testing certain analyses. In addition, it can be employed to make a new Indonesian subword representation resource such as Indonesian morphology dictionary (IMD), used as a language education tool, or embedded in various applications such as text analysis applications. We hope that IndoMorph can be employed not only in the Indonesian NLP research development, but also in the NLP research of any agglutinative languages.
Anthology ID:
2025.sealp-1.7
Volume:
Proceedings of the Second Workshop in South East Asian Language Processing
Month:
January
Year:
2025
Address:
Online
Editors:
Derry Wijaya, Alham Fikri Aji, Clara Vania, Genta Indra Winata, Ayu Purwarianti
Venues:
sealp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
72–81
Language:
URL:
https://aclanthology.org/2025.sealp-1.7/
DOI:
Bibkey:
Cite (ACL):
Ian Kamajaya and David Moeljadi. 2025. IndoMorph: a Morphology Engine for Indonesian. In Proceedings of the Second Workshop in South East Asian Language Processing, pages 72–81, Online. Association for Computational Linguistics.
Cite (Informal):
IndoMorph: a Morphology Engine for Indonesian (Kamajaya & Moeljadi, sealp 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.sealp-1.7.pdf