Unsupervised morphological segmentation in a language with reduplication

Simon Todd, Annie Huang, Jeremy Needle, Jennifer Hay, Jeanette King


Abstract
We present an extension of the Morfessor Baseline model of unsupervised morphological segmentation (Creutz and Lagus, 2007) that incorporates abstract templates for reduplication, a typologically common but computationally underaddressed process. Through a detailed investigation that applies the model to Maori, the ̄ Indigenous language of Aotearoa New Zealand, we show that incorporating templates improves Morfessor’s ability to identify instances of reduplication, and does so most when there are multiple minimally-overlapping templates. We present an error analysis that reveals important factors to consider when applying the extended model and suggests useful future directions.
Anthology ID:
2022.sigmorphon-1.2
Volume:
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
July
Year:
2022
Address:
Seattle, Washington
Venue:
SIGMORPHON
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–22
Language:
URL:
https://aclanthology.org/2022.sigmorphon-1.2
DOI:
10.18653/v1/2022.sigmorphon-1.2
Bibkey:
Cite (ACL):
Simon Todd, Annie Huang, Jeremy Needle, Jennifer Hay, and Jeanette King. 2022. Unsupervised morphological segmentation in a language with reduplication. In Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 12–22, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Unsupervised morphological segmentation in a language with reduplication (Todd et al., SIGMORPHON 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sigmorphon-1.2.pdf
Video:
 https://aclanthology.org/2022.sigmorphon-1.2.mp4
Code
 sjtodd/morfessored