Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages

Ramy Eskander, Judith Klavans, Smaranda Muresan


Abstract
Polysynthetic languages pose a challenge for morphological analysis due to the root-morpheme complexity and to the word class “squish”. In addition, many of these polysynthetic languages are low-resource. We propose unsupervised approaches for morphological segmentation of low-resource polysynthetic languages based on Adaptor Grammars (AG) (Eskander et al., 2016). We experiment with four languages from the Uto-Aztecan family. Our AG-based approaches outperform other unsupervised approaches and show promise when compared to supervised methods, outperforming them on two of the four languages.
Anthology ID:
W19-4222
Volume:
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Garrett Nicolai, Ryan Cotterell
Venue:
ACL
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
189–195
Language:
URL:
https://aclanthology.org/W19-4222/
DOI:
10.18653/v1/W19-4222
Bibkey:
Cite (ACL):
Ramy Eskander, Judith Klavans, and Smaranda Muresan. 2019. Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages. In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 189–195, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages (Eskander et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4222.pdf