Unsupervised Morphology Learning with Statistical Paradigms

Hongzhi Xu, Mitchell Marcus, Charles Yang, Lyle Ungar


Abstract
This paper describes an unsupervised model for morphological segmentation that exploits the notion of paradigms, which are sets of morphological categories (e.g., suffixes) that can be applied to a homogeneous set of words (e.g., nouns or verbs). Our algorithm identifies statistically reliable paradigms from the morphological segmentation result of a probabilistic model, and chooses reliable suffixes from them. The new suffixes can be fed back iteratively to improve the accuracy of the probabilistic model. Finally, the unreliable paradigms are subjected to pruning to eliminate unreliable morphological relations between words. The paradigm-based algorithm significantly improves segmentation accuracy. Our method achieves start-of-the-art results on experiments using the Morpho-Challenge data, including English, Turkish, and Finnish.
Anthology ID:
C18-1005
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–54
Language:
URL:
https://aclanthology.org/C18-1005/
DOI:
Bibkey:
Cite (ACL):
Hongzhi Xu, Mitchell Marcus, Charles Yang, and Lyle Ungar. 2018. Unsupervised Morphology Learning with Statistical Paradigms. In Proceedings of the 27th International Conference on Computational Linguistics, pages 44–54, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Morphology Learning with Statistical Paradigms (Xu et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1005.pdf
Code
 xuhongzhi/ParaMA