Unsupervised Learning of Morphology with Graph Sampling

Maciej Sumalvico


Abstract
We introduce a language-independent, graph-based probabilistic model of morphology, which uses transformation rules operating on whole words instead of the traditional morphological segmentation. The morphological analysis of a set of words is expressed through a graph having words as vertices and structural relationships between words as edges. We define a probability distribution over such graphs and develop a sampler based on the Metropolis-Hastings algorithm. The sampling is applied in order to determine the strength of morphological relationships between words, filter out accidental similarities and reduce the set of rules necessary to explain the data. The model is evaluated on the task of finding pairs of morphologically similar words, as well as generating new words. The results are compared to a state-of-the-art segmentation-based approach.
Anthology ID:
R17-1093
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
723–732
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_093
DOI:
10.26615/978-954-452-049-6_093
Bibkey:
Cite (ACL):
Maciej Sumalvico. 2017. Unsupervised Learning of Morphology with Graph Sampling. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 723–732, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Unsupervised Learning of Morphology with Graph Sampling (Sumalvico, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_093