Maciej Sumalvico


2018

pdf bib
Corpora of Typical Sentences
Lydia Müller | Uwe Quasthoff | Maciej Sumalvico
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Unsupervised Learning of Morphology with Graph Sampling
Maciej Sumalvico
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We introduce a language-independent, graph-based probabilistic model of morphology, which uses transformation rules operating on whole words instead of the traditional morphological segmentation. The morphological analysis of a set of words is expressed through a graph having words as vertices and structural relationships between words as edges. We define a probability distribution over such graphs and develop a sampler based on the Metropolis-Hastings algorithm. The sampling is applied in order to determine the strength of morphological relationships between words, filter out accidental similarities and reduce the set of rules necessary to explain the data. The model is evaluated on the task of finding pairs of morphologically similar words, as well as generating new words. The results are compared to a state-of-the-art segmentation-based approach.