Morphosyntactic Disambiguation in an Endangered Language Setting

Jeff Ens, Mika Hämäläinen, Jack Rueter, Philippe Pasquier


Abstract
Endangered Uralic languages present a high variety of inflectional forms in their morphology. This results in a high number of homonyms in inflections, which introduces a lot of morphological ambiguity in sentences. Previous research has employed constraint grammars to address this problem, however CGs are often unable to fully disambiguate a sentence, and their development is labour intensive. We present an LSTM based model for automatically ranking morphological readings of sentences based on their quality. This ranking can be used to evaluate the existing CG disambiguators or to directly morphologically disambiguate sentences. Our approach works on a morphological abstraction and it can be trained with a very small dataset.
Anthology ID:
W19-6139
Volume:
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Month:
September–October
Year:
2019
Address:
Turku, Finland
Editors:
Mareike Hartmann, Barbara Plank
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
345–349
Language:
URL:
https://aclanthology.org/W19-6139
DOI:
Bibkey:
Cite (ACL):
Jeff Ens, Mika Hämäläinen, Jack Rueter, and Philippe Pasquier. 2019. Morphosyntactic Disambiguation in an Endangered Language Setting. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 345–349, Turku, Finland. Linköping University Electronic Press.
Cite (Informal):
Morphosyntactic Disambiguation in an Endangered Language Setting (Ens et al., NoDaLiDa 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-6139.pdf