IGT2P: From Interlinear Glossed Texts to Paradigms

Sarah Moeller; Ling Liu; Changbing Yang; Katharina Von Der Wense; Mans Hulden

doi:10.18653/v1/2020.emnlp-main.424

IGT2P: From Interlinear Glossed Texts to Paradigms

Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, Mans Hulden

Abstract

An intermediate step in the linguistic analysis of an under-documented language is to find and organize inflected forms that are attested in natural speech. From this data, linguists generate unseen inflected word forms in order to test hypotheses about the language’s inflectional patterns and to complete inflectional paradigm tables. To get the data linguists spend many hours manually creating interlinear glossed texts (IGTs). We introduce a new task that speeds this process and automatically generates new morphological resources for natural language processing systems: IGT-to-paradigms (IGT2P). IGT2P generates entire morphological paradigms from IGT input. We show that existing morphological reinflection models can solve the task with 21% to 64% accuracy, depending on the language. We further find that (i) having a language expert spend only a few hours cleaning the noisy IGT data improves performance by as much as 21 percentage points, and (ii) POS tags, which are generally considered a necessary part of NLP morphological reinflection input, have no effect on the accuracy of the models considered here.

Anthology ID:: 2020.emnlp-main.424
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5251–5262
Language:
URL:: https://aclanthology.org/2020.emnlp-main.424
DOI:: 10.18653/v1/2020.emnlp-main.424
Bibkey:
Cite (ACL):: Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, and Mans Hulden. 2020. IGT2P: From Interlinear Glossed Texts to Paradigms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5251–5262, Online. Association for Computational Linguistics.
Cite (Informal):: IGT2P: From Interlinear Glossed Texts to Paradigms (Moeller et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.424.pdf
Video:: https://slideslive.com/38939208

PDF Cite Search Video