Unsupervised Induction of Ukrainian Morphological Paradigms for the New Lexicon: Extending Coverage for Named Entities and Neologisms using Inflection Tables and Unannotated Corpora

Bogdan Babych


Abstract
The paper presents an unsupervised method for quickly extending a Ukrainian lexicon by generating paradigms and morphological feature structures for new Named Entities and neologisms, which are not covered by existing static morphological resources. This approach addresses a practical problem of modelling paradigms for entities created by the dynamic processes in the lexicon: this problem is especially serious for highly-inflected languages in domains with specialised or quickly changing lexicon. The method uses an unannotated Ukrainian corpus and a small fixed set of inflection tables, which can be found in traditional grammar textbooks. The advantage of the proposed approach is that updating the morphological lexicon does not require training or linguistic annotation, allowing fast knowledge-light extension of an existing static lexicon to improve morphological coverage on a specific corpus. The method is implemented in an open-source package on a GitHub repository. It can be applied to other low-resourced inflectional languages which have internet corpora and linguistic descriptions of their inflection system, following the example of inflection tables for Ukrainian. Evaluation results shows consistent improvements in coverage for Ukrainian corpora of different corpus types.
Anthology ID:
W19-3701
Volume:
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–11
Language:
URL:
https://aclanthology.org/W19-3701
DOI:
10.18653/v1/W19-3701
Bibkey:
Cite (ACL):
Bogdan Babych. 2019. Unsupervised Induction of Ukrainian Morphological Paradigms for the New Lexicon: Extending Coverage for Named Entities and Neologisms using Inflection Tables and Unannotated Corpora. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 1–11, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Induction of Ukrainian Morphological Paradigms for the New Lexicon: Extending Coverage for Named Entities and Neologisms using Inflection Tables and Unannotated Corpora (Babych, BSNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3701.pdf