Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation

Alexey Sorokin


Abstract
We investigate how to improve quality of low-resource morphological inflection without annotating more data. We examine two methods, language models and data augmentation. We show that the model whose decoder that additionally uses the states of the langauge model improves the model quality by 1.5% in combination with both baselines. We also demonstrate that the augmentation of data improves performance by 9% in average when adding 1000 artificially generated word forms to the dataset.
Anthology ID:
2020.lrec-1.490
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3978–3983
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.490
DOI:
Bibkey:
Cite (ACL):
Alexey Sorokin. 2020. Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3978–3983, Marseille, France. European Language Resources Association.
Cite (Informal):
Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation (Sorokin, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.490.pdf