An Inflectional Database for Gitksan

Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg


Abstract
This paper presents a new inflectional resource for Gitksan, a low-resource Indigenous language of Canada. We use Gitksan data in interlinear glossed format, stemming from language documentation efforts, to build a database of partial inflection tables. We then enrich this morphological resource by filling in blank slots in the partial inflection tables using neural transformer reinflection models. We extend the training data for our transformer reinflection models using two data augmentation techniques: data hallucination and back-translation. Experimental results demonstrate substantial improvements from data augmentation, with data hallucination delivering particularly impressive gains. We also release reinflection models for Gitksan.
Anthology ID:
2022.lrec-1.710
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6597–6606
Language:
URL:
https://aclanthology.org/2022.lrec-1.710
DOI:
Bibkey:
Cite (ACL):
Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, and Miikka Silfverberg. 2022. An Inflectional Database for Gitksan. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6597–6606, Marseille, France. European Language Resources Association.
Cite (Informal):
An Inflectional Database for Gitksan (Oliver et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.710.pdf
Code
 mpsilfve/gitksan-data
Data
Universal Dependencies