Improving Low-resource RRG Parsing with Structured Gloss Embeddings

Roland Eibers, Kilian Evang, Laura Kallmeyer


Abstract
Treebanking for local languages is hampered by the lack of existing parsers to generate pre-annotations. However, it has been shown that reasonably accurate parsers can be bootstrapped with little initial training data when use is made of the information in interlinear glosses and translations that language documentation data for such treebanks typically comes with. In this paper, we improve upon such a bootstrapping model by representing glosses using a combination of morphological feature vectors and pre-trained lemma embeddings. We also contribute a mapping from glosses to Universal Dependencies morphological features.
Anthology ID:
2023.fieldmatters-1.6
Volume:
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Oleg Serikov, Ekaterina Voloshina, Anna Postnikova, Elena Klyachko, Ekaterina Vylomova, Tatiana Shavrina, Eric Le Ferrand, Valentin Malykh, Francis Tyers, Timofey Arkhangelskiy, Vladislav Mikhailov
Venue:
FieldMatters
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–51
Language:
URL:
https://aclanthology.org/2023.fieldmatters-1.6
DOI:
10.18653/v1/2023.fieldmatters-1.6
Bibkey:
Cite (ACL):
Roland Eibers, Kilian Evang, and Laura Kallmeyer. 2023. Improving Low-resource RRG Parsing with Structured Gloss Embeddings. In Proceedings of the Second Workshop on NLP Applications to Field Linguistics, pages 46–51, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Improving Low-resource RRG Parsing with Structured Gloss Embeddings (Eibers et al., FieldMatters 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.fieldmatters-1.6.pdf