Glossy Bytes: Neural Glossing using Subword Encoding

Ziggy Cross, Michelle Yun, Ananya Apparaju, Jata MacCabe, Garrett Nicolai, Miikka Silfverberg


Abstract
This paper presents several different neural subword modelling based approaches to interlinear glossing for seven under-resourced languages as a part of the 2023 SIGMORPHON shared task on interlinear glossing. We experiment with various augmentation and tokenization strategies for both the open and closed tracks of data. We found that while byte-level models may perform well for greater amounts of data, character based approaches remain competitive in their performance in lower resource settings.
Anthology ID:
2023.sigmorphon-1.24
Volume:
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, Çağrı Çöltekin
Venue:
SIGMORPHON
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
222–229
Language:
URL:
https://aclanthology.org/2023.sigmorphon-1.24
DOI:
10.18653/v1/2023.sigmorphon-1.24
Bibkey:
Cite (ACL):
Ziggy Cross, Michelle Yun, Ananya Apparaju, Jata MacCabe, Garrett Nicolai, and Miikka Silfverberg. 2023. Glossy Bytes: Neural Glossing using Subword Encoding. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 222–229, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Glossy Bytes: Neural Glossing using Subword Encoding (Cross et al., SIGMORPHON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.sigmorphon-1.24.pdf