Cross-Lingual Wolastoqey-English Definition Modelling

Diego Bear, Paul Cook


Abstract
Definition modelling is the task of automatically generating a dictionary-style definition given a target word. In this paper, we consider cross-lingual definition generation. Specifically, we generate English definitions for Wolastoqey (Malecite-Passamaquoddy) words. Wolastoqey is an endangered, low-resource polysynthetic language. We hypothesize that sub-word representations based on byte pair encoding (Sennrich et al., 2016) can be leveraged to represent morphologically-complex Wolastoqey words and overcome the challenge of not having large corpora available for training. Our experimental results demonstrate that this approach outperforms baseline methods in terms of BLEU score. 
Anthology ID:
2021.ranlp-1.17
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
138–146
Language:
URL:
https://aclanthology.org/2021.ranlp-1.17
DOI:
Bibkey:
Cite (ACL):
Diego Bear and Paul Cook. 2021. Cross-Lingual Wolastoqey-English Definition Modelling. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 138–146, Held Online. INCOMA Ltd..
Cite (Informal):
Cross-Lingual Wolastoqey-English Definition Modelling (Bear & Cook, RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.17.pdf