Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation?

Santiago Góngora, Nicolás Giossa, Luis Chiruzzo


Abstract
Machine translation for low-resource languages, such as Guarani, is a challenging task due to the lack of data. One way of tackling it is using pretrained word embeddings for model initialization. In this work we try to check if currently available data is enough to train rich embeddings for enhancing MT for Guarani and Spanish, by building a set of word embedding collections and training MT systems using them. We found that the trained vectors are strong enough to slightly improve the performance of some of the translation models and also to speed up the training convergence.
Anthology ID:
2022.computel-1.16
Volume:
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venues:
ACL | ComputEL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
127–132
Language:
URL:
https://aclanthology.org/2022.computel-1.16
DOI:
10.18653/v1/2022.computel-1.16
Bibkey:
Cite (ACL):
Santiago Góngora, Nicolás Giossa, and Luis Chiruzzo. 2022. Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation?. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 127–132, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Can We Use Word Embeddings for Enhancing Guarani-Spanish Machine Translation? (Góngora et al., ComputEL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.computel-1.16.pdf
Code
 sgongora27/Guarani-embeddings-for-MT