The Helsinki submission to the AmericasNLP shared task

Raúl Vázquez, Yves Scherrer, Sami Virpioja, Jörg Tiedemann


Abstract
The University of Helsinki participated in the AmericasNLP shared task for all ten language pairs. Our multilingual NMT models reached the first rank on all language pairs in track 1, and first rank on nine out of ten language pairs in track 2. We focused our efforts on three aspects: (1) the collection of additional data from various sources such as Bibles and political constitutions, (2) the cleaning and filtering of training data with the OpusFilter toolkit, and (3) different multilingual training techniques enabled by the latest version of the OpenNMT-py toolkit to make the most efficient use of the scarce data. This paper describes our efforts in detail.
Anthology ID:
2021.americasnlp-1.29
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Venues:
AmericasNLP | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
255–264
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.29
DOI:
10.18653/v1/2021.americasnlp-1.29
Bibkey:
Cite (ACL):
Raúl Vázquez, Yves Scherrer, Sami Virpioja, and Jörg Tiedemann. 2021. The Helsinki submission to the AmericasNLP shared task. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 255–264, Online. Association for Computational Linguistics.
Cite (Informal):
The Helsinki submission to the AmericasNLP shared task (Vázquez et al., AmericasNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.americasnlp-1.29.pdf