System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task

Joseph Attieh; Zachary Hopton; Yves Scherrer; Tanja Samardzic

doi:10.18653/v1/2024.americasnlp-1.18

System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task

Joseph Attieh, Zachary Hopton, Yves Scherrer, Tanja Samardžić

Abstract

This paper presents the system description of the NordicsAlps team for the AmericasNLP 2024 Machine Translation Shared Task 1. We investigate the effect of tokenization on translation quality by exploring two different tokenization schemes: byte-level and redundancy-driven tokenization. We submitted three runs per language pair. The redundancy-driven tokenization ranked first among all submissions, scoring the highest average chrF2++, chrF, and BLEU metrics (averaged across all languages). These findings demonstrate the importance of carefully tailoring the tokenization strategies of machine translation systems, particularly in resource-constrained scenarios.

Anthology ID:: 2024.americasnlp-1.18
Volume:: Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Manuel Mager, Abteen Ebrahimi, Shruti Rijhwani, Arturo Oncevay, Luis Chiruzzo, Robert Pugh, Katharina von der Wense
Venues:: AmericasNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 150–158
Language:
URL:: https://aclanthology.org/2024.americasnlp-1.18/
DOI:: 10.18653/v1/2024.americasnlp-1.18
Bibkey:
Cite (ACL):: Joseph Attieh, Zachary Hopton, Yves Scherrer, and Tanja Samardžić. 2024. System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task. In Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), pages 150–158, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task (Attieh et al., AmericasNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.americasnlp-1.18.pdf

PDF Cite Search Fix data