Italian-Ligurian Machine Translation in Its Cultural Context

Christopher R. Haberland; Jean Maillard; Stefano Lusito

Italian-Ligurian Machine Translation in Its Cultural Context

Christopher R. Haberland, Jean Maillard, Stefano Lusito

Abstract

Large multilingual machine translation efforts are driving improved access and performance for under-resourced languages, but often fail to translate culturally specific and local concepts. Additionally, translation from practically relevant input languages may flag behind those that are comparatively over-represented in the training dataset. In this work, we release a new corpus, ZenaMT, containing 7,561 parallel Ligurian-Italian sentences, nearly a fifth of which are also translated in English. This corpus spans five domains: local and international news, Ligurian literature, Genoese Ligurian linguistics concepts, traditional card game rules, and Ligurian geographic expressions. We find that a translation model augmented with ZenaMT improves a baseline by 20%, and by over 25% (BLEU) compared to NLLB-3.3B, which is over 50 times the size. Our results demonstrate the utility of creating data sets for MT that are specifically tailored for the cultural context of Ligurian speakers. We freely release ZenaMT and expect to periodically update the corpus to improve MT performance and domain coverage.

Anthology ID:: 2024.sigul-1.21
Volume:: Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Maite Melero, Sakriani Sakti, Claudia Soria
Venues:: SIGUL | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 168–176
Language:
URL:: https://aclanthology.org/2024.sigul-1.21/
DOI:
Bibkey:
Cite (ACL):: Christopher R. Haberland, Jean Maillard, and Stefano Lusito. 2024. Italian-Ligurian Machine Translation in Its Cultural Context. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 168–176, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Italian-Ligurian Machine Translation in Its Cultural Context (Haberland et al., SIGUL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.sigul-1.21.pdf

PDF Cite Search Fix data