Improving NMT from a Low-Resource Source Language: A Use Case from Catalan to Chinese via Spanish

Yongjian Chen, Antonio Toral, Zhijian Li, Mireia Farrús


Abstract
The effectiveness of neural machine translation is markedly constrained in low-resource scenarios, where the scarcity of parallel data hampers the development of robust models. This paper focuses on the scenario where the source language is low-resourceand there exists a related high-resource language, for which we introduce a novel approach that combines pivot translation and multilingual training. As a use case we tackle the automatic translation from Catalan to Chinese, using Spanish as an additional language. Our evaluation, conducted on the FLORES-200 benchmark, compares our new approach against a vanilla baseline alongside other models representing various low-resource techniques in the Catalan-to-Chinese context. Experimental results highlight the efficacy of our proposed method, which outperforms existing models, notably demonstrating significant improvements both in translation quality and in lexical diversity.
Anthology ID:
2024.eamt-1.20
Volume:
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Month:
June
Year:
2024
Address:
Sheffield, UK
Editors:
Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
229–245
Language:
URL:
https://aclanthology.org/2024.eamt-1.20
DOI:
Bibkey:
Cite (ACL):
Yongjian Chen, Antonio Toral, Zhijian Li, and Mireia Farrús. 2024. Improving NMT from a Low-Resource Source Language: A Use Case from Catalan to Chinese via Spanish. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 229–245, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):
Improving NMT from a Low-Resource Source Language: A Use Case from Catalan to Chinese via Spanish (Chen et al., EAMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eamt-1.20.pdf