Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

Jaehoon Oh, Jongwoo Ko, Se-Young Yun


Abstract
Translation has played a crucial role in improving the performance on multilingual tasks: (1) to generate the target language data from the source language data for training and (2) to generate the source language data from the target language data for inference. However, prior works have not considered the use of both translations simultaneously. This paper shows that combining them can synergize the results on various multilingual sentence classification tasks. We empirically find that translation artifacts stylized by translators are the main factor of the performance gain. Based on this analysis, we adopt two training methods, SupCon and MixUp, considering translation artifacts. Furthermore, we propose a cross-lingual fine-tuning algorithm called MUSC, which uses SupCon and MixUp jointly and improves the performance. Our code is available at https://github.com/jongwooko/MUSC.
Anthology ID:
2022.emnlp-main.452
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6747–6754
Language:
URL:
https://aclanthology.org/2022.emnlp-main.452
DOI:
10.18653/v1/2022.emnlp-main.452
Bibkey:
Cite (ACL):
Jaehoon Oh, Jongwoo Ko, and Se-Young Yun. 2022. Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6747–6754, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks (Oh et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.452.pdf