Does partial pretranslation can improve low ressourced-languages pairs?

Raoul Blin


Abstract
We study the effects of a local and punctual pretranslation of the source corpus on the performance of a Transformer translation model. The pretranslations are performed at the morphological (morpheme translation), lexical (word translation) and morphosyntactic (numeral groups and dates) levels. We focus on small and medium-sized training corpora (50K 2.5M bisegments) and on a linguistically distant language pair (Japanese and French). We find that this type of pretranslation does not lead to significant progress. We describe the motivations of the approach, the specific difficulties of Japanese-French translation. We discuss the possible reasons for the observed underperformance.
Anthology ID:
2022.wat-1.10
Volume:
Proceedings of the 9th Workshop on Asian Translation
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
WAT
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
82–88
Language:
URL:
https://aclanthology.org/2022.wat-1.10
DOI:
Bibkey:
Cite (ACL):
Raoul Blin. 2022. Does partial pretranslation can improve low ressourced-languages pairs?. In Proceedings of the 9th Workshop on Asian Translation, pages 82–88, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
Cite (Informal):
Does partial pretranslation can improve low ressourced-languages pairs? (Blin, WAT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wat-1.10.pdf