An Extensive Exploration of Back-Translation in 60 Languages

Paul McNamee, Kevin Duh


Abstract
Back-translation is a data augmentation technique that has been shown to improve model quality through the creation of synthetic training bitext. Early studies showed the promise of the technique and follow on studies have produced additional refinements. We have undertaken a broad investigation using back-translation to train models from 60 languages into English; the majority of these languages are considered moderate- or low-resource languages. We observed consistent gains, though compared to prior work we saw conspicuous gains in quite a number of lower-resourced languages. We analyzed differences in translations between baseline and back-translation models, and observed many indications of improved translation quality. Translation of both rare and common terms is improved, and these improvements occur despite the less natural synthetic source-language text used in training.
Anthology ID:
2023.findings-acl.518
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8166–8183
Language:
URL:
https://aclanthology.org/2023.findings-acl.518
DOI:
10.18653/v1/2023.findings-acl.518
Bibkey:
Cite (ACL):
Paul McNamee and Kevin Duh. 2023. An Extensive Exploration of Back-Translation in 60 Languages. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8166–8183, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
An Extensive Exploration of Back-Translation in 60 Languages (McNamee & Duh, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.518.pdf