Survey of Low-Resource Machine Translation

Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, Alexandra Birch


Abstract
We present a survey covering the state of the art in low-resource machine translation (MT) research. There are currently around 7,000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available. We present a summary of this topical research field and provide a description of the techniques evaluated by researchers in several recent shared tasks in low-resource MT.
Anthology ID:
2022.cl-3.6
Volume:
Computational Linguistics, Volume 48, Issue 3 - September 2022
Month:
September
Year:
2022
Address:
Cambridge, MA
Venue:
CL
SIG:
Publisher:
MIT Press
Note:
Pages:
673–732
Language:
URL:
https://aclanthology.org/2022.cl-3.6
DOI:
10.1162/coli_a_00446
Bibkey:
Cite (ACL):
Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, and Alexandra Birch. 2022. Survey of Low-Resource Machine Translation. Computational Linguistics, 48(3):673–732.
Cite (Informal):
Survey of Low-Resource Machine Translation (Haddow et al., CL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.cl-3.6.pdf
Data
CC100FLoResFLoRes-101SamanantarTatoeba