Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

Kevin Duh; Paul McNamee; Matt Post; Brian Thompson

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

Kevin Duh, Paul McNamee, Matt Post, Brian Thompson

Abstract

Research in machine translation (MT) is developing at a rapid pace. However, most work in the community has focused on languages where large amounts of digital resources are available. In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili. These languages are of social importance and serve as test-beds for developing technologies that perform reasonably well despite the low-resource constraint. Our findings suggest that statistical machine translation (SMT) and neural machine translation (NMT) can perform similarly in low-resource scenarios, but neural systems require more careful tuning to match performance. We also investigate how to exploit additional data, such as bilingual text harvested from the web, or user dictionaries; we find that NMT can significantly improve in performance with the use of these additional data. Finally, we survey the landscape of machine translation resources for the languages of Africa and provide some suggestions for promising future research directions.

Anthology ID:: 2020.lrec-1.325
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 2667–2675
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.325/
DOI:
Bibkey:
Cite (ACL):: Kevin Duh, Paul McNamee, Matt Post, and Brian Thompson. 2020. Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2667–2675, Marseille, France. European Language Resources Association.
Cite (Informal):: Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages (Duh et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.325.pdf

PDF Cite Search Fix data