Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French

Cheikh M. Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Sileye Ba


Abstract
In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and Transformer architectures. We trained our models on the parallel French-Wolof corpus (Nguer et al., 2020) of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, backtranslation and the copied corpus method. We evaluate the models using BLEU score and find that the transformer outperforms the classic sequence-to-sequence model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on subword-level based units. For such models, using backtranslation proves to be slightly beneficial in low-resource Wolof to high-resource French language translation for the transformer-based models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with backtranslation leads to a slight improvement of the translation quality.
Anthology ID:
2022.lrec-1.717
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6654–6661
Language:
URL:
https://aclanthology.org/2022.lrec-1.717
DOI:
Bibkey:
Cite (ACL):
Cheikh M. Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, and Sileye Ba. 2022. Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6654–6661, Marseille, France. European Language Resources Association.
Cite (Informal):
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French (Dione et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.717.pdf