Alla Lo
2022
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French
Cheikh M. Bamba Dione
|
Alla Lo
|
Elhadji Mamadou Nguer
|
Sileye Ba
Proceedings of the Thirteenth Language Resources and Evaluation Conference
In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and Transformer architectures. We trained our models on the parallel French-Wolof corpus (Nguer et al., 2020) of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, backtranslation and the copied corpus method. We evaluate the models using BLEU score and find that the transformer outperforms the classic sequence-to-sequence model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on subword-level based units. For such models, using backtranslation proves to be slightly beneficial in low-resource Wolof to high-resource French language translation for the transformer-based models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with backtranslation leads to a slight improvement of the translation quality.
2020
SENCORPUS: A French-Wolof Parallel Corpus
Elhadji Mamadou Nguer
|
Alla Lo
|
Cheikh M. Bamba Dione
|
Sileye O. Ba
|
Moussa Lo
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this paper, we report efforts towards the acquisition and construction of a bilingual parallel corpus between French and Wolof, a Niger-Congo language belonging to the Northern branch of the Atlantic group. The corpus is constructed as part of the SYSNET3LOc project. It currently contains about 70,000 French-Wolof parallel sentences drawn on various sources from different domains. The paper discusses the data collection procedure, conversion, and alignment of the corpus as well as it’s application as training data for neural machine translation. In fact, using this corpus, we were able to create word embedding models for Wolof with relatively good results. Currently, the corpus is being used to develop a neural machine translation model to translate French sentences into Wolof.