A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a new African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both to additional languages and to additional domains is to fine-tune large pre-trained models on small quantities of high-quality translation data.


Introduction
Enormous efforts have been invested in making language and translation models more multilingual while leveraging the maximal amount of data for training, most prominently large crawls of monolingual and parallel data from the web (El-Kishky et al., 2020;Schwenk et al., 2021b,a;Xue et al., 2021b). The resulting models are now capable of translating between hundreds of languages, including language pairs that in isolation do not have large collections of parallel data (Tang et al., 2020;Xue et al., 2021a;Fan et al., 2021b). For example, M2M-100 (Goyal et al., 2021) can translate (with low accuracy) between Hausa and Yorùbá, two of the most widely spoken languages in Nigeria, even though there is barely any parallel data available for training. For languages that are not included in the set of training languages, the model would have no knowledge on how to generate translations. Does this mean there is no hope for languages that do not have large presence on the web and are therefore not included in these pre-trained models?
We investigate how large-scale pre-trained models can be leveraged for the translation of unseen low-resource languages and domains. We address this question by studying 16 African languages that are largely underrepresented in NLP research (Joshi et al., 2020; and further have little to no training data available ( §3). These languages provide an ideal testbed for two challenging knowledge transfer tasks: (1) How can pre-trained models create translations for languages unseen at train-ing time? and (2) Since training data may only exist in single domain (i.e. religious texts), how can a model be trained in one domain and translate another effectively at test time?
These questions are extremely relevant for our chosen languages because they all have millions of native speakers and a massive need for translation technologies. For example, news concerning the African continent are almost exclusively published in English, French, or Arabic, and thereby inaccessible for speakers of only native African languages. This creates a bottleneck for information transmission, which becomes even more critical in times of crises (Öktem et al., 2020;Anastasopoulos et al., 2020;Öktem et al., 2021). Furthermore, the task of translating news has historically played a central role in translation research, e.g. in shared tasks since 2008 (Callison-Burch et al., 2008) and as a test for determining human parity (Hassan et al., 2018;Läubli et al., 2018;Toral et al., 2018). To spur the development of dedicated news translation models for Africa, we construct a benchmark of news translation for translating between 16 native African languages and English or French ( §4).
This allows us to compare three approaches to leveraging large-scale multilingual models for the translation of previously unseen languages: (1) zero-shot transfer, (2) continual pre-training on monolingual data, and (3) multi-domain finetuning on parallel data ( §5). We find that finetuning pre-trained models on a few thousand sentences of high quality bitext is remarkably effective, and can be further augmented with continual pretraining on African languages and fine-tuning on news domain data ( §6). Our contributions are the following: 1 1. We create a new African news corpus for machine translation (following principles of participatory research ∀ et al. (2020)) covering 16 African languages.
We find that having a targeted collection of translations is surprisingly effective, showcasing the power of local knowledge in so-called "zeroresource" scenarios (Bird, 2020). This paints a promising picture for the development of NLP technology for understudied languages: being able to customize these models for new language of interest with as little as 2k sentences and a few fine-tuning steps, MT developers and users from any language community are less dependent on choices and monetary interest of industry powerhouses from the Global North (Paullada, 2020).

Related Work
African MT Datasets. One of the major challenges of developing MT models for African languages is lack of data. There are many attempts to automatically crawl and align sentences from the web (Schwenk et al., 2021a,b). Nevertheless, the resulting corpora for many African languages are typically small and of poor quality (Kreutzer et al., 2021). Other cleaner parallel sources are mostly from religious sources, like the Bible covering over 1600 languages (McCarthy et al., 2020) and JW300 (Agić and Vulić, 2019) from JW.org with over 343 languages, including over 100 African languages. Apart from the training dataset, evaluation datasets are needed to test the performance of multilingual MT models. The FLORES-101 (Goyal et al., 2021) evaluation set, sourced from Wikipedia and manually translated, covers the largest number of languages, including 20 African languages. Finally, while other evaluation datasets for translating into or from African languages have been developed (Siminyu et al., 2021;Emezue and Dossou, 2020;Azunre et al., 2021b;Nyoni and Bassett, 2021;Gezmu et al., 2021;Ali et al., 2021), unfortunately there are only a few African languages with evaluation datasets in the news domain (Adelani et al., 2021a;Mabuya et al., 2021;Ezeani et al., 2020) but ours covers 11 African languages ( §4).
Low-resource MT. Interest in low-resource MT has been increasing both within the MT research community (Haddow et al., 2021), as well as in native speaker communities Azunre et al., 2021a;Mager et al., 2021 Transfer learning from high-resource languages has achieved more promising results: Transfer from multilingual pre-trained language models (PLM), like mBART50 (Tang et al., 2020) and MT5 (Xue et al., 2021b), and large-scale multilingual MT often outperforms bilingual MT (Tran et al., 2021;Yang et al., 2021). For low-resource languages this strategy outperforms the baseline (Transformer) models (Birch et al., 2021;Adelani et al., 2021a;Lee et al., 2022). The performance can be further improved by large scale pre-training (Reid et al., 2021;Emezue and Dossou, 2021).

Focus Languages and Their Data
Focus Languages. We focus on 16 African languages with varying quantities of available data (Joshi et al., 2020), including moderately lowresource languages such as Swahili and Hausa, and very low-resource languages such as Ghomálá' 2 with the Bible being its largest available corpus. Table 1 provides an overview of the focus languages, including the language families, location and number of speakers, and the source and original language for our corpus. The languages are from four language families: Afro-Asiatic (e.g. Hausa), Nilo-Saharan (e.g. Luo), English Creole (e.g. Nigerian-Pidgin/Naija) and Niger-Congo. Most of the languages (13 out of 16) are from the Niger-Congo 2 Spoken by an estimated 1.1M people in Cameroon family, which is the largest language family in Africa. Six of the languages are predominantly spoken in Francophone countries of Africa, while the remainder are predominantly spoken in Anglophone countries of Africa. In contrast to previous work Gowda et al., 2021), we do not focus exclusively on translation to/from English since this is not the primary language of the Francophone Africa community. All languages are spoken by at least one million speakers.
Language Characteristics. All languages are written in Latin script, using letters of the basic Latin alphabet with a few omissions (e.g "c", "q", "x", "z") and additions (e.g. "E", "O", "N", "o . ", including digraphs like "gb", "kp", "gh", and sometimes more than two-character letters). 13 of the languages are tonal, and about nine make use of diacritics. Many African languages are morphologically rich. For example, all Bantu languages are agglutinative. Fon, Mossi, and Yorùbá are highly isolating. All languages follow the Subject-Verb-Object sentence structure like English and French. Table C provides more details.
Existing Parallel Corpora. We curate publicly available parallel data for our focus languages, which consists primarily of text in the religious domain. For most African languages, the largest available parallel corpora is JW300 (Agić and Vulić, 2019), sourced from jw.org, which publishes biblical texts as well as lifestyle and opinion columns. Varying quantities of data are available for 11 of the 16 focus languages. Éwé, Igbo, Swahili, Setswana, Twi, Yorùbá, and isiZulu have over 400K parallel sentences. Hausa and Mossi have slightly more than 200K parallel sentences, while Fon and Naija have around 30K sentences. For the remaining five languages that are not in the JW300 corpus, 3 we make use of the Bible. 4 We aligned the sentences automatically by the verses (around 31k in total). Ghomálá' only has the New Testament with 8k verses. Bambara and Wolof are missing some verses and books, leading to a total size of 28K and 22K. Table 1 summarizes this information about the religious (REL) corpora.

Data Collection Process
We introduce our newly translated news corpus; MAFAND-MT -Masakhane Anglo & Franco Africa News Dataset for Machine Translation. Table 1 gives the news source and data splits for 11 African languages which includes six languages (bam, bbj, ewe, fon, mos, wol) spoken predominantly in Francophone Africa and five languages (lug, luo, pcm, tsn, twi) spoken predominantly in Anglophone Africa. The MAFAND-MT corpus was created in three steps: 1. Crawling and preprocessing of news websites from local newspapers that are publishing in English and French. Raw texts from the web were segmented into sentences. Most languages were crawled from one or two sites, except for Wolof and Fon that were crawled from four and seven news websites respectively due to local French language newspapers having very few articles. We also ensured that the articles came from a variety of topics e.g. politics, sports, culture, technology, society, religion, and education. This was carried out by native speakers of the target language with source language proficiency.
2. Translation of 5k-8k sentences by professional translators.The translation process took one to four months depending on the availability of the translators.
3. Quality control was provided by native speakers, who discussed and, if possible, fixed problematic translations and ran automatic checks to detect misspellings, duplicated sentences, and alignment problems.
Following the recommendations of ∀ et al. (2020), we design the process to be participatory: Everyone involved in the corpus creation is a native speaker of the respective target languages and has societal knowledge about the communities that speak those languages. This is particularly important for curation and quality control to ensure that the resulting material is appropriate and relevant for stakeholders of the final MT models Kreutzer et al., 2021). Furthermore, everyone received appropriate remuneration. To enable cross-disciplinary knowledge transfer between participants in the individual steps, every language was assigned a coordinator. The coordinator conducted the initial curation in the first step, and communicated with translators and quality checkers throughout the following steps.
Other Available Parallel Corpora. We found five African languages with available parallel texts in the news domain: Hausa 5 , Igbo (Ezeani et al., 2020), Swahili 6 , Yorùbá (Adelani et al., 2021a), and isiZulu (Mabuya et al., 2021). Table 1 provides news source, the TRAIN, DEV and TEST splits. Appendix B provides details on the pre-processing of the available news corpora.

Monolingual News Corpus
To adapt available multilingual pre-trained models via continued pre-training to African languages, we curated texts from the 17 highest-resourced African languages and three non-native African languages that are widely spoken on the continent (Arabic, English, and French). The selection of African languages is based on their coverage in mC4 (Xue et al., 2021b), AfriBERTa corpora (Ogueji et al., 2021), and other publicly available news websites like VOA and BBC. We limited the size of the corpus extracted from mC4 to the first 30 million sentences (roughly 1GB of data) for Afrikaans, Amharic, Arabic, English, French, and Swahili. In total, we collected about 12.3 GB of data. Appendix C provides more details about the pre-training corpus.

Baseline Models
We experiment with pre-trained multilingual models and our own bilingual MT baselines. We focus

Transfer Learning Across Languages
We describe two methods for adding new languages to existing models: continual pre-training and many-to-many multilingual translation.
Continual Pre-training. The effectiveness of PLMs is limited on extremely low-resource languages because they rarely, if ever, occur in the pretraining corpus (Wang et al., 2020;Liu et al., 2021). As shown in Table 2, even for MT5 and M2M-100, which cover 100 languages, less than half of the African languages under study are included. To adapt the existing PLMs to our languages corpora and domains, we apply continual pre-training (Gururangan et al., 2020;Liu et al., 2021) using our collected monolingual corpus. Specifically, before 7 https://github.com/google/sentencepiece fine-tuning on the parallel MT data, models are pretrained with their original training objective and vocabulary 8 on the monolingual corpus. Pre-training parameters can be found in the appendix. We refer to the models adapted to African languages as AfriMT5, AfriByT5, and AfriMBART.
Many-to-Many Translation. We fine-tuned M2M-100 for African multilingual translation to create English-and French-centric models.
For the English-centric model, the M2M-100 model was fine-tuned on the news data for en-{hau, ibo, lug, luo, pcm, swa, tsn, twi, yor, zul} while the French-centric model is trained on fr-{bam, bbj, ewe, fon, mos, wol}. Languages not included in the pre-trained M2M-100 model were assigned the language code of a language included in M2M-100 but excluded from our study.

Transfer Learning Across Domains
As there is very limited MT data on the news domain, we compare different methods that combine the large data from the religious domain (REL) and the small data from the NEWS domain (NEWS) to fine-tune M2M-100: 1. REL+NEWS: Fine-tuning on the aggregation of REL and NEWS.
2. REL→NEWS: Training on REL, followed by fine-tuning on NEWS.

Results and Discussion
We successfully adapt several multilingual pretrained models to previously unseen African languages and quantify the effectiveness of small indomain translation datasets. We discuss the effects of domain shift and analyze mitigation strategies.

Adaptation to the Focus Languages
We demonstrate that fine-tuning with a few thousand high-quality bitext is effective for adding new  languages to pre-trained models. Further, continuing to pre-train to specialize models to African languages further improves performance.
Zero-Shot Translation. Table 3 and Table 4 gives the result of zero-shot evaluation on NEWS.
We evaluate only on the M2M-100 dataset because it has been pre-trained on parallel texts with a few of our focus languages. We observe very poor performance (< 5 BLEU) on the languages except for zul (> 13 BLEU) and swa (> 20 BLEU) in both translation directions. For swa, its likely that the performance is reasonable because M2M-100 has seen more bitext during pre-training (2.4M sentences in CCAligned (El-Kishky et al., 2020)). Other African languages except for Afrikaans have less than 600K sentences in CCAligned, and are also of a lower quality (Kreutzer et al., 2021) which affect overall zero-shot performance.
Performance after Fine-tuning. We found impressive performance after fine-tuning PLMs and M2M-100 on few thousand sentences (mostly 2K-7K sentences, except for swa with 30K sentences), including languages not seen during pre-training. For en/fr-xx, MT5 has a poor transfer performance with average BLEU of 7.2, despite being pretrained on 101 languages. ByT5 outperforms MT5 by over 3 BLEU on average, even though their performances were reported to be similar in previous work (Xue et al., 2021a). This indicates that ByT5 might be preferable over MT5 when translating low-resource languages. Surprisingly, mBART50 that was only pre-trained on 50 languages and 2 African languages outperformed MT5 and ByT5 which are pre-trained on 101 languages. Overall, we found M2M-100 to be the best model, most likely because it was pre-trained on a translation task. In general, BLEU scores are relatively low (< 15 BLEU for 9 out of 16 languages for en/fr-xx and 7 in xx-en/fr) even when fine-tuning M2M-100 on in-domain data, which suggests that developing more effective methods for fine-tuning might be a promising future direction. The languages with the best quality according to BLEU on the target side are pcm, swa and tsn, and pcm, zul, and swa on the source side. BLEU scores are higher when translating from an African language, which is expected due to the more frequent exposure to English and French on the target side during pre-training, and BLEU being penalized more for morphologically rich languages like bbj, lug, swa, tsn, and zul). The ChrF metric works better for them. For example, finetuning M2M-100 on NEWS and evaluating on zul has a BLEU of 21.0 in en/fr-xx, and BLEU of 37.8 in the xx-en/fr showing a large gap in performance in both directions. However, with the ChrF, we find a smaller performance gap (51.2 in en/fr-xx and 55.5 in the xx-en/fr. Continual Pre-training. We observe an improvement in BLEU when we utilize AfriMT5 and AfriByT5, for languages included in our continual pre-training corpus (Appendix C). Other languages also benefit despite not being seen during continual pre-training, possibly due to language similarity.  For example, AfriByT5 on fr-bam improved by 1.9 BLEU over ByT5 and AfriMT5 on en-tsn improved by 3.6 BLEU over MT5. On average, AfriMT5 improved over MT5 by 1.3 BLEU in en/fr-xx and 2.4 BLEU in the xx-en/fr. The improvement for AfriByT5 was much smaller: 0.6 and 0.9 BLEU in en/fr-xx and xx-en/fr translation directions. For AfriMBART, we did not see any improvement on average, only the performance of hau (1.5 BLEU) and ibo (0.7 BLEU) improved in en/fr-xx direction. However, in the xx-en/fr direction, fon, tsn, twi, and zul improved by 2.7-6.0 BLEU.
Many-to-Many Multilingual MT. Training on the combined news corpus from all languages that use French or English separately does not appear to help much. We see slight improvements for most languages only in the xx-en/fr direction.

Adaptation to the News Domain
To improve over the baseline performance on NEWS, we train bilingual Transformer models (as a baseline) and M2M-100 on a combination of REL and NEWS. We chose M2M-100 because it was the best performing model.  training data. This demonstrates that increasing the size in the target domain is the most helpful strategy (see Figure 2). Similarly, combining REL+NEWS is not very helpful for xx-en/fr.An alternative approach is REL→NEWS, which allows the model to develop a good understanding of the desired language before adapting to the news domain. We observe an increase on 1.1 BLEU over REL+NEWS in the en/fr-xx direction. However, the best strategy is REL+NEWS→NEWS, especially for xx-en/fr where it yields an improvement over NEWS and REL+NEWS by 2.0 and 1.5 BLEU, respectively.

Analysis of Domain Shift
Is a small in-domain set essential for finetuning? If we train models only on previously available religious data, they are not capable of    translating news well due to the strong domain bias. This is illustrated in Figure 1: All models perform much worse on NEWS than on the REL domain. When the quantity of religious training data is small, the loss in translation performance on the news test set is largest, c.f. bbj (8k of REL data) with a drop of -95.5% BLEU or bam (-93.5%, 28k) and luo (-93.5%, 31k). This indicates that when the REL training data is sparse, it is insufficient to teach the M2M-100 model a more general understanding required for translating NEWS. However, when the religious training data is larger, this loss is reduced, c.f. when translating to zul (667k, -67%), swa (-69.3%, 872k), and tsn (-71%, 870k). While this is the general trend, pcm, whose religious training data is small (23k), has the lowest drop in performance (-59.3%), which may be due to the strong similarity to its source language.
How many sentences in the target domain are required? Figure 2 shows how for three selected language pairs with a large (fr-bam), medium (eng-ibo) and relatively small (eng-swa) do-   main gap, the quality of target domain translations improves as we increase the size of the target domain corpus. For all three pairs, fine-tuning M2M-100 or ByT5 on 2.5k sentence pairs of in-domain data (NEWS) is sufficient to outperform the bilingual Transformer baselines that were additionally trained on larger amounts of out-of-domain data (REL). Surprisingly, this procedure not only works for languages included during pre-training (swa), but also for previously unseen languages (ibo, bam). M2M-100 tends to adapt to the new data more quickly than ByT5, but in all cases, models continue to learn with additional in-domain data. This shows how much more effectively a small number of in-domain translations can be used when they serve for fine-tuning multilingual pre-trained models rather than training bilingual MT models from scratch.
Examples of Domain Bias. To illustrate the challenge of overcoming domain bias, we show examples translating from bam and lug in Table 7. The M2M-100 model fine-tuned only on REL succeeds in roughly capturing the meaning of the sources, but using biblical terms, such as "scroll" instead of "novel". Adding our news corpus to fine-tuning resolves these issues (e.g. "book").
How general is our news corpus? Table 8 shows the zero-shot evaluation of M2M-100 finetuned on our small NEWS corpora on other domains: religious (REL) and Wikipedia (FLORES). We evaluated the Wikipedia domain on the FLO-RES devtest and the REL domain on either JW300 or Bible (lug, luo, wol). As a baseline, we evaluated the zero-shot performance of M2M-100 (not fine-tuned, ) on FLORES 10 using spBLEU (i.e. sentencepiece BLEU (Goyal et al., 2021)). We noticed very poor performance except for Swahilias discussed in §6.1. After fine-tuning on our new data (), transfer is largely improved across the bench (up to +17 BLEU for en-ibo). The same trend holds for the religious domain. This shows that even though our data comes from the news domain, it helped the model generalize to other domains. Hence, expanding African news corpora and developing better MT models for news pays off even for other domains of interest.

Conclusion
We have created MAFAND-MT, a corpus of 16 African languages to study translation systems for low-resource languages in the news domain. We investigate how to most effectively adapt large-scale pre-trained models to incorporate new languages and new domains. Our findings suggest that as little as 2k sentences are sufficient for fine-tuning, with an improved performance, paving the way for others to create new translation systems without relying on large collections of web-sourced text. This has strong implications for languages that are spoken by millions but lack presence on the web.

D Model Hyper-parameters and Reproducibility of Results
For the pre-trained models, we fine-tune the models using HuggingFace transformer tool (Wolf et al., 2020) with the default learning rate (5e − 5), batch size of 10, maximum source length & maximum target length of 200, beam size of 10, and number of epochs is 3 except for models trained on only NEWS which we set to 10. We make All the experiments were performed on a single GPU (Nvidia V100). For fine-tuning pre-trained models, especially for mBART50 that only supports two African languages, the target language is required to be specified during decoding from among those that the model has seen during pre-training, we follow past works (Madaan et al., 2020;Cahyawijaya et al., 2021;Lee et al., 2022) in selecting another closelyrelated language that is represented in the pretrained model. For convenience, we make use of Swahili (sw) as the target language when an African language is not represented since Swahili is represented in all the pre-trained models. The only exception is Nigerian-Pidgin, where we make use of French (fr) since it is closely related to English. When a language is represented in the pre-trained model like M2M-100 has seen Yorùbá (yo), we make use of the correct language code. To train AfriMT5 and ByT5, we start with MT5 and ByT5. We pre-train with the learning rate 1e − 4, 10, 000 warm up steps and a batch size of 2048 for one epoch. For mBART50, we pretrain with learning rate of 5e − 5 for 50, 000 steps using Fairseq (Ott et al., 2019) without modifying the mBART50 vocabulary. Table 11 has the names of all the models that are publicly available on HuggingFace Model Hub 14 . In total, we have 357 models from 22 x 16 bilingual models, two English/French-centric models, and three adapted models to African languages (i.e AfriMT5, AfriByT5, and AfriMBART).   We observe that spBLEU gives higher scores than BLEU especially in the direction of en/fr-xx, which shows that it may be better for evaluating African languages. Although, further analysis and human evaluation are still needed to show that spBLEU is generally better. On the other hand, in the xx-en/fr, there is no much difference in the scores between BLEU and spBLEU.

F Qualitative Analysis
The following examples from the Fon-to-French translations of the test set illustrate the advantage of multilingual modeling and its limitations: • Source (   • Bilingual Transformer (REL+NEWS, fon→fr): on ne peut pas avoir une trentaine d'années ni un jeune homme ni un jeune homme d'âge pour un jeune homme qui soit 12 ans.
The translation of the bilingual Transformer model is very poor and far from the Fon source, highlighting how poorly the model generalized from the few thousand training sentences. The M2M-100 model gives a more meaningful and adequate translation. M2M-100 makes a surprising but beautiful move, switching se plaignent depuis quelques jours de multiples douleurs (sín azǎn mOkpán ãye O, ye ãò wǔvE sè wE tawun ãò agbaza mE) to ont depuis plusieurs jours souffert d'une maladie grave. The BLEU score here might be low but the meaning is conserved and even more detailed than the French reference. In fact, in this source context, wǔv¢ means souffrir, souffrance (suffer, suffering): the French reference made use of se plaignent (complaining) which makes less sense than souffert used in the M2M-100 prediction. M2M-100 also learned the style of the sentence: có ye ká tuun fí é azOn nE lEE gosin (but they do know the origin of their sufferings) é Oǎ (NOT) -this last part is crucial for the meaning of the entire sen-tence. Given the structural and morphological differences between Fon and French, we expected it to be more complicated to predict. However, this translation is structurally wrong even though any French native speaker would understand the conveyed message quickly and easily. In the M2M-100 translation, the word malgré is at the wrong place, corrupting syntax and logic of the second clause. A perfect translation (in the idea to be expressed) would be: "Louis Guy Alimanyion et Issa Etchlekoun ont depuis plusieurs jours souffert d'une maladie grave malgré (dont) ils ne connaissent pas les conséquences (causes/raisons) de cette maladie qu'ils ne connaissent pas." In the opposite translation direction, fr→fon, M2M-100 (REL+NEWS→NEWS) still preserved some sense of logical reasoning and predicted the last part right ye ká tuun nǔ è wú wǔvÉ yetOn (they do know why they are suffering) ãèÓǎ (NOT). However, the model had some limitations: the names which are part of the translation are not spelled correctly. Some expressions are incomplete: For instance sín azǎn + number means since xxx days but yEywE is not a number, and do not have any meaning in this context.   from Premium times and Global Voices, we improved the spBLEU by large points especially in the EN-HAU direction (4.0 → 13.0) for FLORES and (3.7 → 8.8) for the REL domain (based on JW300). Although, we experienced slight drop in the xx-en/fr direction for FLORES.