BERTGen: Multi-task Generation through BERT

We present BERTGen, a novel, generative, decoder-only model which extends BERT by fusing multimodal and multilingual pre-trained models VL-BERT and M-BERT, respectively. BERTGen is auto-regressively trained for language generation tasks, namely image captioning, machine translation and multimodal machine translation, under a multi-task setting. With a comprehensive set of evaluations, we show that BERTGen outperforms many strong baselines across the tasks explored. We also show BERTGen’s ability for zero-shot language generation, where it exhibits competitive performance to supervised counterparts. Finally, we conduct ablation studies which demonstrate that BERTGen substantially benefits from multi-tasking and effectively transfers relevant inductive biases from the pre-trained models.


Introduction
Recent work in unsupervised and self-supervised pre-training has revolutionised the field of natural language understanding (NLU), resulting in high performance ceilings across multiple tasks (Devlin et al., 2019;Dong et al., 2019). The recent success of language model pre-training with masked language modelling (MLM) such as BERT (Devlin et al., 2019) further paved the way for more complex approaches that combine language pre-training with images (Tan and Bansal, 2019;Su et al., 2020;, video (Sun et al., 2019), and speech (Chuang et al., 2020). Most of these approaches follow a task-specific fine-tuning step after the model is pre-trained.
However, there has been little work on exploiting pre-trained MLMs for natural language generation (NLG) tasks. Previous work argues that the MLM objective is ill-suited for generation tasks such as machine translation Rothe et al., 2020). Recent work in this direction has predominantly investigated the use of pre-trained models to either initialise Transformer-based encoderdecoder models (Imamura and Sumita, 2019;Clinchant et al., 2019;Yang et al., 2020;Rothe et al., 2020) or to distill knowledge for sequence generation tasks (Chen et al., 2020).
In this work, we present BERTGEN, which extends BERT in a generative setting ( § 2.1). This results in a single generator -without a separation between the encoder and the decoder -capable of consuming multiple input modalities and generating in multiple languages. The latter features are achieved by transferring knowledge from state-of-the-art pretrained models, namely VL-BERT (Su et al., 2020) and multilingual BERT (M-BERT) (Devlin et al., 2019). We train BERTGEN on various tasks, including image captioning, machine translation and multimodal machine translation, and datasets in four different languages ( § 2.2).
Based on a number of experiments, our findings ( § 3) show that BERTGEN (i) is surprisingly versatile as it is capable of describing images and performing translation in unimodal and multimodal settings, across all languages, (ii) generalises well across zero-shot image captioning, multimodal machine translation, and out-of-domain news translation tasks, and finally (iii) is parameter efficient when compared to state-of-the-art models for each of the tasks combined together.

Method
In this section, we describe BERTGEN and the tasks we explore. We then detail the baselines and SoTA systems that we compare against.

Model
This section details the main aspects of BERTGEN that distinguish it from the existing work on vision & language pre-training.

MLM Head
Transformer Figure 1: A view of the BERTGEN model during the training of an MMT sample: solid and dashed borders around the images represent full-image features and regional features, respectively. At test time, the most likely token y 3 = argmax(P (y t |x, v, y <t )) is placed back into the sequence and the [MASK] token is shifted right by one. Initialisation. We take advantage of the previous successes in large-scale pre-training and propose a hybrid initialisation for BERTGEN ( Figure 2). This involves using the VL-BERT (Su et al., 2020) checkpoint and initialising the word embeddings, the Transformer weights and the MLM head with M-BERT (Devlin et al., 2019). We conjecture that this primes BERTGEN to be aware of the visual modality and of multiple languages. This is simply due to VL-BERT being pre-trained on English monolingual and image captioning corpora, as well as M-BERT offering a 119K WordPiece vocabulary, trained on the entire Wikipedia in 104 languages 1 .
Input configuration. While BERTGEN is potentially capable of modeling a variety of generative tasks, we focus on three particular tasks, namely machine translation (MT), multimodal MT (MMT) and image captioning (IC). Therefore, depending on the task, the input configuration of the model may change during both training and testing. To clarify further, let us first denote a sequence of embeddings representing a source sentence by , and a collection of k regional visual features extracted from an associ- Figure 1 depicts BERTGEN when processing a sample from the MMT task. This task's input configuration is a triplet that involves all the three sequences i.e. {x (i) , y (i) , v (i) }. Using this notation, the MT and IC tasks' configurations would correspond to {x (i) , y (i) } and {v (i) , y (i) }, respectively.
Visual embeddings. We follow VL-BERT and represent images as a collection of k features v (i) defined for regions of interest (RoI). After preextracting the 2048-dimensional RoI features using the bottom-up-top-down object detector (Anderson et al., 2018), we keep between 10 and 100 (i.e. k ∈ [10, 100]) of them depending on the confidence score. The final visual embedding for an RoI is obtained by summing its feature vector and its geometric embedding (i.e. the projection of the  Figure 3: A look at BERTGEN's self-attention: the connections denote that self-attentive representations are re-computed in every step. The generation ends when STOP is predicted. The smileys refer to RoI features. bounding box coordinates). When encoding the non-visual positions, the same RoI feature vector for the full image is repeated (see Figure 1). We note that we do not fine-tune the object detector during training.
Sequence unrolling. An important aspect of BERTGEN is that it does not explicitly distinguish between the encoder and the decoder blocks usually seen in sequence-to-sequence models. This is accomplished by formalising both encoding and generation using the MLM framework. Formally, let us consider the MMT task and define the maximum log-likelihood objective for a given triplet {x (i) , v (i) , y (i) } where the target y (i) has n tokens: In a typical sequence-to-sequence model, each logprobability term would be computed by a decoder within the forward-pass of the same training example. In contrast, BERTGEN explicitly unrolls the example n times, forming n new training examples. In other words, each conditional term in Equation 1 is observed independently within an epoch of training. Therefore, sequence unrolling has a data augmentation effect since a training corpus with D examples is approximately augmented by a factor of the average length of the target sequences. Moreover, the unified encoder-decoder formalism halves the number of parameters, making BERTGEN parameter efficient.
Self attention. Given that a single Transformer (Vaswani et al., 2017) performs both encoding and decoding, sequence unrolling affects selfattention as well ( Figure 3). First, all positions attend to each other for a given unrolled example i.e. the attention is bi-directional. Second, since each unrolled case is an independent example, the self-attentive representations of early positions are naturally re-computed, in contrast to typical Transformer decoders. Finally, due to how inputs/outputs are represented in a single stream and encoded through shared self-attention, BERTGEN enforces an inductive bias towards a truly multi-modal and multi-lingual representation space.
Target language specifiers. Finally, to select the language during generation, input sequences begin with special target language specifiers (Ha et al., 2016;Johnson et al., 2017) (Figure 1). The specifier is task-agnostic, i.e. the same specifier [DE] is used both when captioning into German and when translating into German.
Training & hyper-parameters. We extend 2 the base configuration of VL-BERT which is a Transformer with 12 self-attention layers and 12 heads. The model and feed-forward dimensions are 768 and 3072, respectively. On a single 32GB V100 GPU, one epoch ( § 3) takes approximately two days to complete as we could only fit one example per task (i.e. batch size equal to 13) into the memory 3 . We use AdamW optimiser (Loshchilov and Hutter, 2019) with base learning rate set to 1.3×10 −5 . The learning rate is warmed up in the first 16K steps and then decays linearly. We set the weight decay to 10 −4 . During training, we let the model update the positional embeddings as BERTGEN needs to learn new positions not covered by VL-BERT pretraining. The final model has ∼89.3M parameters excluding the word embeddings.
Decoding. At test time, we incrementally add the most likely prediction (i.e. greedy search) into the previously masked position ( Figure 1) and shift the [MASK] token right by one. The reason we chose greedy over beam search is because the latter would make decoding much slower due to self-attentive representations being re-computed. The decoding ends when [STOP] is predicted.

Tasks & Systems
To evaluate BERTGEN's generative abilities, we explore a diverse set of tasks: image captioning, textonly MT and multimodal MT. Table 1 summarises the training statistics for the various datasets we use.

Image Captioning
Image captioning (IC) involves describing images in a specified natural language. We train BERT-GEN for English, German and Turkish captioning tasks. Specifically, we use the FLICKR30K dataset (Young et al., 2014) that provides 29K training images, each with five English captions collected through crowd-sourcing. The validation and test sets contain approximately 1K images each. We use the MULTI30K dataset (Elliott et al., 2016), which annotates FLICKR30K images with five German captions. Finally, we use the TASVIRET dataset (Unal et al., 2016) which provides two Turkish captions for each of the 8,092 images in the FLICKR8K dataset (Rashtchian et al., 2010). Since FLICKR8K is a subset of FLICKR30K, we create a new split of TASVIRET to avoid data leakage between training and test splits. The resulting training, validation and test splits contain 6914, 543, and 543 images, respectively.
To evaluate BERTGEN's performance on IC, we compare it against previous work with strong performance on COCO (Chen et al., 2015) and FLICKR30K. More precisely, ADAPTIVE ATTEN-TION (SENTINEL) (Lu et al., 2017), which uses a sentinel token to distinguish between visual and non-visual representations, and NEURAL BABY TALK (NBT), which follows a slot-filling approach through explicit object region information (Lu et al., 2018).

Multimodal Machine Translation
Multimodal Machine Translation (MMT) attempts to improve MT quality by incorporating information from modalities other than language (Sulubacak et al., 2020). In our case, we train BERT-GEN for EN↔DE and EN↔FR MMT tasks and use the MULTI30K dataset, the main dataset for image-informed translation, which provides caption translations for FLICKR30K images in German and French. To evaluate BERTGEN on MMT tasks, we use the original 2016 test set which contains 1,000 examples.
For a comprehensive comparison with previous work, we train a SoTA recurrent MMT  solely on the MULTI30K dataset, which applies a secondary (visual) attention in the decoder over the RoI features i.e. the same features that are also used by BERTGEN ( § 2.1). There are two GRU (Cho et al., 2014) layers in both the encoder and the decoder and the embedding & hidden dimensions in the model are set to 200 and 320, respectively. Each model has ∼5.6M parameters excluding the word embeddings.
Besides the state-of-the-art constrained recurrent MMT model described above, we further compare BERTGEN -which is trained on various other MT and IC corpora -to an unconstrained Transformer-based MMT trained on ∼9M additional EN→DE sentences (Libovický, 2019) 4 in addition to MULTI30K.

Text-only Machine Translation
We incorporate six text-only MT tasks into our training protocol. We use EN↔DE and EN↔FR MT datasets from IWSLT'14 (Cettolo et al., 2012) which consists of TED Talks' subtitles and their translations. We take the prepare-iwslt14 recipe from FAIRSEQ (Ott et al., 2019) to prepare the dev and test sets. This yields an EN↔DE test set of 6,750 sentences which consists of dev2010, dev2012.TEDX, tst2010, tst2011 and tst2012. Similarly, the EN↔FR test set consists of dev2010, tst2010, tst2011 and tst2012, which amounts to 4,493 sentences.
For EN↔TR directions, we use the SE-TIMES2 (Tiedemann, 2012) news dataset for training. For development and test sets, we take the official WMT test sets (Bojar et al., 2018), namely, newstest2016 and newstest2017 as the development set (6,007 sentences), and newstest2018 (6,000 sentences) as the test set. Both IWSLT and SETIMES2 corpora are medium-scale resources often used in MT research community, and have much harder test sets than the MMT and IC tasks, due to a significant domain shift.
Finally, for each translation direction, we train a Transformer NMT model (Vaswani et al., 2017) using the IWSLT-DE-EN recipe of the FAIRSEQ toolkit (Ott et al., 2019). This recipe has six encoders and six decoders, each equipped with 4-head self-attention layers. The model and feed-forward dimensions are set to 512 and 1024, respectively. Each model has ∼31.5M parameters excluding the word embeddings. Since BERTGEN is a general purpose multilingual and multimodal generator, we expect it to perform in the same ballpark as these strong NMT baselines, but not necessarily be SoTA compared to novel & sophisticated NMT models, which also make use of a lot more training data.

Results and Findings
We train BERTGEN on lowercased sentences for 45 epochs, after which the overall performance on the tasks reached a plateau. We define one BERTGEN epoch as a single pass over all of the training data for the MULTI30K EN→DE MMT task and denote this task as the reference task. We use greedy search for all systems that we trained and merge back the word pieces before evaluation. We compute tokenised 5 BLEU (Papineni et al., 2002), METEOR (Denkowski and Lavie, 2014) and CIDEr  using cococaption 6 . In what follows, we provide detailed quantitative and qualitative findings.  (Lu et al., 2017) and NBT (Lu et al., 2018), even though they use beam search for decoding. On COCO (Chen et al., 2015), an image captioning corpus much larger and diverse than FLICKR30K, we evaluate BERT-GEN on Karpathy's test split (Karpathy and Fei-Fei, 2015) and notice that the scores are reasonable 5 Since M-BERT is aggressive on splitting apostrophes and hyphens, our results may slightly differ from other work. 6 https://github.com/tylin/coco-caption given that BERTGEN is not trained on COCO: our model lags behind NBT (w/ beam search) by 6.7 METEOR.

Image Captioning
For zero-shot French captioning (F30K FR), we resort to the reference MMT translations from the MULTI30K EN→FR task, as there are no human references for French. Although this is problematic as the metrics will penalise captions that are not translations of English captions, we provide the scores to show that the zero-shot outputs are valid descriptions. We note that the low range of scores reported here is also due to having one reference caption instead of five references 7 as in FLICKR30K. Finally we report results for our custom Turkish split ( § 2.2.1) (F30K TR) and German (F30K DE). Even though there are no comparable results in the literature for these three tasks, we demonstrate through some qualitative examples that BERTGEN produces sensible outputs.
Qualitative examples. We now focus on a few examples to examine the multilingual image captioning ability of BERTGEN in action (Table 3). For the first image, all captions are almost the same as the image has few salient points. For the second image however, we observe much more variation across captions, in line with the complexity of the scene. We are particularly surprised by the zeroshot French captioning performance, a task that BERTGEN is not trained for at all. Upon manual inspection, we noticed that the captions are often short, objective gists of the images. These observations also hold for the captions generated for the EN a man wearing a hat and glasses. DE ein mann mit hut und brille.
a man with hat and glasses TR şapkalı ve gözlüklü bir adam.
a man with a hat and glasses. FR un homme avec un chapeau et des lunettes.
a man with a hat and glasses.
EN two men are on a rooftop working on something. DE zwei männer arbeiten auf einem dach. two men working on a roof TR iki binanın inşasında oturmuş, yanyana yerde duran iki kişi. two people seated in the construction of two buildings, standing next to each other on the ground. FR trois ouvriers du bâtiment construisent un toit.
three construction workers build a roof.
EN a man in a red shirt and helmet is riding a motorbike on a dirt road. DE ein mann fährt mit einem motorrad auf einem weg an einem fluß entlang. a man rides a motorcycle on a path along a river. TR çamurlu bir yolda motoruyla ilerlemekte olan kırmızï ustlü bir adam ve arkasındaki dag manzarası. A man in a red top riding his bike down a muddy road with a mountain landscape behind him. FR un homme avec un casque fait du motocross.
a man with a helmet rides motocross.   Table 4 summarises BERTGEN's performance on MMT. First of all, BERTGEN consistently outperforms the Transformer-based FAIRSEQ NMT models and the recurrent MMT (Caglayan et al., 2020) models on both the EN→DE and the EN→FR language pairs. Furthermore, BERTGEN is also substantially better than a state-of-the-art unconstrained MMT (Libovický, 2019) model trained on a ∼6x larger parallel corpus.

Multimodal Machine Translation
Adversarial evaluation. Following Elliott (2018), we probe BERTGEN's ability for integrating multiple modalities effectively. Specifically, we decode translations by shuffling {image, source caption} mappings so that the images do not correspond to the sentences to be translated. The EN→DE results showed that the incongruence leads to 1.1 and 0.9 point drops in BLEU and METEOR, respectively. For EN→FR, the drops are much more prominent with 3.1 and 2.3 points again for BLEU and METEOR. This indicates that the features are not ignored at all, unlike in (Caglayan et al., 2019), where they showed that sequence-to-sequence MMT models can learn to ignore the images when the linguistic signal is sufficient to perform the task.
Zero-shot performance. The results in Table 4 show the surprising ability of BERTGEN to perform MMT on directions unseen during training.    Table 5.
Moreover, the zero-shot performance surpasses strong MMT and NMT systems by up to 2 and 3.3 METEOR for DE→FR and FR→DE, respectively. Similar to the image captioning results, this demonstrates the potential of BERTGEN to generalise over a variety of language pairs and tasks.

Machine Translation
First, we compare BERTGEN's performance to each task-specific FAIRSEQ system. According to Table 5, we observe that the translation quality of BERTGEN is generally superior compared to the strong FAIRSEQ systems, especially in METEOR, where BERTGEN leads in all pairs. Second, we look at the learning efficiency by comparing the training curves between BERTGEN and each task-specific FAIRSEQ system (Figure 4). Here, the x axis represents how many times the spe-  cific task's training set has been seen by the models. BERTGEN is trained for 45 reference epochs ( § 3), and this corresponds to only a few complete passes over the training sets of NMT tasks 8 . This is in contrast to the single-task systems that usually require a large number of epochs for convergence. We notice a general trend and observe that BERT-GEN tends to outperform single-task systems usually after only a few passes over the corresponding training set. Many factors could be contributing to this observation such as sequence unrolling, multitasking, shared input space or relevant inductive biases transferred from M-BERT. We partly address these in the ablation studies ( § 3.4) and leave further investigation to future work.
Zero-shot performance. We use the DE↔FR test set from the WMT'19 shared task on news translation (Barrault et al., 2019) to assess the zeroshot translation capability of BERTGEN. This test set includes 1,701 sentences from news data regarding European Elections. We compare our results to two shared task systems, namely TARTU (baseline) and MSRA (state-of-the-art) (Barrault et al., 2019), after re-tokenising them accordingly with M-BERT 9 . Although BERTGEN is expected to obtain lower scores than the dedicated WMT systems due to the domain mismatch of the test set, we consider both the quantitative (Table 6) and the qualitative results (Table 7) extremely encouraging.

Impact of initialisation
We train single-task MMT systems on the MULTI30K EN→DE language pair. Specifically, we begin with a baseline system which is initialised with random weights. We then train a second baseline where only the visual processing layers are BERTGEN: la décision est tombée au 70ème anniversaire de ma femme. the decision fell on my wife's 70th birthday. WMT REF la décision est tombée le jour du 70ème anniversaire de ma femme.
the decision fell on my wife's 70th birthday.
BERTGEN: en espagne, on s'est malheureusement habituéà une rôle double et passive. in spain, we unfortunately got used to a double and passive role. WMT REF en espagne, on s'est malheureusement habituéà un rôle secondaire, passif.
in spain, we unfortunately got used to a secondary, passive role.
BERTGEN: pas parce que le président du fdp a dit quelque chose qu' ils ont défaillant leur vote. not because the fdp president said something that they missed their vote. WMT REF ce n' est pas parce que le président fédéral du fdp a dit quelque chose qu' ils ont refusé d' approuver.
it is not because the federal president of the fdp said something that they refused to approve.  transferred from VL-BERT. Finally, we train a third baseline that is initialised similar to BERT-GEN, i.e. using the hybrid initialisation ( § 2.1). Figure 5 compares the validation BLEU scores of these three systems. We observe that the benefits of knowledge transfer from pre-trained models are incrementally positive, however, BERTGEN's hybrid initialisation outperforms the other two ablations.

Impact of multi-task training
We now remove the multi-tasking aspect from BERTGEN to investigate the extent to which the performance improvements are related to other tasks. Similar to § 3.4.1, we focus on the MULTI30K EN→DE MMT task and train a single-task, hybridinitialised BERTGEN. Figure 6 compares the validation BLEU scores obtained by the default BERT-GEN and the single-task variant. We observe that BERTGEN benefits from multi-task training and, more importantly, does not seem to exhibit patterns of catastrophic forgetting (French, 1999). Based on these observations, we expect similar model behavior to hold for other tasks.

Multimodal multilingual pre-training
Research in NLP and related fields has been increasingly focusing on transfer learning approaches where a model is first pre-trained on a data-rich task, and then transferred to downstream tasks (Mc-Cann et al., 2017;Peters et al., 2018;Devlin et al., 2019). This framework presumably allows the model to capture useful inductive biases that generalise to a variety of NLP tasks, often after performing a task-specific fine-tuning (Raffel et al., 2020). Of these, the most relevant studies to our work are BERT (Devlin et al., 2019) and its multilingual version M-BERT, which pre-train a Transformer (Vaswani et al., 2017) on large monolingual corpora using the masked language modelling (MLM) objective.
Recent research has also attempted to combine linguistic inputs with other modalities such as vision and speech, to achieve a grounded understanding of meaning. Successful approaches including LXMERT (Tan and Bansal, 2019), VL-BERT (Su et al., 2020) and others Li et al., 2020a,b) achieve this by combining BERT's MLM objective with auxiliary tasks such as masked region classification and image sentence matching, and pre-train their model on large-scale image captioning corpora (Chen et al., 2015;Sharma et al., 2018). Similarly, SpeechBERT extends BERT by jointly training on speech and text data (Chuang et al., 2020). Although SoTA results are reported by these approaches, they focus on unimodal and multimodal natural language understanding (NLU) tasks, with a strong emphasis in English. The backbone of BERTGEN combines VL-BERT (Su et al., 2020) with M-BERT (Devlin et al., 2019) to realise a multilingual and multimodal generator that can be used for a diverse set of generative tasks and languages rather than NLU tasks.

Pre-training for generative tasks
Previous work has studied how to benefit from pre-trained BERT models in generative tasks such as NMT (Imamura and Sumita, 2019;Clinchant et al., 2019;. BERTGEN differs from these as it is not fine-tuned for a particular MT corpus and it exhibits multi-lingual and multimodal properties for general purpose generation. Another related branch of work explores pretraining strategies specific to sequence-to-sequence tasks. This includes MASS (Song et al., 2019), which exploits an encoder-decoder framework with the MLM objective for task-specific generative pre-training and UniLM (Dong et al., 2019), which introduces uni-directional, bi-directional and sequence-to-sequence LM objectives by carefully adjusting the self-attention masks during training.  extend UniLM to vision & language pre-training using Conceptual Captions (Sharma et al., 2018) as the pre-training dataset. However, these models require a further fine-tuning step for generative tasks, unlike BERT-GEN that is trained only once.

Multi-task learning for generation
Several approaches exist for multi-task learning & generation (Dong et al., 2015;Luong et al., 2016) in NLP, especially in multilingual NMT, where tasks denote different language pairs (Zoph and Knight, 2016;Firat et al., 2016). The multi-task (and zero-shot) generation ability of BERTGEN is mostly inspired by Ha et al. (2016) and Johnson et al. (2017). Both of these introduced target language specifiers to select the output language when decoding translations from their model.
Our multilingual & multimodal take on multitask generation is most similar to Kaiser et al. (2017), where a single Transformer model is trained on different tasks including image captioning, object classification, machine translation, speech recognition and parsing. However, their architecture depends on particular structures such as encoders, decoders, modality-specific networks and I/O mixers, unlike BERTGEN which does not require task-specific modules.

Conclusions
In this paper, we presented BERTGEN, a novel generative, decoder-only model which extends BERT by combining multimodal and multilingual pretrained models. Our findings show that BERTGEN obtains strong performance on a variety of generative tasks and further generalises over unseen tasks. Importantly, our model demonstrates the potential for general-purpose (instead of task-specific) generation that is above and beyond the traditional pre-training and fine-tuning practices. BERTGEN is also parameter efficient as it has 89.3M total parameters and is trained on thirteen tasks encompassing MT, multimodal MT and image captioning. On the other hand, each of the single-task FAIRSEQ NMT baselines has 31.5M parameters.
Our ablation studies show that BERTGEN is able to efficiently transfer relevant inductive biases from the pre-trained models and benefits from multi-task learning without suffering from catastrophic forgetting. We hope that these findings will motivate future research in exploiting more sophisticated pre-trained models in place of M-BERT and VL-BERT and others.

6454
EN: a group of people are riding on elephants through a river. FR: un groupe de personnes sur des chevaux sur un bateau dans un ruisseau. a group of people on horses on a boat in a stream. DE: eine gruppe reiter fährt auf einem fluss.
a group of riders is riding on a river. TR: bir grup insan bir derede duran dört tane at ile ilerliyorlar.
a group of people are moving with four horses standing in a stream.
EN: a black dog is playing with a yellow toy in the grass. FR: un chien avec une frisbee dans la pelouse. a dog with a frisbee in the lawn. DE: ein schwarzer hund mit rotem halsband spielt mit einem gelben ball auf einer wiese.
a black dog with a red collar is playing with a yellow ball in a meadow. TR: yeşil bir topu ısırmaya çalışan siyah bir köpek.
a black dog trying to bite a green ball.
EN: a boy in a red shirt and white shorts is playing tennis. FR: un tennisteur frappe une balle. a tennisteur hits a ball. DE: ein junge spielt tennis. a boy is playing tennis. TR: tenis raketi ile topa vuran çocuk.
several ngos, including the mozilla and greenpeace foundations, estimate that these new tools are incapable and come too late.
several ngos, including the mozilla foundation and greenpeace, estimate that these new tools are inadequate and come too late.
immigration is seen as a big problem for the ue, for 45 percent of germans and 40 percent of all european ones.
45 percent of germans and 40 percent of all europeans consider immigration to be the biggest problem in the eu.
that is the reason why he explores the question of whether there are alternatives to choose from in his book.
therefore, in his book, he investigates the question of whether there are alternatives to choose from.