Macro-Average: Rare Types Are Important Too

While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy. Model-based MT metrics trained on segment-level human judgments have emerged as an attractive replacement due to strong correlation results. These models, however, require potentially expensive re-training for new domains and languages. Furthermore, their decisions are inherently non-transparent and appear to reflect unwelcome biases. We explore the simple type-based classifier metric, MacroF1, and study its applicability to MT evaluation. We find that MacroF1 is competitive on direct assessment, and outperforms others in indicating downstream cross-lingual information retrieval task performance. Further, we show that MacroF1 can be used to effectively compare supervised and unsupervised neural machine translation, and reveal significant qualitative differences in the methods’ outputs.


Introduction
Model-based metrics for evaluating machine translation such as BLEURT (Sellam et al., 2020), ESIM (Mathur et al., 2019), and YiSi (Lo, 2019) have recently attracted attention due to their superior correlation with human judgments (Ma et al., 2019). However, BLEU (Papineni et al., 2002) remains the most widely used corpus-level MT metric. It correlates reasonably well with human judgments, and moreover is easy to understand and cheap to calculate, requiring only reference translations in the target language. By contrast, model-based metrics require tuning on thousands of examples of human evaluation for every new target language or domain 1 Tools and analysis are available at https://github. com/thammegowda/007-mt-eval-macro. MT evaluation metrics are at https://github.com/isi-nlp/sacrebleu/tree/ macroavg-naacl21. (Sellam et al., 2020). Model-based metric scores are also opaque and can hide undesirable biases, as can be seen in Table 1  The source of model-based metrics' (e.g. BLEURT) correlative superiority over model-free metrics (e.g. BLEU) appears to be the former's ability to focus evaluation on adequacy, while the latter are overly focused on fluency. BLEU and most other generation metrics consider each output token equally. Since natural language is dominated by a few high-count types, an MT model that concentrates on getting its if s, ands and buts right will benefit from BLEU in the long run more than one that gets its xylophones, peripatetics, and defenestrates right. Can we derive a metric with the discriminating power of BLEURT that does not share its bias or expense and is as interpretable as BLEU?
As it turns out, the metric may already exist and be in common use. Information extraction and other areas concerned with classification have long used both micro averaging, which treats each token equally, and macro averaging, which instead treats each type equally, when evaluating. The latter in particular is useful when seeking to avoid results dominated by overly frequent types. In this work we take a classification-based approach to evaluating machine translation in order to obtain an easy-to-calculate metric that focuses on adequacy as much as BLEURT but does not have the expensive overhead, opacity, or bias of model-based methods.
Our contributions are as follows: We consider MT as a classification task, and thus admit MACROF 1 as a legitimate approach to evaluation (Section 2). We show that MACROF 1 is competitive with other popular methods at tracking human judgments in translation (Section 3.2). We offer an additional justification of MACROF 1 as a performance indicator on adequacy-focused downstream tasks such as cross-lingual information retrieval (Section 3.3). Finally, we demonstrate that MACROF 1 is just as good as the expensive BLEURT at discriminating between structurally different MT approaches in a way BLEU cannot, especially regarding the adequacy of generated text, and provide a novel approach to qualitative analysis of the effect of metrics choice on quantitative evaluation (Section 4).

NMT as Classification
Neural machine translation (NMT) models are often viewed as pairs of encoder-decoder networks. Viewing NMT as such is useful in practice for implementation; however, such a view is inadequate for theoretical analysis. Gowda and May (2020) provide a high-level view of NMT as two fundamental ML components: an autoregressor and a classifier. Specifically, NMT is viewed as a multi-class classifier that operates on representations from an autoregressor. We may thus consider classifier-based evaluation metrics.
Consider a test corpus, T = {(x (i) ,h (i) ,y (i) ) i = 1,2,3...m} where x (i) , h (i) , and y (i) are source, system hypothesis, and reference translation, respectively. Let x = {x (i) ∀i} and similar for h and y. Let V h ,V y ,V h∩y , and V be the vocabulary of h, the vocabulary of y, V h ∩V y , and V h ∪V y , respectively. For each class c ∈ V , min{C (c,h (i) ),C(c,y (i) )} where C(c,a) counts the number of tokens of type c in sequence a (Papineni et al., 2002). For each class c ∈ V h∩y , precision (P c ), recall (R c ), and F β measure (F β ;c ) are computed as follows: 2 The macro-average consolidates individual performance by averaging by type, while the microaverage averages by token: where f (c) = REFS(c) + k for smoothing factor k. 3 We scale MACROF β and MICROF β values to percentile, similar to BLEU, for the sake of easier readability.  Table 2: WebNLG data-to-text task: Kendall's τ between system-level MT metric scores and human judgments. Fluency and grammar are correlated identically by all metrics. Values that are not significant at α = 0.05 are indicated by × .
judgments provided have three linguistic aspectsfluency, grammar, and semantics 6 -which enable us to perform a fine grained analysis of our metrics. We compute Kendall's τ between metrics and human judgments, which are reported in Table 2. As seen in Table 2, the metrics exhibit much variance in agreements with human judgments. For instance, BLEURTmedian is the best indicator of fluency and grammar, however BLEURTmean is best on semantics. BLEURT, being a model-based measure that is directly trained on human judgments, scores relatively higher than others. Considering the model-free metrics, CHRF 1 does well on semantics but poorly on fluency and grammar compared to BLEU. Not surprisingly, both MICROF 1 and MACROF 1 , which rely solely on unigrams, are poor indicators of fluency and grammar compared to BLEU, however MACROF 1 is clearly a better indicator of semantics than BLEU. The discrepancy between MICROF 1 and MACROF 1 regarding their agreement with fluency, grammar, and semantics is expected: micro-averaging pays more attention to function words (as they are frequent types) that contribute to fluency and grammar whereas macroaveraging pays relatively more attention to the content words that contribute to semantic adequacy.
The take away from this analysis is as follows: MACROF 1 is a strong indicator of semantic adequacy, however, it is a poor indicator of fluency. We recommend using either MACROF 1 or CHRF 1 when semantic adequacy and not fluency is a desired goal.

Machine Translation: WMT Metrics
In this section, we verify how well the metrics agree with human judgments using Workshop on Machine Translation (WMT) metrics task datasets for 2017-2019(Bojar et al., 2017Ma et al., 2018Ma et al., , 2019. 7 We first compute scores from each MT metric, and then calculate the correlation τ with human judgments.
As there are many language pairs and translation directions in each year, we report only the mean and median of τ, and number of wins per metric for each year in Table 3. We have excluded BLEURT from comparison in this section since the BLEURT models are fine-tuned on the same datasets on which we are evaluating the other methods. 8 CHRF 1 has the strongest mean and median agreement with human judgments across the years. In 2018 and 2019, both MACROF 1 and MICROF 1 mean and median agreements outperform BLEU whereas in 2017 BLEU was better than MACROF 1 and MICROF 1 .
As seen in Section 3.1, MACROF 1 weighs towards semantics whereas MICROF 1 and BLEU weigh towards fluency and grammar. This indicates that recent MT systems are mostly fluent, and adequacy is the key discriminating factor amongst them. BLEU served well in the early era of statistical MT when fluency was a harder objective. Recent advancements in neural MT models such as Transformers (Vaswani et al., 2017) produce fluent outputs, and have brought us to an era where semantic adequacy is the focus.

Cross-Lingual Information Retrieval
In this section, we determine correlation between MT metrics and downstream cross-lingual information retrieval (CLIR) tasks. CLIR is a kind of information retrieval (IR) task in which documents in one language are retrieved given queries in another (Grefenstette, 2012). A practical solution to CLIR is to translate source documents into the query language using an MT model, then use a monolingual IR system to match queries with translated documents. Correlation between MT and IR metrics is accomplished in the following steps: 1. Build a set of MT models and measure their performance using MT metrics.

CLSSTS Datasets
CLSSTS datasets contain queries in English (EN), and documents in many source languages along with their human translations, as well as querydocument relevance judgments. We use three source languages: Lithuanian (LT), Pashto (PS), and Bulgarian (BG). The performance of this CLIR task is evaluated using two IR measures: Actual Query Weighted Value (AQWV) and Mean Average Precision (MAP). AQWV 9 is derived from Actual Term Weighted Value (ATWV) metric (Wegmann et al., 2013). We use a single CLIR system (Boschee et al., 2019) with the same IR settings for all MT models in the set, and measure Kendall's τ between MT and IR measures. The results, in Table 4, show that MACROF 1 is the strongest indicator of CLIR downstream task performance in five out of six settings. AQWV and MAP have a similar trend in agreement to the MT metrics. CHRF 1 and BLEURT, which are strong contenders when generated text is directly evaluated by humans, do not indicate CLIR task performance as well as MACROF 1 , as CLIR tasks require faithful meaning equivalence across the language boundary, and human translators can mistake fluent output for proper translations (Callison-Burch et al., 2007).

Europarl Datasets
We perform a similar analysis to Section 3.3.1 but on another cross-lingual task set up by Lignos et al. (2019) for Czech → English (CS-EN) and German → English (DE-EN), using publicly available data from the Europarl v7 corpus (Koehn, 2005 Table 5 show that BLEURT has the highest correlation in both cases. Apart from the trained BLEURTmedian metric, MACROF 1 scores higher than the others on CS-EN, and is competitive on CS-EN. MACROF 1 is not the metric with highest IR task correlation in this setting, unlike in Section 3.3.1, however it is competitive with BLEU and CHRF 1 , and thus a safe choice as a downstream task performance indicator. We compare UNMT and SNMT for English ↔ German (EN-DE, DE-EN), English ↔ French (EN-FR, FR-EN), and English ↔ Romanian (EN-RO, RO-EN). All our UNMT models are based on XLM (Conneau and Lample, 2019), pretrained by Yang (2020). We choose SNMT models with similar BLEU on common test sets by either selecting from systems submitted to previous WMT News Translation shared tasks (Bojar et al., 2014(Bojar et al., , 2016 or by building such systems. 12 Specific SNMT models chosen are in the Appendix (Table 12). Table 6 shows performance for these three language pairs using a variety of metrics. Despite comparable scores in BLEU and only minor differences in MICROF 1 and CHRF 1 , SNMT models have consistently higher MACROF 1 and BLEURT than the UNMT models for all six translation directions.
In the following section, we use a pairwise maximum difference discriminator approach to compare corpus-level metrics BLEU and MACROF 1 on a segment level. Qualitatively, we take a closer look at the behavior of the two metrics when comparing a translation with altered meaning to a translation with differing word choices using the metric.

Pairwise Maximum Difference Discriminator
We consider cases where a metric has a strong opinion of one translation system over another, and analyze whether the opinion is well justified. In order to obtain this analysis, we employ a pairwise segment-level discriminator from within a corpuslevel metric, which we call favoritism.
We extend the definition of T from Section 2 to T = {x,h S ,h U ,y} where each of h S and h U is a separate system's hypothesis set for x. 13 Let M be a corpus-level measure such that M(h,y) ∈ R and a higher value implies better translation quality. M(h (−i) ,y (−i) ) is the corpus-level score obtained by excluding h (i) and y (i) from h and y, respectively. We define the benefit of segment i, δ M (i;h): If δ M (i;h) > 0, then i is beneficial to h with respect to M, as the inclusion of h (i) increases the corpus- 13 The subscripts represent SNMT and UNMT in this case, though the definition is general. level score. We define the favoritism of M toward i as δ M (i;h S ,h U ): If δ M (i;h S ,h U ) > 0 then M favors the translation of x (i) by system S over that in system U. Table 7 reflects the results of a manual examination of the ten sentences in the DE-EN test set with greatest magnitude favoritism; complete results are in the Appendix, Tables 15 and 16. Meaningaltering changes such as 'untranslation', (wrong) 'time', and (wrong) 'translation' are marked in italics, while changes that do not fundamentally alter the meaning, such as 'synonym,' (different) 'inflection,' and (different) 'word order' are marked in plain text. 14 The results indicate that MACROF 1 generally favors SNMT, and with good reasons, as the favored translation does not generally alter sentence meaning, while the disfavored translation does. On Ever since I joined Labour 32 years ago as a school pupil, provoked by the Thatcher government's neglect that had left my comprehensive school classroom literally falling down, I've sought to champion better public services for those who need them most -whether as a local councillor or government minister. SNMT 32 years ago, I joined Labour as a student because of the neglect of the Thatcher government, which had led to my classroom literally collapsed, and as a result I tried to promote better public services for those who need it most, whether as a local council or ministers. UNMT Last 32 years ago, as a student, because of the disdain for the Thatcher-era government, Labour joined Labour. Problems SNMT: synonym, word_order UNMT: subject, truncation , word_order Table 8: An example of favoritism that illustrates the differences between MACROF 1 and BLEU. Translations of the DE-EN test sentence with sixth largest magnitude favoritism according to MACROF 1 , along with the favoritism according to BLEU (not in the top ten). UNMT's translation does not include the second half of the sentence. MACROF 1 favors SNMT, but BLEU favors UNMT. the other hand, for the ten most favored sentences according to BLEU, four do not contain meaningaltering divergences in the disfavored translation. Importantly, none of the sentences with greatest favoritism according to MACROF 1 , all of which having meaning altering changes in the disfavored alternatives, appears in the list for BLEU. This indicates relatively bad judgment on the part of BLEU. One case of good judgment from MACROF 1 and bad judgment from BLEU regarding truncation is shown in Table 8.
From our qualitative examinations, MACROF 1 is better than BLEU at discriminating against untranslations and trucations in UNMT. The case is similar for FR-EN and RO-EN, except that RO-EN has more untranslations for both SNMT and UNMT, possibly due to the smaller training data. Complete tables and annotated sentences are in the Appendix, in Section C.

MT Metrics
Many metrics have been proposed for MT evaluation, which we broadly categorize into modelfree or model-based. Model-free metrics compute scores based on translations but have no significant parameters or hyperparameters that must be tuned a priori; these include BLEU (Papineni et al., 2002), NIST (Doddington, 2002), TER (Snover et al., 2006), and CHRF 1 (Popović, 2015). Modelbased metrics have a significant number of parameters and, sometimes, external resources that must be set prior to use. These include METEOR Pearson's correlation coefficient (r) to quantify the correlation between automatic evaluation metrics and human judgements, we instead use Kendall's rank coefficient (τ), since τ is more robust to outliers than r (Croux and Dehon, 2010).

Rare Words are Important
That natural language word types roughly follow a Zipfian distribution is a well known phenomenon (Zipf, 1949;Powers, 1998). The frequent types are mainly so-called "stop words," function words, and other low-information types, while most content words are infrequent types. To counter this natural frequency-based imbalance, statistics such as inverted document frequency (IDF) are commonly used to weigh the input words in applications such as information retrieval (Jones, 1972). In NLG tasks such as MT, where words are the output of a classifier, there has been scant effort to address the imbalance. Doddington (2002) is the only work we know of in which the 'information' of an ngram is used as its weight, such that rare n-grams attain relatively more importance than in BLEU. We abandon this direction for two reasons: Firstly, as noted in that work, large amounts of data are required to estimate n-gram statistics. Secondly, unequal weighing is a bias that is best suited to datasets where the weights are derived from, and such biases often do not generalize to other datasets. Therefore, unlike Doddington (2002), we assign equal weights to all n-gram classes, and in this work we limit our scope to unigrams only.
While BLEU is a precision-oriented measure, METEOR (Banerjee and Lavie, 2005) and CHRF (Popović, 2015) include both precision and recall, similar to our methods. However, neither of these measures try to address the natural imbalance of class distribution. BEER (Stanojević and Sima'an, 2014) and METEOR (Denkowski and Lavie, 2011) make an explicit distinction between function and content words; such a distinction inherently captures frequency differences since function words are often frequent and content words are often infrequent types. However, doing so requires the construction of potentially expensive linguistic resources. This work does not make any explicit distinction and uses naturally occurring type counts to effect a similar result.

F-measure as an Evaluation Metric
F-measure (Rijsbergen, 1979;Chinchor, 1992) is extensively used as an evaluation metric in classification tasks such as part-of-speech tagging, named entity recognition, and sentiment analysis (Derczynski, 2016). Viewing MT as a multi-class classifier is a relatively new paradigm (Gowda and May, 2020), and evaluating MT solely as a multiclass classifier as proposed in this work is not an established practice. However, we find that the F 1 measure is sometimes used for various analyses when BLEU and others are inadequate: The compare-mt tool (Neubig et al., 2019) supports comparison of MT models based on F 1 measure of individual types. Gowda and May (2020) use F 1 of individual types to uncover frequency-based bias in MT models. Sennrich et al. (2016) use corpuslevel unigram F 1 in addition to BLEU and CHRF, however, corpus-level F 1 is computed as MICROF 1 .
To the best of our knowledge, there is no previous work that clearly formulates the differences between micro-and macro-averages, and justifies the use of MACROF 1 for MT evaluation.

Discussion and Conclusion
We have evaluated NLG in general and MT specifically as a multi-class classifier, and illustrated the differences between micro-and macro-averages using MICROF 1 and MACROF 1 as examples (Section 2). MACROF 1 captures semantic adequacy better than MICROF 1 (Section 3.1). BLEU, being a micro-averaged measure, served well in an era when generating fluent text was at least as difficult as generating adequate text. Since we are now in an era in which fluency is taken for granted and seman-tic adequacy is a key discriminating factor, macroaveraged measures such as MACROF 1 are better at judging the generation quality of MT models (Section 3.2). We have found that another popular metric, CHRF 1 , also performs well on direct assessment, however, being an implicitly micro-averaged measure, it does not perform as well as MACROF 1 on downstream CLIR tasks (Section 3.3.1). Unlike BLEURT, which is also adequacy-oriented, MACROF 1 is directly interpretable, does not require retuning on expensive human evaluations when changing language or domain, and does not appear to have uncontrollable biases resulting from data effects. It is both easy to understand and to calculate, and is inspectable, enabling fine-grained analysis at the level of individual word types. These attributes make it a useful metric for understanding and addressing the flaws of current models. For instance, we have used MACROF 1 to compare supervised and unsupervised NMT models at the same operating point measured in BLEU, and determined that supervised models have better adequacy than the current unsupervised models (Section 4). Macro-average is a useful technique for addressing the importance of the long tail of language, and MACROF 1 is our first step in that direction; we anticipate the development of more advanced macroaveraged metrics that take advantage of higherorder and character n-grams in the future.

Ethical Consideration
Since many ML models including NMT are themselves opaque and known to possess data-induced biases (Prates et al., 2019), using opaque and biased evaluation metrics in concurrence makes it even harder to discover and address the flaws in modeling. Hence, we have raised concerns about the opaque nature of the current model-based evaluation metrics, and demonstrated examples displaying unwelcome biases in evaluation. We advocate the use of the MACROF 1 metric, as it is easily interpretable and offers the explanation of score as a composition of individual type performances. In addition, MACROF 1 treats all types equally, and has no parameters that are directly or indirectly estimated from data sets. Unlike MACROF 1 , MICROF 1 and other implicitly or explicitly micro-averaged metrics assign lower importance to rare concepts and their associated rare types. The use of microaveraged metrics in real world evaluation could lead to marginalization of rare types.
Failure Modes: The proposed MACROF 1 metric is not the best measure of fluency of text. Hence we suggest caution while using MACROF 1 to draw fluency related decisions. MACROF 1 is inherently concerned with words, and assumes the output language is easily segmentable into word tokens. Using MACROF 1 to evaluate translation into alphabetical languages such as Thai, Lao, and Khmer, that do not use white space to segment words, requires an effective tokenizer. Absent this the method may be ineffective; we have not tested it on languages beyond those listed in Section B.

Reproducibility:
Our implementation of MACROF 1 and MICROF 1 has the same user experience as BLEU as implemented in SACRE-BLEU; signatures are provided in Section A. In addition, our implementation is computationally efficient, and has the same (minimal) software and hardware requirements as BLEU. All data for MT and NLG human correlation studies is publicly available and documented. Data for reproducing the IR experiments in Section 3.3.2 is also publicly available and documented. The data for reproducing the IR experiments in Section 3.3.1 is only available to participants in the CLSSTS shared task.
Climate Impact: Our proposed metrics are on par with BLEU and such model-free methods, which consume significantly less energy than most modelbased evaluation metrics. work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via AFRL Contract FA8650-17-C-9116. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.

B Agreement with WMT Human Judgments
Tables 9, 10, and 11 provide τ between MT metrics and human judgments on WMT Metrics task 2017-2019. ⋆BLEU is based on pre-computed scores in WMT metrics package, whereas BLEU is based on our recalculation using SACREBLEU. Values marked with × are not significant at α = 0.05, and hence corresponding rows are excluded from the calculation of mean, median, and standard deviation.
Since MACROF 1 is the only metric that does not achieve statistical significance in the WMT 2019 EN-ZH setting, we carefully inspected it. Human scores for this setting are obtained without looking at the references by bilingual speakers (Ma et al., 2019), but the ZH references are found to have a large number of bracketed EN phrases, especially proper nouns that are rare types. When the text inside these brackets is not generated by an MT system, MACROF 1 naturally penalizes heavily due to the poor recall. Since other metrics assign lower importance to poor recall of such rare types, they achieve relatively better correlation to human scores than MACROF 1 . However, since the τ values for EN-ZH are relatively lower than the other language pairs, we conclude that poor correlation of MACROF 1 in EN-ZH is due to poor quality references. Some settings did not achieve statistical significance due to a smaller sample set as there were fewer MT systems submitted, e.g. 2017 CS-EN.

C UNMT and SNMT Models
The UNMT models follow XLM's standard architecture and are trained with 5 million monolingual sentences for each language using a vocabulary size of 60,000. We train SNMT models for EN↔DE and select models with the most similar (or a slightly lower) BLEU as their UNMT counterparts on newstest2019. The DE-EN model selected is trained with 1 million sentences of parallel data and a vocabulary size of 64,000, and the EN-DE model selected is trained with 250,000 sentences of parallel data and a vocabulary size of 48,000. For EN↔FR and EN↔RO, we select SNMT models from submitted systems to WMT shared tasks that have similar or slightly lower BLEU scores to corresponding UNMT models, based on NewsTest2014 for EN↔FR and NewsTest2016 for EN↔RO. Figure 1, which is a visualization of MACROF 1 for SNMT and UNMT models, shows that UNMT is generally better than SNMT on frequent types, however, SNMT outperforms UNMT on the rest leading to a crossover point in MACROF 1 curves. Since MACROF 1 assigns relatively higher weights to infrequent types than in BLEU, SNMT gains higher MACROF 1 than UNMT while both have approximately the same BLEU, as reported in Table 6.
A complete comparison of UNMT vs SNMT in different languages is in Table 12. A manual analysis of the ten sentences with the largest magnitude favoritism according to MACROF 1 and BLEU in      It is understood they will feature a powerful cannon, an array of anti-aircraft and anti-ship missiles as well as some stealth technologies, such as reduced radar, infrared and acoustic signatures.
It is assumed that they have a powerful cannon, a series of anti-aircraft and anti-ship missiles, as well as some steam technologies, such as reduced radar, infrared and acoustic signatures.
It is understood they have a powerful canon, a number of fluke and ship fire systems and some stealth-controlled technologies, such as reduced radar, infrarot and akustic signposts.
A group of masked proseparatists held back by riot police pelted them with eggs and hurled powder paint, creating dark clouds of dust in streets that would usually be thronged with tourists.
A group of masked proseparatists held hostage by the riot police brought them to eggs and ignited powder paint and produced dark clouds in the streets that were usually crowded by tourists.
A group of masked proindependence separforces, who were kept away by the Bereitschaftpolice, beheaded them with evocative paint and poured pulverforce paint and created dark Staubations in the streets normally clogged by tourists. -.055 Il faut bien le faire.
Il faut bien le faire. La gentillesse du personnel et la disponibil.
. In one section there is an image of a dorm room, where students click on coffee cups, curtains, trainers and books to be told about the effects of caffeine and light and how athletic performance is impacted by sleep deficiency, and the importance of a bedtime routine.
In a section there is an image of a bedroom where students click on coffee cups, curtains, trainers, and books to be informed about the effects of caffeine and light, and about how sporting performance is affected by lack of sleep and the importance of sleeping time routine.
In one section, there is a picture of a sleeping sauna where students click on coffee cups, forecourts, coaches and books to be educated about the impact of Koffein and light and about how athletic performance is influenced by sleep loss and the importance of a sleep day routine.
Nickel mining is also important in the province, but is mostly concentrated in Morowali, on the opposite coast of Sulawesi.
Nickelergbau is also important in the province, but is mainly operated in Morewali, on the opposite coast of Sulawesi.
Nickel mining is also important in the province, but is mostly operated in Morowali, on the opposite coast of Sulawesi.
Ever since I joined Labour 32 years ago as a school pupil, provoked by the Thatcher government's neglect that had left my comprehensive school classroom literally falling down, I've sought to champion better public services for those who need them mostwhether as a local councillor or government minister. 32 years ago, I joined Labour as a student because of the neglect of the Thatcher government, which had led to my classroom literally collapsed, and as a result I tried to promote better public services for those who need it most, whether as a local council or ministers.
Last 32 years ago, as a student, because of the disdain for the Thatcher-era government, Labour joined Labour.
.044 UN-Gesandter Staffan de Mistura hofft, bald die ersten Treffen eines neuen Ausschusses aus Regierungsund Oppositionsmitgliedern einzuberufen, um eine Nachkriegsverfassung für Syrien zu entwerfen und den Weg zu Wahlen zu ebnen. UN envoy Staffan de Mistura is hoping to soon convene the first meetings of a new committee comprised of government and opposition members to draft a postwar constitution for Syria and pave the way to elections.
UN envoy Staffan de Mistura hopes to convene soon the first meetings of a new committee of government and opposition members to draw up a post-war constitution for Syria and pave the way for elections.  A week after an official Chinese newspaper published a four-page ad on the mutual benefit of the US-China trade, the U.S. ambassador to China accused Beijing of using the American press to spread propaganda.
They don't care who they hurt, who they have to run over in order to get power and control, that's what they want is power and control, we're not going to give it to them." They do not care about who they hurt whom they must pass over to gain power and control, that is what they want, power and control, we will not give them.
They don't care who they hurt, who they have to pass to get power and control, that's what they want, power and control, we won't give it to them.
Mayorga claims Ronaldo fell to his knees after the alleged incident and told her he was "99 percent" a "good guy" let down by the "one percent." Mayorga claims that Ronaldo fell to the knees after the alleged incident, saying that he was "99% " a" good guy " left in the lurch by the "one percent ".
Mayorga claims Ronaldo fell on his knee after the alleged incident and told her he was "to 99 percent" a "good guy" who was left in the dark by the "one percent." -. Briefings will still happen, Sanders said, but "if the press has the chance to ask the president of the United States questions directly, that's infinitely better than talking to me.
briefing is still going to take place, Sanders said, but the press should have the opportunity to put the questions directly to the President of the United States, if that is infinitely better than to talk to her.
Briefings will still take place, Sanders said, but if the press has the chance to ask the president of the United States directly, so that is unendlich better than talking to her. All it took was to highlight its mistakes and, in keeping with Jacobinism, the issue would be entrusted to prefects and sub-prefects -the authorised interpreters.
It should deploy the accidents, and the case, Jacobinism obliges, was entrusted to the prefects and the sub-prefects, authorized interpreters.
It only took to deploy the accidents, and the matter, jacobinite oblige, was handed to the préfets and the sous-préfets, authorized interprètes.
Experts say confessions are still routinely coerced, despite a change in the law earlier this year banning the authorities from forcing anyone to incriminate themselves.
The experts say that people are systematically forced to make their confessions, despite a change in the law which was passed earlier this year prohibiting the authorities to force anyone to incriminating himself.
Experts say people are routinely forced to make their confessions, despite a change in the law that was passed earlier in the year banning officials from forcing anyone to incriminate themselves.
They are intersex, part of a group of about 60 conditions that fall under the diagnosis of disorders of sexual development, an umbrella term for those with atypical chromosomes, gonads (ovaries or testes), or unusually developed genitalia.
They are intersex, intersex forming part of the Group of 60 diseases diagnosed as disorders of sexual development, a generic term for people with chromosomes or atypical gonads (ovaries or testes) or abnormally developed sexual organs.
They are intersexuzed, with intersexuality making up the group of the soixantaine of diseases diagnosed as disordered sexual development, a generic term dissignant people possessing chromosomes or gonades (ovaries or testicules) atypiques or anormally developed sexual organs. Is it not surprising to read in the columns of the world a few weeks apart on the one hand the reproduction of American diplomatic correspondence and on the other hand a condemnation of the Quai d'Orsay by the NSA listens?
Isn't it surprising to read in the Times' pages just weeks apart of one side's reproduction of the American diplomatic correspondance and of another a condamnation of the Quai d' Orsay's écoutes by the NSA?
The ministries are currently asking anyone who might have been bitten, clawed, scratched or licked on a mucous membrane or on damaged skin by the kitten, or who own an animal that may have been in contact with the kitten between 08 to 28 October, to contact them on 08 11 00 06 95 between 10am and 6pm from 01 November.
Departments now call people who have been bitten, scratched, scratched or licked on mucous membranes or skin injured by this kitten or where the animal would have been in contact with this kitten between 8 and 28 October to contact the 08.11.00.06.95 between 10 a.m. and 6 p.m. from November 1.
The at present call on people who may have been morbid, griffon, egregious or layed on a mug or on a skin léché by this chateau or whose animal may have been in contact with that chateau between 8 and 28 October to contact 08.11.00.095 between 10 and 18 November.
Pierre Nora has shown has shown great interest and even tenderness for this Third Republic: he salutes those who tried at the time to repair the divide created by the Revolution by teaching students about everything in the former France that obscurely paved the way for the modern France, and by offering them a unified version of their history.
This third Republic, while central and creator, Pierre Nora has shown great interest and even tenderness: saluting those who then worked to repair the revolutionary divide, by teaching students what in the former France preparing darkly modern France and offering them a version unified in their history.
At this IIIe République, central and creator moment, Pierre Nora showed much interest and even tendresse: praising those who then helped to repair the revolutionary fracture, teaching schoolchildren everything in the former French Republic that obscurantly prepared modern France and offering them a unifying version of their history. .030 La théorie dominante sur la façon de traiter les enfants pourvus d'organes sexuels ambigus a été lancée par le Dr John Money, de l'université Johns-Hopkins, qui considérait que le genre est malléable.
The prevailing theory on how to treat children with ambiguous genitalia was put forward by Dr. John Money at Johns Hopkins University, who held that gender was malleable.
The prevailing theory about how to treat children with ambiguous sexual organs was launched by Dr. John Money of the Johns Hopkins University, who considered that the genre is malleable.
The dominant theory about how to treat children armed with ambigüous sex organs was launched by Dr. John Money, of Johns-Hopkins University, who considered the genre maudlin. But Rep. Bill Shuster (R-Pa.), chairman of the House Transportation Committee, has said he, too, sees it as the most viable long-term alternative.
But Congressman Bill Shuster (R -PA.), Chairman of the House of representatives Transportation Committee, said that he considered as the most viable alternative in the long term.
But Rep. Bill Shuster (R-Pa.), chairman of the House Transportation Committee, said he also considered it the most viable long-term alternative.
.025 Les neuf premiers épisodes de Sheriff Callie's Wild West seront disponibles à partir du 24 novembre sur le site watchdisneyjunior.com ou via son application pour téléphones et tablettes.
The first nine episodes of Sheriff Callie's Wild West will be available from November 24 on the site watchdisneyjunior.com or via its application for mobile phones and tablets.
The first nine episodes of Sheriff Callie's Wild West will be available from November 24 on the site watchdisneyjunior.com or via its application for phones and tablets.
Sheriff first nine episodes of Sheriff Callie' s Wild West will be available as of November 24 on the watchdisneyjunior.com website or via its application for phones and computers. .024 Le président Xi Jinping, qui a pris ses fonctions en mars dernier, a fait de la lutte contre la corruption une priorité nationale, estimant que le phénomène constituait une menace à l'existencemême du Parti communiste. A day earlier, on the road leading to Bunagana, a postcode with Uganda, military personnel aided by civilians were loading a multiple rocket lance-roquettes fire on a flambant neuf FARDC truck, expected to provide the lead for another device pilfering M23 positions on the hills. .021 Il y a, avec la crémation, "une violence faite au corps aimé", qui va être "réduit à un tas de cendres" en très peu de temps, et non après un processus de décomposition, qui "accompagnerait les phases du deuil".
With cremation, there is a sense of "violence committed against the body of a loved one", which will be "reduced to a pile of ashes" in a very short time instead of after a process of decomposition that "would accompany the stages of grief".
There, with the cremation, "a violence made to the beloved body", which will be "reduced to a pile of ashes" in a very short time, and not after a process of decomposition, which "would accompany the phases of mourning".
There is, with cremation, "violence done to the loved one," which is going to be "reduced to a tas of ashes" in very little time, and not after a process of disablement, which would "accompany the phases of grief." . -.021 Il a indiqué que le nouveau tribunal des médias « sera toujours partial car il s'agit d'un prolongement du gouvernement » et que les restrictions relatives au contenu et à la publicité nuiraient à la place du Kenya dans l'économie mondiale.
He said the new media tribunal "will always be biased because it's an extension of the government," and that restrictions on content and advertising would damage Kenya's place in the global economy.
He said as the new media tribunal ' will be always partial because it is an extension of the Government "and content and advertising restrictions hurt instead of the Kenya into the world economy.
He said the new media tribunal "will always be partial because it is a extension of the government" and that restrictions relating to content and advertising would hurt Kenya's place in the global economy. .021 Dans "Les Fous de Benghazi", il avait été le premier à révéler l'existence d'un centre de commandement secret de la CIA dans cette ville, berceau de la révolte libyenne.
In "Les Fous de Benghazi", he was the first to reveal the existence of a secret CIA command centre in the city, the cradle of the Libyan revolt.
In "Les Fous de Benghazi", he was the first to reveal the existence of a secret CIA command center in this town, cradle of the Libyan revolt.
In "The Facts of Libya," he had been the first to reveal the existence of a secret CIA command center in that city, the birthplace of the Libyan uprising. -.020 Le Sénat américain a approuvé un projet pilote de 90 M$ l'année dernière qui aurait porté sur environ 10 000 voitures.  A good portion of the store is given over to the massive freezer section, where you'll find frozen curry leaves, bitter melon and galangal, whole ducks, fish, beef blood and bile, pork casings, fish balls, regional sausages, commercially-prepared foods and more.
An important part of the store is devoted to giant rayon of freezers, where you'll find curry leaves, melon and Ginger frozen whole fish, ducks, beef bile and blood, carcasses of pork, fish balls, sausages, traditional dishes and more.
A significant part of the store is devoted to the huge frozen fried chicken county, where you will find curry leaves, peach and frozen ghrelish, whole fish, fish, blood and bone stock, pork carcass, traditional fish carnouts, semi-cooked carnacies and many others.
Those interested will find out how to make harmonious sculptural floral creations (ikebana), how to capture the soul of the surrounding elements in plastic compositions using the Japanese ink painting art (Sumie) or how to express their own individuality and creativity through the multifaceted universe of Fine Arts (graphic, drawing, painting) said Sorin Mazilu, the teacher of these courses.
Persons interested will be able to learn how to achieve harmonious sculptural floral design (ikebana), how to capture the soul of the surrounding elements in plastic compositions using the Japanese art of painting in India ink (Sumie) or how to express their own individuality and creativity through the universe plurivalent of fine arts (painting, drawing, graphics), said Sorin Mazilu, teacher of such courses.
People interested will be able to learn how to complete armonious creative contrasting (grafic, drawing, painting) Creations, how to capture the hearts of surrounding elements in artistic compositions using the Japanese art of painting in tus (Sumie) or how to express their own individualistic identity through the plurivalent universe of beautiful arts (grafic, drawing, painting, painting), said Mr. Mazilu, the professor of these courses.
The humour of Joey and Chandler's frequent hugs, moments watching football on the comfy chairs, and Ross's pining for Rachel, came from the knowledge that yes, men can all relate to this, even if they often hold back from fully exploring their feelings.
His humor frequently asked îmbrȃt ,is ,ȃrilor of Joey and Chandler, the times when looking at football in comfortable armchairs and Ross's passion for Rachel came from the fact that it is known that men may be associated with this case, even if often keep out to explore fully the sentiments.
Ummaline frequent imbratisations of Joey and Chandler, the moments when he looks at football in his comfy fotoland Ross's passion for Rachel came from knowing that men can assimilate with the situation, even though often they are shy about fully exploring their feelings.
In addition, in the coming days we will accept the works from the bridge linking the ER to the cardiology, hepatology and gastroenterology clinics to ensure the proper transfer of patients from the ER to the neighbouring clinics explained the manager of the medical unit.
In addition, the coming days we will make the reception of the works from the footbridge linking the UPU of Cardiology clinics, Hepatology and gastroenterology to ensure decent conditions of transportation of the patient from the UPU in neighboring clinics, explained Manager medical unit.
In addition, next week we will make the receptia work on the pasarela linking the UPU to the cardiology, hepatology and gastroenterology clinicians to ensure the transport in decent manner of the patient from the UPU to neighbouring clinicians, the hospital's chief executive explained.
Although Arab media announced that the separation between Sanmartean and Ittihad is imminent, coach Ladislau Boloni's agent, Arcadia Zaporojanu, says this will no longer happen.
Although the Arabic press announces that the breakup of his impending Sanmartean Ittihad e, impresario, Arcadie antreorului Baghery Isaac says that this will no longer happen.
While the Arab media say Sanmartean's departure from Ittihad is imminent, impressions of coach Ladislau Boloni's agent Arcadie Zaporojanu say this will never happen.
They included a mixture of tax revenue, pungas, i foreign fraudau telephone and non-pseudo domi who hold most major newspapers.
They included the mixture of unpaid tax pungas, foreigners who telephone and pseudo non-doms who own most of the key newspapers.
The Fed should put financial stability concerns first only during a major crisis, such as the 2008 market meltdown, said Adam S. Posen, a former member of the Bank of England's rate-setting committee.
The Fed should put the issue of financial stability in the first place only in the event of major crises such as the earthquake in the market since 2008, "said Adam s. Posen, a former member of the Commission for determining the interest rate within the Bank of England.
The Fed should put financial stability first only in a major crisis, such as the 2008 credit crunch, said Adam S. Posen, a former member of the Bank of England's stabilisation committee.
When Clinton's supporters are asked in an open-ended question why they want her to be the nominee, the top answer is that she has the right experience (16 percent), followed by it's time for a woman president (13 percent), and that she is the best candidate for the job (10 percent).
When Clinton supporters are asked to answer an open question regarding why you want her to win the race, the answer is that majority holds appropriate experience (16%), followed by the fact that the time has come for a woman to be President (13%), and that is the best candidate for this position (10%).
When Clinton supporters are asking an open question as to why she would like to win the race, the majoritar answer is that she holds the right experience (16 percent), followed by the fact that it has come time for a woman to be president (13 percent) and that she is the best candidate for the job (10 percent).
Clinton now has the backing of 47 percent of Democratic primary voters (down from 58 percent), while Sanders comes in second, with 27 percent (up from 17 percent).
Clinton is now supported by 47% of the voters Democrats (down 62%), while Sanders is in second place with 27% (in relation to climb 17%).
Clinton is now backed by 47 percent of Democratic voters (down from 58 percent), while Sanders is second with 27 percent (up from 17 percent).
Those interested will find out how to make harmonious sculptural floral creations (ikebana), how to capture the soul of the surrounding elements in plastic compositions using the Japanese ink painting art (Sumie) or how to express their own individuality and creativity through the multifaceted universe of Fine Arts (graphic, drawing, painting) said Sorin Mazilu, the teacher of these courses.
Persons interested will be able to learn how to achieve harmonious sculptural floral design (ikebana), how to capture the soul of the surrounding elements in plastic compositions using the Japanese art of painting in India ink (Sumie) or how to express their own individuality and creativity through the universe plurivalent of fine arts (painting, drawing, graphics), said Sorin Mazilu, teacher of such courses.
People interested will be able to learn how to complete armonious creative contrasting (grafic, drawing, painting) Creations, how to capture the hearts of surrounding elements in artistic compositions using the Japanese art of painting in tus (Sumie) or how to express their own individualistic identity through the plurivalent universe of beautiful arts (grafic, drawing, painting, painting), said Mr. Mazilu, the professor of these courses.
Companies have thousands of vacant jobs, internship opportunities and internships, and some of them are already posted on the official website www.targuldecariere.ro.
Companies have thousands of vacant jobs, internship opportunities, and internships, and some of them are already announced on the official website www.targuldecariere.ro.
Companies have filled several thousand jobs, internships and training trips, and a number of them are already announced on the official website www.targuldecarier.com.
Brisbane Broncos coach Wayne Bennett made a thinly veiled reference to the Storm after his side's qualifying final win over North Queensland Cowboys on Saturday night when he called that game a "showcase" of the rugby league and said the two Queensland weren't "too big" into wrestling.
The coach of the Brisbane Broncos, Wayne Bennett, made a slight reference to the Storm after his team's victory in the qualifying match with North Queensland Cowboys, played Saturday night, when he called the match a "demonstration" of rugby league and said that the two teams from Queensland is not too good at wrestling.
Brisbane Broncos coach Wayne Bennett made a slight reference to the Storm after their team's victory in the qualifying match against the North Queensland Cowboys on Saturday night when he called the game a "demonstration" of the rugby league league league and said the two Queensland sides were not really pricep to wrestling.
David Grimal admits that Les Dissonaces is "a luxury", adding that none of the musicians who are part of the orchestra does not depend on the success of this project to make a living.
David Grimal admits that Les Dissonaces represents "a luxury", stating that none of the musicians who are part of the orchestra does not depend on the success of this project for living.
David Grimal admits Les Dissonaces represent "a luxury," saying none of the musicians who make up the orchestra depinde of the success of this project to live.