Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

Abstract


Introduction
Sequence labelling is the task of assigning a label to each token in a given input sequence.Sequence labelling is a fundamental process in many downstream NLP tasks.Currently, most successful approaches for this task apply supervised deep-neural networks (Lample et al., 2016;Akbik et al., 2018;Devlin et al., 2019;Conneau et al., 2020).However, as it was the case for supervised statistical approaches (Agerri and Rigau, 2016), their performance still depends on the amount of manually annotated training data.Additionally, deep-neural models still show a significant loss of performance Figure 1: In the data-based transfer approach we translate and project the labels of the gold data into the target language, and use the resulting silver data to train a model for the target language.In the model-based transfer approach we train a model with gold data in English and use it in a zero-shot setting in the target language.
when evaluated in out-of-domain data (Liu et al., 2021).This means that to improvie their performance, it would therefore be necessary to develop very costly manually annotated data for each language and domain of application.Thus, considering that for most of the languages in the world the amount of manually annotated corpora is simply nonexistent (Joshi et al., 2020), then the task of developing sequence labelling models for languages and domain-specific tasks, for which supervised data is not available, remains a challenge of great interest.This task is known as zero-resource crosslingual sequence labelling.
Data-based cross-lingual transfer methods aim to automatically generate labelled data for a target language.Previous works on data-based transfer have proposed translation and annotation projection as an effective technique for zero-resource cross-lingual sequence labelling (Jain et al., 2019;Fei et al., 2020).In this setting, as illustrated in Figure 1, the idea is to translate gold-labelled text into the target language and then, using automatic word alignments, project the labels from the source into the target language.The result is an automatically generated dataset in the target language that can be used for training a sequence labelling model.
The emergence of multilingual language models (Devlin et al., 2019;Conneau et al., 2020) allows for model-based cross-lingual transfer.As Figure 1 illustrates, using labelled data in one source language (usually English), it is possible to fine-tune a pre-trained multilingual model that is directly used to make predictions in any of the languages included in the model.This is also known as zeroshot cross-lingual sequence labelling.
In this work we present an in-depth study of both approaches using the latest advancements in machine translation, word aligners and multilingual language models.We focus on two sequence labelling tasks, namely, Named Entity Recognition (NER) and Opinion Target Extraction (OTE).In order to do so, we present a data-based crosslingual transfer approach consisting of translating gold labeled data between English and 7 other languages using state-of-the-art machine translation systems.Sequence labelling annotations are then automatically projected for every language pair.Additionally, we also produced manual alignments for those 4 languages for which we had expert annotators.After translation and projection, for the data-transfer approach we fine-tune multilingual language models using the automatically generated datasets.We then compare the performance obtained for each of the target languages against the performance of the zero-shot cross-lingual method, consisting of fine-tuning the multilingual language models in the English gold data and generating the predictions in the required target languages.
The main contributions of our work are the following: First, we empirically establish the required conditions for each of these two approaches, data-transfer and zero-shot model-based, to outperform the other.In this sense, our experiments show that, contrary to what previous research sug-gested (Fei et al., 2020;Li et al., 2021), the zeroshot model-based approach obtains the best results when high-capacity multilingual models including the target language and domain are available.Second, when the performance of the multilingual language model is not optimal for the specific target language or domain (for example when working on a text genre and domain for which available language models have not been trained), or when the required hardware to work with high-capacity language models is not easily accessible, then datatransfer based on translate and project constitutes a competitive option.Third, we observe that machine translation data often generates training and test data which is, due to important differences in language use, markedly different to the signal received when using gold standard data in the target language.These discrepancies seem to explain the larger error rate of the translate and project method with respect to the zero-shot technique.Finally, we create manually projected datasets for four languages and automatically projected datasets for seven languages.We use them to train and evaluate cross-lingual sequence labelling models.Additionally, they are also used to extrinsically evaluate machine translation and word alignment systems.These new datasets, together with the code to generate them are publicly available to facilitate the reproducibility of results and its use in future research.1 2 Related work 2.1 Data-based cross-lingual transfer Data-based cross-lingual transfer methods aim to automatically generate labelled data for a target language.Some of these methods exploit parallel data.Ehrmann et al. (2011) automatically annotate the English version of a multi-parallel corpus and projects the annotations into all the other languages using statistical alignments of phrases.Wang and Manning (2014) project model expectations rather than labels, which facilities transfer of model uncertainty across languages.Ni et al. (2017) use a heuristic scheme that effectively selects goodquality projection-labeled data from noisy data.They also project word embeddings from a target language into a source language, so that the source-language sequence labelling system can be applied to the target language without re-training.Agerri et al. (2018) use parallel data from multiple languages as source to project the labelled data to a target language, showing that the combination of multiple sources improves the quality of the projections.Li et al. (2021) uses the XLM-R model (Conneau et al., 2020) for labelling sequences in the source part of the parallel data and also for annotation projection.
Instead of relying on parallel data, Jain et al. (2019) andFei et al. (2020), use machine translation to automatically translate the sentences of a gold-labelled dataset to the target languages.The translated data is then annotated by projecting the gold labels from the source dataset.For this purpose, Jain et al. (2019) first generate a list of projection candidates by orthographic and phonetic similarity.They choose the best matching candidate based on distributional statistics derived from the dataset.Fei et al. ( 2020) leverages the word alignment probabilities calculated with FastAlign (Dyer et al., 2013) and the POS tag distributions of the source and target words.
High quality parallel data or machine translation systems are not always available.Thus, Xie et al. (2018) proposes to find word translations based on bilingual word-embeddings.Alternatively, Guo and Roth (2021) translate labelled data in a wordby-word manner with a dictionary.Then, they the construct target-language text from the sourcelanguage annotations with a constrained pretrained language model.

Model-based transfer
Language models trained on monolingual corpora in many languages (Devlin et al., 2019;Conneau et al., 2020) allow zero-shot cross-lingual model transfer.Task-specific data in one language is used to fine-tune the model for evaluation in another language (Pires et al., 2019).The zero-shot crosslingual capability can be improved for the sequence labelling task using different techniques.The approaches of Wang et al. (2019) and Ouyang et al. (2021) use monolingual corpora to improve the alignment of the language representations within a multilingual model.Instead of using a single source model, (Rahimi et al., 2019) propose to use many models from many source languages to improve the zero-shot transfer to a new language.They learn to infer which are the most reliable mod-els in an unsupervised manner.Wu et al. (2020) take advantage of a Teacher-Student learning approach.NER models in the source languages are used as teachers to train a student model on unlabeled data in the target language.Bari et al. (2021) propose an unsupervised data augmentation framework to improve the cross-lingual adaptation of models using self-training.Hu et al. (2021) use the minimum risk training framework to overcome the gap between the source and the target languages/domains.They propose a unified learning algorithm based on the expectation maximization.
Using low-capacity multilingual language models such as mBERT, Fei et al. ( 2020) finds that their data-based cross-lingual transfer approach is superior to the zero-shot transfer method.However, Li et al. (2021) when using XLM-RoBERTa, a higher capacity multilingual model, obtain the best results for German and Chinese applying the databased cross-lingual transfer approach, while the zero-shot approach is best for Spanish and Dutch.We extend their research on zero-resource settings with two different Sequence Labelling tasks, seven languages and three multilingual models of different capacity.Our experiments and the error analysis carried out establish the required conditions on which zero-shot and data-transfer approaches outperform each other.

Translation and projection method
Our data-based cross-lingual transfer method to perform cross-lingual sequence labelling is the following: we assume our source language to be English, for which we have train and development data.Furthermore, we also assume that the only goldlabelled data available for the target language is the evaluation set.In this setting, we automatically generate data for the target language by translating the gold-labelled English data.Then we project the gold labels from the source sentences to the translated sentences by leveraging automatic word alignments.Given a sentence x = ⟨x 1 , ..., x n ⟩ with length n in the source language and a translated sentence y = ⟨y 1 , ..., y m ⟩ with length m in the target language, we use a word aligner to find a set of pairs A = {⟨x i , y j ⟩ : x i ∈ x, y j ∈ y} where for each word pair ⟨x i , y j ⟩ y i is the lexical translation of x j .Next, given a sequence s = ⟨x a , ..., x b ⟩ ∈ x labeled with a category C we will label the sequence t = ⟨y c , ..., y d ⟩ ∈ y with category C if {∀y j ∈ t ∃x i ∈ s : (⟨x i , y j ⟩ ∈ A)}.In other   words, if a word labelled with a category in the source sentence is aligned to a word in the target sentence, we label the target word with the category from the word in the source sentence.Figure 2 illustrates our method.
When projecting annotations we find two main problems: split annotations and annotation collision.In the first case, a labeled sequence in the source sentence is split into multiple sequences in the target sentence.This happens when the alignment for a word is missing.In this case, we merge the sequences in the target sentence if the gap between them is just one word.If we still end up with multiple sequences, we choose the longest one.In the annotation collision case, a word in the target sentence is aligned to two different labelled sequences in the source language.If the two sequences are of the same category, we merge them and we label the two sequences as a single one in the target sequence.If they are of different category we just consider the one with the longest length.Finally, if a punctuation symbol in the target sequence is aligned to a labeled word in the source sentence we remove this alignment.

Datasets
We conducted experiments in two sequence labelling tasks, namely, Opinion Target Extraction (OTE) and Named Entity Recognition (NER).Opinion Target Expression (OTE): Given a review, the task is to detect the linguistic expression used to refer to the reviewed entity.We use the SemEval-2016 Task 5 Aspect Based Sentiment Analysis (ABSA) datasets (Pontiki et al., 2016).
We experiment with the English, Spanish, Dutch, French, Russian and Turkish datasets from the restaurant domain.
Named Entity Recognition (NER): Given a text, the task is to detect named entities and classify them in pre-defined categories.For Spanish and Dutch we use the CoNLL-2002 datasets (Tjong Kim Sang, 2002).For English and German we use the CoNLL-2003 datasets (Tjong Kim Sang and De Meulder, 2003) and for Italian we use Evalita 2009 data (Speranza, 2009).We map the Geo-Political Entities from Evalita 2009 to location labels to make them compatible with the CoNLL data.

Experimental Setup
We perform 1-to-1 annotation projection in two directions: Translate-Train: We translate the English train and development data to the target language.We project the gold labels from the English data to the translated dataset.We then train a sequence labelling model using only the automatically generated dataset for the target language.Translate-Test: We translate the target language test set to English.We then use a model trained in the English gold-labelled data to label the translated test set.Finally, we project the labelled sequences back to the target language.
These two data-based cross-lingual transfer approaches are compared with the zero-shot method in which a fine-tuned model using English goldlabelled data is evaluated by generating predictions in the target language.Finally, we also fine-tuned language models on the gold-labelled data, which would constitute the upper-bound in our experimental setting.

Machine Translation
We tested DeepL2 , MarianMT (Junczys-Dowmunt et al., 2018;Tiedemann and Thottingal, 2020), M2M100 (1.2B) (Fan et al., 2020) and mBART (mbart-large-50) (Tang et al., 2020).A qualitative analysis performed during the projection of the OTE labels established that DeepL produced the more fluent translations.Thus, we decided to use DeepL (web version during the second half of 2021) to perform the machine translation for our data-based cross-lingual transfer experiments.The exception was Turkish, which is not supported by DeepL.In this case we use M2M100.

Word Alignments
For word alignments, we use the AWESoME (Dou and Neubig, 2021) system.AWESoME leverages multilingual pretrained language models and finetune them on parallel text.Unsupervised training objectives over the parallel corpus improve the alignment quality of the models.AWESoME authors claim that the model works best with mBERT (Devlin et al., 2019) as backbone, so we follow their advice.Although we also experimented with GIZA++ (Och and Ney, 2003), FastAlign (Dyer et al., 2013) and SimAlign (Dou and Neubig, 2021), systems based on alignments from AWESoME produced the highest F1 scores when comparing the model projections and manually annotated projections (see Section 7).
To train the alignment models we use the English gold-labelled dataset together with the respective MT system translations as parallel corpora.We augment the training data with 50,000 random parallel sentences from ParaCrawl v8 (Esplà et al., 2019) for all the language pairs except Turkish, for which we use 50,000 random parallel sentences from the raw CCAligned v1 corpus (El-Kishky et al., 2020).CCAligned has received some criticism (Kreutzer et al., 2022), but the available English-Turkish parallel data is very limited.In Section 7 we analyze the performance of the alignment systems, and we show that CCAligned does not hurt the performance of the aligners.

Sequence Labelling Models
We use three state-of-the-art multilingual pretrained language models for sequence labelling: multilingual BERT (mBERT) (Devlin et al., 2019) and XLM-RoBERTa (XML-R) base and large (Conneau et al., 2020).For both models, we add a token classification layer (linear layer) on top of each token representation.We use the sequence labelling implementation of the Huggingface opensource (Apache-2.0License) library (Wolf et al., 2019).F1 scores and standard deviation scores are reported by averaging the results of 5 runs with different random seeds.Details on models sizes, hyper-parameters and datasets are provided in the Appendix (A, B and C).

Opinion Target Extraction
Opinion Target Extraction (OTE) results are reported in Table 1.The zero-shot model transfer using mBERT obtains better results for Spanish and French.However, for Dutch, Russian and Turkish the best results are obtained by the datatransfer approaches.The overall picture changes when using XLM-RoBERTa (XLM-R) base.First, the zero-shot baseline is much closer to the gold upper bound than that of mBERT.This shows that XLM-R has better multilingual transfer learning capabilities for this task.In fact, the zero-shot transfer outperforms the translate-train and translate-test approaches for all languages except Turkish.Second, the XLM-R base results on gold-labelled data are substantially better than those of mBERT.Finally, XLM-R large offers the best cross-lingual transfer capabilities, as the zero-shot transfer is clearly superior for every language, including Turkish.
A common trait for all three models in the OTE benchmark is that the translate-train approach is superior to the translate-test approach in the large majority of the cases.As expected, all the approaches achieve a performance significantly lower than the gold upper bound.

Named Entity Recognition
If we compare the OTE results with those obtained for NER (Table 2), we see a number of different patterns.First, the zero-shot approach using mBERT outperforms the data-based cross-lingual transfer methods (translate-train and translate-test) for the majority of languages .Second, unlike in OTE, the translate-test is systematically better than translatetrain.Third, the mBERT performance on CoNLL data is similar to that of XLM-R base.Finally, finetuning XLM-R base on translated and projected data obtains better results for German and Italian than the zero-shot method.However, XLM-R large provides obtains the same results as for OTE, obtaining the best results for every language in the zero-shot setting.This validates the findings of the OTE results, namely, that the performance of the zero-shot method heavily depends on the characteristics of the multilingual language model used.
Previous research has demonstrated that crosslingual transfer with mBERT works best for topologically similar languages (Pires et al., 2019;Wu and Dredze, 2020), which is somewhat coherent with the results obtained for Spanish and French, where the zero-shot transfer is superior to the Translate-train and Translate-test approaches, while it is worse for Russian and Turkish.Additionally, it is worth noting that mBERT has been trained using only Wikipedia text for 104 languages.
In contrast, XLM-R (both base and large) have been trained using CommonCrawl (Wenzek et al., 2019), a much larger multilingual corpus with a variety of texts extracted from the Web, perhaps also including texts of similar domain to those in the OTE datasets.This may also account for the large differences in OTE performance between XLM-R base and mBERT.In this sense, the similar performance between mBERT and XLM-R base for NER might be partially due to the fact that the CoNLL and Evalita datasets consist of news stories in which most of the labelled entities may appear in the Wikipedia, the texts used to pre-train mBERT.
The performance of the XLM-R large shows that pretrained models with larger capacity help to obtain strong performance across languages, also for zero-shot cross-lingual methods.Still, databased cross-lingual transfer (Translate-Train and Translate-Test) approaches remain useful if access to the required hardware for working with such larger language models is not available.

Error Analysis
The experiments described in Section 6 showed that translate-train and translate-test perform worse than the zero-shot approach when using XLM-R large.
In this section we will assess the performance of the machine translation and word alignment models.Furthermore, we will undertake an error analysis to better understand the shortcomings of translatetrain and translate-test with respect to the zero-shot cross-lingual transfer.

Evaluating the Projection Method
We start our experiments by analyzing the quality of our automatically projected annotations.In order to do that, human annotators manually projected the labels from the English OTE gold-labelled data to the automatic translations to Spanish, French and Russian using DeepL and M2M100 for Turkish.
The annotators are NLP PhD candidates with either native and/or proficient skills in both English and the target language.See Section E for more details.
We compare the projections of the annotations automatically generated by the different word alignment methods with those provided by the human annotators.Table 4 shows that the language modelbased methods (SimAlign and AWESoME) outperform the statistically based methods (GIZA++ and FastAlign) by a wide margin in all languages.Furthermore, AWESoME consistently outperforms SimAlign for every language.The performance of the AWESoME system confirms that it is possible to generate high quality annotations close to those generated by human experts.The results also show that for Turkish performance is lower than for the other languages.This is the case for the methods that require parallel data (GIZA++, FastAlign and AWESoME) as well as SimALign that does not require parallel data.So we can attribute the lower performance to the difficulty of projecting annotations for the English-Turkish pair and not the usage of the CCAligned corpus.Table 4: OTE F1 score between the human annotation projections vs the automatic projections generated using different alignment models.
While Table 4 shows that we generate high quality annotation projections, the best model, AWE-SoME, still makes some mistakes.To explore the effect of these mistakes we fine-tune XLM-R large models on the manually projected train datasets and compare their performance on the gold-labelled test sets with the models trained on the AWESoME automatically projected data.Table 5 shows that the models obtained using the manually projected data are sightly better, except for Turkish, which once again acts as outlier.In any case, as the results obtained by fine-tuning on the manually projected data are still worse than the zero-shot method, this experiment proves that the projection of annotations is not responsible for the data-based crosslingual transfer methods to be inferior to the zeroshot baseline.

Downstream Evaluation of Machine Translation Models
In order to evaluate the influence of the machine translation system used, we translate the English gold-labelled data using four different translation systems.We fixed AWESoME as the word aligner for annotation projection.We fine-tune XLM-R large with each of the generated training data and evaluate it against the gold-labelled test data from OTE.As Table 6 shows, there are no big differences in the final F1 scores when using different translation systems (Turkish is again being the exception), we decided to carry on using DeepL based on the manual assessment mentioned in Section 3.

Where do the models fail?
To better understand what is happening we identify the most common false negatives and positives for both OTE and NER tasks.Table 7 shows the most frequent false negatives and positives where there is a big mismatch between methods.
As it has been previously noticed (Agerri and Rigau, 2019), in the ABSA data the words "comida" (food) and "restaurante" (restaurant) are highly ambiguous, so we could expect the models to fail with these words.In addition, we have found out 4 main sources of errors, which are analyzed below.
Many-to-one translations: This is stereotypical of targets such as "trato" and "atención" in Span-ish, which, in addition to "servicio", are used to refer to "service" in English.There are 160 sentences in the English gold-labelled data containing the word "service"; in 153 of them "service" is labelled as target.DeepL systematically translates it as "servicio".However, as shown by Table 8, in the Spanish gold-labelled data "service" is also commonly referred as "trato" or "atención", instead of "servicio".
This would result in a training set without any occurrences of "trato" and "atención" which often occur in the gold-labelled test data.Both the zeroshot and the data-based cross-lingual transfer approaches fail to correctly label these words, which shows a problem of using automatically translated data.Interestingly, the zero-shot approach using XLM-R large correctly classifies "trato" (only fails to label 1 of the 19 occurrences).As shown by our experimental results, XLM-R large is more robust than mBERT and XLM-R base.
Something similar happens with the word "place", which in Spanish can be most frequently translated as "lugar" or "sitio".However, DeepL almost always translates it as "lugar" which results in "sitio" being absent in the automatically generated training data while being more frequent than "lugar" in the gold-labelled data.Note that this is not a problem for the "translate-test", given that the translation direction is Spanish to English.
Errors induced by incorrect or missing alignments: For NER we found errors of different nature.Articles and prepositions (i.e."de", "la") are among the words with higher false positive rate for the translate-test and translate-train approaches.We can attribute it to word alignment errors.Large multi-word named entities such as "Consejo General de la Arquitectura Técnica de España" (General Council of Technical Architecture of Spain) are labelled as entities.Word aligners struggle to correctly align articles in these complex expressions specially when a one-to-many or many-toone alignment is required.In fact, in this example, the word aligners we tested failed to correctly align "of" with "de la".
Errors induced by dataset inconsistencies: Another issue is the differences across languages in the original gold-labelled annotations.Thus, "Gobierno" (Government) and "Estado" (State) are labelled as organizations in the Spanish gold-labelled data, but they are not considered to be entities in the English gold-labelled data.The opposite occurs with demonym words.They are labelled as miscellaneous entities in the English data but in Spanish they are not annotated.Cross-lingual models are likely to fail labelling these cases.
Lost in Translation: Finally, there is another group of words related to Spanish Government names which are not commonly used in English for the same contexts (i.e."Economía" to refer to the"Ministry of Economy" or "Ministerio de Economía" in Spanish, "Junta" for "local government", or "Plan" for "government projects").While these words appear frequently in the Spanish data as part of commonly used named entities, that is not the case in the English data, where it is customary to use "Treasury Department" (or variations thereof) which are correctly translated into Spanish by DeepL as "Departamento del Tesoro".This means that, during fine-tuning on the translated data, the model is not receiving any signal to learn that "Economy" may be part of an organization entity.This may explain why the zero-shot method performs better for cases such as "Economía", "Hacienda", "Plan" and "Junta", listed in Table 7.
Summarizing, we see that machine translation data often generates a signal which is, due to inherent differences in language use, different to the signal received when using gold-labelled data in the target language.This disagreement seems to be the most common reason for the larger number of false positive and negatives of the data-based cross-lingual transfer method with respect to the zero-shot technique.A detailed error analysis demonstrates that databased cross-lingual transfer is hindered by machine translations which, although linguistically sound, do not align with the cultural behaviour of the target language use.Moreover, the results also show that the different word alignments methods (for annotation projection) are of high quality, obtaining comparable results with respect to manually generated alignments.
In any case, our results establish that there is still room for improving the cross-lingual performance of zero-resource sequence labelling.zero-shot model transfer approach work for Indo-European languages, while their performance for other language families remains unknown.Finally, the error analysis was performed for the EN-ES language pair only.
In any case, we believe that our main claim still holds.Even though MT quality has substantially improved over the last few years, our results indicate the current optimal solution to perform crosslingual transfer is by using large multilingual language models such as XLM-RoBERTa-large.Thus, our error analysis suggests that this might be due to important differences in language use.More specifically, MT often generates a textual signal which is different to what the models are exposed to when using gold standard data, which affects both the fine-tuning and evaluation processes.This is confirmed by our error analysis which shows that mistranslations are not the main source of errors in the data-transfer method.use the train set as both, train and development data.For NER we use a batch size of 32, 2e − 5 learning rate, we train the model for 4 epochs and 256 maximum sequence length.We use the default values (sequence labelling implementation of the Huggingface library5 ) for the remaining hyperparameters.For both tasks we use the BILOU encoding scheme.

C Datasets Size
We list the dataset size (number of sentences) of the datasets we use.
For OTE we use the SemEval-2016 Task 5 Aspect Based Sentiment Analysis (ABSA) datasets (Pontiki et al., 2016).We list the size of the datasets in Table 10 For NER we use the Spanish and Dutch data from the CoNLL-2002 datasets (Tjong Kim Sang, 2002).For English and German we use the CoNLL-2003 datasets (Tjong Kim Sang and De Meulder, 2003) and for Italian we use Evalita 2009 data (Speranza, 2009).We list the size of these datasets in Table 11

D Computer infrastructure
We perform all our experiments using a single NVIDIA A30 GPU with 24GB memory.The machine used has two Xeon Gold 6226R CPUs and 256GB RAM.

Figure 2 :
Figure 2: Illustration of the translation and annotation projection method for Opinion Target Extraction (OTE).
(a) Illustration of the Opinion Target Extraction task.(b) Illustration of the Named Entity Recognition Task.

Figure 3 :
Figure 3: Sequence Labelling tasks used in our experiments.
Figure 3 illustrates both tasks.

Table 1 :
OTE F1 score with models of different capacity.

Table 2 :
lists the results of previous methods that leverage parallel data and/or annotation projections to perform cross-lingual transfer on the NER CoNLL 2002-2003 data.By comparing previous work with our zero-shot baselines using NER F1 score with models of different capacity.

Table 7 :
Most common false negatives and positives were there is a big mismatch between methods and the total number of labelled apperances of the word in the test data.B is the acronym for mBERT, Xb for XLM-R base and Xl for XLM-R large.

Table 8 :
Number of times words appear as target words in the train datasets

Table 10 :
. Number of sentences in the OTE datasets

Table 11 :
. Number of sentences in the NER datasets