Come hither or go away? Recognising pre-electoral coalition signals in the news

In this paper, we introduce the task of political coalition signal prediction from text, that is, the task of recognizing from the news coverage leading up to an election the (un)willingness of political parties to form a government coalition. We decompose our problem into two related, but distinct tasks: (i) predicting whether a reported statement from a politician or a journalist refers to a potential coalition and (ii) predicting the polarity of the signal – namely, whether the speaker is in favour of or against the coalition. For this, we explore the benefits of multi-task learning and investigate which setup and task formulation is best suited for each sub-task. We evaluate our approach, based on hand-coded newspaper articles, covering elections in three countries (Ireland, Germany, Austria) and two languages (English, German). Our results show that the multi-task learning approach can further improve results over a strong monolingual transfer learning baseline.


Introduction
In the last decade the capabilities of computational methods for text analysis have drastically evolved and the usage of Natural Language Processing (NLP) methods have increasingly expanded in scope, enabling a wide range of applications for the automatised analysis of political texts (Glavaš et al., 2019). Much work in NLP in the past has focused on making sense of how legislative processes are codified in text -e.g., by analysing debate motions (Abercrombie et al., 2019;Abercrombie and Batista-Navarro, 2020;Sawhney et al., 2020, inter alia). However, when analysing electoral dynamics it seems necessary to complement the application of NLP methods on domain-specific texts such as electoral manifestos with a broader analysis of signals from news texts (Blokker et al., 2020).
Specifically in the case of government formation in multi-party systems -that is, in countries where the government is typically formed by multiple parties -party members and candidates often communicate in interviews and rallies during election campaigns which coalition their party might or might not be willing to enter after the election. These public statements about prospects of future governments are often picked up by media outlets. Being able to automatically identify coalition signals in news coverage has accordingly the potential to enable large-scale studies of voting behaviour using 'text-as-data', with the promise of improving our understanding of voters' behaviour and of strategic voting in multi-party democratic systems (Bahnsen et al., 2020;Gschwend et al., 2017;Stoetzer and Orlowski, 2020). Very recent work in political science made first steps in this direction by investigating dictionary-based approaches to automatically detect coalition signals in newspaper text (Bowler et al., 2021). However, much work still remains to be done by developing datadriven methods that take advantage of advances in our field from the past years such like pre-trained language models and joint learning, and providing a thorough evaluation of the accuracy of such automated approaches. This is especially important because coalition signals can be expressed in text by explicit or implicit statements and make heavy use of irony and figurative language, thus raising many challenges that cannot be solved with simple dictionary-based methods and require Natural Language Understanding (NLU) capabilities.
In this paper, we present the task of coalition signal detection as a new, challenging NLU task. We propose to approach this task as a machine learning problem and decompose it into two subtasks, (i) the identification of text sequences that contain a signal and (ii) the prediction of the polarity of each potential signal, that is, whether the speaker promotes or opposes the signalled coalition -cf. examples 1.1 and 1.2 for sentences containing a coalition signal and negative polarity (i.e., unwillingness to join a coalition), respectively. Example 1.1. The Green Party yesterday signalled that it would be interested in participating in a coalition government led by Fine Gael. Example 1.2. Ms O'Donnell said she could not see the PDs negotiating with the Labour Party after the election.
We benchmark performance on this task by introducing a new multilingual dataset -using preelection newswire coverage from multiple elections in three countries (Ireland, Germany and Austria) -and quantifying the performance of monolingual pre-trained models to provide a simple, yet state-ofthe-art strong baseline that is shown to substantially outperform previous approaches from the text-asdata community in political science. Moreover, we investigate the benefits of auxiliary tasks and a cross-lingual approach in a multi-task learning setting, so as to improve performance over the simpler, transfer-learning-based baseline.

Related Work
In recent years the usage of computational methods for the analysis of 'text-as-data' has drastically expanded in scope (Grimmer and Stewart, 2013;Glavaš et al., 2019). Political scientists have used NLP methods for a wide range of tasks, including, for instance, inferring policy positions (Lowe et al., 2011) and detecting topics (Grimmer, 2010). At the same time, NLP researchers have addressed closely related tasks such as election prediction (O'Connor et al., 2010), agreement measurement (Menini et al., 2017) and claim detection  to name a few.
This type of analysis typically requires manual coding of theoretical concepts and has been used to study the role of coalition signals and negative campaigning in the context of electoral campaigns (Best, 2015;Curini and Martelli, 2010;Druckman et al., 2009;Elmelund-Praestekaer, 2010;Evans et al., 2014;Walter and Van der Eijk, 2019). Manual coding, however, is a labour and time consuming process which makes it hard to apply to larger amounts of data or to longitudinal studies where new data will be collected over a long time period, thus requiring new manual effort for the same coding steps. It is therefore crucial to automate this process in order to enable large-scale studies that might increase our understanding of the role of coalition signals in electoral campaigns.
One new promising approach towards that end was recently presented by Bowler et al. (2021), who developed a dictionary-based method for detecting inter-party communication in the form of preelectoral coalition signals. Whilst their approach so far only identifies positive signals between parties, i.e., parties indicating that they are willing to form a government coalition together, it is a promising starting point to automate the process of studying inter-party communication. We describe their approach in more detail in §4.3 and then present our own approach to the identification of inter-party communication in the form of coalition signals during electoral campaigns ( §4.2).

Task definition and annotation
Data Creation and Annotation. Our data includes election coverage from Irish, German and Austrian daily newspapers (Ireland: Irish Times, Irish Independent; Germany: Süddeutsche Zeitung (SZ), Frankfurter Allgemeine Zeitung (FAZ); Austria: Der Standard, Die Presse). SZ and FAZ are considered the two most high-quality/highcirculation German daily newspapers (see, e.g., Weischenberg et al. (2006)). The Irish/Austrian newspapers were selected according to similar considerations. All articles were published in a period up to four weeks (28 days) before the election, not including the actual election day. From that time period, we sampled relevant articles based on the following search query: (1) * election * AND ( * coalition * OR * pact * OR * collaboration * OR * alliance * ) 1 The keyword-based sampling was necessary so that the coders did not spend too much time on articles that did not contain any signals. The keywords were optimised for recall by domain experts who, during keyword selection, also checked articles without any hits for false negatives, to avoid missing relevant articles. The resulting German dataset is the largest with 15,735 sentences of newswire text. The Irish and Austrian data are smaller with 8,493 and 2,880 sentences. The extracted articles have been filtered manually for false positives, i.e., articles that include the keywords but do not touch on the issue of upcoming elections have been removed. As a result, each article in the corpus includes at least one coalition signal. 2 Coding coalition signals. In order to code coalition signal in newspaper texts, the annotators 3 first read the whole article and highlighted all signals and speculations in each article (cf. definitions below). After this initial step, the annotators proceeded with the actual coding: all highlighted signals and speculations in the article were coded in chronological order and inserted in the coding template. For each signal or speculation, the sender(s) and addressee(s) are identified, so that each can be attributed to one or more specific parties. In cases where this information is missing from the text, the instance is coded as unspecified. Crucially, our annotation scheme distinguishes coalition signals and speculation.
Definition 3.1 (SIGNAL). A coalition signal is a statement made by a potential coalition partner about their preferences towards a possible coalition in which the party itself would be a member.
A coalition signal consists of three components: (1) a SENDER (the potential coalition partner) (2) an ADDRESSEE (the potential coalition) (3) a positive, neutral or negative statement about the potential coalition.
For being identified as a coalition signal, it is required that the sender directly speaks about their preferences concerning a potential coalition. This includes citation of (direct and indirect) quotes by a journalist. Sometimes the addressee is underspecified (e.g., when a party rules out a coalition with "any left party").
We present a few examples to illustrate instances of coalition signals. Example 3.1 includes a positive signal from Mary Harney (Independent) (SENDER) to Fianna Fáil (ADDRESSEE). A negative signal is shown in 3.2. Here, the SENDER are the Social Democrats and the ADDRESSEES are Fine Gael and the Labour Party. Example 3.3 contains a neutral signal from Enda Kenny (Fine Gael) to the Labour Party.   Coding coalition speculation. Speculations are typically stated by journalists, experts and political actors. Like coalition signals, each speculation has a SENDER, an ADDRESSEE and a statement regarding the anticipated coalition, with positive, negative or neutral polarity.
Definition 3.2 (SPECULATION). A coalition speculation is a statement about someone else's (i.e., not the speaker's) preferences concerning the formation of a coalition after the election.
Example 3.4 speculates about a coalition of Fine Gael and Labour, with positive polarity. As the roles of SENDER and ADDRESSEE are not made explicit, we code two separate instances of speculation (one with Fine Gael as SENDER and Labour as ADDRESSEE, and one where the roles are reversed). Example 3.5 includes two negative speculations from Fine Gael (SENDER), the first one directed to Fianna Fáil and the second one to Sinn Fáin. 4 Finally, example 3.6 illustrates a neutral coalition statement from the Greens (SENDER) to Fianna Fáil (ADDRESSEE).
Example 3.4. The likelihood of Fine Gael and the Labour Party beginning coalition talks in just over two weeks' time is as close to a sure thing as exists in politics.
Example 3.5. The only parties Fine Gael had ruled out as coalition partners were Fianna Fáil and Sinn Fáin.
Example 3.6. The party, which is fielding seven candidates including Cllr Eamon Ryan in Dublin South and Cllr Ciarán Cuffe in Dún Laoghaire, did not rule out coalition with Fianna Fáil after the election.
Arguably, our examples show how challenging our task is, since it requires information on speaker attribution, coreference resolution and polarity shifting (e.g., example 3.6 would shift its polarity if the statement would be did rule out instead of did not). Moreover, the task requires a considerable portion of world knowledge (e.g., for example 3.6 it is important to know that Eamon Ryan and Ciarán Cuffe are both members of the Green Party; for the German and Austrian data one needs to know that expressions like "Ampel" (traffic light) and "Wachteleier-Koalition" (quail egg coalition) refer to specific coalition constellations. 5 Intercoder reliability Tests of intercoder reliability concerning the annotations for Germany [Austria, Ireland] yielded a Krippendorff's alpha of 0.595 [0.903, 0.659] for the proportion of signals (among signals and speculations) per article. It is not quite clear why the agreement for the Austrian data is higher than for the other two countries. We can exclude training effects (each country has been coded by a different set of annotators). Our intuition is that this at least partly reflects newspaper style, with less figurative/implicit signals in the Austrian newspaper articles.
Signals versus speculation The coding of signals and speculation in our data enables a comparison of the two types of elite communication. For Germany and Ireland, signals were more frequent than speculation for most C-types, with some exceptions. For new or improbable C-types, we observe more speculation than signals (e.g., for the German C-type "Greens + Left + SPD", we count six times more speculations than signals. Taking a closer look at the data and also considering the polarity of the signals, we notice that most of the signals are negative while the vast majority of the 5 Ampel describes a coalition between the German SPD, FDP and the Green Party; Wachteleier-Koalition refers to a coalition of SPÖ and FPÖ. speculations are positive. Similar observations can be made for the Irish data where the ratio of signals/speculations for the most frequent C-type (Fianna Fáil + Fine Gael) is 0.6/0.4. Here, too, most of the signals have a negative polarity while the majority of the speculations are positive. These observations suggest that coalition signals and speculation are used differently and that it might be worthwhile to take a closer look at the differences between the two.

A transfer learning approach to coalition signal detection
We propose an approach to coalition signal detection on the basis of transfer learning, and decompose our problem into two separate classification tasks. The first task consists of identifying whether or not a certain text sequence -more specifically, a sentence -includes a coalition signal. For signal-bearing sentences, we additionally predict the type of the coalition (e.g., LABOUR+SOCIAL DEMOCRATS). For this, we model both steps in a standard text classification framework by assigning non-signals the label NONE. The second task then determines the polarity of the detected signals as either negative, neutral or positive. Table 1 shows the distribution of unique coalition types (C-types) in each dataset. In total, we have 100 distinct coalition types coded in the data. In order to predict the final labels for our data, we collect the sentence-level annotations from both classifiers and merge them into a new, atomic label (e.g., the C-type: FINE GAEL+GREENS+LABOUR and polarity prediction: positive would be merged into the atomic label FINE GAEL+GREENS+LABOUR:positive). As the number of unique coalition types in the data is already quite high (see Table 1) and most labels are sparse, we focus on the most frequent types and only try to predict those. For that, we determined a different minimum occurrence frequency N for each country, depending on data size: German: N=10, Ireland: N=12, Austria: N=6. As a result, we obtain 14 different coalition types for German elections, 12 distinct types for Irish and 8 types for Austrian elections (Appendix, Table 7) that we use to train our models.

Experimental setup
To ensure fair benchmarking of our models with respect to the full labeled data, we include for test- ing, however, all coalition signals, i.e., all coalition types that have been coded in the data, including those not appearing in the training set. We compare our approach to the dictionarybased approach of Bowler et al. (2021) where the authors use a dictionary with party names and terms indicating cooperation or coalition to identify sentences in news coverage of German elections that might include coalition signals. 6 If a sentence includes a term signaling cooperation and a reference to a coalition, it is classified as a signal. The criteria for coalition references are as follows: The sentence must include at least two mentions of different parties or a coalition term that can be resolved to the parties participating in the coalition (e.g.: "große Koalition" (grand coalition) → CDU/CSU + SPD).
It is important to note that Bowler et al. (2021) do not distinguish signals from speculation and also do not distinguish between sender and addressee but use the same label for signals from party A to B as for signals from party B to A. We therefore modify our data to make our results comparable. We map SENDER and ADDRESSEE information to undirected coalition terms (e.g., both SENDER: Labour, ADDRESSEE: Social Democrats and SENDER: Social Democrats, ADDRESSEE: Labour are mapped onto LABOUR+SOCIAL DEMOCRATS). In addition, we merge signals and speculation into one unified class. In the remainder of the paper, we refer to both, signals and speculation, as signals.
As mentioned before, Bowler et al. (2021) do not model the signals' polarity. We thus compare results for C-type prediction only, in addition to predicting the full label, C-type+polarity, where we consider all signals identified by the dictionarybased approach as positive signals. Table 2 summarises the properties of our data 6 The authors provide an R script 7 for replication, based on the QUANTEDA library (Quantitative Analysis of Textual Data) (Benoit et al., 2018). Germany  39  26  14  42  58  Ireland  80  50  12  36  28  Austria  32  24  8  27  16   Table 2: No. of unique C-type labels in the original data (orig), in the modified data (mod), in the training data (train) and number of unique C-type+pol labels in the system predictions (pred) and test set (gold). and modifications described above (i.e., removing the distinction between sender and addressee and merging C-type labels with signal polarity). The first column (orig) shows the number of unique C-type labels in the original dataset (encoding the direction of the signal). Column 2 (mod) shows the number of distinct labels after removing the information on sender and recipient (i.e., the direction). As this number is still quite high and many labels are sparse, we focus on the most frequent C-types and only try to predict those. Specifically, we train our models to predict 14 different C-types for German elections, 12 distinct types for Irish and 8 types for Austrian elections (column 3; train). Column 4 lists the number of merged labels (C-type+polarity; pred) that our models are able to predict while the final column shows the unique number of merged final labels in the ground truth data. The number of predicted label types can be higher than the actual number of labels in the gold annotations as not every possible combination of C-type and polarity has been seen in the data.

Our approach
We propose a transfer learning approach for the task of coalition signal detection, based on pretrained transformer-based language models (Vaswani et al., 2017). For German, we evaluate monolingual BERT (Devlin et al., 2019) 8 and finetune two separate transformer models for each dataset in order to predict (i) the coalition type and (ii) the polarity of the signals.
Training details and hyperparameters We use the same model architecture for both subtasks and initialise parameters with the pretrained bert-basegerman-cased model, using the same seed in all experiments to control for random effects caused by parameters that remain uninitialised for some of the models. Our implementation is based on the Huggingface transformers library (Wolf et al., 2020) and PyTorch (Paszke et al., 2017), using the AutoModelForSequenceClassification model that can be initialised with different pretrained transformers and has a linear layer on top of the pooled output. As we are facing a sparse and imbalanced data problem where the majority of the sentences in our data do not contain any signal (Table 1), we apply downsampling and report results in a 5-fold cross-validation setup. All our models were trained with a maximum sequence length of 128 tokens per input sentence and a batch size of 8. Initial learning rate was set to 1e-5 and weight decay to 0.01. We did not freeze any of the layers but fine-tuned the whole model in all experiments. All our models are trained on a single GPU with 11GB RAM (NVIDIA RTX 2080Ti).

Dictionary-based baseline
Evaluation One problem for evaluation is the fact that coalition signals can consist of text segments with multiple sentences, while the system predictions (for Bowler et al. (2021) as well as in our task formulation) are on the sentence level. Example 4.1 (translated from German) illustrates how the information needed for disambiguation can be spread over multiple sentences, where the first sentence suggests that Bütikofer, a member of the Green Party, sends a signal to the SPD and FDP (Ampel coalition). The second sentence, however, clarifies that the addressee is the SPD only.
Example 4.1. Bütikofer used the image: "We are only available for one traffic light. And that is the pedestrian traffic light, which only has red and green." A sentence-level evaluation, thus, might be too fine-grained, since it requires the system to identify all sentences that belong to one signal. Accordingly, we evaluate at the signal level by creating a mapping between sentences and coalition signals that allows us to map all predictions for individual sentences back to their corresponding signals. As a result, we can have multiple predictions for each signal. We can then count the number of true positives (correctly identified signal), false positives (system predicts an incorrect signal) and false negatives (system fails to identify the signal) and report standard precision, recall and f-measure:

Monolingual signal prediction (German)
We evaluate our approach on German (Table 3) and compare our results to previous work (Bowler et al., 2021). Our transformer-based approach significantly outperforms the baseline in each setup. The first setup (A) evaluates on the level of individual sentences whether a coalition signal has been detected (but not whether the type and polarity of the signal have been predicted correctly). While the baseline has a higher precision than the neural classifier, this comes at the cost of recall (28.9% versus 60.3%). Setting (B) compares the predicted coalition types on the signal level but ignores the polarity of the signals. This is arguably a fairer comparison than (C) where we also consider the polarity of the signal, as the approach of Bowler et al. (2021) does not explicitly model the polarity of the signals. For (C), the transfer learning approach is able to correctly predict coalition types+polarity for more than twice the number of signals than the baseline while keeping precision in the same range (46.3% versus 46.4%). This is a respectable result for such a challenging task where we have a high number of often sparse labels and where the model can not rely on lexical cues only.

Multilingual coalition signal prediction
We next focus on the prediction of coalition signals in a multilingual setup and across countries. As shown in Table 1, the size of the Irish and Austrian data is much smaller and does not provide a sufficient number of instances for training accurate monolingual predictors. Our main research goal is thus to find appropriate settings for improving the accuracy for coalition signal detection in Irish and Austrian election coverage by making use of the larger German hand-coded data.
Given the relatedness of our tasks, we propose a multitask learning (Ruder, 2017, MTL) approach for our problem. From a conceptual point of view it is important, however, to note also the differences between our two tasks. While they can be both approached as text classification problems, polarity prediction has a small number of labels only (positive, neutral, negative) that are the same across countries and languages. Coalition type prediction has instead a much larger number of (often sparse)  Table 3: Results for the prediction of coalition signals in German newspaper text: (A) identification of coalition signals on the sentence level (binary: yes/no); (B) the prediction of the correct coalition type (C-type) on the signal level; and (C) the prediction of the correct C-type and polarity on the signal level.
labels that are country-specific and thus pose a challenge for crosslingual transfer.

Experimental setup
Our MTL setup uses hard parameter sharing (Caruana, 1993) which has shown to work well for related tasks. All models in our setup share the same encoder, initialised with pretrained multilingual MBERT or R-XLM language models, and update the same encoder weights during training. 9 This approach enables us to use the task-specific implementations of the Huggingface library for the different auxiliary tasks. We test different MTL setups where we model the prediction of a) coalition types and b) signal polarity for different countries as different tasks. We expect that this will allow the model to jointly learn shared feature representations across countries, while the task (i.e., country)-specific classifier will be fine-tuned for each task separately. This seems crucial especially for the prediction of coalition types where we have distinct label sets for each country, while we expect that for the polarity classifier it should be easier to learn useful features across datasets.
In addition to our crosslingual, cross-country MTL models, we evaluate a number of auxiliary tasks that might provide the classifier with relevant information: Named Entity Recognition. Directing the models' attention towards named entities such as persons, organisations etc. might provide useful information for C-type prediction. We use the English Ontonotes dataset (Weischedel et al., 2013) and the German Twitter news headlines data (Ruppenhofer et al., 2020) to train German and English NER models. We use these as auxiliary tasks, in combination with our three subtasks, i.e., C-type prediction for Ireland, Germany and Austria. 9 We would like to thank Jason Phang for sharing his code for MTL with transformers that we have adapted for our work. Political actor tagging. This auxiliary task aims at injecting some world knowledge about political actors (politicians and parties) and coalitions into the model. As for NER, we use a sequence tagging setup and create our training data as follows. Based on a number of manually created dictionaries, we augment the German input data with BIO labels on the token level ( Figure 1). We then use this data in a MTL setting, in combination with C-type prediction for Ireland, Germany and Austria.
Valence classification. To improve results for polarity classification, we experiment with a highly related task from the area of sentiment analysis (Zhang et al., 2018). Specifically, we use the multilingual NRC VAD lexicon of Mohammad (2018) which provides valence scores for over 20,000 pivot terms, created via Best-Worst Scaling. We extract the scores for the English words and their German translations and, for each sentence in our German and Irish input data, compute a valence score by simply adding up the scores for each lemma in the sentence and then normalising it by sentence length. We then use the scores in a regression task to make our representations more sensitive towards positive or negative valence in the input signal. The intuition behind this setup is that negative signals should strongly correlate with low valence scores.  We compare against the following baselines: ST-MO: Singletask, monolingual: we finetune pretrained MBERT (bert-base-multilinualcased) and XLM-R (xlm-roberta-base) separately for each country and task, resulting in 6 separate classification models.
ST-MU: Singletask, multilingual: we merge all training data into one set and use the multilingual XLM-R to train two unified models, one for polarity and one for predicting C-types (this approach crucially increases the number of country-specific C-types in the data while the number of polarity labels remains the same). Table 4 reports F1 for C-type prediction and for the combined final labels, C-type+polarity for our different experimental setups. While our datasets are not small (>27,000 sentences), due to the sparsity of the phenomenon and fineness-of-grain of our C-type schema we are still facing a sparse data problem. We therefore opted for reporting results over different epochs, to show the variation in results which seemed more informative than a single score. When applying the models to new data, how-ever, we recommend the use of an ensemble system (e.g., a simple majority vote over the predictions of different models) to avoid selecting an inferior model based on a non-representative dev set.
Looking at experiments 1-6, we notice that for German, MBERT results are slightly lower than the ones reported in § 4.4 (see Table 3), especially for the final, atomic labels (C-type+pol). This is to be expected, given that our previous experiments were using the BERT model specifically trained for German. As expected, results for Ireland and Austria are substantially lower than the ones for Germany, reflecting the smaller sizes of the respective training sets. This is a problem especially for the Austrian data with less than 3,000 sentences.
Singletask, multilingual . Experiments 7-9 show that for C-type prediction, jointly training the model on the union of all three datasets yields a slight improvement for the largest dataset (DE). For Ireland, results are in the same range as before while for the smallest dataset (AU), we observe a drastic decrease in F1. German and Irish F1 scores for C-type+polarity, however, are around 2% higher than for ST-MO, thus showing  the improvement of the ST-MU polarity classifier over the monolingual ones. Again, this makes sense, considering that -in contrast to the C-types -the polarity labels are shared across countries.
MTL results (Exp.10-15). We now turn to experiments 10-15 where we jointly train a shared representation for each subtask by considering each country as a separate task in our MTL setup. For both subtasks, C-type prediction and polarity classification, we observe some improvements over the ST baselines. Results for German and Irish C-types increase by more than 2% F1 (Exp.12) over the best ST results, and for polarity we see improvements for Ireland and Austria (but not for Germany). 10 Our modular setup allows us to combine the best classifiers from each setting. As we noticed that for polarity prediction, simply merging all three datasets into one training set seemed to work well, due to the unified labels across countries and the larger training size, we combine the best C-type MTL setup  with the best polarity model (Exp.9: R-XLM-30). Results for Ireland and Germany (Exp.16) now outperform the ST baselines (ST-MO, Exp.1-6) by 4-5% F1.

Results for auxiliary tasks
Results for the auxiliary tasks created to improve C-type prediction (NER and political actor tagging, Table 5) fail to surpass our best MTL results. A possible explanation is 'catastrophic interference' (Hashimoto et al., 2017) where learned information is overwritten by the new tasks.
The auxiliary task designed to improve polarity prediction (Table 6) via learning to predict valence scores, on the other hand, achieves further improvements in the range of 1-2% over previous scores and results in best scores for the combined C-type+polarity labels for all three countries. 11 10 For results on the level of C-types, see Table 8 in the appendix.
11 Table 9 in the appendix also shows results for individual

Conclusions and Future Work
We presented experiments on coalition signal detection, a new, challenging NLU task for the political text analysis domain, and showed that a simple transfer learning approach outperforms previous work based on dictionary-based methods. We further showed that coalition signal detection in a multilingual setting is feasible and can substantially outperform results for the monolingual setup. We also approached the problem in a multitask learning setup, modelling the prediction of inter-party communication for different countries as separate tasks. This approach gave us improvements over the monolingual singletask setting. We could further increase results by adding another, highly related auxiliary task (valence prediction), obtaining final improvements of 5-7 percentage points (DE, IE) over the monolingual baselines.
For future work, we would like to test adapters (Pfeiffer et al., 2020) as another way to inject relevant information into the model and mitigate the issue of 'catastrophic forgetting' that might be responsible for the negative results we obtained for two auxiliary tasks (NER, political actor tagging).

Ethics statement
The intended use of this work is to enable researchers to conduct large-scale longitudinal studies that might increase our understanding of the role of coalition signals in electoral campaigns, while avoiding the high time requirements and cost for manual coding. While our results show some progress over previous work, there are a number of pitfalls that we would like to point out. First, our experiments showed that identifying coalition signals in newswire is a very challenging task and that results are not yet very accurate. We would thus like to stress the need for additional external validation of the results. Another pitfall of applying machine learning to coalition signal detection for combinations of C-type and polarity. new elections comes from the fact that we might observe shifts in the distribution of signals in new, unseen data. New parties and coalition types might emerge or previously frequent signals might become less frequent. This is a problem for supervised machine learning as classifiers tend to learn the distribution in the training data and thus might introduce an unwanted bias that does not fit the distribution of new elections. We address this issue in subsequent work by creating synthetic training instances for sparse coalition types in the training data and show that this approach can ameliorate the problem (Adendorf et al., 2021).     Table 9: F1 for the best performing model from Table 6 (R-XLM-30) for each C-type + polarity in the German and Irish data (evaluation on the signal level) (CDU: Christian German Union, FDP: Free Democratic Party, SPD: Social Democratic Party, PD: Progressive Democrats).