Target-Oriented Relation Alignment for Cross-Lingual Stance Detection

.


Introduction
Stance detection is an important task in public opinion mining and social media analytics, which aims to automatically identify the user's attitude (e.g., "in favor of " or "against") toward a specific target (e.g., entity, topic, or claim) from text.It has been widely applied to many domains such as veracity checking, market analysis, social security and gov- ernment decision-making (Küçük and Can, 2020;AlDayel and Magdy, 2021).
Existing studies on stance detection are mainly conducted in monolingual setting, focusing on English (Hardalov et al., 2021;Allaway et al., 2021;Liang et al., 2022).In contrast to the abundant corpora in English, the annotated data resources for stance detection in other languages are usually scarce.To address the imbalanced data resources between languages and support stance-related applications in low-resource languages, cross-lingual stance detection is proposed to transfer the knowledge learned from the high-resource (source) language to the low-resource (target) language (Küçük and Can, 2020).Two studies have been conducted for cross-lingual stance detection, by adopting contrastive language adaptation to align the representations across languages (Mohtarami et al., 2019) and pre-training the language model to acquire additional knowledge (Hardalov et al., 2022).
In addition to the problem of imbalanced language resources, another important issue in crosslingual stance detection has been ignored by current research.Due to the differences in contextual information, language expressions and socio-cultural backgrounds, the occurrences and distributions of the concerned targets may vary considerably across languages.For example, even in the same domain such as "presidential election", the targets in the English corpus are distinct from those in the French one.Since the discrepancy in target distri-butions pervasively exists between languages, the targets in the source and target language datasets cannot precisely align in cross-lingual stance detection.This brings about the target inconsistency issue, that is, the difficulty of knowledge transfer across languages caused by the misalignment, which inevitably leads to the performance decrease for cross-lingual stance detection.
To address the target inconsistency issue, semantic associations between different targets can be utilized for cross-lingual stance detection.Table 1 gives an example of a target relation.The target "Feminist Movement" in English and the target "Legalization of Abortion" in French are highly correlated, since both are highly associated with women's rights and they mention quite similar topics such as equality.Target relations reflect the semantic associations between targets on the shared background information or topics, which prevalently exist in textual expressions within and across languages.
In this paper, we model the target-level associations and propose a fine-grained Target-oriented Relation Alignment (TaRA) method for crosslingual stance detection.Our method considers both target-level associations and language-level alignments.In addition, to guarantee the crosslanguage performance on stance detection, the in-language target relation learning and relation contrastive alignments should be maintained first.Specifically, it first learns the relations between different targets via target relation graph within each language (i.e., in-language), and constructs the cross-lingual relation graph to compensate for target inconsistency.We then devise the in-language and cross-language relation alignment strategies to align the samples with highly correlated targets based on the relation graph, so as to enable knowledge transfer between semantically correlated targets across languages.
The contributions of our work are as follows: • We identify the target inconsistency issue in cross-lingual stance detection for the first time, and propose a computational method TaRA to tackle this problem via target-level correlation learning and relation alignment across languages.
• Our method learns the associations between targets via target relation graphs within and across languages, and designs relation align-ment strategies that enable in-language knowledge enhancement and cross-lingual knowledge transfer among semantically correlated targets.
• We conduct experiments on the representative multilingual stance datasets with variant settings, and the results demonstrate the effectiveness of our method compared to the competitive methods.
Compared to the abundant resources in English, there are much fewer data resources in other languages (Xu et al., 2016;Lozhnikov et al., 2018;Baly et al., 2018;Khouja, 2020;Cignarella et al., 2020).To promote stance detection in low-resource languages, some researchers make efforts to construct multilingual stance datasets (Taulé et al., 2017;Vamvas and Sennrich, 2020;Lai et al., 2020), while other research develops the methods for cross-lingual stance detection (Mohtarami et al., 2019;Hardalov et al., 2022).Hardalov et al. (2022) conducts a comprehensive empirical study on pretraining the language model with additional corpora in the source language, to acquire and transfer the knowledge to the target language through prompt-tuning.Mohtarami et al. (2019) proposes a contrastive language adaptation method to align the representations across languages, which encourages samples with the same label in different languages to be closer in the embedding space.However, their method only considers the contrastive adaptation at the language level, ignoring the finegrained modeling of target relations that is essential to the compensation for target inconsistency and can facilitate cross-lingual stance detection in general.Another drawback of the previous method (Mohtarami et al., 2019) is that it only considers cross-language contrastive alignment and ignores the in-language target relation learning and contrastive alignments.Therefore, in our work, we consider both target-level modeling and languagelevel alignments, and develop our computational method with in-language and cross-language solutions to tackle the target inconsistency issue for cross-lingual stance detection.

Proposed Method
Figure 1 illustrates the overall structure of our proposed method TaRA.We first encode the input target and text with the Encoder Module and get the textual representations.Then, we construct the Target Relation Graph to learn both in-language and cross-language associations between targets and get target representations with aggregated information.After that, we concatenate the textual representations and target representations for classification.Meanwhile, we align the representations with related targets within and across languages using the Target Relation Alignment Strategies.

Problem Statement
We denote the src language data as D src = {(t s i , c s i ), y s i } Ns i=1 , where t s i is the i-th target, c s i is the i-th text, and y s i is the stance label of text c s i towards target t s i , N s is the number of samples in D src .We also denote the target set of D src as T s = {T s i } ns i=1 , where n s is the number of the targets in D src .Similarly, we denote the tgt1 language data as Typically, N t ≪ N s in cross-lingual stance detection.We use both D src and D tgt to train the model and predict the stance label of each sample in the tgt language test set.

Encoder Module
To map words in different languages into the same embedding space, we use a language model mBERT (Devlin et al., 2019) as the encoder module, which is pre-trained on a large-scale multilingual corpus.Specifically, given a pair of target t and text c, we encode them with the encoder module and obtain a textual representation h ∈ R d : where t and c are sequences of words in the target and text respectively.

In-Language Target Relation Graphs
To reduce the impact of the language gap on relation learning across languages, we first learn the target relations within the language to provide preliminary knowledge for cross-lingual target relation modeling.Specifically, we construct src and tgt target relation graphs G * = ⟨V * , A * ⟩, * ∈ {s, t}, where V * represents the node features and A * is the adjacent matrix.
Graph Construction Each target is treated as a node in the graph, and the correlations of nodes reflect target relations.Intuitively, the relationship between targets is characterized by the relationship between their corresponding text sets.Hence, for each target T * i , we also use mBERT to derive all the textual representations with target T * i and calculate the mean vector as the feature (2) for G * , we construct the adjacent matrix A * ∈ {0, 1} n * ×n * .We use the semantic similarities between targets as the start point, which can be viewed as an approximation of semantic associations of targets and used as a basis for subsequent target relation learning.Specifically, we calculate the cosine similarity score score * i,j between v * i and v * j and filtering them with threshold θ 0 : where f (•) is the cosine similarity function.
Target Relation Calculation To dynamically model the associations between targets, we adopt Graph Attention Network (GAT) (Veličković et al., 2018) to learn the weights between nodes and obtain high-level target representations with aggregated information.Specifically, we feed the node features V * and the adjacent matrix A * into GAT, and derive the target representations In-Language (Src) In-Language (Tgt) Cross-Language We adopt Top K to convert the weight matrix W * learned from G * into the target relation matrix R * ∈ {0, 1} n * ×n * , * ∈ {s, t}.Specifically, for the i-th target, we treat the targets with the first K highest weights as its related targets: R * i,j = 1 if j ∈ Index * (i) and i ∈ Index * (j) 0 otherwise (7) where Index * (i) is the set of selected indices of targets with the Top K operation and k * is a hyperparameter denoting the value of K in the corresponding language.
To utilize the high-level aggregated target information, we concatenate the learned target representation u * i and the textual representations with target T * i to obtain the target-enhanced representations within the language for in-language relation alignment: where z * i,j ∈ R 2d is the target-enhanced representation of the j-th sample with target T * i and ⊕ denotes the concatenation operation.

Cross-Lingual Target Relation Graph
To explore the relationships between targets across languages, we further construct the cross-lingual target relation graph G = ⟨V, A⟩ with all targets from the two languages T = {T k } n k=1 , where n is the total number of targets in two languages.The learned in-language target associations and representations are utilized as the start point, for the purpose of reducing the impact of the language gap and providing reliable prior target information.

Graph Construction
We calculate the node features V = {v k } n k=1 with target representations U s and U t .Especially, for the targets shared across languages, we initialize them with the mean vectors of target representations in the two languages: where k s and k t are the corresponding indices of target T k in T s and T t respectively, and ∁ T T s denotes the complementary set of T s (i.e., the set of targets only in T t ), and ∁ T T t denotes the complementary set of T t .
Then, we calculate the adjacency matrix A ∈ {0, 1} n×n with target relation matrices R s and R t .To compensate for the target inconsistency between languages, we also establish the connections between cross-language targets, forcing the model to pay attention to the cross-language target relations.Specifically, for those targets only in the tgt language, we connect them with each remaining target by setting A i,j = A j,i = 1: (10) where i s and j s are the corresponding indices of targets T i and T j in T s , i t and j t are the corresponding indices of targets T i and T j in T t .
Cross-Lingual Target Relation Calculation We also adopt GAT to learn the cross-lingual target representations ) and attention weight matrix W ∈ R n×n .The learned weight matrix W is transformed into the cross-lingual relation matrix R ∈ {0, 1} n×n via the Top K operation to acquire the target relations across languages: R i,j = 1 if j ∈ Index(i) and i ∈ Index(j) 0 otherwise (12) where Index(i) is the set of selected indices with the Top K operation and k is a hyperparameter denoting the value of K in Top K operation.
Similarly, we concatenate the learned crosslingual target representation u k and the textual representations with target T k to get the targetenhanced representations for cross-lingual relation alignment and classification: where z k,j ∈ R 2d is the target-enhanced representation of the j-th sample with target T k .

Relation Alignment Strategies
We devise target relation alignment strategies to align representations between highly correlated targets so that semantic associations like shared background knowledge can be transferred across languages.Inspired by Mohtarami et al. (2019) and Lin et al. (2022), we further take the target relations into consideration based on contrastive learning and devise the relation alignment strategies within the language and across languages.

In-Language Relation Alignment
We first devise the in-language alignment strategies for the src and tgt languages to realize the in-language target relation alignment and optimize relationships of targets within the language.For each anchor ẑ * i in the mini-batch, we select positive samples within the language which are targetrelated to ẑ * i and have the same stance label with it, and treat other samples within the language as negative samples.We use the following loss function to pull the positive pairs closer and push the negative pairs away: where ẑ * i is the i-th target-enhanced representation within the language in the mini-batch, Ψ * (i, j) calculates the related targets through the target relation matrix R * , I(•) is an indicator function, f (•) denotes the cosine similarity function, τ is the parameter of temperature, and b * and b ′ * are the numbers of samples and positive samples within the language.

Cross-Lingual Relation Alignment
To align representations across languages, we design a cross-lingual relation alignment strategy, which transfers the knowledge between semantically correlated targets across languages.The crosslingual relation alignment enables us to make up for the lack of the tgt language data using src language data with the most relevant targets.For each anchor ẑi in the mini-batch, we select positive samples that are target-related, with the same stance label and from different languages, and take others as negative samples: where ẑi is the i-th target-enhanced representation in the mini-batch, Ψ(i, j) calculates the related targets through R, l i is the language of ẑi , and b and b ′ are the numbers of samples and positive samples in the mini-batch.

Stance Classifier
The cross-lingual target-enhanced representations are fed into a two-layer feed-forward network with a softmax function for classification.We adopt a cross-entropy loss L ce to optimize the classifier: where ŷi is the predicted label, y i is the ground truth, and FFN is a two-layer feed-forward network with the activation function ReLU(•).

Model Training
Algorithm 1 presents the training process of our method.We optimize the whole target-oriented relation alignment method by minimizing the overall loss function L, consisting of the cross-entropy loss L ce and the combined contrastive alignment loss L con .Formally, L con is defined as follows: in-language alignment (21) where α, β, γ are trade-off hyperparameters for balancing different losses.

Experiments
4.1 Datasets X-Stance (Vamvas and Sennrich, 2020) is a multilingual stance dataset in German, French and Italian (with no training data), in which German is used as the src language and French as the tgt language.We focus on two political domains "Foreign Policy" and "Immigration" in X-Stance, with 31 targets in total.Based on them, we construct two datasets with different target settings for our experiments.X-Stance-all contains 5926 and 2582 texts in the src and tgt languages with the complete overlap of all the 31 targets.X-Stance-partial contains 3406 and 1806 texts in the src and tgt languages with partial overlap of targets.More details on datasets and targets are provided in Appendix A. Multilingual Political Dataset (Lai et al., 2020) is comprised of 4 datasets, including two election datasets and two other datasets that contain only one target.We use the two election datasets for our experiments, and English and French are as the src and tgt languages respectively.Electionnone contains 1691 and 1116 texts in the src and tgt languages.The targets in src include Hillary Clinton and Donald Trump, and Emmanuel Macron and Marine Le Pen are the targets for tgt.

Experimental Settings
In our experiments, we use mBERT to extract 768dimensional textual representations.The threshold θ 0 for the adjacent matrix is set to 0.4.For Top K in target relation calculation, k, k s , k t are set to 10, 10, 4 for X-Stance-all; 10, 8, 5 for X-Stancepartial; 2, 1, 1 for Election-none.τ is set to 0.3 in the target relation alignment loss.For the tradeoff hyperparameters, α, β and γ are set to 0.7, 0.6 and 0.7, respectively.All parameters are optimized by Adam (Kingma and Ba, 2015) with a learning rate of 2e-5 and a batch size of 64 for X-Stance-all and 32 for X-Stance-partial and Election-none.We train the model for 15 epochs with early stopping.76.0 ± 1.5 75.5 ± 2.0 58.1 ± 3.9 mBERT-FT 77.6 ± 1.6 77.5 ± 1.6 75.6 ± 2.0 75.5 ± 1.9 74.0 ± 3.5 57.8 ± 4.9 Table 2: Experimental results of the comparative methods and TaRA on the three datasets.For each method, we report the average scores and standard deviations of 5 runs.The best performance of each evaluation metric is marked in bold, and † means that our proposed TaRA is statistically significantly better than the baselines (p < 0.05).
The whole method is implemented with PyTorch on NVIDIA GeForce RTX 3090.The mBERT is Multilingual Cased BERT-Base model, which is 12layer, 768-hidden, and 12-head, with about 110M parameters, and is implemented in the Transformers framework.For the three datasets, the running time is around 1 GPU hour.

Comparison Methods
We select the following methods for cross-lingual tasks as the comparative methods.( 1) ADAN (Chen et al., 2018) is an adversarial based method for cross-lingual sentiment classification; (2) CLA (Mohtarami et al., 2019) aligns the representations in the two languages with contrastive language adaptation; (3) ACLR (Lin et al., 2022) improves the alignment method of CLA by devising two different alignments for the src and tgt languages respectively for cross-lingual rumor detection; (4) mBERT-FT (Devlin et al., 2019) fine-tunes the language model mBERT with the training data.
In addition, we also choose the following monolingual methods for stance detection and adapt them to the cross-lingual stance detection task by replacing the original word embeddings with the hidden vectors of mBERT.(1) BiCond (Augenstein et al., 2016) incorporates target representations into text representations with bidirectional conditional LSTMs; (2) TAN (Du et al., 2017) learns the targetspecific representations with attention mechanism; (3) TGMN (Wei et al., 2018) utilizes a multi-hop memory network to obtain the implicit clues for stance detection.

Main Results
We use accuracy and the average F1 score of "Favor" and "Against" as the evaluation metric.Table 2 gives the experimental results of the comparative methods and our proposed TaRA on the three datasets.It can be seen from the table that our method outperforms all the baseline methods on the three datasets.In general, cross-lingual methods perform better than monolingual methods, indicating the importance of knowledge transfer across languages for cross-lingual tasks.As for the crosslingual methods, we can see that the performance of fine-tuning mBERT is relatively good on the three datasets, which benefits from the superiority of the pre-trained language model.ACLR and CLA perform better among the cross-lingual methods, demonstrating the advantage of cross-lingual alignment with contrastive learning.More importantly, the results in Table 2 reveal that with the decrease of the number of topics shared between languages, the improvements of our method compared to the suboptimal methods become greater.Specifically, TaRA achieves 1.5%, 1.9% and 4.2% performance gains on F1 scores compared to the suboptimal results on X-Stance-all, X-Stance-partial and Election-none, respectively.The experimental results verify the effectiveness of our relation alignment method for dealing with target inconsistency.

Ablation Study
Table 3 gives the ablation results of all the variants of our proposed TaRA on the three datasets.It can be seen from the table that removing the inlanguage target relation graph G s and G t decreases the performance, showing the necessity of incorporating in-language target relations to provide the preliminary information for cross-lingual relation graph.It can also be seen from the table that removing the cross-lingual relation graph G causes larger drops in performance, indicating that cross- Table 3: Ablation results of all the variants of our proposed TaRA on the three datasets.Regarding learning objectives, it can be seen from the table that the F1 scores decline without target relation alignment within the language L s , L t or across languages L cross , demonstrating the effectiveness of our proposed relation alignment strategies and the validity of model optimization.
Furthermore, the table also shows the results of the variants of relation alignment strategies.Excluding the "target relation" has a greater influence in target inconsistency cases (X-Stancepartial, Election-none) than excluding the "language", whereas removing the "language" has a greater impact in target consistency case (X-Stanceall) than removing the "target relation".This indicates that the alignment between semantically correlated targets across languages is more effective than the sole alignment of language for target inconsistency.

Analysis of Hyperparameters
Impact of Top K in Target Relation Graph The value of k in the Top K operation determines the number of related targets.We conduct experiments on X-Stance-partial.As shown in Figure 2, the performance is low in the beginning.When k (or k s , k t ) is small, the related targets are rather few, re- sulting in that most samples in the mini-batch have no target-related positive samples.As the value of k (or k s , k t ) increases, the performance gradually improves until it reaches a peak value.After that, increasing the value of k (or k s , k t ), leads to too many positive samples with low correlations, resulting in a decrease in contrastive ability.

Analysis of Trade-off Hyperparameters
We use three hyperparameters α, β and γ to balance different losses in our method.We set α = 0.7 empirically.As shown in Figure 3, we conduct experiments to analyze the influence of different values of β and γ on X-Stance-partial.It can be seen that our method performs best when β and γ are around 0.6 ∼ 0.8.When β is greater than 0.5, the model gains higher performance because it pays more attention to the cross-lingual relation alignment, so that the shared knowledge between correlated targets can be transferred across languages.However, when β is too large, the drop in performance indicates that in-language target relations are the basis for cross-lingual relation learning.

Visualization of Representations
We compare the representations learned by our method TaRA and the baseline methods mBERT-FT and CLA on the test set of Election-none.We use t-SNE to visualize them on a two-dimensional plane, as shown in Figure 4.It can be seen from the left column that the representations of mBERT and CLA overlap in Favor and Against samples.Our method clearly separates the three classes of samples, with higher in-class compactness.Comparing the predicted results visualized in the left column and the ground truth in the right column, it can be seen that our method gets fairly consistent results.This shows that our method can better handle target inconsistency with fine-grained alignment strategies at the both target and language levels.

Conclusion
We first identify the issue of target inconsistency in cross-lingual stance detection and propose a target-oriented relation alignment method TaRA, which considers relation associations and language alignments at both target and language levels.Our method explores the in-language and crosslanguage associations of targets via target relation graphs and aligns samples between highly correlated targets within the language and across languages through the fine-grained relation alignment strategies.Experimental results demonstrate the effectiveness of our method for cross-lingual stance detection.

Limitations
For the Top K operation in target relation calculation, we set K's three hyperparameters (i.e., k s , k t and k) to determine the number of the related targets.To explore the influence of the selection of K on model performance, a grid search on these three hyperparameters needs to be conducted to iterate each combination.However, due to the time and resource limits, we explore the impact of one hyperparameter in K by controlling the other two hyperparameters.Based on the empirical findings from this, we then set the value of K so as to achieve an appropriate performance.

Figure 1 :
Figure 1: The overall architecture of our proposed method TaRA.

Figure 2 :
Figure 2: Impact of hyperparameters k in Top K Calculation on X-Stance-partial.

Figure 4 :
Figure 4: Visualization of representations.Red, blue and green points denote the samples of Favor, Against and None classes, respectively.

Table 1 :
An example of cross-lingual stance detection.The original French text with the target is presented with English translation.