SocCogCom at SemEval-2020 Task 11: Characterizing and Detecting Propaganda Using Sentence-Level Emotional Salience Features

This paper describes a system developed for detecting propaganda techniques from news articles. We focus on examining how emotional salience features extracted from a news segment can help to characterize and predict the presence of propaganda techniques. Correlation analyses surfaced interesting patterns that, for instance, the “loaded language” and “slogan” techniques are negatively associated with valence and joy intensity but are positively associated with anger, fear and sadness intensity. In contrast, “flag waving” and “appeal to fear-prejudice” have the exact opposite pattern. Through predictive experiments, results further indicate that whereas BERT-only features obtained F1-score of 0.548, emotion intensity features and BERT hybrid features were able to obtain F1-score of 0.570, when a simple feedforward network was used as the classifier in both settings. On gold test data, our system obtained micro-averaged F1-score of 0.558 on overall detection efficacy over fourteen propaganda techniques. It performed relatively well in detecting “loaded language” (F1 = 0.772), “name calling and labeling” (F1 = 0.673), “doubt” (F1 = 0.604) and “flag waving” (F1 = 0.543).


Introduction
Propaganda is studied in a wide range of social sciences disciplines, including social psychology, political science, media and mass communication, as well as advertising and marketing (Davison, 1971;Taylor, 2002;Balfour, 1979;McGarry, 1958). As Jowett and O'Donnell (2018) put it, propaganda is a "deliberate, systematic attempt to shape perceptions, manipulate cognitions, and direct behavior to achieve a response that furthers the desired intent of the propagandist". To achieve the agenda, propagandists may use various influence techniques such as loaded emotive language and flag waving. Such techniques are centered on influencing the audiences' opinions and behaviors through psychological and rhetorical tricks in order to reach its purpose, such as promoting a particular politician or product in political or marketing campaigns.
The ability to automatically detect propaganda has important societal implications. For news management, propaganda detection may help publishers to quickly identify news articles that may be subjected to propagandistic characteristics that severely deviate from journalism principles. For the general public, such tools may raise awareness for social media users to stay alert of potential propagandistic content, which often may leverage non-obvious psychological tricks, and potentially mitigate the propagation of such content.
We participated in Task 11 on the detection of propaganda techniques in news articles (Da San Martino et al., 2020a), in particular the Technique Classification task (task TC), a multi-class classification task that aims to classify each identified text segment with the existence of a collection of fourteen propaganda techniques (Da San Martino et al., 2020a). Appendix A provides a summary and a distribution analysis on this task. This text segment-based ground truth data presents an advancement to this line of study with an ability to allow an algorithm to not only identify the existence of propaganda, but also to name the specific techniques. 1. "stop those refugees; they are terrorists" ["appeal to fear-prejudice"] 2. "the best of the best" ["exaggeration,minimisation"] 3. "Entering this war will make us have a better future in our country" ["flag waving"] 4. "a lone lawmaker's childish shouting" ["loaded language"] 5. "Republican congressweasels" ["name calling,labeling"] 6. "Make America great again!" ["slogans"] To extract the sentence-level emotional salience features in the news segments, we leveraged Gupta and Yang (2018)'s work which trains a collection of SVM-based algorithms, named as CrystalFeel 1 , which detects the intensities of five emotion dimensions present in a given text message, including the sentiment valence, joy, anger, fear and sadness (Gupta and Yang, 2018). As the key purpose of propaganda is to influence or persuade the audiences, our main design hypothesis is that sentence-level emotional salience features will help to characterize a few most commonly used propaganda techniques that involve a degree of emotional connotations in their language manifestations.   2017) compared the linguistic patterns, e.g., psycholinguistic features from LIWC, sentiments, hedging words and intensifying words, across four categories of news: propaganda, trusted news, hoax, or satire. They found interesting linguistic differences in the three "fake" news categories vis-à-vis trusted news, though the predictive experiments showed that LIWC do not improve over the neural model in terms of predictive model performance, probably due to that "some of this lexical information is perhaps redundant to what the model was already learning from the text" (Rashkin et al., 2017). What Rashkin et al. (2017) focused on are word-level or lexical linguistic features. None of the existing work has explored the value of sentence-level sentiment and emotion intensity features in the context of propaganda detection.
Emotion intensity detection and analysis. Classic sentiment analysis typically provides classification results for discrete sentiment (e.g., positive, negative, neutral) and emotion classification analysis (e.g., happy vs. no happy, sad vs. no sad). Emotion intensity analysis is relatively a new development in the context of predicting the degree or intensity of the underlying emotional valence and dimensions in text messages such as tweets (Mohammad and Bravo-Marquez, 2017;Mohammad et al., 2018). Gupta and Yang (2018) trained CrystalFeel with features derived from parts-of-speech, n-grams, word embedding, multiple existing affective lexicons, and an in-house developed emotion intensity lexicon to predict the degree of the intensity associated with fear, anger, sadness, and joy in the tweets. Its predicted sentiment intensity had arrived a Pearson correlation coefficient (r) value of .816 on sentiment intensity with out-of-training sample of human annotations, and of .708, .740, .700 and .720 on emotion intensities in predicting joy, anger, fear and sadness (Gupta and Yang, 2018).

Correlation Analysis
To gain an exploratory understanding on the usefulness of the emotional salience features, we performed bivariate correlation analyses between each of the propaganda ground truth labels for the 1,043 text segments in the development set and the emotion intensity scores derived from CrystalFeel. Table 2 reports the correlation results. Non-parametric measure of Kendall's τ was used for the correlation test because the ground truth is a dichotomous variable (1 indicates the propaganda technique is present in the text; 0 indicates otherwise).  Results indicate interesting patterns: "loaded language", "flag waving", "slogans", "appeal to fearprejudice", and a total of twelve propaganda techniques are significantly correlated with at least one of the emotion intensity scores ( * * p < 0.01, * p < 0.05, n = 1, 043).
Two propaganda techniques, "black-and-white fallacy" and "causal oversimplification", are not found to be correlated with any emotion intensity scores. Noted that these techniques also have less occurrences in the dataset (gold labels < 3%; see Appendix A) and are not conceptually associated with emotional connotation or emotional appeal by definition.
The results showed initial support to our main design intuition, which also implies that the emotional saliences based system is likely to be effective in detecting emotions-associated (but not non-emotionsassociated) propaganda techniques.

System Overview
Following the the exploratory analysis, we proceed to design a predictive system named as "SocCogCom". Our SocCogCom system is designed to determine the specific propaganda technique used in a given text segment from news articles. The possible techniques are based on a range of fourteen possibilities which are defined in the official SemEval 2020 Task 11 description paper (Da San Martino et al., 2020a). Figure  1 depicts the system architecture. x ∈ R n and a propaganda technique label associated with the text: y ∈ {14 techniques}. x is a sequence of words represented in the order of appearance in the vocabulary.
Features Extraction: For every input text segment, our system extracts the following features: 1. BERT features 2 : Sentence-level embeddings (b f ) (Devlin et al., 2018). This is a set of pre-trained sentence-level embedding features with a total of 1,024 dimensions. 2. CrystalFeel features 3 : Sentence-level emotional saliences features (Gupta and Yang, 2018) (c f ).
The extracted features for each text segment include five dimensions of emotion intensity features. 3. LIWC Features: Word-level psycholinguistic features from the LIWC lexicon 4 (Pennebaker et al., 2015) (l f ). We obtained 73 extracted features that represent psycholinguistic characteristics of a piece of text that may involve a propaganda technique.
Fusion Layer: CrystalFeel and LIWC features obtained above are first concatenated and a dense layer is applied over the concatenated vector to obtain a feature vector, h f , of dimension d h = 50. This is done in order to project the features extracted from CrystalFeel and LIWC to a similar latent space as that of BERT features. Here, the extracted features, b f and h f , are simply concatenated to form the representation: z f = [b f ; h f ] of dimension d in = 1074. A dense layer with 256 dimensions is applied over z f . After this, the final representation -o f is obtained by applying a dropout layer (Srivastava et al., 2014) with dropout rate of 0.5.
Output Layer: The system employs a fully-connected layer with softmax activation where the fused representation o f is fed.
Loss function: The categorical cross-entropy is used to calculate the loss. We minimize the loss with an optimizer. The function that is optimized is as follows: where N is the total number of samples and c is the number of classes (in our case it is 14). y n k is the actual label of the k th class of the n th sample andŷ n k is the prediction corresponding to the k th class of the n th sample.

Features Experiments and Results
For data pre-processing, we used Keras Tokenizer to split the text into word tokens. The sentences are cleaned to remove unwanted characters and double spaces are replaced with single space.
We conducted the features experiments using the standard training and development datasets provided in the official TC task, based on the system set up described in Section 3. Hyper-parameters are tuned using a held out validation data: 10% of the training data. To optimize the parameters, we use Adam optimizer (Kingma and Ba, 2015) with an initial learning rate of 1e −4 . The experiments results are presented in Table 3.

Model + Features
Micro-averaged F1 score Logistic Regression 0.2520 BERT Only 0.5485 CrystalFeel Only 0.5234 BERT + CrystalFeel 0.5701 BERT + CrystalFeel + LIWC 0.5626 AlBERT + CrystalFeel 0.5588 BERT + CrystalFeel + Context 0.5824 First, we evaluated the effects of using BERT features and emotional salience features from CrystalFeel outputs alone. BERT only obtained micro-averaged F1 score of 0.5484, showing strong performance in comparison to a simple baseline using logistic regression. CrystalFeel features achieved 0.5234, which shows fair performance given this is a low-dimensional features set. When combined, BERT and CrystalFeel features achieved better results, with micro-averaged F1 score of 0.5701 than the individual settings.
We also assessed classic word-level psycholinguistics features based on LIWC lexicons. The BERT + LIWC condition didn't converge, as the loss didn't decrease and was fluctuating a lot. Adding LIWC onto the hybrid BERT + CrystalFeel, i.e., the BERT + CrystalFeel + LIWC condition, obtained micro-averaged F1 score of 0.5626, indicating that additional word-level psycholinguistics features do not appear to improve over the BERT + CrystalFeel condition. We tested AlBERT + CrystalFeel too, and they did not match the results obtained from BERT + CrystalFeel condition.
Based on the experiment results, we used the best-performing hybrid features sets (BERT + CrystalFeel) for our system results submission for the gold test set.
After we submitted our results, we experimented a new condition where context features were added to the BERT + CrystalFeel condition. For context, we extracted features using 3 words before and after the target text segment. The results showed improvement (micro-averaged F1 score = 0.5824).

Results on Gold Test Set
Overall, on gold test set, the results released from the task organizers suggested that our system achieved micro-averaged F1 score of 0.558 across the fourteen propaganda techniques.  The results suggested that using relatively parsimonious features, BERT and CrystalFeel emotional salience features, our system performed reasonably well (F1 score > 0.5) in detecting "loaded language" (F1 = 0.772), "name calling and labeling" (0.673), "doubt" (0.604) and "flag waving" (0.543). Meanwhile, our system struggled in detecting non-emotion associated techniques (which also happen to have imbalanced distributions), such as "causal oversimplification" (F1 = 0.063), "bandwagon,reductio ad hitlerum" (F1 = 0.098), and "whataboutism, straw men, red herring" (F1 = 0.100). The results also support with our design intuition that the sentiment and emotion intensities features help to detect propaganda techniques which are manifested in their emotional salience in the text segment.

Conclusion
Propaganda is primarily information that is used to advance an agenda through influence techniques. Our work is motivated to explore the value of emotional salience features in predicting emotion-related propaganda techniques. In our experiments, we found that emotional salience features using CrystalFeel emotion intensity scores can improve over BERT only features, when a simple feedforward neural network is used in both experiment settings. Results and analysis on gold test dataset show that our approach performed reasonably well (F1 > 0.5) in detecting "loaded language", "name calling and labeling", "doubt" and "flag waving" techniques. As these are also most frequently used techniques, our system has a potential value to facilitate publishers and general public to be alerted with these common techniques. The system scripts are released at https://github.com/gangeshwark/PropagandaNews.