YNU-HPCC at SemEval-2020 Task 11: LSTM Network for Detection of Propaganda Techniques in News Articles

This paper summarizes our studies on propaganda detection techniques for news articles in the SemEval-2020 task 11. This task is divided into the SI and TC subtasks. We implemented the GloVe word representation, the BERT pretraining model, and the LSTM model architecture to accomplish this task. Our approach achieved good results for both the SI and TC subtasks. The macro- F 1 - score for the SI subtask is 0.406, and the micro- F 1 - score for the TC subtask is 0.505. Our method significantly outperforms the officially released baseline method, and the SI and TC subtasks rank 17th and 22nd, respectively, for the test set. This paper also compares the performances of different deep learning model architectures, such as the Bi-LSTM, LSTM, BERT, and XGBoost models, on the detection of news promotion techniques.


Introduction
Propaganda techniques need to attach importance to arouse the emotions of the receivers, sometimes even by temporarily bypassing the intellectual defenses of the receivers Pearlin and Rosenberg (1952) . Propaganda uses psychological and rhetorical techniques to achieve its purpose. Such techniques include using logical fallacies and appealing to the emotions of the audience. Logical fallacies are usually hard to spot since the argumentation, at first, might appear correct and objective (Martino et al., 2019) . However, careful analysis shows that the conclusion cannot be drawn from the premise without misusing logical rules. Another set of techniques uses emotional language to induce the audience to agree with the speaker only based on the emotional bond that is being created, provoking the suspension of any rational analysis of the argumentation.
The traditional NLP task generally classifies and detects propaganda techniques at the article level, which often fails to meet more detailed requirements. This fact has also been confirmed by previous iterations of the SemEval competition, where leading solutions used convolutional neural networks (CNN), long short-term memory (LSTM) (Baziotis et al., 2018) and transfer learning techniques (Duppada et al., 2018) . The main features of an article are extracted by using the feature capture and pooling of the CNN model, but these methods can only be used at the article level and are coarse-grained detection methods. However, limited research has focused on text classification (Lewis, 1995;Song et al., 2010). News articles have also been classified using the Bi-LSTM-CNN model (Li et al., 2018) . However, there are often many propaganda techniques in one article, and most of these techniques are efficient for propaganda classification but lack the ability to detect categories of propaganda techniques. Thus, they cannot achieve good results and are less efficient in practice. Now the difficulty is to detect propaganda techniques at the fine-grained level. The SemEval-2020 Task 11, "Detection of Propaganda Techniques in News Articles", is designed to promote research on this task. We used the word embedded representation of the pretrained model and LSTM model to detect the news article propaganda techniques at a fine-grained level, and we also evaluate the performance among different neural network models on this task.
The task consists of two subtasks.
1. Span Identification (SI): Given a plain-text document, identify those specific fragments that contain at least one propaganda technique (Da San Martino et al., 2020) . This is a binary sequence tagging task. We need to detect which fragments of the news article belong to the propaganda technique and mark the fragments with begin offset and end offset. The span ranges from begin offset to end offset-1.
2. Technique Classification (TC): The purpose of this subtask is to identify the category of the propaganda technique. Given a text fragment identified as propaganda and its document context, identify the applied propaganda technique in the fragment. Since there are overlapping spans, formally, this is a multilabel multiclass classification problem. However, whenever a span is associated with multiple techniques, the input file will have multiple copies of such fragments; therefore, the problem can be treated as a multiclass classification problem. The techniques include Appeal to Authority, Appeal to fear-prejudice, Black-and-White Fallacy, and so on.
The rest of the paper is organized as follows. Section 2 describes the details of the LSTM used in our system. Section 3 presents the experimental results. Conclusions and future works are described in Section 4.

System Description
We implemented LSTM model to accomplish this task. Meanwhile, the representations of input words are trained by using GloVe model (Pennington et al., 2014).

Span Identification (SI)
For the SI subtask requirements, we need to detect which fragments of the news article utilized a propaganda technique. The SemEval organizers provided us with 371 training sets. The data were plain text files, and the SI task was to identify specific pieces that contained at least one propaganda technique. To detect news article propaganda techniques, we tested some deep learning model and integration architectures (Chen and Guestrin, 2016) . For the SI subtask, we also experimented with GloVe-BiLSTM (Li et al., 2017;Luo et al., 2018;Cross and Huang, 2016) , BERT-LSTM, GloVe-LSTM and BERT-BiLSTM (Agirre et al., 2016;MacAvaney et al., 2018) . As illustrated in Figure 1, our model includes an embedding layer, an LSTM layer, a fully connected layer and an output layer. First, the embedding layer represents every word using pretrained word embeddings. The LSTM layer is implemented to obtain contextual information. The hidden vector proceeded by each LSTM cell will be further fed into a dense layer with the sigmoid activation function. Then, we can discriminate whether a word is propaganda or not. Finally, we record the index of propaganda words and recognize the propaganda fragments.
Based on our experimental results, we can be concluded that the LSTM model with GloVe word embeddings obtained the best performance on this task. For the embeddings layer, we implemented GloVe to train the word embeddings (Papagiannopoulou and Tsoumakas, 2018). The input tokens were obtained using the NLTK toolkit on the given articles. After the word embedding representation trained by GloVe is obtained, an LSTM layer is connected. In LSTM, recurrent cells are connected in a special way to avoid vanishing and exploding gradients, and the number of hidden nodes in the LSTM layer is set to 150. We find that the Bi-LSTM model (Ma and Hovy, 2016) does not perform well on this task. Next, the features captured by the LSTM layer are flattened and passed to the hidden dense layer, and the parameters of the dense layer are set to 8, which analyzes the interactions among the obtained vectors. The dropout rate of the dense layer is set to 0.2 to prevent model overfitting.

Technique Classification (TC)
The TC task is a multiclass classification task representing an extension of the SI task, and the TC subtask seeks to classify the various propaganda techniques identified by the SI subtask. Such techniques include the use of logical fallacies and appealing to the emotions of the audience. Logical fallacies are usually hard to spot since the argumentation, at first, might seem correct and objective. For the TC subtask, we obtain the sentence representations via two ways. The first is feeding pretrained word embeddings into the LSTM model and then we treat the last hidden vector as the sentence representation. The second way is using a pretrained language model, such as BERT, to directly produce sentence representations. Meanwhile, we also compare the effects of softmax and XGBoost (Mitchell and Frank, 2017) on the classification task. Through comparing the experimental results, we can conclude that the BERT-LSTM model can obtain good performance on the TC subtask. The detailed analysis of the experimental results of the TC subtask will be introduced in Section 3 of this paper. We will introduce the selected model and parameter setting for the experiment. The BERT pretrained language model has been proved efficient in many NLP tasks, and the pretraining model used in our experiment is BERT-Base. Therefore, we implement BERT to train the word representations for the TC subtask. As illustrated in Figure 2, we implement BERT to train the word representations obtained through the bert-as-service library. The 768-dimensional word embeddings trained by BERT are fed to the LSTM layer, and the number of hidden nodes in the LSTM layer is set to 50. Same as the SI task, the features captured by the LSTM layer are flattened and passed to the hidden dense layer, and the number of parameters for the dense layer is set to 32. The dropout rate of the dense layer is set to 0.2 (Srivastava et al., 2014).

Experimental Results
In Section 3, we first introduce the dataset of this task, and then we analyze the performance of different neural networks and integrated learning models for this task.

Dataset
The dataset contains 371 news articles for the training set, 75 news articles for the development set, and 90 news articles for the test set. The articles may contain several propaganda spans. The beginning position and the ending position were marked by "begin offset" and "end offset", respectively. As illustrated in Table 1, for the "111111111" article, it contains 3 propaganda spans with a span range from "begin offset" to "end offset" minus one because the index of words in the article started from zero.
The TC subtask requires us to recognize the techniques of a certain propaganda span. The propaganda technique and the corresponding text contents are shown in Table 2. The number of propaganda techniques is 18; therefore, the TC subtask is a multiclass classification task.

Evaluation Metrics
For both subtasks, the participating systems were evaluated using standard evaluation metrics, including the accuracy, precision, recall and F 1 -score, which are calculated as follows: accuracy = true positives + true negatives total number of instances (1) precision = true positives true positives + f alse positives (2) recall = true positives true positives + f alse negatives (3) The organizers provided baseline models for each subtask. For the SI subtask, the macro-F 1 -score for the baseline model were 0.011 on the development set and 0.003 on the test set. For the TC subtask, the micro-F 1 -score for the baseline model was 0.265 on the development set and 0.252 on the test set.

SI Subtask Results
After fine-tuning the different parameters of the model, we finally decided to use the adaptive moment estimation (Adam) optimizer (Kingma and Ba, 2015) with the learning rate set to 0.01. The scores of the fine-tuning process of different learning rates on the LSTM model with GloVe word embeddings are shown in Figure 3.
On the development set, the LSTM model with GloVe word embeddings obtained a macro-F 1 -score of 0.423. In the test set, this method achieved an F 1 -score of 0.406. The comparative results are presented in Table 3.   Our system ranked 17th out of 36 teams. The selected LSTM model with GloVe word embeddings significantly exceeded the system baseline in terms of performance, which proved that the model performed well on this task and could detect the span of propaganda techniques in news articles.

TC Subtask Results
TC is a multiclass classification task. As illustrated in Table 4, the distribution of the golden labels is rather imbalanced. Therefore, the official evaluation measure for the task is the micro-F 1 -score .
The number of propaganda techniques is 18, but Table 4 only lists 14 techniques because some propaganda techniques are combined due to insufficient data for some propaganda techniques in the corpus (Da San Martino et al., 2020) . Since there are overlapping spans, formally, it is a multilabel and multiclass classification problem. However, whenever a span is associated with multiple techniques, the input file will have multiple copies of these fragments; therefore, the problem can be algorithmically treated as a multiclass classification problem. We tried to use GloVe and BERT to generate sentence embeddings, but the experimental results showed that the sentence embedding produced by BERT pretraining was better. After training on the datasets given by the TC task (Drissi et al., 2019;Deriu et al., 2016) , through the model mentioned above, the micro-F 1 -score on the development set was 0.561 and that on the test set  was 0.505. Our system ranked 22th out of 31 teams.
In addition to the LSTM model, we also tested some machine learning architectures and some integrated learning methods. Because the performance of the BERT-LSTM model on this task is better than those of other models, we adopted the BERT-LSTM model as our final model. The experimental results of the different models are shown in Table 5.

Conclusion
In this paper, we presented our system for the SemEval-2020 Task 11, which leverages LSTM and pretrained word embeddings without using human-engineered features for representation learning. Our experimental results show that the LSTM model with GloVe word embeddings can get better performance according to the scores of different neural network models and integration models on this task. The main goal of this task is to detect propaganda techniques in news articles at a fine-grained level, not just to make coarse judgments about whether the news articles use propaganda techniques. (Zhong and Miao, 2019) It is known that neural networks perform well on large training sets, but sometimes a large, accurately labeled dataset cannot be obtained. For future work, the development of propaganda technology detection in news articles can be greatly improved in the pretraining model and the integrated model architecture.