IIITT at CASE 2021 Task 1: Leveraging Pretrained Language Models for Multilingual Protest Detection

In a world abounding in constant protests resulting from events like a global pandemic, climate change, religious or political conflicts, there has always been a need to detect events/protests before getting amplified by news media or social media. This paper demonstrates our work on the sentence classification subtask of multilingual protest detection in CASE@ACL-IJCNLP 2021. We approached this task by employing various multilingual pre-trained transformer models to classify if any sentence contains information about an event that has transpired or not. We performed soft voting over the models, achieving the best results among the models, accomplishing a macro F1-Score of 0.8291, 0.7578, and 0.7951 in English, Spanish, and Portuguese, respectively.


Introduction
The recent surge in social media users has led many people to express their opinions on various global issues. These opinions travel far and wide within a matter of seconds (Hossny et al., 2018). This can influence many people and may engage public movements (Won et al., 2017a). Therefore, there is a definite need to detect these protests and analyse them to know the significant areas of disinterest.
Being a free and easy to use platform, social media has become a part of our day to day life. It incorporates people of different ages, gender, location, religions, background, and so on. The enormous number of rich and diversified users results in an enormous amount of information being generated, which is helpful in many ways (Kapoor et al., 2018). Some of this even contains private information about the users, which others could misuse. Cases were also found where certain users were being targeted and harassed by people using this platform, a common scenario in cyberbullying (Abaido, 2020).
Social media plays a crucial role in amplifying these protests and movements (Won et al., 2017b). It enables political groups and protesters to organise protest movements and share information. It acts as a platform for the people who are underrepresented by giving a voice to them. It also offers new opportunities for people to engage in activism, political resistance, and protest outside the political groups and civic institutions. Thus, it has a social impact on everyone (Pulido et al., 2018). It is to be noted that social media, similar to news media, plays a vital role in its social and political events worldwide (Holt et al., 2013). For the above reasons, we can state that social media plays a crucial role in most worldwide events.
The English language is widely regarded as the first Lingua Franca. Statistically, it is one of the most widely spoken languages globally, having official status in over 53 countries (Crystal, 2008). Over 400 million people speak English as their primary language and widely spoken in the United States and the United Kingdom. BlackLivesMatter (Dave et al., 2020), EarthDay (Rome, 2010) are some of the major protests that have occurred in these countries. Español commonly referred to as Spanish, is spoken by over 360 million people worldwide, with most of its speakers residing in Mexico, Argentina, Spain. 15-M Movement (Casero-Ripollés and Feenstra, 2012) and YoSoy132 (García and Treré, 2014) are some of the recent protests where people have been vocal about in the Spanish language. Portuguese has over 220 million native speakers. Brazil, Portugal, Angola are some of the major countries where this language is spoken. Protests like Racism Kills, May 68 (Ross, 2008) are the recent ones that occurred in the Portuguese language.
The recent upheavals of protests are due to so-  (Peng et al., 2013), which has motivated us to participate in the shared task for multilingual protest detection (Hürriyetoglu et al., 2019a) The objective of the task is to identify if any sentence talks about any mentions of protests or events in three languages, namely, English, Spanish, and Portuguese. Hence, we treat this as a sequence classification task. The rest of the paper is organized as follows, Section 2 presents previous work on protest detection and analysis. Section 3 entails a comprehensive analysis of the dataset used for our cause. Next, section 4 gives a detailed description of the models used for the multilingual event detection. Finally, section 5 analyses the results obtained, and Section 6 concludes our work while discussing the potential directions for future work.

Related Work
The need to detect events that could lead to protests is of prime interest to sociologists and governments (Danilova et al., 2016). There are several active ongoing projects for socio-political event systems such as KEDS (Kansas Event Data System) (Schrodt and Hall, 2006), CAMEO (Conflict and Mediation Event Observation) (Gerner et al., 2002), and several other databases for protest de-tection systems (Danilova, 2015). These methods have focused on news data as they have traditionally been the most reliant source of events. Protest detection has been one of the major issues in the context of social and political (Ettinger et al., 2017). Papanikolaou and Papageorgiou (2020) presented a computational social science methodology to analyse protests in Greece.  constructed a corpus of protest events comprising various language sources from various countries. Several systems were submitted to the CLEF ProtestNews Track that consisted of three shared tasks, primarily aimed at identifying and extracting event information spanning to multiple countries (Hürriyetoglu et al., 2019b(Hürriyetoglu et al., , 2020.

Dataset
This dataset comprises 26,208 sentences in three languages, namely English, Spanish, and Portuguese. The dataset consists of two classes: • Event: The sentence indicates an event of the past.
• Not-event: The sentence does not talk about any event.
The volume of sequences indicating Not-event is higher in contrast to that of the Event label. Therefore, the dataset distribution is quite imbalanced. We can also notice that the number of English samples exceeds that of Spanish and Portuguese ones. Refer to Table 1

DistilmBERT
DistilBERT (Sanh et al., 2019) is the distilled version of BERT. DistilBERT employs a triple loss language modelling, where it integrates cosine distance loss with knowledge distillation. DistilBERT has 40% fewer parameters than BERT but still promises 97% of the latter's performance. It is also 60% faster than BERT. In this system, we used a cased multilingual DistilBERT model as they are three different languages. For our cause, we finetune distilbert-base-multilingual-cased, which is distilled from the mBERT checkpoint. The model has 6 layers, 768 dimensions, and 12 Attention heads, totalizing about 134 million parameters.

RoBERTa
Robustly Optimized BERT (RoBERTa) (Liu et al., 2019) follows the same architecture of BERT while differing in the pretraining strategy. It is pretrained with MLM as its objective where the model tries to predict the masked words. RoBERTa model is trained on the vast English Wikipedia and CC-News datasets. The NSP is not employed as a pretraining strategy, and the tokens are dynamically masked, making the model slightly different to BERT. During tokenization, RoBERTa follows byte-pair encoding (BPE) (Gallé, 2019) as opposed to WordPiece employed in BERT. We use robertabase, a pretrained language model consisting of 12 layers, 768 hidden, 12 attention heads, and 125 million parameters.

System Description
For our system, we fine-tune the pretrained models discussed in Section 4.1, 4.2, and 4.3. We combine the three datasets as the number of samples for Spanish and Portuguese are quite low. After combining the models, we split the validation set accordingly, maintaining the split's ratio and tabulating the results on the concatenated dataset in Table3. The embeddings are extracted from these models to be fed as input to the LSTM layer, (Hochreiter and Schmidhuber, 1997) as shown in Figure1. The resulting output is fed into a global average pooling layer (Lin et al., 2014) and then passed into fully connected layers, followed by a sigmoid activation function to obtain the resulting probability score for the input sentences. The same parameters are used for all three models. A dropout layer (Srivastava et al., 2014) is also added in between the fully connected layers for regularization. Refer Table 4 for the parameters used in the model.

Results and Analysis
All pretrained language models are fine-tuned in Google Colab 2 for ten epochs. We use the Tensorflow implementation of the models 3 on the Huggingface transformers library 4 . We compare the macro F1-Scores of our fine-tuned models on the validation set, which were created by splitting the given dataset. The remaining split is the training data. The validation set contains samples from all three languages. It has 4,387 Not-event sequences and 963 Event sequences making a combined total of 5,350. The results are shown in Table3. We fine-tuned BERT, DistilBERT, and RoBERTa models on the training set. We have combined the   glish (22,825). We also believe that our approach of combining datasets could have influenced the performance of the low support datasets.

Conclusion
The need to develop automated systems to detect any event is an active protest has constantly been increasing because of the escalation of social media users and several platforms to support them. In this paper, we have explored several multilingual language models to classify if a given sentence talks about an event that has happened (Event) or not (Not-event) in three languages. Our work primarily focuses on fine-tuning language models and feeding them to an architecture we created. We also observe that the problem of class imbalance has had a significant impact on the performance of the