Fine-Tuning Language Models on Dutch Protest Event Tweets

Meagan Loerakker; Laurens Müter; Marijn Schraagen

doi:10.18653/v1/2024.case-1.2

Fine-Tuning Language Models on Dutch Protest Event Tweets

Meagan Loerakker, Laurens Müter, Marijn Schraagen

Abstract

Being able to obtain timely information about an event, like a protest, becomes increasingly more relevant with the rise of affective polarisation and social unrest over the world. Nowadays, large-scale protests tend to be organised and broadcast through social media. Analysing social media platforms like X has proven to be an effective method to follow events during a protest. Thus, we trained several language models on Dutch tweets to analyse their ability to classify if a tweet expresses discontent, considering these tweets may contain practical information about a protest. Our results show that models pre-trained on Twitter data, including Bernice and TwHIN-BERT, outperform models that are not. Additionally, the results showed that Sentence Transformers is a promising model. The added value of oversampling is greater for models that were not trained on Twitter data. In line with previous work, pre-processing the data did not help a transformer language model to make better predictions.

Anthology ID:: 2024.case-1.2
Original:: 2024.case-1.2v1
Version 2:: 2024.case-1.2v2
Version 3:: 2024.case-1.2v3
Volume:: Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Month:: March
Year:: 2024
Address:: St. Julians, Malta
Editors:: Ali Hürriyetoğlu, Hristo Tanev, Surendrabikram Thapa, Gökçe Uludoğan
Venues:: CASE | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6–23
Language:
URL:: https://aclanthology.org/2024.case-1.2/
DOI:: 10.18653/v1/2024.case-1.2
Bibkey:
Cite (ACL):: Meagan Loerakker, Laurens Müter, and Marijn Schraagen. 2024. Fine-Tuning Language Models on Dutch Protest Event Tweets. In Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 6–23, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):: Fine-Tuning Language Models on Dutch Protest Event Tweets (Loerakker et al., CASE 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.case-1.2.pdf
Supplementarymaterial:: 2024.case-1.2.SupplementaryMaterial.txt

PDF (v3) PDF (v1) Cite Search Supplementarymaterial Fix data