Bastiaan Bruinsma
2024
Look Who’s Talking: The Most Frequently Used Words in the Bulgarian Parliament 1990-2024
Ruslana Margova
|
Bastiaan Bruinsma
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
In this study we identify the most frequently used words and some multi-word expressions in the Bulgarian Parliament. We do this by using the transcripts of all plenary sessions between 1990 and 2024 - 3,936 in total. This allows us both to study an interesting period known in the Bulgarian linguistic space as the years of “transition and democracy”, and to provide scholars of Bulgarian politics with a purposefully generated list of additional stop words that they can use for future analysis. Because our list of words was generated from the data, there is no preconceived theory, and because we include all interactions during all sessions, our analysis goes beyond traditional party lines. We provide details of how we selected, retrieved, and cleaned our data, and discuss our findings.
2023
Sudden Semantic Shifts in Swedish NATO discourse
Brian Bonafilia
|
Bastiaan Bruinsma
|
Denitsa Saynova
|
Moa Johansson
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
In this paper, we investigate a type of semantic shift that occurs when a sudden event radically changes public opinion on a topic. Looking at Sweden’s decision to apply for NATO membership in 2022, we use word embeddings to study how the associations users on Twitter have regarding NATO evolve. We identify several changes that we successfully validate against real-world events. However, the low engagement of the public with the issue often made it challenging to distinguish true signals from noise. We thus find that domain knowledge and data selection are of prime importance when using word embeddings to study semantic shifts.
Class Explanations: the Role of Domain-Specific Content and Stop Words
Denitsa Saynova
|
Bastiaan Bruinsma
|
Moa Johansson
|
Richard Johansson
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
We address two understudied areas related to explainability for neural text models. First, class explanations. What features are descriptive across a class, rather than explaining single input instances? Second, the type of features that are used for providing explanations. Does the explanation involve the statistical pattern of word usage or the presence of domain-specific content words? Here, we present a method to extract both class explanations and strategies to differentiate between two types of explanations – domain-specific signals or statistical variations in frequencies of common words. We demonstrate our method using a case study in which we analyse transcripts of political debates in the Swedish Riksdag.