Anna Kruspe

2024

pdf bib abs

Musical Ethnocentrism in Large Language Models
Anna Kruspe
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)

Large Language Models (LLMs) reflect the biases in their training data and, by extension, those of the people who created this training data. Detecting, analyzing, and mitigating such biases is becoming a focus of research. One type of bias that has been understudied so far are geocultural biases. Those can be caused by an imbalance in the representation of different geographic regions and cultures in the training data, but also by value judgments contained therein. In this paper, we make a first step towards analyzing musical biases in LLMs, particularly ChatGPT and Mixtral. We conduct two experiments. In the first, we prompt LLMs to provide lists of the “Top 100” musical contributors of various categories and analyze their countries of origin. In the second experiment, we ask the LLMs to numerically rate various aspects of the musical cultures of different countries. Our results indicate a strong preference of the LLMs for Western music cultures in both experiments.

2022

pdf bib abs

True or False? Detecting False Information on Social Media Using Graph Neural Networks
Samyo Rode-Hasinger | Anna Kruspe | Xiao Xiang Zhu
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)

In recent years, false information such as fake news, rumors and conspiracy theories on many relevant issues in society have proliferated. This phenomenon has been significantly amplified by the fast and inexorable spread of misinformation on social media and instant messaging platforms. With this work, we contribute to containing the negative impact on society caused by fake news. We propose a graph neural network approach for detecting false information on Twitter. We leverage the inherent structure of graph-based social media data aggregating information from short text messages (tweets), user profiles and social interactions. We use knowledge from pre-trained language models efficiently, and show that user-defined descriptions of profiles provide useful information for improved prediction performance. The empirical results indicate that our proposed framework significantly outperforms text- and user-based methods on misinformation datasets from two different domains, even in a difficult multilingual setting.

2021

pdf bib abs

Changes in Twitter geolocations: Insights and suggestions for future usage
Anna Kruspe | Matthias Häberle | Eike J. Hoffmann | Samyo Rode-Hasinger | Karam Abdulahhad | Xiao Xiang Zhu
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Twitter data has become established as a valuable source of data for various application scenarios in the past years. For many such applications, it is necessary to know where Twitter posts (tweets) were sent from or what location they refer to. Researchers have frequently used exact coordinates provided in a small percentage of tweets, but Twitter removed the option to share these coordinates in mid-2019. Moreover, there is reason to suspect that a large share of the provided coordinates did not correspond to GPS coordinates of the user even before that. In this paper, we explain the situation and the 2019 policy change and shed light on the various options of still obtaining location information from tweets. We provide usage statistics including changes over time, and analyze what the removal of exact coordinates means for various common research tasks performed with Twitter data. Finally, we make suggestions for future research requiring geolocated tweets.

2020

pdf bib abs

Cross-language sentiment analysis of European Twitter messages during the COVID-19 pandemic
Anna Kruspe | Matthias Häberle | Iona Kuhn | Xiao Xiang Zhu
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

In this paper, we analyze Twitter messages (tweets) collected during the first months of the COVID-19 pandemic in Europe with regard to their sentiment. This is implemented with a neural network for sentiment analysis using multilingual sentence embeddings. We separate the results by country of origin, and correlate their temporal development with events in those countries. This allows us to study the effect of the situation on people’s moods. We see, for example, that lockdown announcements correlate with a deterioration of mood in almost all surveyed countries, which recovers within a short time span.

Co-authors

Gabriel Meseguer-Brocal 1

Venues

Fix author