Anna-Katharina Dick
2024
GIL-GALaD: Gender Inclusive Language - German Auto-Assembled Large Database
Anna-Katharina Dick
|
Matthias Drews
|
Valentin Pickard
|
Victoria Pierz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
As the need for gender-inclusive language has become a highly debated topic over the years, gendered biases in speech are unfortunately often picked up and propagated by modern language models trained on large amounts of text. While remedial efforts are underway, grammatically gendered languages such as German pose some unique challenges in generating gender-inclusive language for corrective model training or fine-tuning. We assembled GIL-GALaD, a corpus of German gender-inclusive language from different sources such as social media, news articles, public speeches and academic publications. Our corpus includes the most common types of modifications of generic masculine forms of nouns and spans 30 years (1993-2023), containing over 800,000 instances of gender-inclusive language. Tools for corpus usage and extension are to be included in the release. During corpus assembly, we were also able to gain some insights into which types of gender-inclusive language were used in practice throughout the years and across different domains.
2022
QiNiAn at SemEval-2022 Task 5: Multi-Modal Misogyny Detection and Classification
Qin Gu
|
Nino Meisinger
|
Anna-Katharina Dick
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
In this paper, we describe our submission to the misogyny classification challenge at SemEval-2022. We propose two models for the two subtasks of the challenge: The first uses joint image and text classification to classify memes as either misogynistic or not. This model uses a majority voting ensemble structure built on traditional classifiers and additional image information such as age, gender and nudity estimations. The second model uses a RoBERTa classifier on the text transcriptions to additionally identify the type of problematic ideas the memes perpetuate. Our submissions perform above all organizer submitted baselines. For binary misogyny classification, our system achieved the fifth place on the leaderboard, with a macro F1-score of 0.665. For multi-label classification identifying the type of misogyny, our model achieved place 19 on the leaderboard, with a weighted F1-score of 0.637.
2020
HumorAAC at SemEval-2020 Task 7: Assessing the Funniness of Edited News Headlines through Regression and Trump Mentions
Anna-Katharina Dick
|
Charlotte Weirich
|
Alla Kutkina
Proceedings of the Fourteenth Workshop on Semantic Evaluation
In this paper we describe our contribution to the Semeval-2020 Humor Assessment task. We essentially use three different features that are passed into a ridge regression to determine a funniness score for an edited news headline: statistical, count-based features, semantic features and contextual information. For deciding which one of two given edited headlines is funnier, we additionally use scoring information and logistic regression. Our work was mostly concentrated on investigating features, rather than improving prediction based on pre-trained language models. The resulting system is task-specific, lightweight and performs above the majority baseline. Our experiments indicate that features related to socio-cultural context, in our case mentions of Donald Trump, generally perform better than context-independent features like headline length.
Search
Fix data
Co-authors
- Matthias Drews 1
- Qin Gu 1
- Alla Kutkina 1
- Nino Meisinger 1
- Valentin Pickard 1
- show all...