Christian Burger


2024

Brigitte Krenn, Johann Petrak, Marina Kubina, Christian Burger This paper presents a sexism/misogyny dataset extracted from comments of a large online forum of an Austrian newspaper. The comments are in Austrian German language, and in some cases interspersed with dialectal or English elements. We describe the data collection, the annotation guidelines and the annotation process resulting in a corpus of approximately 8 000 comments which were annotated with 5 levels of sexism/misogyny, ranging from 0 (not sexist/misogynist) to 4 (highly sexist/misogynist). The professional forum moderators (self-identified females and males) of the online newspaper were involved as experts in the creation of the annotation guidelines and the annotation of the user comments. In addition, we also describe first results of training transformer-based classification models for both binarized and original label classification of the corpus.