Johanna Binnewitt
2024
Recognising Occupational Titles in German Parliamentary Debates
Johanna Binnewitt
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
The application of text mining methods is becoming more and more popular, not only in Digital Humanities (DH) and Computational Social Sciences (CSS) in general, but also in vocational education and training (VET) research. Employing algorithms offers the possibility to explore corpora that are simply too large for manual methods. However, challenges arise when dealing with abstract concepts like occupations or skills, which are crucial subjects of VET research. Since algorithms require concrete instructions, either in the form of rules or annotated examples, these abstract concepts must be broken down as part of the operationalisation process. In our paper, we tackle the task of identifying occupational titles in the plenary protocols of the German Bundestag. The primary focus lies in the comparative analysis of two distinct approaches: a dictionary-based method and a BERT fine-tuning approach. Both approaches are compared in a quantitative evaluation and applied to a larger corpus sample. Results indicate comparable precision for both approaches (0.93), but the BERT-based models outperform the dictionary-based approach in terms of recall (0.86 vs. 0.77). Errors in the dictionary-based method primarily stem from the ambiguity of occupational titles (e.g., ‘baker’ as both a surname and a profession) and missing terms in the dictionary. In contrast, the BERT model faces challenges in distinguishing occupational titles from other personal names, such as ‘mother’ or ‘Christians’.
2023
Personal noun detection for German
Carla Sökefeld
|
Melanie Andresen
|
Johanna Binnewitt
|
Heike Zinsmeister
Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA-19)
Personal nouns, i.e. common nouns denoting human beings, play an important role in manifesting gender and gender stereotypes in texts, especially for languages with grammatical gender like German. Automatically detecting and extracting personal nouns can thus be of interest to a myriad of different tasks such as minimizing gender bias in language models and researching gender stereotypes or gender-fair language, but is complicated by the morphological heterogeneity and homonymy of personal and non-personal nouns, which restrict lexicon-based approaches. In this paper, we introduce a classifier created by fine-tuning a transformer model that detects personal nouns in German. Although some phenomena like homonymy and metalinguistic uses are still problematic, the model is able to classify personal nouns with robust accuracy (f1-score: 0.94).