Invisible Speakers? Gender Disparity in German AI Discourse and Its Reflection in Language Models

Milena Belosevic


Abstract
This paper investigates how language models (LMs) reproduce the existing gender disparity found in German media discourse about artificial intelligence (AI). Building on a human-annotated corpus of quotations from German media discourse on AI, we first quantify the frequency with which male and female speakers are directly cited across domains and speaker roles. We then train LLäMmlein (Pfister et al., 2025), a state-of-the-art German-only language model, GBERT, and a logistic regression model using only the quoted text as input and without providing any gender cues to classify the quotation as originating from a male or female speaker. By comparing model predictions with corpus-based gold labels, we find that male voices dominate both the corpus and the model predictions. Balancing the data mitigates but does not fully eliminate this disparity, indicating that the strong male-default tendency of transformer models cannot be explained by corpus skew alone, but also by their priors from pretraining. The study contributes to the interpretability of language models’ output for DH-related tasks, adaptation of NLP tools to domain-specific humanities corpora, and knowledge modelling in the humanities.
Anthology ID:
2026.latechclfl-1.7
Volume:
Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz
Venues:
LaTeCH-CLfL | WS
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
66–79
Language:
URL:
https://aclanthology.org/2026.latechclfl-1.7/
DOI:
Bibkey:
Cite (ACL):
Milena Belosevic. 2026. Invisible Speakers? Gender Disparity in German AI Discourse and Its Reflection in Language Models. In Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026, pages 66–79, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Invisible Speakers? Gender Disparity in German AI Discourse and Its Reflection in Language Models (Belosevic, LaTeCH-CLfL 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.latechclfl-1.7.pdf
Supplementarymaterial:
 2026.latechclfl-1.7.SupplementaryMaterial.zip
Supplementarymaterial:
 2026.latechclfl-1.7.SupplementaryMaterial.txt