Mind the Gap: Gender-based Differences in Occupational Embeddings

Olga Kononykhina; Anna-Carolina Haensch; Frauke Kreuter

doi:10.18653/v1/2025.gebnlp-1.7

Mind the Gap: Gender-based Differences in Occupational Embeddings

Olga Kononykhina, Anna-Carolina Haensch, Frauke Kreuter

Abstract

Large Language Models (LLMs) offer promising alternatives to traditional occupational coding approaches in survey research. Using a German dataset, we examine the extent to which LLM-based occupational coding differs by gender. Our findings reveal systematic disparities: gendered job titles (e.g., “Autor” vs. “Autorin”, meaning “male author” vs. “female author”) frequently result in diverging occupation codes, even when semantically identical. Across all models, 54%–82% of gendered inputs obtain different Top-5 suggestions. The practical impact, however, depends on the model. GPT includes the correct code most often (62%) but demonstrates female bias (up to +18 pp). IBM is less accurate (51%) but largely balanced. Alibaba, Gemini, and MiniLM achieve about 50% correct-code inclusion, and their small (< 10 pp) and direction-flipping gaps could indicate a sampling noise rather than gender bias. We discuss these findings in the context of fairness and reproducibility in NLP applications for social data.

Anthology ID:: 2025.gebnlp-1.7
Volume:: Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza
Venues:: GeBNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 83–91
Language:
URL:: https://aclanthology.org/2025.gebnlp-1.7/
DOI:: 10.18653/v1/2025.gebnlp-1.7
Bibkey:
Cite (ACL):: Olga Kononykhina, Anna-Carolina Haensch, and Frauke Kreuter. 2025. Mind the Gap: Gender-based Differences in Occupational Embeddings. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 83–91, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Mind the Gap: Gender-based Differences in Occupational Embeddings (Kononykhina et al., GeBNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.gebnlp-1.7.pdf

PDF Cite Search Fix data