Occupational Biases in Norwegian and Multilingual Language Models

Samia Touileb, Lilja Øvrelid, Erik Velldal


Abstract
In this paper we explore how a demographic distribution of occupations, along gender dimensions, is reflected in pre-trained language models. We give a descriptive assessment of the distribution of occupations, and investigate to what extent these are reflected in four Norwegian and two multilingual models. To this end, we introduce a set of simple bias probes, and perform five different tasks combining gendered pronouns, first names, and a set of occupations from the Norwegian statistics bureau. We show that language specific models obtain more accurate results, and are much closer to the real-world distribution of clearly gendered occupations. However, we see that none of the models have correct representations of the occupations that are demographically balanced between genders. We also discuss the importance of the training data on which the models were trained on, and argue that template-based bias probes can sometimes be fragile, and a simple alteration in a template can change a model’s behavior.
Anthology ID:
2022.gebnlp-1.21
Volume:
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
July
Year:
2022
Address:
Seattle, Washington
Venues:
GeBNLP | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
200–211
Language:
URL:
https://aclanthology.org/2022.gebnlp-1.21
DOI:
10.18653/v1/2022.gebnlp-1.21
Bibkey:
Cite (ACL):
Samia Touileb, Lilja Øvrelid, and Erik Velldal. 2022. Occupational Biases in Norwegian and Multilingual Language Models. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 200–211, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Occupational Biases in Norwegian and Multilingual Language Models (Touileb et al., GeBNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.gebnlp-1.21.pdf
Code
 samiatouileb/biases-norwegian-multilingual-lms
Data
WinoBias