Hire Me or Not? Examining Language Model’s Behavior with Occupation Attributes

Damin Zhang, Yi Zhang, Geetanjali Bihani, Julia Rayz


Abstract
With the impressive performance in various downstream tasks, large language models (LLMs) have been widely integrated into production pipelines, such as recruitment and recommendation systems. A known issue of models trained on natural language data is the presence of human biases, which can impact the fairness of the system. This paper investigates LLMs’ behavior with respect to gender stereotypes in the context of occupation decision making. Our framework is designed to investigate and quantify the presence of gender stereotypes in LLMs’ behavior via multi-round question answering. Inspired by prior work, we constructed a dataset using a standard occupation classification knowledge base released by authoritative agencies. We tested it on three families of LMs (RoBERTa, GPT, and Llama) and found that all models exhibit gender stereotypes analogous to human biases, but with different preferences. The distinct preferences of GPT-3.5-turbo and Llama2-70b-chat, along with additional analysis indicating GPT-4o-mini favors female subjects, may imply that the current alignment methods are insufficient for debiasing and could introduce new biases contradicting the traditional gender stereotypes. Our contribution includes a 73,500 prompts dataset constructed with a taxonomy of real-world occupations and a multi-step verification framework to evaluate model’s behavior regarding gender stereotype.
Anthology ID:
2025.coling-main.529
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7891–7911
Language:
URL:
https://aclanthology.org/2025.coling-main.529/
DOI:
Bibkey:
Cite (ACL):
Damin Zhang, Yi Zhang, Geetanjali Bihani, and Julia Rayz. 2025. Hire Me or Not? Examining Language Model’s Behavior with Occupation Attributes. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7891–7911, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Hire Me or Not? Examining Language Model’s Behavior with Occupation Attributes (Zhang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.529.pdf