CrowS-Pairs-NL: A Benchmark to Evaluate Dutch Stereotype Bias in LLMs

Jens van der Weide; Dong Nguyen; Marianne Schaaphok; Roos M. Bakker

CrowS-Pairs-NL: A Benchmark to Evaluate Dutch Stereotype Bias in LLMs

Jens van der Weide, Dong Nguyen, Marianne Schaaphok, Roos M. Bakker

Abstract

Bias benchmarks for LLMs largely focus on English, overlooking language- and culture-specific stereotypes. We introduce CrowS-Pairs-NL, a Dutch stereotype benchmark built by filtering, translating, and adapting the English CrowS-Pairs dataset to address known conceptual pitfalls, and extending it with newly crowdsourced Dutch sentence pairs. We evaluate six multilingual and Dutch-trained models using both a pseudo-log-likelihood metric adapted for autoregressive models and a prompt-based metric with three template variants. Models explicitly trained on Dutch data consistently exhibit higher stereotyping scores, suggesting that language-specific fine-tuning introduces language-specific bias. The two metrics broadly agree on model rankings but differ in sensitivity, with the prompt metric showing a narrower range of scores. Our benchmark and findings underscore the need for culturally grounded bias evaluation beyond English.

Anthology ID:: 2026.stereacult-1.1
Volume:: Proceedings of the 1st Workshop on Stereotypes Across Cultures in Language Technologies (StereACuLT 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Weicheng Ma, Soroush Vosoughi, Nabeel Gillani, Rolando Coto-Solano
Venues:: StereACuLT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–12
Language:
URL:: https://aclanthology.org/2026.stereacult-1.1/
DOI:
Bibkey:
Cite (ACL):: Jens van der Weide, Dong Nguyen, Marianne Schaaphok, and Roos M. Bakker. 2026. CrowS-Pairs-NL: A Benchmark to Evaluate Dutch Stereotype Bias in LLMs. In Proceedings of the 1st Workshop on Stereotypes Across Cultures in Language Technologies (StereACuLT 2026), pages 1–12, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CrowS-Pairs-NL: A Benchmark to Evaluate Dutch Stereotype Bias in LLMs (van der Weide et al., StereACuLT 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.stereacult-1.1.pdf

PDF Cite Search Fix data