Andrew Zaldivar

2026

Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been known to be conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.

pdf bib abs

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context
Aishwarya Verma | Laud Ammah | Olivia Nercy Ndlovu Lucas | Andrew Zaldivar | Vinodkumar Prabhakaran | Sunipa Dev
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Stereotype repositories are critical to assess generative AI model safety, but currently lack adequate global coverage. It is imperative to prioritize targeted expansion, strategically addressing existing deficits, over merely increasing data volume. This work introduces a multilingual stereotype resource covering four sub-Saharan African countries that are severely underrepresented in NLP resources: Ghana, Kenya, Nigeria, and South Africa. By utilizing socioculturally-situated, community-engaged methods, including telephonic surveys moderated in native languages, we establish a reproducible methodology that is sensitive to the region’s complex linguistic diversity and traditional orality. By deliberately balancing the sample across diverse ethnic and demographic backgrounds, we ensure broad coverage, resulting in a dataset of 3,534 stereotypes in English and 3,206 stereotypes across 15 native languages.

Co-authors

Charu Kalia 1

Olivia Nercy Ndlovu Lucas 1

Hayk Stepanyan 1

Erin MacMurray van Liemt 1

Venues

EACL1
Findings1

Fix author