Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity

Dipto Das, Shion Guha, Bryan Semaan


Abstract
Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. This paper describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide a Bengali dataset as an artifact outcome that can contribute to future critical research.
Anthology ID:
2023.c3nlp-1.8
Volume:
Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Sunipa Dev, Vinodkumar Prabhakaran, David Adelani, Dirk Hovy, Luciana Benotti
Venue:
C3NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
68–83
Language:
URL:
https://aclanthology.org/2023.c3nlp-1.8
DOI:
10.18653/v1/2023.c3nlp-1.8
Bibkey:
Cite (ACL):
Dipto Das, Shion Guha, and Bryan Semaan. 2023. Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 68–83, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity (Das et al., C3NLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.c3nlp-1.8.pdf
Video:
 https://aclanthology.org/2023.c3nlp-1.8.mp4