Use of a Citizen Science Platform for the Creation of a Language Resource to Study Bias in Language Models for French: A Case Study

Karën Fort, Aurélie Névéol, Yoann Dupont, Julien Bezançon


Abstract
There is a growing interest in the evaluation of bias, fairness and social impact of Natural Language Processing models and tools. However, little resources are available for this task in languages other than English. Translation of resources originally developed for English is a promising research direction. However, there is also a need for complementing translated resources by newly sourced resources in the original languages and social contexts studied. In order to collect a language resource for the study of biases in Language Models for French, we decided to resort to citizen science. We created three tasks on the LanguageARC citizen science platform to assist with the translation of an existing resource from English into French as well as the collection of complementary resources in native French. We successfully collected data for all three tasks from a total of 102 volunteer participants. Participants from different parts of the world contributed and we noted that although calls sent to mailing lists had a positive impact on participation, some participants pointed barriers to contributions due to the collection platform.
Anthology ID:
2022.nidcp-1.2
Volume:
Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Chris Callison-Burch, Christopher Cieri, James Fiumara, Mark Liberman
Venue:
NIDCP
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
8–13
Language:
URL:
https://aclanthology.org/2022.nidcp-1.2
DOI:
Bibkey:
Cite (ACL):
Karën Fort, Aurélie Névéol, Yoann Dupont, and Julien Bezançon. 2022. Use of a Citizen Science Platform for the Creation of a Language Resource to Study Bias in Language Models for French: A Case Study. In Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022, pages 8–13, Marseille, France. European Language Resources Association.
Cite (Informal):
Use of a Citizen Science Platform for the Creation of a Language Resource to Study Bias in Language Models for French: A Case Study (Fort et al., NIDCP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nidcp-1.2.pdf
Data
CrowS-Pairs