Towards Responsible Natural Language Annotation for the Varieties of Arabic

A. Bergman, Mona Diab


Abstract
When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance. In this position paper, we make the case for care and attention to such nuances, particularly in dataset annotation, as well as the inclusion of cultural and linguistic expertise in the process. We present a playbook for responsible dataset creation for polyglossic, multidialectal languages. This work is informed by a study on Arabic annotation of social media content.
Anthology ID:
2022.findings-acl.31
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
364–371
Language:
URL:
https://aclanthology.org/2022.findings-acl.31
DOI:
10.18653/v1/2022.findings-acl.31
Bibkey:
Cite (ACL):
A. Bergman and Mona Diab. 2022. Towards Responsible Natural Language Annotation for the Varieties of Arabic. In Findings of the Association for Computational Linguistics: ACL 2022, pages 364–371, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Towards Responsible Natural Language Annotation for the Varieties of Arabic (Bergman & Diab, Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.31.pdf
Video:
 https://aclanthology.org/2022.findings-acl.31.mp4