The Sign Language Dataset Compendium: Creating an Overview of Digital Linguistic Resources

Maria Kopf, Marc Schulder, Thomas Hanke


Abstract
One of the challenges that sign language researchers face is the identification of suitable language datasets, particularly for cross-lingual studies. There is no single source of information on what sign language corpora and lexical resources exist or how they compare. Instead, they have to be found through extensive literature review or word-of-mouth. The amount of information available on individual datasets can also vary widely and may be distributed across different publications, data repositories and (potentially defunct) project websites. This article introduces the Sign Language Dataset Compendium, an extensive overview of linguistic resources for sign languages. It covers existing corpora and lexical resources, as well as commonly used data collection tasks. Special attention is paid to covering resources for many different languages from around the globe. All information is provided in a standardised format to make entries comparable, but kept flexible enough to allow for differences in content. The compendium is intended as a growing resource that will be updated regularly.
Anthology ID:
2022.signlang-1.16
Volume:
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, Julie A. Hochgesang, Jette Kristoffersen, Johanna Mesch, Marc Schulder
Venue:
SignLang
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
102–109
Language:
URL:
https://aclanthology.org/2022.signlang-1.16
DOI:
Bibkey:
Cite (ACL):
Maria Kopf, Marc Schulder, and Thomas Hanke. 2022. The Sign Language Dataset Compendium: Creating an Overview of Digital Linguistic Resources. In Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, pages 102–109, Marseille, France. European Language Resources Association.
Cite (Informal):
The Sign Language Dataset Compendium: Creating an Overview of Digital Linguistic Resources (Kopf et al., SignLang 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.signlang-1.16.pdf