Thomas Hanke

2025

pdf bib abs
Making Sign Language Research Findable: The sign-lang@LREC Anthology and the Sign Language Dataset Compendium
Marc Schulder | Thomas Hanke | Maria Kopf
Proceedings of the 5th Conference on Language, Data and Knowledge

Resources and research on sign languages are sparse and can often be difficult to locate. Few centralised sources of information exist. This article presents two repositories that aim to improve the findability of such information through the implementation of open science best practices. The sign-lang@LREC Anthology is a repository of publications on sign languages in the series of sign-lang@LREC workshops and related events, enhanced with indices cataloguing what datasets, tools, languages and projects are addressed by these publications. The Sign Language Dataset Compendium provides an overview of existing linguistic corpora, lexical resources and data collection tasks. We describe the evolution of these repositories, covering topics such as supplementary information structures, rich metadata, interoperability, and dealing with the challenges of reference rot.

2024

pdf bib
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
Eleni Efthimiou | Stavroula-Evita Fotinea | Thomas Hanke | Julie A. Hochgesang | Johanna Mesch | Marc Schulder
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources

pdf bib
Corpus à la carte – Improving Access to the Public DGS Corpus
Reiner Konrad | Thomas Hanke | Amy Isard | Marc Schulder | Lutz König | Julian Bleicken | Oliver Böse
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources

pdf bib
Introducing the DW-DGS – The Digital Dictionary of DGS
Gabriele Langer | Anke Müller | Sabrina Wähl | Felicitas Otte | Lea Sepke | Thomas Hanke
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources

2022

pdf bib abs
How to be FAIR when you CARE: The DGS Corpus as a Case Study of Open Science Resources for Minority Languages
Marc Schulder | Thomas Hanke
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The publication of resources for minority languages requires a balance between making data open and accessible and respecting the rights and needs of its language community. The FAIR principles were introduced as a guide to good open data practices and they have since been complemented by the CARE principles for indigenous data governance. This article describes how the DGS Corpus implemented these principles and how the two sets of principles affected each other. The DGS Corpus is a large collection of recordings of members of the deaf community in Germany communicating in their primary language, German Sign Language (DGS); it was created to be both as a resource for linguistic research and as a record of the life experiences of deaf people in Germany. The corpus was designed with CARE in mind to respect and empower the language community and FAIR data publishing was used to enhance its usefulness as a scientific resource.

pdf bib
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources
Eleni Efthimiou | Stavroula-Evita Fotinea | Thomas Hanke | Julie A. Hochgesang | Jette Kristoffersen | Johanna Mesch | Marc Schulder
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

pdf bib abs
Introducing Sign Languages to a Multilingual Wordnet: Bootstrapping Corpora and Lexical Resources of Greek Sign Language and German Sign Language
Sam Bigeard | Marc Schulder | Maria Kopf | Thomas Hanke | Kyriaki Vasilaki | Anna Vacalopoulou | Theodore Goulas | Athanasia-Lida Dimou | Stavroula-Evita Fotinea | Eleni Efthimiou
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

Wordnets have been a popular lexical resource type for many years. Their sense-based representation of lexical items and numerous relation structures have been used for a variety of computational and linguistic applications. The inclusion of different wordnets into multilingual wordnet networks has further extended their use into the realm of cross-lingual research. Wordnets have been released for many spoken languages. Research has also been carried out into the creation of wordnets for several sign languages, but none have yet resulted in publicly available datasets. This article presents our own efforts towards an inclusion of sign languages in a multilingual wordnet, starting with Greek Sign Language (GSL) and German Sign Language (DGS). Based on differences in available language resources between GSL and DGS, we trial two workflows with different coverage priorities. We also explore how synergies between both workflows can be leveraged and how future work on additional sign languages could profit from building on existing sign language wordnet data. The results of our work are made publicly available.

pdf bib abs
The Sign Language Dataset Compendium: Creating an Overview of Digital Linguistic Resources
Maria Kopf | Marc Schulder | Thomas Hanke
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

One of the challenges that sign language researchers face is the identification of suitable language datasets, particularly for cross-lingual studies. There is no single source of information on what sign language corpora and lexical resources exist or how they compare. Instead, they have to be found through extensive literature review or word-of-mouth. The amount of information available on individual datasets can also vary widely and may be distributed across different publications, data repositories and (potentially defunct) project websites. This article introduces the Sign Language Dataset Compendium, an extensive overview of linguistic resources for sign languages. It covers existing corpora and lexical resources, as well as commonly used data collection tasks. Special attention is paid to covering resources for many different languages from around the globe. All information is provided in a standardised format to make entries comparable, but kept flexible enough to allow for differences in content. The compendium is intended as a growing resource that will be updated regularly.

pdf bib abs
Facilitating the Spread of New Sign Language Technologies across Europe
Hope Morgan | Onno Crasborn | Maria Kopf | Marc Schulder | Thomas Hanke
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

For developing sign language technologies like automatic translation, huge amounts of training data are required. Even the larger corpora available for some sign languages are tiny compared to the amounts of data used for corresponding spoken language technologies. The overarching goal of the European project EASIER is to develop a framework for bidirectional automatic translation between sign and spoken languages and between sign languages. One part of this multi-dimensional project is that it will pool available language resources from European sign languages into a larger dataset to address the data scarcity problem. This approach promises to open the floor for lower-resourced sign languages in Europe. This article focusses on efforts in the EASIER project to allow for new languages to make use of such technologies in the future. What are the characteristics of sign language resources needed to train recognition, translation, and synthesis algorithms, and how can other countries including those without any sign resources follow along with these developments? The efforts undertaken in EASIER include creating workflow documents and organizing training sessions in online workshops. They reflect the current state of the art, and will likely need to be updated in the coming decade.

pdf bib
Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives
Eleni Efthimiou | Stavroula-Evita Fotinea | Thomas Hanke | John C. McDonald | Dimitar Shterionov | Rosalee Wolfe
Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives

2021

Development of automatic translation between signed and spoken languages has lagged behind the development of automatic translation between spoken languages, but it is a common misperception that extending machine translation techniques to include signed languages should be a straightforward process. A contributing factor is the lack of an acceptable method for displaying sign language apart from interpreters on video. This position paper examines the challenges of displaying a signed language as a target in automatic translation, analyses the underlying causes and suggests strategies to develop display technologies that are acceptable to sign language communities.

2020

pdf bib
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives
Eleni Efthimiou | Stavroula-Evita Fotinea | Thomas Hanke | Julie A. Hochgesang | Jette Kristoffersen | Johanna Mesch
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

pdf bib abs
Extending the Public DGS Corpus in Size and Depth
Thomas Hanke | Marc Schulder | Reiner Konrad | Elena Jahn
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

In 2018 the DGS-Korpus project published the first full release of the Public DGS Corpus. This event marked a change of focus for the project. While before most attention had been on increasing the size of the corpus, now an increase in its depth became the priority. New data formats were added, corpus annotation conventions were released and OpenPose pose information was published for all transcripts. The community and research portal websites of the corpus also received upgrades, including persistent identifiers, archival copies of previous releases and improvements to their usability on mobile devices. The research portal was enhanced even further, improving its transcript web viewer, adding a KWIC concordance view, introducing cross-references to other linguistic resources of DGS and making its entire interface available in German in addition to English. This article provides an overview of these changes, chronicling the evolution of the Public DGS Corpus from its first release in 2018, through its second release in 2019 until its third release in 2020.

pdf bib abs
SignHunter – A Sign Elicitation Tool Suitable for Deaf Events
Thomas Hanke | Elena Jahn | Sabrina Wähl | Oliver Böse | Lutz König
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

This paper presents SignHunter, a tool for collecting isolated signs, and discusses application possibilities. SignHunter is successfully used within the DGS-Korpus project to collect name signs for places and cities. The data adds to the content of a German Sign Language (DGS) – German dictionary which is currently being developed, as well as a freely accessible subset of the DGS Corpus, the Public DGS Corpus. We discuss reasons to complement a natural language corpus by eliciting concepts without context and present an application example of SignHunter.

pdf bib abs
From Dictionary to Corpus and Back Again – Linking Heterogeneous Language Resources for DGS
Anke Müller | Thomas Hanke | Reiner Konrad | Gabriele Langer | Sabrina Wähl
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

The Public DGS Corpus is published in two different formats, that is subtitled videos for lay persons and lemmatized and annotated transcripts and videos for experts. In addition, a draft version with the first set of preliminary entries of the DGS dictionary (DW-DGS) to be completed in 2023 is now online. The Public DGS Corpus and the DW-DGS are conceived of as stand-alone products, but are nevertheless closely interconnected to offer additional and complementary informative functions. In this paper we focus on linking the published products in order to provide users access to corpus and corpus-based dictionary in various, interrelated ways. We discuss which links are thought to be useful and what challenges the linking of the products poses. In addition we address the inclusion of links to other, older lexical resources (LSP dictionaries).

2016

pdf bib abs
Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data
Julian Bleicken | Thomas Hanke | Uta Salden | Sven Wagner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

For publishing sign language corpus data on the web, anonymization is crucial even if it is impossible to hide the visual appearance of the signers: In a small community, even vague references to third persons may be enough to identify those persons. In the case of the DGS Korpus (German Sign Language corpus) project, we want to publish data as a contribution to the cultural heritage of the sign language community while annotation of the data is still ongoing. This poses the question how well anonymization can be achieved given that no full linguistic analysis of the data is available. Basically, we combine analysis of all data that we have, including named entity recognition on translations into German. For this, we use the WebLicht language technology infrastructure. We report on the reliability of these methods in this special context and also illustrate how the anonymization of the video data is technically achieved in order to minimally disturb the viewer.

2002

pdf bib
iLex - A tool for Sign Language Lexicography and Corpus Analysis
Thomas Hanke
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Co-authors

Venues

Fix author