Marc Schulder


2022

pdf bib
How to be FAIR when you CARE: The DGS Corpus as a Case Study of Open Science Resources for Minority Languages
Marc Schulder | Thomas Hanke
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The publication of resources for minority languages requires a balance between making data open and accessible and respecting the rights and needs of its language community. The FAIR principles were introduced as a guide to good open data practices and they have since been complemented by the CARE principles for indigenous data governance. This article describes how the DGS Corpus implemented these principles and how the two sets of principles affected each other. The DGS Corpus is a large collection of recordings of members of the deaf community in Germany communicating in their primary language, German Sign Language (DGS); it was created to be both as a resource for linguistic research and as a record of the life experiences of deaf people in Germany. The corpus was designed with CARE in mind to respect and empower the language community and FAIR data publishing was used to enhance its usefulness as a scientific resource.

pdf bib
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources
Eleni Efthimiou | Stavroula-Evita Fotinea | Thomas Hanke | Julie A. Hochgesang | Jette Kristoffersen | Johanna Mesch | Marc Schulder
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

pdf bib
Introducing Sign Languages to a Multilingual Wordnet: Bootstrapping Corpora and Lexical Resources of Greek Sign Language and German Sign Language
Sam Bigeard | Marc Schulder | Maria Kopf | Thomas Hanke | Kyriaki Vasilaki | Anna Vacalopoulou | Theodore Goulas | Athanasia-Lida Dimou | Stavroula-Evita Fotinea | Eleni Efthimiou
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

Wordnets have been a popular lexical resource type for many years. Their sense-based representation of lexical items and numerous relation structures have been used for a variety of computational and linguistic applications. The inclusion of different wordnets into multilingual wordnet networks has further extended their use into the realm of cross-lingual research. Wordnets have been released for many spoken languages. Research has also been carried out into the creation of wordnets for several sign languages, but none have yet resulted in publicly available datasets. This article presents our own efforts towards an inclusion of sign languages in a multilingual wordnet, starting with Greek Sign Language (GSL) and German Sign Language (DGS). Based on differences in available language resources between GSL and DGS, we trial two workflows with different coverage priorities. We also explore how synergies between both workflows can be leveraged and how future work on additional sign languages could profit from building on existing sign language wordnet data. The results of our work are made publicly available.

pdf bib
The Sign Language Dataset Compendium: Creating an Overview of Digital Linguistic Resources
Maria Kopf | Marc Schulder | Thomas Hanke
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

One of the challenges that sign language researchers face is the identification of suitable language datasets, particularly for cross-lingual studies. There is no single source of information on what sign language corpora and lexical resources exist or how they compare. Instead, they have to be found through extensive literature review or word-of-mouth. The amount of information available on individual datasets can also vary widely and may be distributed across different publications, data repositories and (potentially defunct) project websites. This article introduces the Sign Language Dataset Compendium, an extensive overview of linguistic resources for sign languages. It covers existing corpora and lexical resources, as well as commonly used data collection tasks. Special attention is paid to covering resources for many different languages from around the globe. All information is provided in a standardised format to make entries comparable, but kept flexible enough to allow for differences in content. The compendium is intended as a growing resource that will be updated regularly.

pdf bib
Facilitating the Spread of New Sign Language Technologies across Europe
Hope Morgan | Onno Crasborn | Maria Kopf | Marc Schulder | Thomas Hanke
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

For developing sign language technologies like automatic translation, huge amounts of training data are required. Even the larger corpora available for some sign languages are tiny compared to the amounts of data used for corresponding spoken language technologies. The overarching goal of the European project EASIER is to develop a framework for bidirectional automatic translation between sign and spoken languages and between sign languages. One part of this multi-dimensional project is that it will pool available language resources from European sign languages into a larger dataset to address the data scarcity problem. This approach promises to open the floor for lower-resourced sign languages in Europe. This article focusses on efforts in the EASIER project to allow for new languages to make use of such technologies in the future. What are the characteristics of sign language resources needed to train recognition, translation, and synthesis algorithms, and how can other countries including those without any sign resources follow along with these developments? The efforts undertaken in EASIER include creating workflow documents and organizing training sessions in online workshops. They reflect the current state of the art, and will likely need to be updated in the coming decade.

2020

pdf bib
Extending the Public DGS Corpus in Size and Depth
Thomas Hanke | Marc Schulder | Reiner Konrad | Elena Jahn
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

In 2018 the DGS-Korpus project published the first full release of the Public DGS Corpus. This event marked a change of focus for the project. While before most attention had been on increasing the size of the corpus, now an increase in its depth became the priority. New data formats were added, corpus annotation conventions were released and OpenPose pose information was published for all transcripts. The community and research portal websites of the corpus also received upgrades, including persistent identifiers, archival copies of previous releases and improvements to their usability on mobile devices. The research portal was enhanced even further, improving its transcript web viewer, adding a KWIC concordance view, introducing cross-references to other linguistic resources of DGS and making its entire interface available in German in addition to English. This article provides an overview of these changes, chronicling the evolution of the Public DGS Corpus from its first release in 2018, through its second release in 2019 until its third release in 2020.

pdf bib
Collocations in Sign Language Lexicography: Towards Semantic Abstractions for Word Sense Discrimination
Gabriele Langer | Marc Schulder
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

In general monolingual lexicography a corpus-based approach to word sense discrimination (WSD) is the current standard. Automatically generated lexical profiles such as Word Sketches provide an overview on typical uses in the form of collocate lists grouped by their part of speech categories and their syntactic dependency relations to the base item. Collocates are sorted by their typicality according to frequency-based rankings. With the advancement of sign language (SL) corpora, SL lexicography can finally be based on actual language use as reflected in corpus data. In order to use such data effectively and gain new insights on sign usage, automatically generated collocation profiles need to be developed under the special conditions and circumstances of the SL data available. One of these conditions is that many of the prerequesites for the automatic syntactic parsing of corpora are not yet available for SL. In this article we describe a collocation summary generated from DGS Corpus data which is used for WSD as well as in entry-writing. The summary works based on the glosses used for lemmatisation. In addition, we explore how other resources can be utilised to add an additional layer of semantic grouping to the collocation analysis. For this experimental approach we use glosses, concepts, and wordnet supersenses.

pdf bib
Enhancing a Lexicon of Polarity Shifters through the Supervised Classification of Shifting Directions
Marc Schulder | Michael Wiegand | Josef Ruppenhofer
Proceedings of the Twelfth Language Resources and Evaluation Conference

The sentiment polarity of an expression (whether it is perceived as positive, negative or neutral) can be influenced by a number of phenomena, foremost among them negation. Apart from closed-class negation words like “no”, “not” or “without”, negation can also be caused by so-called polarity shifters. These are content words, such as verbs, nouns or adjectives, that shift polarities in their opposite direction, e.g. “abandoned” in “abandoned hope” or “alleviate” in “alleviate pain”. Many polarity shifters can affect both positive and negative polar expressions, shifting them towards the opposing polarity. However, other shifters are restricted to a single shifting direction. “Recoup” shifts negative to positive in “recoup your losses”, but does not affect the positive polarity of “fortune” in “recoup a fortune”. Existing polarity shifter lexica only specify whether a word can, in general, cause shifting, but they do not specify when this is limited to one shifting direction. To address this issue we introduce a supervised classifier that determines the shifting direction of shifters. This classifier uses both resource-driven features, such as WordNet relations, and data-driven features like in-context polarity conflicts. Using this classifier we enhance the largest available polarity shifter lexicon.

pdf bib
ATC-ANNO: Semantic Annotation for Air Traffic Control with Assistive Auto-Annotation
Marc Schulder | Johannah O’Mahony | Yury Bakanouski | Dietrich Klakow
Proceedings of the Twelfth Language Resources and Evaluation Conference

In air traffic control, assistant systems support air traffic controllers in their work. To improve the reactivity and accuracy of the assistant, automatic speech recognition can monitor the commands uttered by the controller. However, to provide sufficient training data for the speech recognition system, many hours of air traffic communications have to be transcribed and semantically annotated. For this purpose we developed the annotation tool ATC-ANNO. It provides a number of features to support the annotator in their task, such as auto-complete suggestions for semantic tags, access to preliminary speech recognition predictions, syntax highlighting and consistency indicators. Its core assistive feature, however, is its ability to automatically generate semantic annotations. Although it is based on a simple hand-written finite state grammar, it is also able to annotate sentences that deviate from this grammar. We evaluate the impact of different features on annotator efficiency and find that automatic annotation allows annotators to cover four times as many utterances in the same time.

2018

pdf bib
Automatically Creating a Lexicon of Verbal Polarity Shifters: Mono- and Cross-lingual Methods for German
Marc Schulder | Michael Wiegand | Josef Ruppenhofer
Proceedings of the 27th International Conference on Computational Linguistics

In this paper we use methods for creating a large lexicon of verbal polarity shifters and apply them to German. Polarity shifters are content words that can move the polarity of a phrase towards its opposite, such as the verb “abandon” in “abandon all hope”. This is similar to how negation words like “not” can influence polarity. Both shifters and negation are required for high precision sentiment analysis. Lists of negation words are available for many languages, but the only language for which a sizable lexicon of verbal polarity shifters exists is English. This lexicon was created by bootstrapping a sample of annotated verbs with a supervised classifier that uses a set of data- and resource-driven features. We reproduce and adapt this approach to create a German lexicon of verbal polarity shifters. Thereby, we confirm that the approach works for multiple languages. We further improve classification by leveraging cross-lingual information from the English shifter lexicon. Using this improved approach, we bootstrap a large number of German verbal polarity shifters, reducing the annotation effort drastically. The resulting German lexicon of verbal polarity shifters is made publicly available.

pdf bib
Introducing a Lexicon of Verbal Polarity Shifters for English
Marc Schulder | Michael Wiegand | Josef Ruppenhofer | Stephanie Köser
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features
Marc Schulder | Michael Wiegand | Josef Ruppenhofer | Benjamin Roth
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a major step towards the creation of the first high-coverage lexicon of polarity shifters. In this work, we bootstrap a lexicon of verbs by exploiting various linguistic features. Polarity shifters, such as “abandon”, are similar to negations (e.g. “not”) in that they move the polarity of a phrase towards its inverse, as in “abandon all hope”. While there exist lists of negation words, creating comprehensive lists of polarity shifters is far more challenging due to their sheer number. On a sample of manually annotated verbs we examine a variety of linguistic features for this task. Then we build a supervised classifier to increase coverage. We show that this approach drastically reduces the annotation effort while ensuring a high-precision lexicon. We also show that our acquired knowledge of verbal polarity shifters improves phrase-level sentiment analysis.

2016

pdf bib
Separating Actor-View from Speaker-View Opinion Expressions using Linguistic Features
Michael Wiegand | Marc Schulder | Josef Ruppenhofer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Opinion Holder and Target Extraction for Verb-based Opinion Predicates – The Problem is Not Solved
Michael Wiegand | Marc Schulder | Josef Ruppenhofer
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2014

pdf bib
Metaphor Detection through Term Relevance
Marc Schulder | Eduard Hovy
Proceedings of the Second Workshop on Metaphor in NLP