Josef Ruppenhofer

2025

pdf bib abs
Beyond Negative Stereotypes – Non-Negative Abusive Utterances about Identity Groups and Their Semantic Variants
Tina Lommel | Elisabeth Eder | Josef Ruppenhofer | Michael Wiegand
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study a subtype of implicitly abusive language, namely non-negative sentences about identity groups (e.g. “Women make good cooks”), and introduce a novel dataset of such utterances. Not only do we profile such abusive sentences, but since our dataset includes different semantic variants of the same characteristic attributed to an identity group, we can also systematically study the impact of varying degrees of generalization and perspective framing. Similarly, we switch identity groups to assess whether the characteristic described in a sentence is inherently abusive. We also report on classification experiments.

pdf bib abs
Where it’s at: Annotating Verb Placement Types in Learner Language
Josef Ruppenhofer | Annette Portmann | Matthias Schwendemann | Christine Renker | Katrin Wisniewski | Torsten Zesch
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)

The annotation of learner language is an often ambiguous and challenging task. It is therefore surprising that in Second Language Acquisition research, information on annotation quality is hardly ever published. This is also true for verb placement, a linguistic feature that has re- ceived much attention within SLA. This paper presents an annotation on verb placement in German learner texts at different proficiency levels. We argue that as part of the annotation process target hypotheses should be provided as ancillary annotations that make explicit each annotator’s interpretation of a learner sentence. Our study demonstrates that verb placement can be annotated with high agreement between multiple annotators, for texts at all proficiency levels and across sentences of varying complex- ity. We release our corpus with annotations by four annotators on more than 600 finite clauses sampled across 5 CEFR levels.

2024

pdf bib abs
Oddballs and Misfits: Detecting Implicit Abuse in Which Identity Groups are Depicted as Deviating from the Norm
Michael Wiegand | Josef Ruppenhofer
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We address the task of detecting abusive sentences in which identity groups are depicted as deviating from the norm (e.g. Gays sprinkle flour over their gardens for good luck). These abusive utterances need not be stereotypes or negative in sentiment. We introduce the first dataset for this task. It is created via crowdsourcing and includes 7 identity groups. We also report on classification experiments.

pdf bib abs
Every Verb in Its Right Place? A Roadmap for Operationalizing Developmental Stages in the Acquisition of L2 German
Josef Ruppenhofer | Matthias Schwendemann | Annette Portmann | Katrin Wisniewski | Torsten Zesch
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Developmental stages are a linguistic concept claiming that language learning, despite its large inter-individual variance, generally progresses in an ordered, step-like manner. At the core of research has been the acquisition of verb placement by learners, as conceptualized within Processability Theory (Pienemann, 1989). The computational implementation of a system detecting developmental stages is a prerequisite for an automated analysis of L2 language development. However, such an implementation faces two main challenges. The first is the lack of a fully fleshed out, coherent linguistic specification of the stages. The second concerns the translation of the linguistic specification into computational procedures that can extract clauses from learner-produced text and assign them to a developmental stage based on verb placement. Our contribution provides the necessary linguistic specification of the stages as well as detaiiled discussion and recommendations regarding computational implementation.

pdf bib abs
Out of the Mouths of MPs: Speaker Attribution in Parliamentary Debates
Ines Rehbein | Josef Ruppenhofer | Annelen Brunner | Simone Paolo Ponzetto
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents GePaDe_SpkAtt , a new corpus for speaker attribution in German parliamentary debates, with more than 7,700 manually annotated events of speech, thought and writing. Our role inventory includes the sources, addressees, messages and topics of the speech event and also two additional roles, medium and evidence. We report baseline results for the automatic prediction of speech events and their roles, with high scores for both, event triggers and roles. Then we apply our model to predict speech events in 20 years of parliamentary debates and investigate the use of factives in the rhetoric of MPs.

2023

pdf bib abs
Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language
Michael Wiegand | Jana Kampfmeier | Elisabeth Eder | Josef Ruppenhofer
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We address the task of identifying euphemistic abuse (e.g. “You inspire me to fall asleep”) paraphrasing simple explicitly abusive utterances (e.g. “You are boring”). For this task, we introduce a novel dataset that has been created via crowdsourcing. Special attention has been paid to the generation of appropriate negative (non-abusive) data. We report on classification experiments showing that classifiers trained on previous datasets are less capable of detecting such abuse. Best automatic results are obtained by a classifier that augments training data from our new dataset with automatically-generated GPT-3 completions. We also present a classifier that combines a few manually extracted features that exemplify the major linguistic phenomena constituting euphemistic abuse.

2022

pdf bib abs
Who’s in, who’s out? Predicting the Inclusiveness or Exclusiveness of Personal Pronouns in Parliamentary Debates
Ines Rehbein | Josef Ruppenhofer
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents a compositional annotation scheme to capture the clusivity properties of personal pronouns in context, that is their ability to construct and manage in-groups and out-groups by including/excluding the audience and/or non-speech act participants in reference to groups that also include the speaker. We apply and test our schema on pronoun instances in speeches taken from the German parliament. The speeches cover a time period from 2017-2021 and comprise manual annotations for 3,126 sentences. We achieve high inter-annotator agreement for our new schema, with a Cohen’s κ in the range of 89.7-93.2 and a percentage agreement of > 96%. Our exploratory analysis of in/exclusive pronoun use in the parliamentary setting provides some face validity for our new schema. Finally, we present baseline experiments for automatically predicting clusivity in political debates, with promising results for many referential constellations, yielding an overall 84.9% micro F1 for all pronouns.

pdf bib abs
Identifying Implicitly Abusive Remarks about Identity Groups using a Linguistically Informed Approach
Michael Wiegand | Elisabeth Eder | Josef Ruppenhofer
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We address the task of distinguishing implicitly abusive sentences on identity groups (“Muslims contaminate our planet”) from other group-related negative polar sentences (“Muslims despise terrorism”). Implicitly abusive language are utterances not conveyed by abusive words (e.g. “bimbo” or “scum”). So far, the detection of such utterances could not be properly addressed since existing datasets displaying a high degree of implicit abuse are fairly biased. Following the recently-proposed strategy to solve implicit abuse by separately addressing its different subtypes, we present a new focused and less biased dataset that consists of the subtype of atomic negative sentences about identity groups. For that task, we model components that each address one facet of such implicit abuse, i.e. depiction as perpetrators, aspectual classification and non-conformist views. The approach generalizes across different identity groups and languages.

2021

pdf bib abs
Implicitly Abusive Comparisons – A New Dataset and Linguistic Analysis
Michael Wiegand | Maja Geulig | Josef Ruppenhofer
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We examine the task of detecting implicitly abusive comparisons (e.g. “Your hair looks like you have been electrocuted”). Implicitly abusive comparisons are abusive comparisons in which abusive words (e.g. “dumbass” or “scum”) are absent. We detail the process of creating a novel dataset for this task via crowdsourcing that includes several measures to obtain a sufficiently representative and unbiased set of comparisons. We also present classification experiments that include a range of linguistic features that help us better understand the mechanisms underlying abusive comparisons.

pdf bib abs
Exploiting Emojis for Abusive Language Detection
Michael Wiegand | Josef Ruppenhofer
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We propose to use abusive emojis, such as the “middle finger” or “face vomiting”, as a proxy for learning a lexicon of abusive words. Since it represents extralinguistic information, a single emoji can co-occur with different forms of explicitly abusive utterances. We show that our approach generates a lexicon that offers the same performance in cross-domain classification of abusive microposts as the most advanced lexicon induction method. Such an approach, in contrast, is dependent on manually annotated seed words and expensive lexical resources for bootstrapping (e.g. WordNet). We demonstrate that the same emojis can also be effectively used in languages other than English. Finally, we also show that emojis can be exploited for classifying mentions of ambiguous words, such as “fuck” and “bitch”, into generally abusive and just profane usages.

pdf bib
Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates
Ines Rehbein | Josef Ruppenhofer | Julian Bernauer
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

pdf bib abs
Implicitly Abusive Language – What does it actually look like and why are we not getting there?
Michael Wiegand | Josef Ruppenhofer | Elisabeth Eder
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Abusive language detection is an emerging field in natural language processing which has received a large amount of attention recently. Still the success of automatic detection is limited. Particularly, the detection of implicitly abusive language, i.e. abusive language that is not conveyed by abusive words (e.g. dumbass or scum), is not working well. In this position paper, we explain why existing datasets make learning implicit abuse difficult and what needs to be changed in the design of such datasets. Arguing for a divide-and-conquer strategy, we present a list of subtypes of implicitly abusive language and formulate research tasks and questions for future research.

2020

pdf bib abs
Fine-grained Named Entity Annotations for German Biographic Interviews
Josef Ruppenhofer | Ines Rehbein | Carolina Flinz
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a fine-grained NER annotations with 30 labels and apply it to German data. Building on the OntoNotes 5.0 NER inventory, our scheme is adapted for a corpus of transcripts of biographic interviews by adding categories for AGE and LAN(guage) and also features extended numeric and temporal categories. Applying the scheme to the spoken data as well as a collection of teaser tweets from newspaper sites, we can confirm its generality for both domains, also achieving good inter-annotator agreement. We also show empirically how our inventory relates to the well-established 4-category NER inventory by re-annotating a subset of the GermEval 2014 NER coarse-grained dataset with our fine label inventory. Finally, we use a BERT-based system to establish some baseline models for NER tagging on our two new datasets. Global results in in-domain testing are quite high on the two datasets, near what was achieved for the coarse inventory on the CoNLLL2003 data. Cross-domain testing produces much lower results due to the severe domain differences.

pdf bib abs
Doctor Who? Framing Through Names and Titles in German
Esther van den Berg | Katharina Korfhage | Josef Ruppenhofer | Michael Wiegand | Katja Markert
Proceedings of the Twelfth Language Resources and Evaluation Conference

Entity framing is the selection of aspects of an entity to promote a particular viewpoint towards that entity. We investigate entity framing of political figures through the use of names and titles in German online discourse, enhancing current research in entity framing through titling and naming that concentrates on English only. We collect tweets that mention prominent German politicians and annotate them for stance. We find that the formality of naming in these tweets correlates positively with their stance. This confirms sociolinguistic observations that naming and titling can have a status-indicating function and suggests that this function is dominant in German tweets mentioning political figures. We also find that this status-indicating function is much weaker in tweets from users that are politically left-leaning than in tweets by right-leaning users. This is in line with observations from moral psychology that left-leaning and right-leaning users assign different importance to maintaining social hierarchies.

pdf bib abs
Enhancing a Lexicon of Polarity Shifters through the Supervised Classification of Shifting Directions
Marc Schulder | Michael Wiegand | Josef Ruppenhofer
Proceedings of the Twelfth Language Resources and Evaluation Conference

The sentiment polarity of an expression (whether it is perceived as positive, negative or neutral) can be influenced by a number of phenomena, foremost among them negation. Apart from closed-class negation words like “no”, “not” or “without”, negation can also be caused by so-called polarity shifters. These are content words, such as verbs, nouns or adjectives, that shift polarities in their opposite direction, e.g. “abandoned” in “abandoned hope” or “alleviate” in “alleviate pain”. Many polarity shifters can affect both positive and negative polar expressions, shifting them towards the opposing polarity. However, other shifters are restricted to a single shifting direction. “Recoup” shifts negative to positive in “recoup your losses”, but does not affect the positive polarity of “fortune” in “recoup a fortune”. Existing polarity shifter lexica only specify whether a word can, in general, cause shifting, but they do not specify when this is limited to one shifting direction. To address this issue we introduce a supervised classifier that determines the shifting direction of shifters. This classifier uses both resource-driven features, such as WordNet relations, and data-driven features like in-context polarity conflicts. Using this classifier we enhance the largest available polarity shifter lexicon.

The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.

pdf bib abs
A New Resource for German Causal Language
Ines Rehbein | Josef Ruppenhofer
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a new resource for German causal language, with annotations in context for verbs, nouns and prepositions. Our dataset includes 4,390 annotated instances for more than 150 different triggers. The annotation scheme distinguishes three different types of causal events (CONSEQUENCE , MOTIVATION, PURPOSE). We also provide annotations for semantic roles, i.e. of the cause and effect for the causal event as well as the actor and affected party, if present. In the paper, we present inter-annotator agreement scores for our dataset and discuss problems for annotating causal language. Finally, we present experiments where we frame causal annotation as a sequence labelling problem and report baseline results for the prediciton of causal arguments and for predicting different types of causation.

pdf bib abs
Improving Sentence Boundary Detection for Spoken Language Transcripts
Ines Rehbein | Josef Ruppenhofer | Thomas Schmidt
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents experiments on sentence boundary detection in transcripts of spoken dialogues. Segmenting spoken language into sentence-like units is a challenging task, due to disfluencies, ungrammatical or fragmented structures and the lack of punctuation. In addition, one of the main bottlenecks for many NLP applications for spoken language is the small size of the training data, as the transcription and annotation of spoken language is by far more time-consuming and labour-intensive than processing written language. We therefore investigate the benefits of data expansion and transfer learning and test different ML architectures for this task. Our results show that data expansion is not straightforward and even data from the same domain does not always improve results. They also highlight the importance of modelling, i.e. of finding the best architecture and data representation for the task at hand. For the detection of boundaries in spoken language transcripts, we achieve a substantial improvement when framing the boundary detection problem assentence pair classification task, as compared to a sequence tagging approach.

pdf bib abs
I’ve got a construction looks funny – representing and recovering non-standard constructions in UD
Josef Ruppenhofer | Ines Rehbein
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

The UD framework defines guidelines for a crosslingual syntactic analysis in the framework of dependency grammar, with the aim of providing a consistent treatment across languages that not only supports multilingual NLP applications but also facilitates typological studies. Until now, the UD framework has mostly focussed on bilexical grammatical relations. In the paper, we propose to add a constructional perspective and discuss several examples of spoken-language constructions that occur in multiple languages and challenge the current use of basic and enhanced UD relations. The examples include cases where the surface relations are deceptive, and syntactic amalgams that either involve unconnected subtrees or structures with multiply-headed dependents. We argue that a unified treatment of constructions across languages will increase the consistency of the UD annotations and thus the quality of the treebanks for linguistic analysis.

2019

pdf bib abs
Detection of Abusive Language: the Problem of Biased Datasets
Michael Wiegand | Josef Ruppenhofer | Thomas Kleinbauer
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We discuss the impact of data bias on abusive language detection. We show that classification scores on popular datasets reported in previous work are much lower under realistic settings in which this bias is reduced. Such biases are most notably observed on datasets that are created by focused sampling instead of random sampling. Datasets with a higher proportion of implicit abuse are more affected than datasets with a lower proportion.

pdf bib abs
Detecting Derogatory Compounds – An Unsupervised Approach
Michael Wiegand | Maximilian Wolf | Josef Ruppenhofer
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We examine the new task of detecting derogatory compounds (e.g. “curry muncher”). Derogatory compounds are much more difficult to detect than derogatory unigrams (e.g. “idiot”) since they are more sparsely represented in lexical resources previously found effective for this task (e.g. Wiktionary). We propose an unsupervised classification approach that incorporates linguistic properties of compounds. It mostly depends on a simple distributional representation. We compare our approach against previously established methods proposed for extracting derogatory unigrams.

pdf bib abs
Not My President: How Names and Titles Frame Political Figures
Esther van den Berg | Katharina Korfhage | Josef Ruppenhofer | Michael Wiegand | Katja Markert
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

Naming and titling have been discussed in sociolinguistics as markers of status or solidarity. However, these functions have not been studied on a larger scale or for social media data. We collect a corpus of tweets mentioning presidents of six G20 countries by various naming forms. We show that naming variation relates to stance towards the president in a way that is suggestive of a framing effect mediated by respectfulness. This confirms sociolinguistic theory of naming and titling as markers of status.

pdf bib
tweeDe – A Universal Dependencies treebank for German tweets
Ines Rehbein | Josef Ruppenhofer | Bich-Ngoc Do
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

2018

pdf bib abs
Sprucing up the trees – Error detection in treebanks
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 27th International Conference on Computational Linguistics

We present a method for detecting annotation errors in manually and automatically annotated dependency parse trees, based on ensemble parsing in combination with Bayesian inference, guided by active learning. We evaluate our method in different scenarios: (i) for error detection in dependency treebanks and (ii) for improving parsing accuracy on in- and out-of-domain data.

pdf bib abs
Automatically Creating a Lexicon of Verbal Polarity Shifters: Mono- and Cross-lingual Methods for German
Marc Schulder | Michael Wiegand | Josef Ruppenhofer
Proceedings of the 27th International Conference on Computational Linguistics

In this paper we use methods for creating a large lexicon of verbal polarity shifters and apply them to German. Polarity shifters are content words that can move the polarity of a phrase towards its opposite, such as the verb “abandon” in “abandon all hope”. This is similar to how negation words like “not” can influence polarity. Both shifters and negation are required for high precision sentiment analysis. Lists of negation words are available for many languages, but the only language for which a sizable lexicon of verbal polarity shifters exists is English. This lexicon was created by bootstrapping a sample of annotated verbs with a supervised classifier that uses a set of data- and resource-driven features. We reproduce and adapt this approach to create a German lexicon of verbal polarity shifters. Thereby, we confirm that the approach works for multiple languages. We further improve classification by leveraging cross-lingual information from the English shifter lexicon. Using this improved approach, we bootstrap a large number of German verbal polarity shifters, reducing the annotation effort drastically. The resulting German lexicon of verbal polarity shifters is made publicly available.

pdf bib abs
Distinguishing affixoid formations from compounds
Josef Ruppenhofer | Michael Wiegand | Rebecca Wilm | Katja Markert
Proceedings of the 27th International Conference on Computational Linguistics

We study German affixoids, a type of morpheme in between affixes and free stems. Several properties have been associated with them – increased productivity; a bleached semantics, which is often evaluative and/or intensifying and thus of relevance to sentiment analysis; and the existence of a free morpheme counterpart – but not been validated empirically. In experiments on a new data set that we make available, we put these key assumptions from the morphological literature to the test and show that despite the fact that affixoids generate many low-frequency formations, we can classify these as affixoid or non-affixoid instances with a best F1-score of 74%.

pdf bib
Disambiguation of Verbal Shifters
Michael Wiegand | Sylvette Loda | Josef Ruppenhofer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Introducing a Lexicon of Verbal Polarity Shifters for English
Marc Schulder | Michael Wiegand | Josef Ruppenhofer | Stephanie Köser
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Building a Morphological Treebank for German from a Linguistic Database
Petra Steiner | Josef Ruppenhofer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Inducing a Lexicon of Abusive Words – a Feature-Based Approach
Michael Wiegand | Josef Ruppenhofer | Anna Schmidt | Clayton Greenberg
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the cross-domain detection of abusive microposts.

2017

pdf bib abs
Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features
Marc Schulder | Michael Wiegand | Josef Ruppenhofer | Benjamin Roth
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a major step towards the creation of the first high-coverage lexicon of polarity shifters. In this work, we bootstrap a lexicon of verbs by exploiting various linguistic features. Polarity shifters, such as “abandon”, are similar to negations (e.g. “not”) in that they move the polarity of a phrase towards its inverse, as in “abandon all hope”. While there exist lists of negation words, creating comprehensive lists of polarity shifters is far more challenging due to their sheer number. On a sample of manually annotated verbs we examine a variety of linguistic features for this task. Then we build a supervised classifier to increase coverage. We show that this approach drastically reduces the annotation effort while ensuring a high-precision lexicon. We also show that our acquired knowledge of verbal polarity shifters improves phrase-level sentiment analysis.

pdf bib abs
Detecting annotation noise in automatically labelled data
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.

pdf bib abs
Evaluating the morphological compositionality of polarity
Josef Ruppenhofer | Petra Steiner | Michael Wiegand
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Unknown words are a challenge for any NLP task, including sentiment analysis. Here, we evaluate the extent to which sentiment polarity of complex words can be predicted based on their morphological make-up. We do this on German as it has very productive processes of derivation and compounding and many German hapax words, which are likely to bear sentiment, are morphologically complex. We present results of supervised classification experiments on new datasets with morphological parses and polarity annotations.

pdf bib abs
Catching the Common Cause: Extraction and Annotation of Causal Relations and their Participants
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 11th Linguistic Annotation Workshop

In this paper, we present a simple, yet effective method for the automatic identification and extraction of causal relations from text, based on a large English-German parallel corpus. The goal of this effort is to create a lexical resource for German causal relations. The resource will consist of a lexicon that describes constructions that trigger causality as well as the participants of the causal event, and will be augmented by a corpus with annotated instances for each entry, that can be used as training data to develop a system for automatic classification of causal relations. Focusing on verbs, our method harvested a set of 100 different lexical triggers of causality, including support verb constructions. At the moment, our corpus includes over 1,000 annotated instances. The lexicon and the annotated data will be made available to the research community.

2016

pdf bib abs
Effect Functors for Opinion Inference
Josef Ruppenhofer | Jasper Brandes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Sentiment analysis has so far focused on the detection of explicit opinions. However, of late implicit opinions have received broader attention, the key idea being that the evaluation of an event type by a speaker depends on how the participants in the event are valued and how the event itself affects the participants. We present an annotation scheme for adding relevant information, couched in terms of so-called effect functors, to German lexical items. Our scheme synthesizes and extends previous proposals. We report on an inter-annotator agreement study. We also present results of a crowdsourcing experiment to test the utility of some known and some new functors for opinion inference where, unlike in previous work, subjects are asked to reason from event evaluation to participant evaluation.

pdf bib
Separating Actor-View from Speaker-View Opinion Expressions using Linguistic Features
Michael Wiegand | Marc Schulder | Josef Ruppenhofer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Opinion Holder and Target Extraction on Opinion Compounds – A Linguistic Approach
Michael Wiegand | Christine Bocionek | Josef Ruppenhofer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss' multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.

pdf bib abs
Yes we can!? Annotating English modal verbs
Josef Ruppenhofer | Ines Rehbein
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.

pdf bib
Semantic frames as an anchor representation for sentiment analysis
Josef Ruppenhofer | Ines Rehbein
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

2011

pdf bib
Evaluating the Impact of Coder Errors on Active Learning
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
In Search of Missing Arguments: A Linguistic Approach
Josef Ruppenhofer | Philip Gorinski | Caroline Sporleder
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Learning Script Participants from Unlabeled Data
Michaela Regneri | Alexander Koller | Josef Ruppenhofer | Manfred Pinkal
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Bringing Active Learning to Life
Ines Rehbein | Josef Ruppenhofer | Alexis Palmer
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib abs
Speaker Attribution in Cabinet Protocols
Josef Ruppenhofer | Caroline Sporleder | Fabian Shirokov
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Historical cabinet protocols are a useful resource which enable historians to identify the opinions expressed by politicians on different subjects and at different points of time. While cabinet protocols are often available in digitized form, so far the only method to access their information content is by keyword-based search, which often returns sub-optimal results. We present a method for enriching German cabinet protocols with information about the originators of statements. This requires automatic speaker attribution. Unlike many other approaches, our method can also deal with cases in which the speaker is not explicitly identified in the sentence itself. Such cases are very common in our domain. To avoid costly manual annotation of training data, we design a rule-based system which exploits morpho-syntactic cues. We show that such a system obtains good results, especially with respect to recall which is particularly important for information access.

pdf bib abs
Generating FrameNets of Various Granularities: The FrameNet Transformer
Josef Ruppenhofer | Jonas Sunde | Manfred Pinkal
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a method and a software tool, the FrameNet Transformer, for deriving customized versions of the FrameNet database based on frame and frame element relations. The FrameNet Transformer allows users to iteratively coarsen the FrameNet sense inventory in two ways. First, the tool can merge entire frames that are related by user-specified relations. Second, it can merge word senses that belong to frames related by specified relations. Both methods can be interleaved. The Transformer automatically outputs format-compliant FrameNet versions, including modified corpus annotation files that can be used for automatic processing. The customized FrameNet versions can be used to determine which granularity is suitable for particular applications. In our evaluation of the tool, we show that our method increases accuracy of statistical semantic parsers by reducing the number of word-senses (frames) per lemma, and increasing the number of annotated sentences per lexical unit and frame. We further show in an experiment on the FATE corpus that by coarsening FrameNet we do not incur a significant loss of information that is relevant to the Recognizing Textual Entailment task.

pdf bib abs
There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task
Ines Rehbein | Josef Ruppenhofer
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb ""drohen"" (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.

pdf bib
SemEval-2010 Task 10: Linking Events and Their Participants in Discourse
Josef Ruppenhofer | Caroline Sporleder | Roser Morante | Collin Baker | Martha Palmer
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation
Ines Rehbein | Josef Ruppenhofer | Caroline Sporleder
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

2008

pdf bib
Discourse Level Opinion Interpretation
Swapna Somasundaran | Janyce Wiebe | Josef Ruppenhofer
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib abs
Finding the Sources and Targets of Subjective Expressions
Josef Ruppenhofer | Swapna Somasundaran | Janyce Wiebe
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

As many popular text genres such as blogs or news contain opinions by multiple sources and about multiple targets, finding the sources and targets of subjective expressions becomes an important sub-task for automatic opinion analysis systems. We argue that while automatic semantic role labeling systems (ASRL) have an important contribution to make, they cannot solve the problem for all cases. Based on the experience of manually annotating opinions, sources, and targets in various genres, we present linguistic phenomena that require knowledge beyond that of ASRL systems. In particular, we address issues relating to the attribution of opinions to sources; sources and targets that are realized as zero-forms; and inferred opinions. We also discuss in some depth that for arguing attitudes we need to be able to recover propositions and not only argued-about entities. A recurrent theme of the discussion is that close attention to specific discourse contexts is needed to identify sources and targets correctly.

pdf bib
Discourse Level Opinion Relations: An Annotation Study
Swapna Somasundaran | Josef Ruppenhofer | Janyce Wiebe
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue