Ines Rehbein

2025

Investigating the effectiveness of Data Augmentation and Contrastive Learning for Named Entity Recognition
Noel Chia | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Data Augmentation (DA) and Contrastive Learning (CL) are widely used in NLP, but their potential for NER has not yet been investigated in detail. Existing work is mostly limited to zero- and few-shot scenarios where improvements over the baseline are easy to obtain. In this paper, we address this research gap by presenting a systematic evaluation of DA for NER on small, medium-sized and large datasets with coarse and fine-grained labels. We report results for a) DA only, b) DA in combination with supervised contrastive learning, and c) DA with transfer learning. Our results show that DA on its own fails to improve results over the baseline and that supervised CL works better on larger datasets while transfer learning is beneficial if the target dataset is very small. Finally, we investigate how contrastive learning affects the learned representations, based on dimensionality reduction and visualisation techniques, and show that CL mostly helps to separate named entities from non-entities.

pdf bib abs

Moral Framing in Politics (MFiP): A new resource and models for moral framing
Ines Rehbein | Ines Reinig | Simone Paolo Ponzetto
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The construct of morality permeates our entire lives and influences our behavior and how we perceive others. It therefore comes at no surprise that morality also plays an important role in politics, as morally framed arguments are perceived as more appealing and persuasive. Thus, being able to identify moral framing in political communication and to detect subtle differences in politicians’ moral framing can provide the basis for many interesting analyses in the political sciences. In the paper, we release MoralFramingInPolitics (MFiP), a new corpus of German parliamentary debates where the speakers’ moral framing has been coded, using the framework of Moral Foundations Theory (MFT). Our fine-grained annotations distinguish different types of moral frames and also include narrative roles, together with the moral foundations for each frame. We then present models for frame type and moral foundation classification and explore the benefits of data augmentation (DA) and contrastive learning (CL) for the two tasks. All data and code will be made available to the research community.

pdf bib abs

Moral reckoning: How reliable are dictionary-based methods for examining morality in text?
Ines Rehbein | Lilly Brauner | Florian Ertz | Ines Reinig | Simone Ponzetto
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Due to their availability and ease of use, dictionary-based measures of moral values are a popular tool for text-based analyses of morality that examine human attitudes and behaviour across populations and cultures. In this paper, we revisit the construct validity of different dictionary-based measures of morality in text that have been proposed in the literature. We discuss conceptual challenges for text-based measures of morality and present an annotation experiment where we create a new dataset with human annotations of moral rhetoric in German political manifestos. We compare the results of our human annotations with different measures of moral values, showing that none of them is able to capture the trends observed by trained human coders. Our findings have far-reaching implications for the application of moral dictionaries in the digital humanities.

pdf bib

Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
Siyao Peng | Ines Rehbein
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)

pdf bib

Americans are dreamers – Generic statements and stereotyping in political tweets
Ines Rehbein
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops

2024

pdf bib abs

A Survey on Modelling Morality for Text Analysis
Ines Reinig | Maria Becker | Ines Rehbein | Simone Ponzetto
Findings of the Association for Computational Linguistics: ACL 2024

In this survey, we provide a systematic review of recent work on modelling morality in text, an area of research that has garnered increasing attention in recent years. Our survey is motivated by the importance of modelling decisions on the created resources, the models trained on these resources and the analyses that result from the models’ predictions. We review work at the interface of NLP, Computational Social Science and Psychology and give an overview of the different goals and research questions addressed in the papers, their underlying theoretical backgrounds and the methods that have been applied to pursue these goals. We then identify and discuss challenges and research gaps, such as the lack of a theoretical framework underlying the operationalisation of morality in text, the low IAA reported for manyhuman-annotated resulting resources and the lack of validation of newly proposed resources and analyses.

pdf bib abs

Resources and Methods for Analysing Political Rhetoric and Framing in Parliamentary Debates
Ines Rehbein
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

Recent work in political science has made exten- sive use of NLP methods to produce evidential sup- port for a variety of analyses, for example, inferring an actor’s ideological positions from textual data or identifying the polarisation of the political discourse over the last decades. Most work has employed variations of lexical features extracted from text or has learned latent representations in a mostly un- supervised manner. While such approaches have the potential to enable political analyses at scale, they are often limited by their lack of interpretabil- ity. In the talk, I will instead look at semantic and pragmatic representations of political rhethoric and ideological framing and present several case stud- ies that showcase how linguistic annotation and the use of NLP methods can help to investigate dif- ferent framing strategies in parliamentary debates. The first part of the talk investigates populist framing strategies, specifically, the use of pronouns to create in- and out-groups and the identification of people-centric messages. The second part of the presentation focusses on framing strategies on the pragmatic level.

pdf bib abs

How to Do Politics with Words: Investigating Speech Acts in Parliamentary Debates
Ines Reinig | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a new perspective on framing through the lens of speech acts and investigates how politicians make use of different pragmatic speech act functions in political debates. To that end, we created a new resource of German parliamentary debates, annotated with fine-grained speech act types. Our hierarchical annotation scheme distinguishes between cooperation and conflict communication, further structured into six subtypes, such as informative, declarative or argumentative-critical speech acts, with 14 fine-grained classes at the lowest level. We present classification baselines on our new data and show that the fine-grained classes in our schema can be predicted with an avg. F1 of around 82.0%. We then use our classifier to analyse the use of speech acts in a large corpus of parliamentary debates over a time span from 2003–2023.

pdf bib abs

A new Resource and Baselines for Opinion Role Labelling in German Parliamentary Debates
Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

Detecting opinions, their holders and targets in parliamentary debates provides an interesting layer of analysis, for example, to identify frequent targets of opinions for specific topics, actors or parties. In the paper, we present GePaDe-ORL, a new dataset for German parliamentary debates where subjective expressions, their opinion holders and targets have been annotated. We describe the annotation process and report baselines for predicting those annotations in our new dataset.

pdf bib abs

Out of the Mouths of MPs: Speaker Attribution in Parliamentary Debates
Ines Rehbein | Josef Ruppenhofer | Annelen Brunner | Simone Paolo Ponzetto
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents GePaDe_SpkAtt , a new corpus for speaker attribution in German parliamentary debates, with more than 7,700 manually annotated events of speech, thought and writing. Our role inventory includes the sources, addressees, messages and topics of the speech event and also two additional roles, medium and evidence. We report baseline results for the automatic prediction of speech events and their roles, with high scores for both, event triggers and roles. Then we apply our model to predict speech events in 20 years of parliamentary debates and investigate the use of factives in the rhetoric of MPs.

pdf bib

Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers
Christopher Klamm | Gabriella Lapesa | Simone Paolo Ponzetto | Ines Rehbein | Indira Sen
Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers

2023

pdf bib abs

Our kind of people? Detecting populist references in political debates
Christopher Klamm | Ines Rehbein | Simone Paolo Ponzetto
Findings of the Association for Computational Linguistics: EACL 2023

This paper investigates the identification of populist rhetoric in text and presents a novel cross-lingual dataset for this task. Our work is based on the definition of populism as a “communication style of political actors that refers to the people” but also includes anti-elitism as another core feature of populism. Accordingly, we annotate references to The People and The Elite in German and English parliamentary debates with a hierarchical scheme. The paper describes our dataset and annotation procedure and reports inter-annotator agreement for this task. Next, we compare and evaluate different transformer-based model architectures on a German dataset and report results for zero-shot learning on a smaller English dataset. We then show that semi-supervised tri-training can improve results in the cross-lingual setting. Our dataset can be used to investigate how political actors talk about The Elite and The People and to study how populist rhetoric is used as a strategic device.

pdf bib

Policy Domain Prediction from Party Manifestos with Adapters and Knowledge Enhanced Transformers
Hsiao-Chu Yu | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)

2022

pdf bib abs

FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics
Christopher Klamm | Ines Rehbein | Simone Paolo Ponzetto
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

This paper presents a framework for studying second-level political agenda setting in parliamentary debates, based on the selection of policy topics used by political actors to discuss a specific issue on the parliamentary agenda. For example, the COVID-19 pandemic as an agenda item can be contextualised as a health issue or as a civil rights issue, as a matter of macroeconomics or can be discussed in the context of social welfare. Our framework allows us to observe differences regarding how different parties discuss the same agenda item by emphasizing different topical aspects of the item. We apply and evaluate our framework on data from the German Bundestag and discuss the merits and limitations of our approach. In addition, we present a new annotated data set of parliamentary debates, following the coding schema of policy topics developed in the Comparative Agendas Project (CAP), and release models for topic classification in parliamentary debates.

pdf bib

Journal for Language Technology and Computational Linguistics, Vol. 35 No. 2
Ines Rehbein | Gabriella Lapesa | Goran Glavaš | Simone Paolo Ponzetto
Journal for Language Technology and Computational Linguistics, Vol. 35 No. 2

pdf bib

Improved Opinion Role Labelling in Parliamentary Debates
Laura Bamberg | Ines Rehbein | Simone Ponzetto
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

pdf bib abs

Who’s in, who’s out? Predicting the Inclusiveness or Exclusiveness of Personal Pronouns in Parliamentary Debates
Ines Rehbein | Josef Ruppenhofer
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents a compositional annotation scheme to capture the clusivity properties of personal pronouns in context, that is their ability to construct and manage in-groups and out-groups by including/excluding the audience and/or non-speech act participants in reference to groups that also include the speaker. We apply and test our schema on pronoun instances in speeches taken from the German parliament. The speeches cover a time period from 2017-2021 and comprise manual annotations for 3,126 sentences. We achieve high inter-annotator agreement for our new schema, with a Cohen’s κ in the range of 89.7-93.2 and a percentage agreement of > 96%. Our exploratory analysis of in/exclusive pronoun use in the parliamentary setting provides some face validity for our new schema. Finally, we present baseline experiments for automatically predicting clusivity in political debates, with promising results for many referential constellations, yielding an overall 84.9% micro F1 for all pronouns.

2021

pdf bib

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates
Ines Rehbein | Josef Ruppenhofer | Julian Bernauer
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

pdf bib abs

Come hither or go away? Recognising pre-electoral coalition signals in the news
Ines Rehbein | Simone Paolo Ponzetto | Anna Adendorf | Oke Bahnsen | Lukas Stoetzer | Heiner Stuckenschmidt
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce the task of political coalition signal prediction from text, that is, the task of recognizing from the news coverage leading up to an election the (un)willingness of political parties to form a government coalition. We decompose our problem into two related, but distinct tasks: (i) predicting whether a reported statement from a politician or a journalist refers to a potential coalition and (ii) predicting the polarity of the signal – namely, whether the speaker is in favour of or against the coalition. For this, we explore the benefits of multi-task learning and investigate which setup and task formulation is best suited for each sub-task. We evaluate our approach, based on hand-coded newspaper articles, covering elections in three countries (Ireland, Germany, Austria) and two languages (English, German). Our results show that the multi-task learning approach can further improve results over a strong monolingual transfer learning baseline.

2020

pdf bib abs

Improving Sentence Boundary Detection for Spoken Language Transcripts
Ines Rehbein | Josef Ruppenhofer | Thomas Schmidt
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents experiments on sentence boundary detection in transcripts of spoken dialogues. Segmenting spoken language into sentence-like units is a challenging task, due to disfluencies, ungrammatical or fragmented structures and the lack of punctuation. In addition, one of the main bottlenecks for many NLP applications for spoken language is the small size of the training data, as the transcription and annotation of spoken language is by far more time-consuming and labour-intensive than processing written language. We therefore investigate the benefits of data expansion and transfer learning and test different ML architectures for this task. Our results show that data expansion is not straightforward and even data from the same domain does not always improve results. They also highlight the importance of modelling, i.e. of finding the best architecture and data representation for the task at hand. For the detection of boundaries in spoken language transcripts, we achieve a substantial improvement when framing the boundary detection problem assentence pair classification task, as compared to a sequence tagging approach.

pdf bib abs

I’ve got a construction looks funny – representing and recovering non-standard constructions in UD
Josef Ruppenhofer | Ines Rehbein
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

The UD framework defines guidelines for a crosslingual syntactic analysis in the framework of dependency grammar, with the aim of providing a consistent treatment across languages that not only supports multilingual NLP applications but also facilitates typological studies. Until now, the UD framework has mostly focussed on bilexical grammatical relations. In the paper, we propose to add a constructional perspective and discuss several examples of spoken-language constructions that occur in multiple languages and challenge the current use of basic and enhanced UD relations. The examples include cases where the surface relations are deceptive, and syntactic amalgams that either involve unconnected subtrees or structures with multiply-headed dependents. We argue that a unified treatment of constructions across languages will increase the consistency of the UD annotations and thus the quality of the treebanks for linguistic analysis.

pdf bib abs

Neural Reranking for Dependency Parsing: An Evaluation
Bich-Ngoc Do | Ines Rehbein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent work has shown that neural rerankers can improve results for dependency parsing over the top k trees produced by a base parser. However, all neural rerankers so far have been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. In the paper, we re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). We show that the GCN not only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. We explain the differences in reranking performance based on an analysis of a) the gold tree ratio and b) the variety in the k-best lists.

pdf bib abs

The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.

pdf bib abs

Exploring Morality in Argumentation
Jonathan Kobbe | Ines Rehbein | Ioana Hulpuș | Heiner Stuckenschmidt
Proceedings of the 7th Workshop on Argument Mining

Sentiment and stance are two important concepts for the analysis of arguments. We propose to add another perspective to the analysis, namely moral sentiment. We argue that moral values are crucial for ideological debates and can thus add useful information for argument mining. In the paper, we present different models for automatically predicting moral sentiment in debates and evaluate them on a manually annotated testset. We then apply our models to investigate how moral values in arguments relate to argument quality, stance and audience reactions.

pdf bib abs

Fine-grained Named Entity Annotations for German Biographic Interviews
Josef Ruppenhofer | Ines Rehbein | Carolina Flinz
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a fine-grained NER annotations with 30 labels and apply it to German data. Building on the OntoNotes 5.0 NER inventory, our scheme is adapted for a corpus of transcripts of biographic interviews by adding categories for AGE and LAN(guage) and also features extended numeric and temporal categories. Applying the scheme to the spoken data as well as a collection of teaser tweets from newspaper sites, we can confirm its generality for both domains, also achieving good inter-annotator agreement. We also show empirically how our inventory relates to the well-established 4-category NER inventory by re-annotating a subset of the GermEval 2014 NER coarse-grained dataset with our fine label inventory. Finally, we use a BERT-based system to establish some baseline models for NER tagging on our two new datasets. Global results in in-domain testing are quite high on the two datasets, near what was achieved for the coarse inventory on the CoNLLL2003 data. Cross-domain testing produces much lower results due to the severe domain differences.

pdf bib abs

Parsers Know Best: German PP Attachment Revisited
Bich-Ngoc Do | Ines Rehbein
Proceedings of the 28th International Conference on Computational Linguistics

In the paper, we revisit the PP attachment problem which has been identified as one of the major sources for parser errors and discuss shortcomings of recent work. In particular, we show that using gold information for the extraction of attachment candidates as well as a missing comparison of the system’s output to the output of a full syntactic parser leads to an overly optimistic assessment of the results. We address these issues by presenting a realistic evaluation of the potential of different PP attachment systems, using fully predicted information as system input. We compare our results against the output of a strong neural parser and show that the full parsing approach is superior to modeling PP attachment disambiguation as a separate task.

pdf bib abs

A New Resource for German Causal Language
Ines Rehbein | Josef Ruppenhofer
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a new resource for German causal language, with annotations in context for verbs, nouns and prepositions. Our dataset includes 4,390 annotated instances for more than 150 different triggers. The annotation scheme distinguishes three different types of causal events (CONSEQUENCE , MOTIVATION, PURPOSE). We also provide annotations for semantic roles, i.e. of the cause and effect for the causal event as well as the actor and affected party, if present. In the paper, we present inter-annotator agreement scores for our dataset and discuss problems for annotating causal language. Finally, we present experiments where we frame causal annotation as a sequence labelling problem and report baseline results for the prediciton of causal arguments and for predicting different types of causation.

2019

pdf bib abs

Automatic Alignment and Annotation Projection for Literary Texts
Uli Steinbach | Ines Rehbein
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper presents a modular NLP pipeline for the creation of a parallel literature corpus, followed by annotation transfer from the source to the target language. The test case we use to evaluate our pipeline is the automatic transfer of quote and speaker mention annotations from English to German. We evaluate the different components of the pipeline and discuss challenges specific to literary texts. Our experiments show that after applying a reasonable amount of semi-automatic postprocessing we can obtain high-quality aligned and annotated resources for a new language.

pdf bib abs

Active Learning via Membership Query Synthesis for Semi-Supervised Sentence Classification
Raphael Schumann | Ines Rehbein
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Active learning (AL) is a technique for reducing manual annotation effort during the annotation of training data for machine learning classifiers. For NLP tasks, pool-based and stream-based sampling techniques have been used to select new instances for AL while gen erating new, artificial instances via Membership Query Synthesis was, up to know, considered to be infeasible for NLP problems. We present the first successfull attempt to use Membership Query Synthesis for generating AL queries, using Variational Autoencoders for query generation. We evaluate our approach in a text classification task and demonstrate that query synthesis shows competitive performance to pool-based AL strategies while substantially reducing annotation time

pdf bib abs

On the role of discourse relations in persuasive texts
Ines Rehbein
Proceedings of the 13th Linguistic Annotation Workshop

This paper investigates the use of explicitly signalled discourse relations in persuasive texts. We present a corpus study where we control for speaker and topic and show that the distribution of different discourse connectives varies considerably across different discourse settings. While this variation can be explained by genre differences, we also observe variation regarding the distribution of discourse relations across different settings. This variation, however, cannot be easily explained by genre differences. We argue that the differences regarding the use of discourse relations reflects different strategies of persuasion and that these might be due to audience design.

pdf bib

tweeDe – A Universal Dependencies treebank for German tweets
Ines Rehbein | Josef Ruppenhofer | Bich-Ngoc Do
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

2018

pdf bib abs

Sprucing up the trees – Error detection in treebanks
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 27th International Conference on Computational Linguistics

We present a method for detecting annotation errors in manually and automatically annotated dependency parse trees, based on ensemble parsing in combination with Bayesian inference, guided by active learning. We evaluate our method in different scenarios: (i) for error detection in dependency treebanks and (ii) for improving parsing accuracy on in- and out-of-domain data.

2017

pdf bib abs

Catching the Common Cause: Extraction and Annotation of Causal Relations and their Participants
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 11th Linguistic Annotation Workshop

In this paper, we present a simple, yet effective method for the automatic identification and extraction of causal relations from text, based on a large English-German parallel corpus. The goal of this effort is to create a lexical resource for German causal relations. The resource will consist of a lexicon that describes constructions that trigger causality as well as the participants of the causal event, and will be augmented by a corpus with annotated instances for each entry, that can be used as training data to develop a system for automatic classification of causal relations. Focusing on verbs, our method harvested a set of 100 different lexical triggers of causality, including support verb constructions. At the moment, our corpus includes over 1,000 annotated instances. The lexicon and the annotated data will be made available to the research community.

pdf bib abs

Authorship Attribution with Convolutional Neural Networks and POS-Eliding
Julian Hitschler | Esther van den Berg | Ines Rehbein
Proceedings of the Workshop on Stylistic Variation

We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.

pdf bib abs

Evaluating LSTM models for grammatical function labelling
Bich-Ngoc Do | Ines Rehbein
Proceedings of the 15th International Conference on Parsing Technologies

To improve grammatical function labelling for German, we augment the labelling component of a neural dependency parser with a decision history. We present different ways to encode the history, using different LSTM architectures, and show that our models yield significant improvements, resulting in a LAS for German that is close to the best result from the SPMRL 2014 shared task (without the reranker).

pdf bib

Data point selection for genre-aware parsing
Ines Rehbein | Felix Bildhauer
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

pdf bib

Universal Dependencies are Hard to Parse – or are They?
Ines Rehbein | Julius Steen | Bich-Ngoc Do | Anette Frank
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib abs

Detecting annotation noise in automatically labelled data
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.

pdf bib abs

What do we need to know about an unknown word when parsing German
Bich-Ngoc Do | Ines Rehbein | Anette Frank
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information.

2016

pdf bib abs

Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
Ines Rehbein | Merel Scholman | Vera Demberg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.

2015

pdf bib

Proceedings of the 9th Linguistic Annotation Workshop
Adam Meyers | Ines Rehbein | Heike Zinsmeister
Proceedings of the 9th Linguistic Annotation Workshop

pdf bib

Filled Pauses in User-generated Content are Words with Extra-propositional Meaning
Ines Rehbein
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015)

2014

pdf bib

POS error detection in automatically annotated corpora
Ines Rehbein
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf bib

Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages
Yoav Goldberg | Yuval Marton | Ines Rehbein | Yannick Versley | Özlem Çetinoğlu | Joel Tetreault
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

pdf bib abs

The KiezDeutsch Korpus (KiDKo) Release 1.0
Ines Rehbein | Sören Schalowski | Heike Wiese
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multiethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data.

2013

pdf bib

Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages
Yoav Goldberg | Yuval Marton | Ines Rehbein | Yannick Versley
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib abs

Yes we can!? Annotating English modal verbs
Josef Ruppenhofer | Ines Rehbein
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.

pdf bib

Semantic frames as an anchor representation for sentiment analysis
Josef Ruppenhofer | Ines Rehbein
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

2011

pdf bib

Data point selection for self-training
Ines Rehbein
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages

pdf bib

Evaluating the Impact of Coder Errors on Active Learning
Ines Rehbein | Josef Ruppenhofer
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib

Hard Constraints for Grammatical Function Labelling
Wolfgang Seeker | Ines Rehbein | Jonas Kuhn | Josef van Genabith
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib abs

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task
Ines Rehbein | Josef Ruppenhofer
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb ""drohen"" (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.

pdf bib

Bringing Active Learning to Life
Ines Rehbein | Josef Ruppenhofer | Alexis Palmer
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib

2009

pdf bib

Scalable Discriminative Parsing for German
Yannick Versley | Ines Rehbein
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf bib

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation
Ines Rehbein | Josef Ruppenhofer | Caroline Sporleder
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

2008

pdf bib abs

How to Compare Treebanks
Sandra Kübler | Wolfgang Maier | Ines Rehbein | Yannick Versley
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EvalB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.

Ines Rehbein

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Co-authors

Venues