Annette Hautli

Also published as: Annette Hautli-Janisz


2024

pdf bib
Question Type Prediction in Natural Debate
Zlata Kikteva | Alexander Trautsch | Steffen Herbold | Annette Hautli-Janisz
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In spontaneous natural debate, questions play a variety of crucial roles: they allow speakers to introduce new topics, seek other speakers’ opinions or indeed confront them. A three-class question typology has previously been demonstrated to effectively capture details pertaining to the nature of questions and the different functions associated with them in a debate setting. We adopt this classification and investigate the performance of several machine learning approaches on this task by incorporating various sets of lexical, dialogical and argumentative features. We find that BERT demonstrates the best performance on the task, followed by a Random Forest model enriched with pragmatic features.

pdf bib
Automated Anonymization of Parole Hearing Transcripts
Abed Itani | Wassiliki Siskou | Annette Hautli-Janisz
Proceedings of the Natural Legal Language Processing Workshop 2024

Responsible natural language processing is more and more concerned with preventing the violation of personal rights that language technology can entail (CITATION). In this paper we illustrate the case of parole hearings in California, the verbatim transcripts of which are made available to the general public upon a request sent to the California Board of Parole Hearings. The parole hearing setting is highly sensitive: inmates face a board of legal representatives who discuss highly personal matters not only about the inmates themselves but also about victims and their relatives, such as spouses and children. Participants have no choice in contributing to the data collection process, since the disclosure of the transcripts is mandated by law. As researchers who are interested in understanding and modeling the communication in these hierarchy-driven settings, we face an ethical dilemma: publishing raw data as is for the community would compromise the privacy of all individuals affected, but manually cleaning the data requires a substantive effort. In this paper we present an automated anonymization process which reliably removes and pseudonymizes sensitive data in verbatim transcripts, while at the same time preserving the structure and content of the data. Our results show that the process exhibits little to no leakage of sensitive information when applied to more than 300 hearing transcripts.

pdf bib
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024
Annette Hautli-Janisz | Gabriella Lapesa | Lucas Anastasiou | Valentin Gold | Anna De Liddo | Chris Reed
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

pdf bib
PSE v1.0: The First Open Access Corpus of Public Service Encounters
Ingrid Espinoza | Steffen Frenzel | Laurin Friedrich | Wassiliki Siskou | Steffen Eckhard | Annette Hautli-Janisz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Face-to-face interactions between representatives of the state and citizens are a key intercept in public service delivery, for instance when providing social benefits to vulnerable groups. Despite the relevance of these encounters for the individual, but also for society at large, there is a significant research gap in the systematic empirical study of the communication taking place. This is mainly due to the high institutional and data protection barriers for collecting data in a very sensitive and private setting in which citizens request support from the state. In this paper, we describe the procedure of compiling the first open access dataset of transcribed recordings of so-called Public Service Encounters in Germany, i.e., meetings between state officials and citizens in which there is direct communication in order to allocate state services. This dataset sets a new research directive in the social sciences, because it allows the community to open up the black box of direct state-citizen interaction. With data of this kind it becomes possible to directly and systematically investigate bias, bureaucratic discrimination and other power-driven dynamics in the actual communication and ideally propose guidelines as to alleviate these issues.

2023

pdf bib
On the Impact of Reconstruction and Context for Argument Prediction in Natural Debate
Zlata Kikteva | Alexander Trautsch | Patrick Katzer | Mirko Oest | Steffen Herbold | Annette Hautli-Janisz
Proceedings of the 10th Workshop on Argument Mining

Debate naturalness ranges on a scale from small, highly structured, and topically focused settings to larger, more spontaneous and less constrained environments. The more unconstrained a debate, the more spontaneous speakers act: they build on contextual knowledge and use anaphora or ellipses to construct their arguments. They also use rhetorical devices such as questions and imperatives to support or attack claims. In this paper, we study how the reconstruction of the actual debate contributions, i.e., utterances which contain pronouns, ellipses and fuzzy language, into full-fledged propositions which are interpretable without context impacts the prediction of argument relations and investigate the effect of incorporating contextual information for the task. We work with highly complex spontaneous debates with more than 10 speakers on a wide variety of topics. We find that in contrast to our initial hypothesis, reconstruction does not improve predictions and context only improves them when used in combination with propositions.

2022

pdf bib
The Keystone Role Played by Questions in Debate
Zlata Kikteva | Kamila Gorska | Wassiliki Siskou | Annette Hautli-Janisz | Chris Reed
Proceedings of the 3rd Workshop on Computational Approaches to Discourse

Building on the recent results of a study into the roles that are played by questions in argumentative dialogue (Hautli-Janisz et al.,2022a), we expand the analysis to investigate a newly released corpus that constitutes the largest extant corpus of closely annotated debate. Questions play a critical role in driving dialogical discourse forward; in combative or critical discursive environments, they not only provide a range of discourse management techniques, they also scaffold the semantic structure of the positions that interlocutors develop. The boundaries, however, between providing substantive answers to questions, merely responding to questions, and evading questions entirely, are fuzzy and the way in which answers, responses and evasions affect the subsequent development of dialogue and argumentation structure are poorly understood. In this paper, we explore how questions have ramifications on the large-scale structure of a debate using as our substrate the BBC television programme Question Time, the foremost topical debate show in the UK. Analysis of the data demonstrates not only that questioning plays a particularly prominent role in such debate, but also that its repercussions can reverberate through a discourse.

pdf bib
Disagreement Space in Argument Analysis
Annette Hautli-Janisz | Ella Schad | Chris Reed
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022

For a highly subjective task such as recognising speaker intention and argumentation, the traditional way of generating gold standards is to aggregate a number of labels into a single one. However, this seriously neglects the underlying richness that characterises discourse and argumentation and is also, in some cases, straightforwardly impossible. In this paper, we present QT30nonaggr, the first corpus of non-aggregated argument annotation, which will be openly available upon publication. QT30nonaggr encompasses 10% of QT30, the largest corpus of dialogical argumentation and analysed broadcast political debate currently available with 30 episodes of BBC’s ‘Question Time’ from 2020 and 2021. Based on a systematic and detailed investigation of annotation judgements across all steps of the annotation process, we structure the disagreement space with a taxonomy of the types of label disagreements in argument annotation, identifying the categories of annotation errors, fuzziness and ambiguity.

pdf bib
QT30: A Corpus of Argument and Conflict in Broadcast Debate
Annette Hautli-Janisz | Zlata Kikteva | Wassiliki Siskou | Kamila Gorska | Ray Becker | Chris Reed
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Broadcast political debate is a core pillar of democracy: it is the public’s easiest access to opinions that shape policies and enables the general public to make informed choices. With QT30, we present the largest corpus of analysed dialogical argumentation ever created (19,842 utterances, 280,000 words) and also the largest corpus of analysed broadcast political debate to date, using 30 episodes of BBC’s ‘Question Time’ from 2020 and 2021. Question Time is the prime institution in UK broadcast political debate and features questions from the public on current political issues, which are responded to by a weekly panel of five figures of UK politics and society. QT30 is highly argumentative and combines language of well-versed political rhetoric with direct, often combative, justification-seeking of the general public. QT30 is annotated with Inference Anchoring Theory, a framework well-known in argument mining, which encodes the way arguments and conflicts are created and reacted to in dialogical settings. The resource is freely available at http://corpora.aifdb.org/qt30.

2019

pdf bib
lingvis.io - A Linguistic Visual Analytics Framework
Mennatallah El-Assady | Wolfgang Jentner | Fabian Sperrle | Rita Sevastjanova | Annette Hautli-Janisz | Miriam Butt | Daniel Keim
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a modular framework for the rapid-prototyping of linguistic, web-based, visual analytics applications. Our framework gives developers access to a rich set of machine learning and natural language processing steps, through encapsulating them into micro-services and combining them into a computational pipeline. This processing pipeline is auto-configured based on the requirements of the visualization front-end, making the linguistic processing and visualization design, detached independent development tasks. This paper describes the constellation and modality of our framework, which continues to support the efficient development of various human-in-the-loop, linguistic visual analytics research techniques and applications.

2018

pdf bib
A Multilingual Approach to Question Classification
Aikaterini-Lida Kalouli | Katharina Kaiser | Annette Hautli-Janisz | Georg A. Kaiser | Miriam Butt
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Interactive Visual Analysis of Transcribed Multi-Party Discourse
Mennatallah El-Assady | Annette Hautli-Janisz | Valentin Gold | Miriam Butt | Katharina Holzinger | Daniel Keim
Proceedings of ACL 2017, System Demonstrations

2015

pdf bib
Encoding event structure in Urdu/Hindi VerbNet
Annette Hautli-Janisz | Tracy Holloway King | Gilian Ramchand
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

2014

pdf bib
The CLE Urdu POS Tagset
Saba Urooj | Sarmad Hussain | Asad Mustafa | Rahila Parveen | Farah Adeeba | Tafseer Ahmed Khan | Miriam Butt | Annette Hautli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The paper presents a design schema and details of a new Urdu POS tagset. This tagset is designed due to challenges encountered in working with existing tagsets for Urdu. It uses tags that judiciously incorporate information about special morpho-syntactic categories found in Urdu. With respect to the overall naming schema and the basic divisions, the tagset draws on the Penn Treebank and a Common Tagset for Indian Languages. The resulting CLE Urdu POS Tagset consists of 12 major categories with subdivisions, resulting in 32 tags. The tagset has been used to tag 100k words of the CLE Urdu Digest Corpus, giving a tagging accuracy of 96.8%.

pdf bib
Automatic Detection of Causal Relations in German Multilogs
Tina Bögel | Annette Hautli-Janisz | Sebastian Sulger | Miriam Butt
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

2013

pdf bib
A Visual Analytics System for Cluster Exploration
Andreas Lamprecht | Annette Hautli | Christian Rohrdantz | Tina Bögel
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
Lexical Semantics and Distribution of Suffixes - A Visual Analysis
Christian Rohrdantz | Andreas Niekler | Annette Hautli | Miriam Butt | Daniel A. Keim
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

pdf bib
Identifying Urdu Complex Predication via Bigram Extraction
Miriam Butt | Tina Bögel | Annette Hautli | Sebastian Sulger | Tafseer Ahmed
Proceedings of COLING 2012

pdf bib
A Reference Dependency Bank for Analyzing Complex Predicates
Tafseer Ahmed | Miriam Butt | Annette Hautli | Sebastian Sulger
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

When dealing with languages of South Asia from an NLP perspective, a problem that repeatedly crops up is the treatment of complex predicates. This paper presents a first approach to the analysis of complex predicates (CPs) in the context of dependency bank development. The efforts originate in theoretical work on CPs done within Lexical-Functional Grammar (LFG), but are intended to provide a guideline for analyzing different types of CPs in an independent framework. Despite the fact that we focus on CPs in Hindi and Urdu, the design of the dependencies is kept general enough to account for CP constructions across languages.

2011

pdf bib
Towards Tracking Semantic Change by Visual Analytics
Christian Rohrdantz | Annette Hautli | Thomas Mayer | Miriam Butt | Daniel A. Keim | Frans Plank
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Extracting and Classifying Urdu Multiword Expressions
Annette Hautli | Sebastian Sulger
Proceedings of the ACL 2011 Student Session

pdf bib
Towards a Computational Semantic Analyzer for Urdu
Annette Hautli | Miriam Butt
Proceedings of the 9th Workshop on Asian Language Resources