Annette Hautli

Also published as: Annette Hautli-Janisz

2025

Exploring the Limits of Prompting LLMs with Speaker-Specific Rhetorical Fingerprints
Wassiliki Siskou | Annette Hautli-Janisz
Proceedings of the First Workshop on Natural Language Processing and Language Models for Digital Humanities

The capabilities of Large Language Models (LLMs) to mimic written content are being tested on a wide range of tasks and settings, from persuasive essays to programming code. However, the question to what extent they are capable of mimicking human conversational monologue is less well-researched. In this study, we explore the limits of popular LLMs in impersonating content in a high-stakes legal setting, namely for the generation of the decision statement in parole suitability hearings: We distill a linguistically well-motivated rhetorical fingerprint from individual presiding commissioners, based on patterns observed in verbatim transcripts and then enhance the model prompts with those characteristics. When comparing this enhanced prompt with an underspecified prompt we show that LLMs can approximate certain rhetorical features when prompted accordingly, but are not able to fully replicate the linguistic profile of the original speakers as their own fingerprint dominates.

pdf bib abs

In this position paper, we advocate for the development of conversational technology that is inherently designed to support and facilitate argumentative processes. We argue that, at present, large language models (LLMs) are inadequate for this purpose, and we propose an ideal technology design aimed at enhancing argumentative skills. This involves re-framing LLMs as tools to exercise our critical thinking skills rather than replacing them. We introduce the concept of reasonable parrots that embody the fundamental principles of relevance, responsibility, and freedom, and that interact through argumentative dialogical moves. These principles and moves arise out of millennia of work in argumentation theory and should serve as the starting point for LLM-based technology that incorporates basic principles of argumentation.

pdf bib abs

“The Facts Speak for Themselves”: GPT and Fallacy Classification
Erisa Bytyqi | Annette Hautli-Janisz
Proceedings of the 12th Argument mining Workshop

Fallacies are not only part and parcel of human communication, they are also important for generative models in that fallacies can be tailored to self-verify the output they generate. Previous work has shown that fallacy detection and classification is tricky, but the question that still remains is whether the use of theoretical explanations in prompting Large Language Models (LLMs) on the task enhances the performance of the models. In this paper we show that this is not the case: Using the pragma-dialectics approach to fallacies (van Eemeren, 1987), we show that three GPT models struggle with the task. Based on our own PD-oriented dataset of fallacies and an extension of an existing fallacy dataset from Jin et al. (2022), we show that this is not only the case for fallacies “in the wild”, but also for textbook examples of fallacious arguments. Our paper also supports the claim that LLMs generally lag behind in fallacy classification in comparison to smaller-scale neural models.

pdf bib abs

DebArgVis: An Interactive Visualisation Tool for Exploring Argumentative Dynamics in Debate
Martin Gruber | Zlata Kikteva | Ignaz Rutter | Annette Hautli-Janisz
Proceedings of the 12th Argument mining Workshop

Television debates play a key role in shaping public opinion, however, the rapid exchange of viewpoints in these settings often makes it difficult to perceive the underlying nature of the discussion. While there exist several debate visualisation techniques, to the best of our knowledge, none of them emphasise the argumentative dynamics in particular. With DebArgVis, we present a new interactive debate visualisation tool that leverages data annotated with argumentation structures to demonstrate how speaker interactions unfold over time, enabling users to deepen their comprehension of the debate.

pdf bib abs

Identifying Small Talk in Natural Conversations
Steffen Frenzel | Annette Hautli-Janisz
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)

Small talk is part and parcel of human interaction and is rather employed to communicate values and opinions than pure information. Despite small talk being an omnipresent phenomenon in spoken language, it is difficult to identify: Small talk is situated, i.e., for interpreting a string of words or discourse units, outside references such as the context of the interlocutors and their previous experiences have to be interpreted.In this paper, we present a dataset of natural conversation annotated with a theoretically well-motivated distillation of what constitutes small talk. This dataset comprises of verbatim transcribed public service encounters in German authorities and are the basis for empirical work in administrative policy on how the satisfaction of the citizen manifests itself in the communication with the authorities. We show that statistical models achieve comparable results to those of state-of-the-art LLMs.

2024

pdf bib abs

Automated Anonymization of Parole Hearing Transcripts
Abed Itani | Wassiliki Siskou | Annette Hautli-Janisz
Proceedings of the Natural Legal Language Processing Workshop 2024

Responsible natural language processing is more and more concerned with preventing the violation of personal rights that language technology can entail (CITATION). In this paper we illustrate the case of parole hearings in California, the verbatim transcripts of which are made available to the general public upon a request sent to the California Board of Parole Hearings. The parole hearing setting is highly sensitive: inmates face a board of legal representatives who discuss highly personal matters not only about the inmates themselves but also about victims and their relatives, such as spouses and children. Participants have no choice in contributing to the data collection process, since the disclosure of the transcripts is mandated by law. As researchers who are interested in understanding and modeling the communication in these hierarchy-driven settings, we face an ethical dilemma: publishing raw data as is for the community would compromise the privacy of all individuals affected, but manually cleaning the data requires a substantive effort. In this paper we present an automated anonymization process which reliably removes and pseudonymizes sensitive data in verbatim transcripts, while at the same time preserving the structure and content of the data. Our results show that the process exhibits little to no leakage of sensitive information when applied to more than 300 hearing transcripts.

pdf bib abs

PSE v1.0: The First Open Access Corpus of Public Service Encounters
Ingrid Espinoza | Steffen Frenzel | Laurin Friedrich | Wassiliki Siskou | Steffen Eckhard | Annette Hautli-Janisz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Face-to-face interactions between representatives of the state and citizens are a key intercept in public service delivery, for instance when providing social benefits to vulnerable groups. Despite the relevance of these encounters for the individual, but also for society at large, there is a significant research gap in the systematic empirical study of the communication taking place. This is mainly due to the high institutional and data protection barriers for collecting data in a very sensitive and private setting in which citizens request support from the state. In this paper, we describe the procedure of compiling the first open access dataset of transcribed recordings of so-called Public Service Encounters in Germany, i.e., meetings between state officials and citizens in which there is direct communication in order to allocate state services. This dataset sets a new research directive in the social sciences, because it allows the community to open up the black box of direct state-citizen interaction. With data of this kind it becomes possible to directly and systematically investigate bias, bureaucratic discrimination and other power-driven dynamics in the actual communication and ideally propose guidelines as to alleviate these issues.

pdf bib abs

Question Type Prediction in Natural Debate
Zlata Kikteva | Alexander Trautsch | Steffen Herbold | Annette Hautli-Janisz
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In spontaneous natural debate, questions play a variety of crucial roles: they allow speakers to introduce new topics, seek other speakers’ opinions or indeed confront them. A three-class question typology has previously been demonstrated to effectively capture details pertaining to the nature of questions and the different functions associated with them in a debate setting. We adopt this classification and investigate the performance of several machine learning approaches on this task by incorporating various sets of lexical, dialogical and argumentative features. We find that BERT demonstrates the best performance on the task, followed by a Random Forest model enriched with pragmatic features.

pdf bib

Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024
Annette Hautli-Janisz | Gabriella Lapesa | Lucas Anastasiou | Valentin Gold | Anna De Liddo | Chris Reed
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

2023

pdf bib abs

Debate naturalness ranges on a scale from small, highly structured, and topically focused settings to larger, more spontaneous and less constrained environments. The more unconstrained a debate, the more spontaneous speakers act: they build on contextual knowledge and use anaphora or ellipses to construct their arguments. They also use rhetorical devices such as questions and imperatives to support or attack claims. In this paper, we study how the reconstruction of the actual debate contributions, i.e., utterances which contain pronouns, ellipses and fuzzy language, into full-fledged propositions which are interpretable without context impacts the prediction of argument relations and investigate the effect of incorporating contextual information for the task. We work with highly complex spontaneous debates with more than 10 speakers on a wide variety of topics. We find that in contrast to our initial hypothesis, reconstruction does not improve predictions and context only improves them when used in combination with propositions.

2022

pdf bib abs

Broadcast political debate is a core pillar of democracy: it is the public’s easiest access to opinions that shape policies and enables the general public to make informed choices. With QT30, we present the largest corpus of analysed dialogical argumentation ever created (19,842 utterances, 280,000 words) and also the largest corpus of analysed broadcast political debate to date, using 30 episodes of BBC’s ‘Question Time’ from 2020 and 2021. Question Time is the prime institution in UK broadcast political debate and features questions from the public on current political issues, which are responded to by a weekly panel of five figures of UK politics and society. QT30 is highly argumentative and combines language of well-versed political rhetoric with direct, often combative, justification-seeking of the general public. QT30 is annotated with Inference Anchoring Theory, a framework well-known in argument mining, which encodes the way arguments and conflicts are created and reacted to in dialogical settings. The resource is freely available at http://corpora.aifdb.org/qt30.

pdf bib abs

The Keystone Role Played by Questions in Debate
Zlata Kikteva | Kamila Gorska | Wassiliki Siskou | Annette Hautli-Janisz | Chris Reed
Proceedings of the 3rd Workshop on Computational Approaches to Discourse

Building on the recent results of a study into the roles that are played by questions in argumentative dialogue (Hautli-Janisz et al.,2022a), we expand the analysis to investigate a newly released corpus that constitutes the largest extant corpus of closely annotated debate. Questions play a critical role in driving dialogical discourse forward; in combative or critical discursive environments, they not only provide a range of discourse management techniques, they also scaffold the semantic structure of the positions that interlocutors develop. The boundaries, however, between providing substantive answers to questions, merely responding to questions, and evading questions entirely, are fuzzy and the way in which answers, responses and evasions affect the subsequent development of dialogue and argumentation structure are poorly understood. In this paper, we explore how questions have ramifications on the large-scale structure of a debate using as our substrate the BBC television programme Question Time, the foremost topical debate show in the UK. Analysis of the data demonstrates not only that questioning plays a particularly prominent role in such debate, but also that its repercussions can reverberate through a discourse.

pdf bib abs

Disagreement Space in Argument Analysis
Annette Hautli-Janisz | Ella Schad | Chris Reed
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022

For a highly subjective task such as recognising speaker intention and argumentation, the traditional way of generating gold standards is to aggregate a number of labels into a single one. However, this seriously neglects the underlying richness that characterises discourse and argumentation and is also, in some cases, straightforwardly impossible. In this paper, we present QT30nonaggr, the first corpus of non-aggregated argument annotation, which will be openly available upon publication. QT30nonaggr encompasses 10% of QT30, the largest corpus of dialogical argumentation and analysed broadcast political debate currently available with 30 episodes of BBC’s ‘Question Time’ from 2020 and 2021. Based on a systematic and detailed investigation of annotation judgements across all steps of the annotation process, we structure the disagreement space with a taxonomy of the types of label disagreements in argument annotation, identifying the categories of annotation errors, fuzziness and ambiguity.

2019

pdf bib abs

We present a modular framework for the rapid-prototyping of linguistic, web-based, visual analytics applications. Our framework gives developers access to a rich set of machine learning and natural language processing steps, through encapsulating them into micro-services and combining them into a computational pipeline. This processing pipeline is auto-configured based on the requirements of the visualization front-end, making the linguistic processing and visualization design, detached independent development tasks. This paper describes the constellation and modality of our framework, which continues to support the efficient development of various human-in-the-loop, linguistic visual analytics research techniques and applications.

The paper presents a design schema and details of a new Urdu POS tagset. This tagset is designed due to challenges encountered in working with existing tagsets for Urdu. It uses tags that judiciously incorporate information about special morpho-syntactic categories found in Urdu. With respect to the overall naming schema and the basic divisions, the tagset draws on the Penn Treebank and a Common Tagset for Indian Languages. The resulting CLE Urdu POS Tagset consists of 12 major categories with subdivisions, resulting in 32 tags. The tagset has been used to tag 100k words of the CLE Urdu Digest Corpus, giving a tagging accuracy of 96.8%.

pdf bib

Automatic Detection of Causal Relations in German Multilogs
Tina Bögel | Annette Hautli-Janisz | Sebastian Sulger | Miriam Butt
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

2013

pdf bib

A Visual Analytics System for Cluster Exploration
Andreas Lamprecht | Annette Hautli | Christian Rohrdantz | Tina Bögel
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib

Lexical Semantics and Distribution of Suffixes - A Visual Analysis
Christian Rohrdantz | Andreas Niekler | Annette Hautli | Miriam Butt | Daniel A. Keim
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

pdf bib

Identifying Urdu Complex Predication via Bigram Extraction
Miriam Butt | Tina Bögel | Annette Hautli | Sebastian Sulger | Tafseer Ahmed
Proceedings of COLING 2012

pdf bib abs

A Reference Dependency Bank for Analyzing Complex Predicates
Tafseer Ahmed | Miriam Butt | Annette Hautli | Sebastian Sulger
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

When dealing with languages of South Asia from an NLP perspective, a problem that repeatedly crops up is the treatment of complex predicates. This paper presents a first approach to the analysis of complex predicates (CPs) in the context of dependency bank development. The efforts originate in theoretical work on CPs done within Lexical-Functional Grammar (LFG), but are intended to provide a guideline for analyzing different types of CPs in an independent framework. Despite the fact that we focus on CPs in Hindi and Urdu, the design of the dependencies is kept general enough to account for CP constructions across languages.