2025
pdf
bib
abs
CoPrUS: Consistency Preserving Utterance Synthesis towards more realistic benchmark dialogues
Sebastian Steindl
|
Ulrich Schäfer
|
Bernd Ludwig
Proceedings of the 31st International Conference on Computational Linguistics
Large-scale Wizard-Of-Oz dialogue datasets have enabled the training of deep learning-based dialogue systems. While they are successful as benchmark datasets, they lack certain types of utterances, which would make them more realistic. In this work, we investigate the creation of synthetic communication errors in an automatic pipeline. Based on linguistic theory, we propose and follow a simple error taxonomy. We focus on three types of miscommunications that could happen in real-world dialogues but are underrepresented in the benchmark dataset: misunderstandings, non-understandings and vaguely related questions. Our two-step approach uses a state-of-the-art Large Language Model (LLM) to first create the error and secondly the repairing utterance. We perform Language Model-based evaluation to ensure the quality of the generated utterances. We apply the method to the MultiWOZ dataset and evaluate it both qualitatively and empirically as well as with human judges. Our results indicate that current LLMs can aid in adding post-hoc miscommunications to benchmark datasets as a form of data augmentation. We publish the resulting dataset, in which nearly 1900 dialogues have been modified, as CoPrUS-MultiWOZ to facilitate future work on dialogue systems.
2024
pdf
bib
Team Quabynar at the GermEval 2024 Shared Task 1 GerMS-Detect (Subtasks 1 and 2) on Sexism Detection
Kwabena Odame Akomeah
|
Udo Kruschwitz
|
Bernd Ludwig
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)
pdf
bib
abs
Counterfactual Dialog Mixing as Data Augmentation for Task-Oriented Dialog Systems
Sebastian Steindl
|
Ulrich Schäfer
|
Bernd Ludwig
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
High-quality training data for Task-Oriented Dialog (TOD) systems is costly to come by if no corpora are available. One method to extend available data is data augmentation. Yet, the research into and adaptation of data augmentation techniques for TOD systems is limited in comparison with other data modalities. We propose a novel, causally-flavored data augmentation technique called Counterfactual Dialog Mixing (CDM) that generates realistic synthetic dialogs via counterfactuals to increase the amount of training data. We demonstrate the method on a benchmark dataset and show that a model trained to classify the counterfactuals from the original data fails to do so, which strengthens the claim of creating realistic synthetic dialogs. To evaluate the effectiveness of CDM, we train a current architecture on a benchmark dataset and compare the performance with and without CDM. By doing so, we achieve state-of-the-art on some metrics. We further investigate the external generalizability and a lower resource setting. To evaluate the models, we adopted an interactive evaluation scheme.
pdf
bib
abs
Linguistic Obfuscation Attacks and Large Language Model Uncertainty
Sebastian Steindl
|
Ulrich Schäfer
|
Bernd Ludwig
|
Patrick Levi
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)
Large Language Models (LLMs) have taken the research field of Natural Language Processing by storm. Researchers are not only investigating their capabilities and possible applications, but also their weaknesses and how they may be exploited.This has resulted in various attacks and “jailbreaking” approaches that have gained large interest within the community.The vulnerability of LLMs to certain types of input may pose major risks regarding the real-world usage of LLMs in productive operations.We therefore investigate the relationship between a LLM’s uncertainty and its vulnerability to jailbreaking attacks.To this end, we focus on a probabilistic point of view of uncertainty and employ a state-of-the art open-source LLM.We investigate an attack that is based on linguistic obfuscation.Our results indicate that the model is subject to a higher level of uncertainty when confronted with manipulated prompts that aim to evade security mechanisms.This study lays the foundation for future research into the link between model uncertainty and its vulnerability to jailbreaks.
2023
pdf
bib
abs
Controlled Data Augmentation for Training Task-Oriented Dialog Systems with Low Resource Data
Sebastian Steindl
|
Ulrich Schäfer
|
Bernd Ludwig
Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
Modern dialog systems rely on Deep Learning to train transformer-based model architectures. These notoriously rely on large amounts of training data. However, the collection of conversational data is often a tedious and costly process. This is especially true for Task-Oriented Dialogs, where the system ought to help the user achieve specific tasks, such as making reservations. We investigate a controlled strategy for dialog synthesis. Our method generates utterances based on dialog annotations in a sequence-to-sequence manner. Besides exploring the viability of the approach itself, we also explore the effect of constrained beam search on the generation capabilities. Moreover, we analyze the effectiveness of the proposed method as a data augmentation by studying the impact the synthetic dialogs have on training dialog systems. We perform the experiments in multiple settings, simulating various amounts of ground-truth data. Our work shows that a controlled generation approach is a viable method to synthesize Task-Oriented Dialogs, that can in turn be used to train dialog systems. We were able to improve this process by utilizing constrained beam search.
2021
pdf
bib
abs
UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments
Kwabena Odame Akomeah
|
Udo Kruschwitz
|
Bernd Ludwig
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments
In this paper, we report on our approach to addressing the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments for the German language. We submitted three runs for each subtask based on ensembles of three models each using contextual embeddings from pre-trained language models using SVM and neural-network-based classifiers. We include language-specific as well as language-agnostic language models – both with and without fine-tuning. We observe that for the runs we submitted that the SVM models overfitted the training data and this affected the aggregation method (simple majority voting) of the ensembles. The model records a lower performance on the test set than on the training set. Exploring the issue of overfitting we uncovered that due to a bug in the pipeline the runs we submitted had not been trained on the full set but only on a small training set. Therefore in this paper we also include the results we get when trained on the full training set which demonstrate the power of ensembles.
2006
pdf
bib
Tracing Actions Helps in Understanding Interactions
Bernd Ludwig
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue