Dirk Väth

2025

Understanding the Role of Mental Models in User Interaction with an Adaptive Dialog Agent
Lindsey Morgan Vanderlyn | Dirk Väth | Thang Vu
Findings of the Association for Computational Linguistics: NAACL 2025

Mental models play an important role in whether user interactions with intelligent systems, such as dialog agents, are successful. Adaptive dialog systems present the opportunity to align a dialog agent’s behavior with heterogeneous user expectations. However, there has been little research into what mental models users form when interacting with a task-oriented dialog system, how these models affect users’ interactions, or what role system adaptation can play in this process. This can make it challenging to avoid damage to human-AI partnership. In this work, we collect a new publicly available dataset for exploring user mental models of information seeking dialog systems. We demonstrate that users have a variety of conflicting mental models about such systems, the validity of which directly impacts the success and perception of their interactions. Furthermore, we show that adapting a dialog agent’s behavior to better align with users’ mental models, even when done implicitly, can improve dialog efficiency, success, and user perception of the interaction. This shows that implicit adaptation can be beneficial for task-oriented dialog systems, so long as developers understand the mental models of their users.

2024

pdf bib abs

Towards a Zero-Data, Controllable, Adaptive Dialog System
Dirk Väth | Lindsey Vanderlyn | Ngoc Thang Vu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Conversational Tree Search (Väth et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree. The agent learns to efficiently navigate this tree, while adapting to information needs, e.g., domain familiarity, of different users. However, the need for additional training data hinders deployment in new domains. To address this, we explore approaches to generate this data directly from dialog trees. We improve the original approach, and show that agents trained on synthetic data can achieve comparable dialog success to models trained on human data, both when using a commercial Large Language Model for generation, or when using a smaller open-source model, running on a single GPU. We further demonstrate the scalability of our approach by collecting and testing on two new datasets: ONBOARD, a new domain helping foreign residents moving to a new city, and the medical domain DIAGNOSE, a subset of Wikipedia articles related to scalp and head symptoms. Finally, we perform human testing, where no statistically significant differences were found in either objective or subjective measures between models trained on human and generated data.

2023

pdf bib abs

DIAGRAPH: An Open-Source Graphic Interface for Dialog Flow Design
Dirk Väth | Lindsey Vanderlyn | Ngoc Thang Vu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

In this work, we present DIAGRAPH, an open-source graphical dialog flow editor built on the ADVISER toolkit. Our goal for this tool is threefold: 1) To support subject-experts to intuitively create complex and flexible dialog systems,2) To support rapid prototyping of dialog system behavior, e.g., for research, and 3) To provide a hands-on test bed for students learning about dialog systems. To facilitate this, DIAGRAPH aims to provide a clean and intuitive graphical interface for creating dialog systems without requiring any coding knowledge. Once a dialog graph has been created, it is automatically turned into a dialog system using state of the art language models. This allows for rapid prototyping and testing. Dialog designers can then distribute a link to their finished dialog system or embed it into a website.Additionally, to support scientific experiments and data collection, dialog designers can access chat logs. Finally, to verify the usability of DIAGRAPH, we performed evaluation with subject-experts who extensively worked with the tool and users testing it for the first time, receiving above average System Usability Scale (SUS) scores from both (82 out 100 and 75 out of 100, respectively).In this way, we hope DIAGRAPH helps reduce the barrier to entry for creating dialog interactions.

pdf bib abs

Conversational Tree Search: A New Hybrid Dialog Task
Dirk Väth | Lindsey Vanderlyn | Ngoc Thang Vu
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Conversational interfaces provide a flexible and easy way for users to seek information that may otherwise be difficult or inconvenient to obtain. However, existing interfaces generally fall into one of two categories: FAQs, where users must have a concrete question in order to retrieve a general answer, or dialogs, where users must follow a pre-defined path but may receive a personalized answer. In this paper, we introduce Conversational Tree Search (CTS) as a new task that bridges the gap between FAQ-style information retrieval and task-oriented dialog, allowing domain-experts to define dialog trees which can then be converted to an efficient dialog policy that learns only to ask the questions necessary to navigate a user to their goal. We collect a dataset for the travel reimbursement domain and demonstrate a baseline as well as a novel deep Reinforcement Learning architecture for this task. Our results show that the new architecture combines the positive aspects of both the FAQ and dialog system used in the baseline and achieves higher goal completion while skipping unnecessary questions.

2021

pdf bib abs

“It seemed like an annoying woman”: On the Perception and Ethical Considerations of Affective Language in Text-Based Conversational Agents
Lindsey Vanderlyn | Gianna Weber | Michael Neumann | Dirk Väth | Sarina Meyer | Ngoc Thang Vu
Proceedings of the 25th Conference on Computational Natural Language Learning

Previous research has found that task-oriented conversational agents are perceived more positively by users when they provide information in an empathetic manner compared to a plain, emotionless information exchange. However, users’ perception and ethical considerations related to a dialog systems’ response language style have received comparatively little attention in the field of human-computer interaction. To bridge this gap, we explored these ethical implications through a scenario-based user study. 127 participants interacted with one of three variants of an affective, task-oriented conversational agent, each variant providing responses in a different language style. After the interaction, participants filled out a survey about their feelings during the experiment and their perception of various aspects of the chatbot. Based on statistical and qualitative analysis of the responses, we found language style played an important role in how human-like participants perceived a dialog agent as well as how likable. Language style also had a direct effect on how users perceived the use of personal pronouns ‘I’ and ‘You’ and how they projected gender onto the chatbot. Finally, we identify and discuss ethical implications. In particular we focus on what factors/stereotypes influenced participants’ impressions of gender, and what trade-offs a more human-like chatbot brings.

pdf bib abs

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Väth | Pascal Tilli | Ngoc Thang Vu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets. To this end, we propose a browser-based benchmarking tool for researchers and challenge organizers, with an API for easy integration of new models and datasets to keep up with the fast-changing landscape of VQA. Our tool helps test generalization capabilities of models across multiple datasets, evaluating not just accuracy, but also performance in more realistic real-world scenarios such as robustness to input noise. Additionally, we include metrics that measure biases and uncertainty, to further explain model behavior. Interactive filtering facilitates discovery of problematic behavior, down to the data sample level. As proof of concept, we perform a case study on four models. We find that state-of-the-art VQA models are optimized for specific tasks or datasets, but fail to generalize even to other in-domain test sets, for example they can not recognize text in images. Our metrics allow us to quantify which image and question embeddings provide most robustness to a model. All code s publicly available.

2020

pdf bib abs

We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically experienced users, such as machine learning researchers, but also for less technically experienced users, such as linguists or cognitive scientists, thereby providing a flexible platform for collaborative research.

2019

pdf bib abs

In this paper, we present ADVISER - an open source dialog system framework for education and research purposes. This system supports multi-domain task-oriented conversations in two languages. It additionally provides a flexible architecture in which modules can be arbitrarily combined or exchanged - allowing for easy switching between rules-based and neural network based implementations. Furthermore, ADVISER offers a transparent, user-friendly framework designed for interdisciplinary collaboration: from a flexible back end, allowing easy integration of new features, to an intuitive graphical user interface supporting nontechnical users.

pdf bib abs

To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies
Dirk Väth | Ngoc Thang Vu
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

In this paper, we explore state-of-the-art deep reinforcement learning methods for dialog policy training such as prioritized experience replay, double deep Q-Networks, dueling network architectures and distributional learning. Our main findings show that each individual method improves the rewards and the task success rate but combining these methods in a Rainbow agent, which performs best across tasks and environments, is a non-trivial task. We, therefore, provide insights about the influence of each method on the combination and how to combine them to form a Rainbow agent.