The 19th Annual Meeting of the Young Researchers' Roundtable on Spoken Dialogue Systems (2024)


up

pdf (full)
bib (full)
Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems

pdf bib
Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
Koji Inoue | Yahui Fu | Agnes Axelsson | Atsumoto Ohashi | Brielen Madureira | Yuki Zenimoto | Biswesh Mohapatra | Armand Stricker | Sopan Khosla

pdf bib
Conversational XAI and Explanation Dialogues
Nils Feldhus

My main research interest is human-centric explainability, i.e., making language models more interpretable by building applications that lower the barrier of entry to explanations. I am enthusiastic about interactive systems that pique the interest of more people beyond just the experts to learn about the inner workings of language models. My hypothesis is that users of language model applications and dialogue systems are more satisfied and trustworthy if they can look behind the curtain and get easy access to explanations of their behavior.

pdf bib
Enhancing Emotion Recognition in Spoken Dialogue Systems through Multimodal Integration and Personalization
Takumasa Kaneko

My research interests focus on multimodal emotion recognition and personalization in emotion recognition tasks. In multimodal emotion recognition, existing studies demonstrate that integrating various data types like speech, text, and video enhances accuracy. However, real-time constraints and high dataset costs limit their practical application. I propose constructing a multimodal emotion recognition model by combining available unimodal datasets. In terms of personalization, traditional discrete emotion labels often fail to capture the complexity of human emotions. Although recent methods embed speaker characteristics to boost prediction accuracy, they require extensive retraining. I introduce continuous prompt tuning, which updates only the speaker prompts while keeping the speech encoder weights fixed, enabling the addition of new speaker data without additional retraining. This paper discusses these existing research gaps and presents novel approaches to address them, aiming to significantly improve emotion recognition in spoken dialogue systems.

pdf bib
Towards Personalisation of User Support Systems.
Tomoya Higuchi

My research interests lie on the development of advanced user support systems, emphasizing the enhancement of user engagement and system effectiveness. The field of user support systems aims to help users accomplish complex tasks efficiently while ensuring a pleasant and intuitive interaction experience. I explore how to incorporate engaging and context-appropriate assistance into these systems to make the task completion process more effective and enjoyable for users.

pdf bib
Social Agents for Positively Influencing Human Psychological States
Muhammad Yeza Baihaqi

My research interest lies in the realm of social interactive agents, specifically in the development of social agents for positively influencing human psychological states. This interdisciplinary field merges elements of artificial intelligence, psychology, and human-computer interaction. My work integrates psychological theories with dialogue system technologies, including rule-based systems and large language models (LLMs). The core aim of my work is to leverage these systems to promote mental well-being and enhance user experiences in various contexts. At YRRSDS 2024, I plan to discuss several intriguing topics in spoken dialogue systems (SDS), including implementing psychological theories into SDS, assessing human-agent rapport without direct human evaluation, and conducting SDS evaluations using other SDSs. These topics promise to stimulate insightful and engaging discussions during the roundtable at YRRSDS.

pdf bib
Personalized Topic Transition for Dialogue System
Kai Yoshida

In our research, we aim to achieve SDS capable of generating responses considering user preferences. While users have individual topic preferences, existing SDSs do not adequately consider such information. With the development of LLMs, SDSs are expected to be implemented in various tasks, including coexisting with humans in robotic applications. To become better partners with humans, systems are anticipated to memorize user preferences and utilize them in their response generation. Our future reserarch aim to realize SDSs that can remember and complement user information through dialogue, enabling personalized interactions. In YRRSDS, The author would like to propose the following topics for discussion. 1. What is the necessity of SDSs aimed specifically at dialogue rather than being just user interfaces? What do general users need from SDSs through conversation? 2. The relationship between SDSs and users: Should SDSs act just as agents, or should they aim to become like friends or family? 3. Privacy in conversational content. Nowadays, many SDS applications operate online via APIs, but is this preferable from a privacy perspective? If it is not preferable, how can this issue be resolved?

pdf bib
Elucidation of Psychotherapy and Development of New Treatment Methods Using AI
Shio Maeda

My research theme is to develop an optimal analytical model for various information generated during therapy using multimodal data in psychotherapy, to elucidate the process of psychotherapy, and to create an AI therapist to develop a new psychotherapy. In this context, I would like to participate in the Young Researchers’ Roundtable on Spoken Dialogue Systems because I believe that I can broaden my research horizons by participating and discussing with various young researchers.

pdf bib
Assessing Interactional Competence with Multimodal Dialog Systems
Mao Saeki

My research interests lie in multimodal dialog systems, especially in turn-taking and the understanding and generation of non-verbal cues. I am also interested in bringing dialog system research into industry, and making virtual agents practical in real world setting. I have been working on the Intelligent Language Learning Assistant (InteLLA) system, a virtual agent designed to provide fully automated English proficiency assessments through oral conversations. This project is driven by the practical need to address the lack of opportunities for second-language learners to assess and practice their conversation skills.

pdf bib
Faithfulness of Natural Language Generation
Patricia Schmidtova

In this position paper, I present my research interest in the faithfulness of natural language generation, i.e. the adherence to the data provided by a user or the dialog state. I motivate the task and present my progress and plans on the topic. I propose my position on the future of research dialog systems and share topics I would like to discuss during the roundtables.

pdf bib
Knowledge-Grounded Dialogue Systems for Generating Interesting and Engaging Responses
Hiroki Onozeki

My research interests lie in the area of building a dialogue system to generate interesting and entertaining responses, with a particular focus on knowledge-grounded dialogue systems. Study of open-domain dialogue systems seeks to maximize user engagement by enhancing specific dialogue skills. To achieve this goal, much research has focused on the generation of empathetic responses, personality-based responses, and knowledge-grounded responses. In addition, interesting and entertaining responses from the open-domain dialogue systems can increase user satisfaction and engagement due to their diversity and ability to attract the user’s interest. It has also been observed in task-oriented dialogue, user engagement can be increased by incorporating interesting responses into the dialogue. For example, methods have been proposed to incorporate interesting responses into spoken dialogue systems (SDSs) that support the execution of complex tasks and provide a pleasant and enjoyable experience for the user. However, even in the case of interesting responses, if the dialogue is incoherent, user engagement is likely to be significantly reduced. To create a dialogue system that is consistent and interesting in a dialogue context, I am working on using knowledge-grounded response generation methods to select interesting knowledge that is relevant to the dialogue context and to make responses that are based on that knowledge.

pdf bib
Towards a Dialogue System That Can Take Interlocutors’ Values into Account
Yuki Zenimoto

In this position paper, I present my research interests regarding the dialogue systems that can reflect the interlocutor’s values, such as their way of thinking and perceiving things. My work focuses on two main aspects: dialogue systems for eliciting the interlocutor’s values and methods for understanding the interlocutor’s values from narratives. Additionally, I discuss the abilities required for Spoken Dialogue Systems (SDSs) that can converse with the same user multiple times. Finally, I suggest topics for discussion regarding an SDS as a personal assistant for everyday use.

pdf bib
Multimodal Spoken Dialogue System with Biosignals
Shun Katada

The dominance of large language models has forced the transformation of research directions in many domains. The growth speed of large-scale models and the knowledge acquired have reached incredible levels. Thus, researchers must have the ability and foresight to adapt to a rapidly changing environment. In this position paper, the author introduces research interests and discusses their relationships from the perspective of spoken dialogue systems. In particular, the fields of multimodal processing and affective computing are introduced. Additionally, the effects of large language models on spoken dialogue systems research and topics for discussion are presented.

pdf bib
Timing Sensitive Turn-Taking in Spoken Dialogue Systems Based on User Satisfaction
Sadahiro Yoshikawa

pdf bib
Towards Robust and Multilingual Task-Oriented Dialogue Systems
Atsumoto Ohashi

In this position paper, I present my research interests regarding the field of task-oriented dialogue systems. My work focuses on two main aspects: optimizing the task completion ability of dialogue systems using reinforcement learning, and developing language resources and exploring multilinguality to support the advancement of dialogue systems across different languages. I discuss the limitations of current approaches in achieving robust task completion performance and propose a novel optimization approach called Post-Processing Networks. Furthermore, I highlight the importance of multilingual dialogue datasets and describe our work on constructing JMultiWOZ, the first large-scale Japanese task-oriented dialogue dataset.

pdf bib
Toward Faithful Dialogs: Evaluating and Improving the Faithfulness of Dialog Systems
Sicong Huang

My primary research interests lie in evaluating and improving the faithfulness of language model-based text generation systems. Recent advances in large language models (LLMs) such as GPT-4 and Llama have enabled the wide adoption of LLMs in various aspects of natural language processing (NLP). Despite their widespread use, LLMs still suffer from the problem of hallucination, limiting the practicality of deploying such systems in use cases where being factual and faithful is of critical importance. My research specifically aims to evaluate and improve the faithfulness, i.e. the factual alignment between the generated text and a given context, of text generation systems. By developing techniques to reliably evaluate, label, and improve generation faithfulness, we can enable wider adoption of dialog systems that need to converse with human users using accurate information.

pdf bib
Cognitive Model of Listener Response Generation and Its Application to Dialogue Systems
Taiga Mori

In this position paper, we introduce our efforts in modeling listener response generation and its application to dialogue systems. We propose that the cognitive process of generating listener responses involves four levels: attention level, word level, propositional information level, and activity level, with different types of responses used depending on the level. Attention level responses indicate that the listener is listening to and paying attention to the speaker’s speech. Word-level responses demonstrate the listener’s knowledge or understanding of a single representation. Propositional information level responses indicate the listener’s understanding, empathy, and emotions towards a single propositional information. Activity level responses are oriented towards activities. Additionally, we briefly report on our current initiative in generating propositional information level responses using a knowledge graph and LLMs.

pdf bib
Topological Deep Learning for Term Extraction
Benjamin Matthias Ruppik

Ben is a postdoctoral researcher in the Dialog Systems and Machine Learning research group led by Milica Gašić at the Heinrich-Heine-Universität Düsseldorf, which he joined in 2022. In collaboration with the Topology and Geometry group in the Mathematics Department, under the supervision of Marcus Zibrowius, Ben is developing applications of Topological Data Analysis in Natural Language Processing, focusing on dialogue systems. Before transitioning to machine learning research, Ben was a pure mathematician at the Max-Planck-Institute for Mathematics in Bonn, where he specialized in knotted surfaces in 4-dimensional manifolds. He graduated from the University of Bonn in 2022.

pdf bib
Dialogue Management with Graph-structured Knowledge
Nicholas Thomas Walker

I am a postdoctoral researcher at Otto-Friedrich University of Bamberg, and my research interests include the knowledge-grounded dialogue systems, logical rule-based reasoning for dialogue management, and human-robot interaction.

pdf bib
Towards a Co-creation Dialogue System
Xulin Zhou

In this position paper, I present my research interests in dialogue systems, where the user and the system collaboratively work on tasks through conversation. My work involves analyzing dialogues in which two parties collaborate through conversation, focusing on tasks that yield outcomes with no single correct answer. To support this research, I have created a tagline co-writing dialogue corpus, which I have analyzed from various perspectives. Additionally, I developed a prototype for a tagline co-writing dialogue system.

pdf bib
Enhancing Decision-Making with AI Assistance
Yoshiki Tanaka

My research interests broadly lie in the influence of artificial intelligence (AI) agents on human decision-making. Specifically, I aim to develop applications for conversational agents in decision-making support. During my master’s program, I developed a system that uses an interview dialogue system to support user review writing. In this approach, the conversational agent gathers product information such as users’ impressions and opinions during the interview, to create reviews, facilitating the review writing process. Additionally, I conducted a comprehensive evaluation from the perspectives of system users and review readers. Although experimental results have shown that the system is capable of generating helpful reviews, the quality of the reviews still depends on how effectively the agent elicits the information from users. Therefore, I believe that personalizing the agent’s interview strategy to users’ preferences regarding the review writing process can further enhance both the user experience and the helpfulness of the review.

pdf bib
Ontology Construction for Task-oriented Dialogue
Renato Vukovic

My research interests lie generally in dialogue ontology construction, that uses techniques from information extraction to extract relevant terms from task-oriented dialogue data and order them by finding hierarchical relations between them.

pdf bib
Generalized Visual-Language Grounding with Complex Language Context
Bhathiya Hemanthage

My research focus on Visual Dialogues and Generalized Visual-Language Grounding with Complex Language Context. Specifically, my research aim to utilize Large Language Models (LLMs) to build conversational agents capable of comprehending and responding to visual cues. Visual-Language Pre-trained (VLP) models, primarily utilizing transformer-based encoder-decoder architectures, are extensively employed across a range of visual-language tasks, such as visual question answering (VQA) and referring expression comprehension (REC). The effectiveness of these models stems from their robust visual-language integration capabilities. However, their performance is constrained in more complex applications like multimodal conversational agents, where intricate and extensive language contexts pose significant challenges. These tasks demands language-only reasoning before engaging in multimodal fusion. In response, my research investigates the application of Large Language Models (LLMs) with advance comprehension and generation capabilities to enhance performance in complex multimodal tasks, particularly multimodal dialogues. In brief, my work in visual dialogues revolves around two major research questions. i) How to redefine visually grounded conversational agent architectures to benefit from LLMs ii) How to transfer the large body of knowledge encoded in LLMs to conversational systems.

pdf bib
Towards a Real-Time Multimodal Emotion Estimation Model for Dialogue Systems
Jingjing Jiang

This position paper presents my research interest in establishing human-like chat-oriented dialogue systems. To this end, my work focuses on two main areas: the construction and utilization of multimodal datasets and real-time multimodal affective computing. I discuss the limitations of current multimodal dialogue corpora and multimodal affective computing models. As a solution, I have constructed a human-human dialogue dataset containing various synchronized multimodal information, and I have conducted preliminary analyses on it. In future work, I will further analyze the collected data and build a real-time multimodal emotion estimation model for dialogue systems.

pdf bib
Exploring Explainability and Interpretability in Generative AI
Shiyuan Huang

pdf bib
Innovative Approaches to Enhancing Safety and Ethical AI Interactions in Digital Environments
Zachary Yang

Ensuring safe online environments is a formidable challenge, but nonetheless an important one as people are now chronically online. The increasing online presence of people paired with the prevalence of harmful content such as toxicity, hate speech, misinformation and disinformation across various social media platforms and within different video calls for stronger detection and prevention methods. My research interests primarily lie in applied natural language processing for social good. Previously, I focused on measuring partisan polarization on social media during the COVID-19 pandemic and its societal impacts. Currently, at Ubisoft La Forge, I am dedicated to enhancing player safety within in-game chat systems by developing methods to detect toxicity, evaluating the biases in these detection systems, and assessing the current ecological state of online interactions. Additionally, I am engaged in simulating social media environments using LLMs to ethically test detection methods, evaluate the effectiveness of current mitigation strategies, and potentially introduce new, successful strategies. My suggested topics for discussion: 1. Understanding and mitigating social harms through high fidelity simulated social media environments 2. Enhancing safety in online environments such as within in-game chats (text and speech) 3. Personification of LLM agents 4. Ethically simulating social media sandbox environments at scale with LLM agents 5. Re-balancing the playing field between good and bad actors: Strategies for countering societal-scale manipulation.

pdf bib
Leveraging Linguistic Structural Information for Improving the Model’s Semantic Understanding Ability
Sangmyeong Lee

This position paper describes research interests of the author (semantic structure comprehension in multimodal dialogue environments), his point of view on Spoken Dialogue System research that a new wave is to be carried out for coexistence with LLMs, and discussion topic proposals. Those three topics are as follows: 1) How to keep up with, or manipulate LLM for academia research, 2) How representational languages for semantic structural information could be used in this new era, and 3) how to deal with disambiguating the user’s language during the actual dialogue scenario.

pdf bib
Multi-User Dialogue Systems and Controllable Language Generation
Nicolas Wagner

My research interests include multi-user dialogue systems with a focus on user modelling and the development of moderation strategies. Contemporary Spoken Dialogue Systems (SDSs) frequently lack the ability to deal with more than one user simultaneously. Moreover, I am interested in researching on the Controllability of Language Generation using Large Language Models (LLMs). Our hypothesis is that an integration of explicit dialogue control signals improves the Controllability and Reliability of generated sequences independently of the underlying LLM.

pdf bib
Enhancing Role-Playing Capabilities in Persona Dialogue Systems through Corpus Construction and Evaluation Methods
Ryuichi Uehara

My research interest involves persona dialogue systems, which use the profile information of a character or real person, called a persona, and responds accordingly. Persona dialogue systems can improve the consistency of the system’s responses, users’ trust, and user enjoyment. My current research focuses on persona dialogue systems, especially dialogue agents that role-play as fictional characters. The first task involves obtaining the dialogue and personas of novel characters and building a dialogue corpus. The second task involves evaluating whether the dialogue agent’s responses are character-like relative to the context. The goal of these studies is to allow dialogue agents to generate responses that are more character-like.

pdf bib
Character Expression and User Adaptation for Spoken Dialogue Systems
Kenta Yamamoto

The author is interested in building dialogue systems with character and user adaptation. The goal is to create a dialogue system capable of establishing deeper relationships with users. To build a trustful relationship with users, it is important for the system to express its character. The author particularly aims to convey the system’s character through multimodal behavior. Users currently try to speak clearly to avoid speech recognition errors when interacting with SDSs. However, it is necessary to develop SDSs that allow users to converse naturally, as if they were speaking with a human. The author focused on user adaptation by considering user personality. In particular, the author proposes a system that adjusts its manner of speaking according to the user’s personality. Furthermore, the author is interested not only in adjusting the system’s speaking style to match the user but also in making the system’s listening style more conducive to natural conversation.

pdf bib
Interactive Explanations through Dialogue Systems
Isabel Feustel

The growing need for transparency in AI systems has led to the increased popularity of explainable AI (XAI), with dialogue systems emerging as a promising approach to provide dynamic and interactive explanations. To overcome the limitations of non-conversational XAI methods, we proposed and implemented a generic dialogue architecture that integrates domain-specific knowledge, enhancing user comprehension and interaction. By incorporating computational argumentation and argumentative tree structures into our prototype, we found a positive impact on the dialogue’s effectiveness. In future research, we plan to improve Natural Language Understanding (NLU) to reduce error rates and better interpret user queries, and to advance Natural Language Generation (NLG) techniques for generating more fluid and contextually appropriate responses using large language models. Additionally, we will refine argument annotation to enable better selection and presentation of information, ensuring the system provides the most relevant and coherent explanations based on user needs. Over the next 5 to 10 years, we anticipate significant advancements in dialogue systems’ flexibility, personalization, and cultural adaptability, driven by large language models and open domain dialogues. These developments will enhance global communication, user satisfaction, and the effectiveness of virtual assistants across various applications while addressing ethical and social implications.

pdf bib
Towards Emotion-aware Task-oriented Dialogue Systems in the Era of Large Language Models
Shutong Feng

My research interests lie in the area of modelling affective behaviours of interlocutors in conversations. In particular, I look at emotion perception, expression, and management in information-retrieval task-oriented dialogue (ToD) systems. Traditionally, ToD systems focus primarily on fulfilling the user’s goal by requesting and providing appropriate information. Yet, in real life, the user’s emotional experience also contributes to the overall satisfaction. This requires the system’s ability to recognise, manage, and express emotions. To this end, I incorporated emotion in the entire ToD system pipeline (Feng et al., 2024, to appear in SIGDIAL 2024). In addition, in the era of large language models (LLMs), emotion recognition and generation have been made easy even under a zero-shot set-up (Feng et al., 2023; Stricker and Paroubek, 2024). Therefore, I am also interested in building ToD systems with LLMs and examining various types of affect in other ToD set-ups such as depression detection in clinical consultations.

pdf bib
Utilizing Large Language Models for Customized Dialogue Data Augmentation and Psychological Counseling
Zhiyang Qi

Large language models (LLMs), such as GPT-4, have driven significant technological advances in spoken dialogue systems (SDSs). In the era of LLMs, my research focuses on: (1) employing these models for customized dialogue data augmentation to improve SDS adaptability to various speaking styles, and (2) utilizing LLMs to support counselors with psychological counseling dialogues. In the future, I aim to integrate these themes, applying user adaptability to psychological counseling dialogues to facilitate smoother conversations.

pdf bib
Toward More Human-like SDSs: Advancing Emotional and Social Engagement in Embodied Conversational Agents
Zi Haur Pang

The author’s research advances human-AI interaction across two innovative domains to enhance the depth and authenticity of communication. Through Emotional Validation, which leverages psychotherapeutic techniques, the research enriches SDSs with advanced capabilities for understanding and responding to human emotions. On the other hand, while utilizing Embodied Conversational Agents (ECAs), the author focuses on developing agents that simulate sophisticated human social behaviors, enhancing their ability to engage in context-sensitive and personalized dialogue. Together, these initiatives aim to transform SDSs and ECAs into empathetic, embodied companions, pushing the boundaries of conversational AI.