Proceedings of the 19th Annual Meeting of the Young Reseachers' Roundtable on Spoken Dialogue Systems
Position paper for YRRSDS 2023
In this position paper, I will present the research interests in my PostDoc on safety and robustness specific to conversational AI, including then relevant overlap from my PhD.
Speech production is nuanced and unique to every individual, but today’s Spoken Dialogue Systems (SDSs) are trained to use general speech patterns to successfully improve performance on various evaluation metrics. However, these patterns do not apply to certain user groups - often the very people that can benefit the most from SDSs. For example, people with dementia produce more disfluent speech than the general population. The healthcare domain is now a popular setting for spoken dialogue and human-robot interaction research. This trend is similar when observing company behaviour. Charities promote industry voice assistants, the creators are getting HIPAA compliance, and their features sometimes target vulnerable user groups. It is therefore critical to adapt SDSs to be more accessible.
This research encompasses a comprehensive exploration of Spoken Dialogue Systems (SDSs) in the manufacturing sector. It begins by establishing a conceptual architecture and taxonomy to guide the design and selection of SDS elements. Real case applications, including worker safety and cybersecurity support, validate the research findings and highlight areas for improvement. Looking ahead, the study delves into the potential of Large Language Models (LLMs) and multi-modal applications. Emphasizing the importance of extreme personalization, the study highlights the need to cater to the diverse qualifications and preferences of workers. Additionally, it investigates the integration of SDSs with other sensory modalities, such as images, videos, and augmented or virtual reality scenarios, to enhance the user experience and productivity. The research also addresses crucial considerations related to knowledge base optimization. It examines semantic variations of words across different application contexts, the continuous updating of procedures and data, and the adaptability of SDSs to diverse dialects and linguistic abilities, particularly in low-schooling personnel scenarios. Privacy, industrial protection, and ethical concerns in the era of LLMs and external players like OpenAI are given due attention. The study explores the boundaries of knowledge that conversational systems should possess, advocating for transparency, explainability, and responsible data handling practices.
The process of “conversational grounding” is an interactive process that has been studied extensively in cognitive science, whereby participants in a conversation check to make sure their interlocutors understand what is being referred to. This interactive process uses multiple modes of communication to establish the information between the participants. This could include information provided through eye-gaze, head movements, intonation in speech, along with the content of the speech. While the process is essential to successful communication between humans and between humans and machines, work needs to be done on testing and building the capabilities of the current dialogue system in managing conversational grounding, especially in multimodal medium of communication. Recent work such as Benotti and Blackburn have shown the importance of conversational grounding in dialog systems and how current systems fail in them. This is essential for the advancement of Embodied Conversational Agents and Social Robots. Thus my PhD project aims to test, understand and improve the functioning of current dialog models with respect to Conversational Grounding.
My research interests focus on natural language generation (NLG) regarding how to make system outputs more intuitive and comprehensible for the human-user and conversational entrainment and alignment from the perspective of how dialogue systems could or should personalize its responses to the human user. As it relates to NLG, my current work focuses on training a system to auto-generate comments for SQL queries produced by a Text-to-SQL parser. The goal is to make the connection between technical SQL language and the user’s question more transparent. My linguistic training lies primarily at the intersection of computational and socio-linguistics. As such, my curiosities in conversational entrainment and alignment focus on the extent to which conversational agents can or should adjust their language based on human characteristics such as age, race, or gender.
Position paper for YRRSDS 2023
Large language models (LLMs) have brought about a significant transformation in spoken dialogue systems (SDSs). It is anticipated that these systems will be implemented into diverse robotic applications and employed in a variety of social settings. The author presents research interest with the aim of realizing social SDSs from multiple perspectives, including task design, turn-taking mechanisms, and evaluation methodologies. Additionally, future research in social SDSs should delve into a deeper understanding of user mental states and a relationship with society via multi-party conversations. Finally, the author suggests topics for discussion regarding the future directions of SDS researchers in the LLM era.
Many companies use dialogue systems for their customer service, and although there has been a rise in the usage of these systems (Costello and LoDolce, 2022), many of these systems still face challenges in comprehending and properly responding to the customer (Følstadet al., 2021). In our project we aim to figure out how to develop and improve these conversational agents. Part of this project (detailed in this paper) will focus on the detection of breakdown patterns and the possible solutions (repairs) to mitigate negative results of these errors.
Position paper on the intersection between chitchat and task-oriented dialogues (TODs), with a focus on integrating capabilities typically associated with chitchat systems into task-oriented agents.
This is my position paper for YRRSDS 2023. In it, I write about the details of my research interests as well as past, current and future projects, talk about the status of spoken dialogue system research, include a short bio, and suggest topics for discussion.
This position paper is an overview of author’s main research interests and work considering deep learning techniques in audio classification, sign languages, and multimodality in dialogue systems. Author also shares her opinion on current and future research considering dialogue agents, and suggests topics for discussion panels.
Task-Oriented Dialogue (TOD) systems provide interactive assistance to a user in order to accomplish a specific task such as making a reservation at a restaurant or booking a room in a hotel. Speech presents itself as a natural interface for TOD systems. A typical approach to implement them is to use a modular architecture (Gao et al., 2018). A core component of such dialogue systems is Spoken Language Understanding (SLU) whose goal is to extract the relevant information from the user’s utterances. While spoken dialogue was the focus of earlier work (Williams et al., 2013; Henderson et al., 2014), recent work has focused on text inputs with no regard for the specificities of spoken language (Wu et al., 2019; Heck et al., 2020; Feng et al., 2021). However, this approach fails to account for the differences between written and spoken language (Faruqui and Hakkani-Tür, 2022) such as disfluencies. My research focuses on Spoken Language Understanding in the context of Task-Oriented Dialogue. More specifically I am interested in the two following research directions: • Annotation schema for spoken TODs, • Integration of dialogue history for contextually coherent predictions.
My position paper for the YRRSDS 2023 workshop.
My PhD focuses on conversational agents for behaviour change, with a focus on the feasibility of applying Large Language Models (LLMs) such as GPT-4 in this context.
My primary research focus lies in the domain of Text Style Transfer (TST), a fascinating area within Natural Language Processing (NLP). TST involves the transfor- mation of text into a desired style while approximately preserving its underlying content. In my research, I am also driven by the goal of incorporating TST techniques into NLP systems, particularly within the realm of dia- logue systems. I am intrigued by the concept of Stylized Dialog Response Generation, which aims to enhance the versatility and adaptability of dialog systems in generat- ing text responses with specific style attributes. By ad- vancing our understanding of TST and its integration into dialogue systems, my research seeks to contribute to the broader field of human-computer interaction. Through the development of robust and versatile dialogue systems with enhanced style transfer capabilities, we can facili- tate more engaging and personalized conversational experiences.
A brief introduction to author’s keyinterests and research topics which are: multimodal dialogue systems and impact of data augmentation to NLU performance. In addition to that the author shares his biography and view on the future of dialogue assistants.
This submission discusses my research interests in two areas: measuring user satisfaction in goal-oriented dialogue systems and exploring the potential of multi-modal interactions. For goal-oriented dialogue systems, I focus on evaluating and enhancing user satisfaction throughout the interaction process, aiming to propose innovative strategies and address the limitations of existing evaluation techniques. Additionally, I explore the benefits of multi-modal dialogue systems, highlighting their ability to provide more natural and immersive conversations by incorporating various communication modes such as speech, text, gestures, and visuals.
My research interests broadly lie in the area of Information Extraction from Spoken Dialogue, with a spacial focus on state modeling, anaphora resolution, program synthesis & planning, and intent classification in goal-oriented conversations. My aim is to create embedded dialogue systems that can interact with humans in a collaborative setup to solve tasks in a digital/non-digital environment. Most of the goal-oriented conversations usually involve experts and a laypersons. The aim for the expert is to consider all the information provided by the layperson, identify the underlying set of issues or intents, and prescribe solutions. While human experts are very good at extracting such information, AI agents (that build up most of the automatic dialog systems today) not so much. Most of the existing assistants (or chatbots) only consider individual utterances and do not ground them in the context of the dialogue. My work in this direction has focused on making these systems more effective at extracting the most relevant information from the dialogue to help the human user reach their end-goal.
My research interests lie in the area of modelling natural and human-like conversations, with a special focus on emotions in task-oriented dialogue (ToD) systems. ToD systems need to produce semantically and grammatically correct responses to fulfil the user’s goal. Being able to perceive and express emotions pushes them one more step towards achieving human-likeness. To begin with, I constructed a dataset with meaningful emotion labels as well as a wide coverage of emotions and linguistic features in ToDs. Then, I improved emotion recognition in conversations (ERC) in the task-oriented domain by exploiting key characteristics of ToDs. Currently, I am working towards enhancing ToD systems with emotions.
I am broadly interested in evaluation of dialogue systems, in all its many facets: The data they are trained on, their ability to perform a task successfully, their skills with respect to various dialogue phenomena, their resemblance to human cognitive processes, and their ethical and societal impact. More specifically, my research topics focus on understanding the possibilities and limits of current multimodal neural network-based models to incrementally encode information for natural language understanding in general and also for building common ground and asking for clarification. Besides, I am interested in dialogue games as a means to elicit and collect dialogue data and to evaluate the abilities of dialogue models.
My research work centers on how to enable a human-like interaction through generating contextual, emotional or proactive responses, both in task-oriented and in chitchat spoken dialogue systems (SDSs), because natural lan- guage generation (NLG) is an indispensable component in SDSs and can directly affect the user interactive expe- rience of the entire dialogue system. In addition to NLG, I am also interested in natural language understanding (NLU), as it plays a crucial role in SDSs and is a prerequisite for dialogue systems to generate replies.
The author’s objective centers around developing a spoken dialogue system (SDS) that can emulate the cognitive and conversational qualities of a human friend. Key attributes such as empathy, knowledge/causality reasoning, and personality are integral components of human interaction. The proposed approach involves the creation of an Empathy-enriched SDS, capable of comprehending human emotions and circumstances, thus providing companionship and assistance akin to a trusted friend. Additionally, the Causality-reasoning for SDS aims to ground the system in commonsense knowledge and equip it with the ability to reason about causalities, such as predicting user desires/reactions and system intentions/reactions, thereby enhancing the system’s intelligence and human-like behavior. Finally, the concept of a Personality-conditioned SDS involves enabling systems to exhibit distinct personalities, further enhancing the naturalness of human-robot interaction.
This position paper describes my research interests, spoken dialogue system research, and suggested topics for discussion.