Angus Addlesee


2024

pdf bib
Multi-party Multimodal Conversations Between Patients, Their Companions, and a Social Robot in a Hospital Memory Clinic
Angus Addlesee | Neeraj Cherakara | Nivan Nelson | Daniel Hernandez Garcia | Nancie Gunson | Weronika Sieińska | Christian Dondrup | Oliver Lemon
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

We have deployed an LLM-based spoken dialogue system in a real hospital. The ARI social robot embodies our system, which patients and their companions can have multi-party conversations with together. In order to enable this multi-party ability, multimodality is critical. Our system, therefore, receives speech and video as input, and generates both speech and gestures (arm, head, and eye movements). In this paper, we describe our complex setting and the architecture of our dialogue system. Each component is detailed, and a video of the full system is available with the appropriate components highlighted in real-time. Our system decides when it should take its turn, generates human-like clarification requests when the patient pauses mid-utterance, answers in-domain questions (grounding to the in-prompt knowledge), and responds appropriately to out-of-domain requests (like generating jokes or quizzes). This latter feature is particularly remarkable as real patients often utter unexpected sentences that could not be handled previously.

2023

pdf bib
Incremental Speech Processing for Voice Assistant Accessibility
Angus Addlesee
Proceedings of the 19th Annual Meeting of the Young Reseachers' Roundtable on Spoken Dialogue Systems

Speech production is nuanced and unique to every individual, but today’s Spoken Dialogue Systems (SDSs) are trained to use general speech patterns to successfully improve performance on various evaluation metrics. However, these patterns do not apply to certain user groups - often the very people that can benefit the most from SDSs. For example, people with dementia produce more disfluent speech than the general population. The healthcare domain is now a popular setting for spoken dialogue and human-robot interaction research. This trend is similar when observing company behaviour. Charities promote industry voice assistants, the creators are getting HIPAA compliance, and their features sometimes target vulnerable user groups. It is therefore critical to adapt SDSs to be more accessible.

pdf bib
Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering
Angus Addlesee | Weronika Sieińska | Nancie Gunson | Daniel Hernandez Garcia | Christian Dondrup | Oliver Lemon
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

This paper evaluates the extent to which current LLMs can capture task-oriented multi-party conversations (MPCs). We have recorded and transcribed 29 MPCs between patients, their companions, and a social robot in a hospital. We then annotated this corpus for multi-party goal-tracking and intent-slot recognition. People share goals, answer each other’s goals, and provide other people’s goals in MPCs - none of which occur in dyadic interactions. To understand user goals in MPCs, we compared three methods in zero-shot and few-shot settings: we fine-tuned T5, created pre-training tasks to train DialogLM using LED, and employed prompt engineering techniques with GPT-3.5-turbo, to determine which approach can complete this novel task with limited data. GPT-3.5-turbo significantly outperformed the others in a few-shot setting. The ‘reasoning’ style prompt, when given 7% of the corpus as example annotated conversations, was the best performing method. It correctly annotated 62.32% of the goal tracking MPCs, and 69.57% of the intent-slot recognition MPCs. A ‘story’ style prompt increased model hallucination, which could be detrimental if deployed in safety-critical settings. We conclude that multi-party conversations still challenge state-of-the-art LLMs.

2022

pdf bib
Securely Capturing People’s Interactions with Voice Assistants at Home: A Bespoke Tool for Ethical Data Collection
Angus Addlesee
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)

Speech production is nuanced and unique to every individual, but today’s Spoken Dialogue Systems (SDSs) are trained to use general speech patterns to successfully improve performance on various evaluation metrics. However, these patterns do not apply to certain user groups - often the very people that can benefit the most from SDSs. For example, people with dementia produce more disfluent speech than the general population. In order to evaluate systems with specific user groups in mind, and to guide the design of such systems to deliver maximum benefit to these users, data must be collected securely. In this short paper we present CVR-SI, a bespoke tool for ethical data collection. Designed for the healthcare domain, we argue that it should also be used in more general settings. We detail how off-the-shelf solutions fail to ensure that sensitive data remains secure and private. We then describe the ethical design and security features of our device, with a full guide on how to build both the hardware and software components of CVR-SI. Our design ensures inclusivity to all researchers in this field, particularly those who are not hardware experts. This guarantees everyone can collect appropriate data for human evaluation ethically, securely, and in a timely manner.

pdf bib
A Visually-Aware Conversational Robot Receptionist
Nancie Gunson | Daniel Hernandez Garcia | Weronika Sieińska | Angus Addlesee | Christian Dondrup | Oliver Lemon | Jose L. Part | Yanchao Yu
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Socially Assistive Robots (SARs) have the potential to play an increasingly important role in a variety of contexts including healthcare, but most existing systems have very limited interactive capabilities. We will demonstrate a robot receptionist that not only supports task-based and social dialogue via natural spoken conversation but is also capable of visually grounded dialogue; able to perceive and discuss the shared physical environment (e.g. helping users to locate personal belongings or objects of interest). Task-based dialogues include check-in, navigation and FAQs about facilities, alongside social features such as chit-chat, access to the latest news and a quiz game to play while waiting. We also show how visual context (objects and their spatial relations) can be combined with linguistic representations of dialogue context, to support visual dialogue and question answering. We will demonstrate the system on a humanoid ARI robot, which is being deployed in a hospital reception area.

2021

pdf bib
Incremental Graph-Based Semantics and Reasoning for Conversational AI
Angus Addlesee | Arash Eshghi
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

The next generation of conversational AI systems need to: (1) process language incrementally, token-by-token to be more responsive and enable handling of conversational phenomena such as pauses, restarts and self-corrections; (2) reason incrementally allowing meaning to be established beyond what is said; (3) be transparent and controllable, allowing designers as well as the system itself to easily establish reasons for particular behaviour and tailor to particular user groups, or domains. In this short paper we present ongoing preliminary work combining Dynamic Syntax (DS) - an incremental, semantic grammar framework - with the Resource Description Framework (RDF). This paves the way for the creation of incremental semantic parsers that progressively output semantic RDF graphs as an utterance unfolds in real-time. We also outline how the parser can be integrated with an incremental reasoning engine through RDF. We argue that this DS-RDF hybrid satisfies the desiderata listed above, yielding semantic infrastructure that can be used to build responsive, real-time, interpretable Conversational AI that can be rapidly customised for specific user groups such as people with dementia.

pdf bib
The Spoon Is in the Sink: Assisting Visually Impaired People in the Kitchen
Katie Baker | Amit Parekh | Adrien Fabre | Angus Addlesee | Ruben Kruiper | Oliver Lemon
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

Visual Question Answering (VQA) systems are increasingly adept at a variety of tasks, and this technology can be used to assist blind and partially sighted people. To do this, the system’s responses must not only be accurate, but usable. It is also vital for assistive technologies to be designed with a focus on: (1) privacy, as the camera may capture a user’s mail, medication bottles, or other sensitive information; (2) transparency, so that the system’s behaviour can be explained and trusted by users; and (3) controllability, to tailor the system for a particular domain or user group. We have therefore extended a conversational VQA framework, called Aye-saac, with these objectives in mind. Specifically, we gave Aye-saac the ability to answer visual questions in the kitchen, a particularly challenging area for visually impaired people. Our system can now answer questions about quantity, positioning, and system confidence in regards to 299 kitchen objects. Questions about the spatial relations between these objects are particularly helpful to visually impaired people, and our system output more usable answers than other state of the art end-to-end VQA systems.

2020

pdf bib
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI
Angus Addlesee | Yanchao Yu | Arash Eshghi
Proceedings of the 28th International Conference on Computational Linguistics

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.