Muhammad Umair
2024
Large Language Models Know What To Say But Not When To Speak
Muhammad Umair | Vasanth Sarathy | Jan Ruiter
Findings of the Association for Computational Linguistics: EMNLP 2024
Muhammad Umair | Vasanth Sarathy | Jan Ruiter
Findings of the Association for Computational Linguistics: EMNLP 2024
Turn-taking is a fundamental mechanism in human communication that ensures smooth and coherent verbal interactions. Recent advances in Large Language Models (LLMs) have motivated their use in improving the turn-taking capabilities of Spoken Dialogue Systems (SDS), such as their ability to respond at appropriate times. However, existing models often struggle to predict opportunities for speaking — called Transition Relevance Places (TRPs) — in natural, unscripted conversations, focusing only on turn-final TRPs and not within-turn TRPs. To address these limitations, we introduce a novel dataset of participant-labeled within-turn TRPs and use it to evaluate the performance of state-of-the-art LLMs in predicting opportunities for speaking. Our experiments reveal the current limitations of LLMs in modeling unscripted spoken interactions, highlighting areas for improvement and paving the way for more naturalistic dialogue systems.
2022
Using Transition Duration to Improve Turn-taking in Conversational Agents
Charles Threlkeld | Muhammad Umair | Jp de Ruiter
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Charles Threlkeld | Muhammad Umair | Jp de Ruiter
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Smooth turn-taking is an important aspect of natural conversation that allows interlocutors to maintain adequate mutual comprehensibility. In human communication, the timing between utterances is normatively constrained, and deviations convey socially relevant paralinguistic information. However, for spoken dialogue systems, smooth turn-taking continues to be a challenge. This motivates the need for spoken dialogue systems to employ a robust model of turn-taking to ensure that messages are exchanged smoothly and without transmitting unintended paralinguistic information. In this paper, we examine dialogue data from natural human interaction to develop an evidence-based model for turn-timing in spoken dialogue systems. First, we use timing between turns to develop two models of turn-taking: a speaker-agnostic model and a speaker-sensitive model. From the latter model, we derive the propensity of listeners to take the next turn given TRP duration. Finally, we outline how this measure may be incorporated into a spoken dialogue system to improve the naturalness of conversation.
GailBot: An automatic transcription system for Conversation Analysis
Muhammad Umair | Julia Beret Mertens | Saul Albert | Jan Peter De Ruiter
Dialogue Discourse Volume 13
Muhammad Umair | Julia Beret Mertens | Saul Albert | Jan Peter De Ruiter
Dialogue Discourse Volume 13
Researchers studying human interaction, such as conversation analysts, psychologists, and linguists, all rely on detailed transcriptions of language use. Ideally, these should include so-called paralinguistic features of talk, such as overlaps, prosody, and intonation, as they convey important information. However, creating conversational transcripts that include these features by hand requires substantial amounts of time by trained transcribers. There are currently no Speech to Text (STT) systems that are able to integrate these features in the generated transcript. To reduce the resources needed to create detailed conversation transcripts that include representation of paralinguistic features, we developed a program called GailBot. GailBot combines STT services with plugins to automatically generate first drafts of transcripts that largely follow the transcription standards common in the field of Conversation Analysis. It also enables researchers to add new plugins to transcribe additional features, or to improve the plugins it currently uses. We describe GailBot’s architecture and its use of computational heuristics and machine learning. We also evaluate its output in relation to transcripts produced by both human transcribers and comparable automated transcription systems. We argue that despite its limitations, GailBot represents a substantial improvement over existing dialogue transcription software.