Prompt-Guided Turn-Taking Prediction

Koji Inoue; Mikey Elmers; Yahui Fu; Zi Haur Pang; Divesh Lala; Keiko Ochi; Tatsuya Kawahara

Prompt-Guided Turn-Taking Prediction

Koji Inoue, Mikey Elmers, Yahui Fu, Zi Haur Pang, Divesh Lala, Keiko Ochi, Tatsuya Kawahara

Abstract

Turn-taking prediction models are essential components in spoken dialogue systems and conversational robots. Recent approaches leverage transformer-based architectures to predict speech activity continuously and in real-time. In this study, we propose a novel model that enables turn-taking prediction to be dynamically controlled via textual prompts. This approach allows intuitive and explicit control through instructions such as “faster” or “calmer,” adapting dynamically to conversational partners and contexts. The proposed model builds upon a transformer-based voice activity projection (VAP) model, incorporating textual prompt embeddings into both channel-wise transformers and a cross-channel transformer. We evaluated the feasibility of our approach using over 950 hours of human-human spoken dialogue data. Since textual prompt data for the proposed approach was not available in existing datasets, we utilized a large language model (LLM) to generate synthetic prompt sentences. Experimental results demonstrated that the proposed model improved prediction accuracy and effectively varied turn-taking timing behaviors according to the textual prompts.

Anthology ID:: 2025.sigdial-1.9
Volume:: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: August
Year:: 2025
Address:: Avignon, France
Editors:: Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 146–151
Language:
URL:: https://aclanthology.org/2025.sigdial-1.9/
DOI:
Bibkey:
Cite (ACL):: Koji Inoue, Mikey Elmers, Yahui Fu, Zi Haur Pang, Divesh Lala, Keiko Ochi, and Tatsuya Kawahara. 2025. Prompt-Guided Turn-Taking Prediction. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 146–151, Avignon, France. Association for Computational Linguistics.
Cite (Informal):: Prompt-Guided Turn-Taking Prediction (Inoue et al., SIGDIAL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.sigdial-1.9.pdf

PDF Cite Search Fix data