Transition Relevance Point Detection for Spoken Dialogue Systems with Self-Attention Transformer

Kouki Miyazawa; Yoshinao Sato

Transition Relevance Point Detection for Spoken Dialogue Systems with Self-Attention Transformer

Abstract

Most conventional spoken dialogue systems determine when to respond based on the elapsed time of silence following user speech utterances. This approach often results in failures of turn-taking, disrupting smooth communications with users. This study addresses the detection of when it is acceptable for the dialogue system to start speaking. Specifically, we aim to detect transition relevant points (TRPs) rather than predict whether the dialogue participants will actually start speaking. To achieve this, we employ a self-supervised speech representation using contrastive predictive coding and a self-attention transformer. The proposed model, TRPDformer, was trained and evaluated on the corpus of everyday Japanese conversation. TRPDformer outperformed a baseline model based on the elapsed time of silence. Furthermore, third-party listeners rated the timing of system responses determined using the proposed model as superior to that of the baseline in a preference test.

Anthology ID:: 2025.sigdial-1.21
Volume:: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: August
Year:: 2025
Address:: Avignon, France
Editors:: Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 268–274
Language:
URL:: https://aclanthology.org/2025.sigdial-1.21/
DOI:
Bibkey:
Cite (ACL):: Kouki Miyazawa and Yoshinao Sato. 2025. Transition Relevance Point Detection for Spoken Dialogue Systems with Self-Attention Transformer. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 268–274, Avignon, France. Association for Computational Linguistics.
Cite (Informal):: Transition Relevance Point Detection for Spoken Dialogue Systems with Self-Attention Transformer (Miyazawa & Sato, SIGDIAL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.sigdial-1.21.pdf

PDF Cite Search Fix data