Neural Generation of Dialogue Response Timings

Matthew Roddy, Naomi Harte


Abstract
The timings of spoken response offsets in human dialogue have been shown to vary based on contextual elements of the dialogue. We propose neural models that simulate the distributions of these response offsets, taking into account the response turn as well as the preceding turn. The models are designed to be integrated into the pipeline of an incremental spoken dialogue system (SDS). We evaluate our models using offline experiments as well as human listening tests. We show that human listeners consider certain response timings to be more natural based on the dialogue context. The introduction of these models into SDS pipelines could increase the perceived naturalness of interactions.
Anthology ID:
2020.acl-main.221
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2442–2452
Language:
URL:
https://aclanthology.org/2020.acl-main.221
DOI:
10.18653/v1/2020.acl-main.221
Bibkey:
Cite (ACL):
Matthew Roddy and Naomi Harte. 2020. Neural Generation of Dialogue Response Timings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2442–2452, Online. Association for Computational Linguistics.
Cite (Informal):
Neural Generation of Dialogue Response Timings (Roddy & Harte, ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.221.pdf
Video:
 http://slideslive.com/38928926
Code
 mattroddy/RTNets