TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog

Erik Ekstedt, Gabriel Skantze


Abstract
Syntactic and pragmatic completeness is known to be important for turn-taking prediction, but so far machine learning models of turn-taking have used such linguistic information in a limited way. In this paper, we introduce TurnGPT, a transformer-based language model for predicting turn-shifts in spoken dialog. The model has been trained and evaluated on a variety of written and spoken dialog datasets. We show that the model outperforms two baselines used in prior work. We also report on an ablation study, as well as attention and gradient analyses, which show that the model is able to utilize the dialog context and pragmatic completeness for turn-taking prediction. Finally, we explore the model’s potential in not only detecting, but also projecting, turn-completions.
Anthology ID:
2020.findings-emnlp.268
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2981–2990
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.268
DOI:
10.18653/v1/2020.findings-emnlp.268
Bibkey:
Cite (ACL):
Erik Ekstedt and Gabriel Skantze. 2020. TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2981–2990, Online. Association for Computational Linguistics.
Cite (Informal):
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog (Ekstedt & Skantze, Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.268.pdf
Code
 ErikEkstedt/TurnGPT
Data
Coached Conversational Preference ElicitationDailyDialogMultiWOZ