Gabriel Skantze


2021

pdf bib
How “open” are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation
A. Seza Doğruöz | Gabriel Skantze
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of “openness”, we conduct two studies: First, we classify the types of “speech events” encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the “small talk” category and exclude the other speech event categories encountered in real life human-human communication. Second, we conduct a small-scale pilot study to generate online conversations covering a wider range of speech event categories between two humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook). A human evaluation of these generated conversations indicates a preference for human-human conversations, since the human-chatbot conversations lack coherence in most speech event categories. Based on these results, we suggest (a) using the term “small talk” instead of “open-domain” for the current chatbots which are not that “open” in terms of conversational abilities yet, and (b) revising the evaluation methods to test the chatbot conversations against other speech events.

pdf bib
Projection of Turn Completion in Incremental Spoken Dialogue Systems
Erik Ekstedt | Gabriel Skantze
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

The ability to take turns in a fluent way (i.e., without long response delays or frequent interruptions) is a fundamental aspect of any spoken dialog system. However, practical speech recognition services typically induce a long response delay, as it takes time before the processing of the user’s utterance is complete. There is a considerable amount of research indicating that humans achieve fast response times by projecting what the interlocutor will say and estimating upcoming turn completions. In this work, we implement this mechanism in an incremental spoken dialog system, by using a language model that generates possible futures to project upcoming completion points. In theory, this could make the system more responsive, while still having access to semantic information not yet processed by the speech recognizer. We conduct a small study which indicates that this is a viable approach for practical dialog systems, and that this is a promising direction for future research.

2020

pdf bib
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
Erik Ekstedt | Gabriel Skantze
Findings of the Association for Computational Linguistics: EMNLP 2020

Syntactic and pragmatic completeness is known to be important for turn-taking prediction, but so far machine learning models of turn-taking have used such linguistic information in a limited way. In this paper, we introduce TurnGPT, a transformer-based language model for predicting turn-shifts in spoken dialog. The model has been trained and evaluated on a variety of written and spoken dialog datasets. We show that the model outperforms two baselines used in prior work. We also report on an ablation study, as well as attention and gradient analyses, which show that the model is able to utilize the dialog context and pragmatic completeness for turn-taking prediction. Finally, we explore the model’s potential in not only detecting, but also projecting, turn-completions.

2019

pdf bib
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Satoshi Nakamura | Milica Gasic | Ingrid Zuckerman | Gabriel Skantze | Mikio Nakano | Alexandros Papangelis | Stefan Ultes | Koichiro Yoshino
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

pdf bib
Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees
Nils Axelsson | Gabriel Skantze
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

In dialogue, speakers continuously adapt their speech to accommodate the listener, based on the feedback they receive. In this paper, we explore the modelling of such behaviours in the context of a robot presenting a painting. A Behaviour Tree is used to organise the behaviour on different levels, and allow the robot to adapt its behaviour in real-time; the tree organises engagement, joint attention, turn-taking, feedback and incremental speech processing. An initial implementation of the model is presented, and the system is evaluated in a user study, where the adaptive robot presenter is compared to a non-adaptive version. The adaptive version is found to be more engaging by the users, although no effects are found on the retention of the presented material.

2018

pdf bib
Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution
Todd Shore | Gabriel Skantze
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Referring to entities in situated dialog is a collaborative process, whereby interlocutors often expand, repair and/or replace referring expressions in an iterative process, converging on conceptual pacts of referring language use in doing so. Nevertheless, much work on exophoric reference resolution (i.e. resolution of references to entities outside of a given text) follows a literary model, whereby individual referring expressions are interpreted as unique identifiers of their referents given the state of the dialog the referring expression is initiated. In this paper, we address this collaborative nature to improve dialogic reference resolution in two ways: First, we trained a words-as-classifiers logistic regression model of word semantics and incrementally adapt the model to idiosyncratic language between dyad partners during evaluation of the dialog. We then used these semantic models to learn the general referring ability of each word, which is independent of referent features. These methods facilitate accurate automatic reference resolution in situated dialog without annotation of referring expressions, even with little background data.

pdf bib
A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Dimosthenis Kontogiorgos | Vanya Avramova | Simon Alexanderson | Patrik Jonell | Catharine Oertel | Jonas Beskow | Gabriel Skantze | Joakim Gustafson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
Todd Shore | Theofronia Androulakaki | Gabriel Skantze
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks
Gabriel Skantze
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turn-taking decisions in a human-robot interaction scenario.

2015

pdf bib
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Alexander Koller | Gabriel Skantze | Filip Jurcicek | Masahiro Araki | Carolyn Penstein Rose
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Modelling situated human-robot interaction using IrisTK
Gabriel Skantze | Martin Johansson
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction
Martin Johansson | Gabriel Skantze
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Automatic Detection of Miscommunication in Spoken Dialogue Systems
Raveesh Meena | José Lopes | Gabriel Skantze | Joakim Gustafson
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System
Raveesh Meena | Johan Boye | Gabriel Skantze | Joakim Gustafson
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

pdf bib
Human Evaluation of Conceptual Route Graphs for Interpreting Spoken Route Descriptions
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the IWCS 2013 Workshop on Computational Models of Spatial Language Interpretation and Generation (CoSLI-3)

pdf bib
Exploring the effects of gaze and pauses in situated human-robot interaction
Gabriel Skantze | Anna Hjalmarsson | Catharine Oertel
Proceedings of the SIGDIAL 2013 Conference

pdf bib
The Map Task Dialogue System: A Test-bed for Modelling Human-Like Dialogue
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2013 Conference

pdf bib
A Data-driven Model for Timing Feedback in a Map Task Dialogue System
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2013 Conference

2010

pdf bib
Towards Incremental Speech Generation in Dialogue Systems
Gabriel Skantze | Anna Hjalmarsson
Proceedings of the SIGDIAL 2010 Conference

pdf bib
Middleware for Incremental Processing in Conversational Agents
David Schlangen | Timo Baumann | Hendrik Buschmeier | Okko Buß | Stefan Kopp | Gabriel Skantze | Ramin Yaghoubzadeh
Proceedings of the SIGDIAL 2010 Conference

2009

pdf bib
Attention and Interaction Control in a Human-Human-Computer Dialogue Setting
Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2009 Conference

pdf bib
A General, Abstract Model of Incremental Dialogue Processing
David Schlangen | Gabriel Skantze
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Incremental Dialogue Processing in a Micro-Domain
Gabriel Skantze | David Schlangen
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2007

pdf bib
Making Grounding Decisions: Data-driven Estimation of Dialogue Costs and Confidence Thresholds
Gabriel Skantze
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2005

pdf bib
GALATEA: A Discourse Modeller Supporting Concept-Level Error Handling in Spoken Dialogue Systems
Gabriel Skantze
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue