Gabriel Skantze - ACL Anthology

Gabriel Skantze

2025

Using LLMs to Grade Clinical Reasoning for Medical Students in Virtual Patient Dialogues
Jonathan Schiött | William Ivegren | Alexander Borg | Ioannis Parodis | Gabriel Skantze
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

This paper presents an evaluation of the use of large language models (LLMs) for grading clinical reasoning during rheumatology medical history virtual patient (VP) simulations. The study explores the feasibility of using state-of-the-art LLMs, including both general-purpose models, with various prompting strategies such as zero-shot, analysis-first, and chain-of-thought prompting, as well as reasoning models. The performance of these models in grading transcribed dialogues from VP simulations conducted on a Furhat robot was evaluated against human expert annotations. Human experts initially achieved a 65% inter-rater agreement, which resulted in a pooled Cohen’s Kappa of 0.71 and 82.3% correctness. The best LLM, o3-mini, achieved a pooled Kappa of 0.68 and 81.5% correctness, with response times under 30 seconds, compared to approximately 6 minutes for human grading. These results indicate the possibility that automatic assessments can approach human reliability under controlled simulation conditions while delivering time and cost efficiencies.

Role of Reasoning in LLM Enjoyment Detection: Evaluation Across Conversational Levels for Human-Robot Interaction
Lubos Marcinek | Bahar Irfan | Gabriel Skantze | Andre Pereira | Joakim Gustafsson
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

User enjoyment is central to developing conversational AI systems that can recover from failures and maintain interest over time. However, existing approaches often struggle to detect subtle cues that reflect user experience. Large Language Models (LLMs) with reasoning capabilities have outperformed standard models on various other tasks, suggesting potential benefits for enjoyment detection. This study investigates whether models with reasoning capabilities outperform standard models when assessing enjoyment in a human-robot dialogue corpus at both turn and interaction levels. Results indicate that reasoning capabilities have complex, model-dependent effects rather than universal benefits. While performance was nearly identical at the interaction level (0.44 vs 0.43), reasoning models substantially outperformed at the turn level (0.42 vs 0.36). Notably, LLMs correlated better with users’ self-reported enjoyment metrics than human annotators, despite achieving lower accuracy against human consensus ratings. Analysis revealed distinctive error patterns: non-reasoning models showed bias toward positive ratings at the turn level, while both model types exhibited central tendency bias at the interaction level. These findings suggest that reasoning should be applied selectively based on model architecture and assessment context, with assessment granularity significantly influencing relative effectiveness.

Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
Koji Inoue | Divesh Lala | Gabriel Skantze | Tatsuya Kawahara
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In human conversations, short backchannel utterances such as “yeah” and “oh” play a crucial role in facilitating smooth and engaging dialogue.These backchannels signal attentiveness and understanding without interrupting the speaker, making their accurate prediction essential for creating more natural conversational agents.This paper proposes a novel method for real-time, continuous backchannel prediction using a fine-tuned Voice Activity Projection (VAP) model.While existing approaches have relied on turn-based or artificially balanced datasets, our approach predicts both the timing and type of backchannels in a continuous and frame-wise manner on unbalanced, real-world datasets.We first pre-train the VAP model on a general dialogue corpus to capture conversational dynamics and then fine-tune it on a specialized dataset focused on backchannel behavior.Experimental results demonstrate that our model outperforms baseline methods in both timing and type prediction tasks, achieving robust performance in real-time environments.This research offers a promising step toward more responsive and human-like dialogue systems, with implications for interactive spoken dialogue applications such as virtual assistants and robots.

Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models
Bram Willemsen | Gabriel Skantze
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)

In this paper, we explore the use of a text-only, autoregressive language modeling approach for the extraction of referring expressions from visually grounded dialogue. More specifically, the aim is to investigate the extent to which the linguistic context alone can inform the detection of mentions that have a (visually perceivable) referent in the visual context of the conversation. To this end, we adapt a pretrained large language model (LLM) to perform a relatively course-grained annotation of mention spans in unfolding conversations by demarcating mention span boundaries in text via next-token prediction. Our findings indicate that even when using a moderately sized LLM, relatively small datasets, and parameter-efficient fine-tuning, a text-only approach can be effective, highlighting the relative importance of the linguistic context for this task. Nevertheless, we argue that the task represents an inherently multimodal problem and discuss limitations fundamental to unimodal approaches.

2024

How Much Does Nonverbal Communication Conform to Entropy Rate Constancy?: A Case Study on Listener Gaze in Interaction
Yu Wang | Yang Xu | Gabriel Skantze | Hendrik Buschmeier
Findings of the Association for Computational Linguistics: ACL 2024

According to the Entropy Rate Constancy (ERC) principle, the information density of a text is approximately constant over its length. Whether this principle also applies to nonverbal communication signals is still under investigation. We perform empirical analyses of video-recorded dialogue data and investigate whether listener gaze, as an important nonverbal communication signal, adheres to the ERC principle. Results show (1) that the ERC principle holds for listener gaze; and (2) that the two linguistic factors syntactic complexity and turn transition potential are weakly correlated with local entropy of listener gaze.

Mhm... Yeah? Okay! Evaluating the Naturalness and Communicative Function of Synthesized Feedback Responses in Spoken Dialogue
Carol Figueroa | Marcel de Korte | Magalie Ochs | Gabriel Skantze
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

To create conversational systems with human-like listener behavior, generating short feedback responses (e.g., “mhm”, “ah”, “wow”) appropriate for their context is crucial. These responses convey their communicative function through their lexical form and their prosodic realization. In this paper, we transplant the prosody of feedback responses from human-human U.S. English telephone conversations to a target speaker using two synthesis techniques (TTS and signal processing). Our evaluation focuses on perceived naturalness, contextual appropriateness and preservation of communicative function. Results indicate TTS-generated feedback were perceived as more natural than signal-processing-based feedback, with no significant difference in appropriateness. However, the TTS did not consistently convey the communicative function of the original feedback.

Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
Bram Willemsen | Gabriel Skantze
Proceedings of the 17th International Natural Language Generation Conference

We propose an approach to referring expression generation (REG) in visually grounded dialogue that is meant to produce referring expressions (REs) that are both discriminative and discourse-appropriate. Our method constitutes a two-stage process. First, we model REG as a text- and image-conditioned next-token prediction task. REs are autoregressively generated based on their preceding linguistic context and a visual representation of the referent. Second, we propose the use of discourse-aware comprehension guiding as part of a generate-and-rerank strategy through which candidate REs generated with our REG model are reranked based on their discourse-dependent discriminatory power. Results from our human evaluation indicate that our proposed two-stage approach is effective in producing discriminative REs, with higher performance in terms of text-image retrieval accuracy for reranked REs compared to those generated using greedy decoding.

Multilingual Turn-taking Prediction Using Voice Activity Projection
Koji Inoue | Bing’er Jiang | Erik Ekstedt | Tatsuya Kawahara | Gabriel Skantze
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages. However, a multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages. Further analyses show that the multilingual model has learned to discern the language of the input signal. We also analyze the sensitivity to pitch, a prosodic cue that is thought to be important for turn-taking. Finally, we compare two different audio encoders, contrastive predictive coding (CPC) pre-trained on English, with a recent model based on multilingual wav2vec 2.0 (MMS).

2023

The Open-domain Paradox for Chatbots: Common Ground as the Basis for Human-like Dialogue
Gabriel Skantze | A. Seza Doğruöz
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

There is a surge in interest in the development of open-domain chatbots, driven by the recent advancements of large language models. The “openness” of the dialogue is expected to be maximized by providing minimal information to the users about the common ground they can expect, including the presumed joint activity. However, evidence suggests that the effect is the opposite. Asking users to “just chat about anything” results in a very narrow form of dialogue, which we refer to as the “open-domain paradox”. In this position paper, we explain this paradox through the theory of common ground as the basis for human-like communication. Furthermore, we question the assumptions behind open-domain chatbots and identify paths forward for enabling common ground in human-computer dialogue.

Response-conditioned Turn-taking Prediction
Bing’er Jiang | Erik Ekstedt | Gabriel Skantze
Findings of the Association for Computational Linguistics: ACL 2023

Previous approaches to turn-taking and response generation in conversational systems have treated it as a two-stage process: First, the end of a turn is detected (based on conversation history), then the system generates an appropriate response. Humans, however, do not take the turn just because it is likely, but also consider whether what they want to say fits the position. In this paper, we present a model (an extension of TurnGPT) that conditions the end-of-turn prediction on both conversation history and what the next speaker wants to say. We found that our model consistently outperforms the baseline model in a variety of metrics. The improvement is most prominent in two scenarios where turn predictions can be ambiguous solely from the conversation history: 1) when the current utterance contains a statement followed by a question; 2) when the end of the current utterance semantically matches the response. Treating the turn-prediction and response-ranking as a one-stage process, our findings suggest that our model can be used as an incremental response ranker, which can be applied in various settings.

Resolving References in Visually-Grounded Dialogue via Text Generation
Bram Willemsen | Livia Qian | Gabriel Skantze
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Vision-language models (VLMs) have shown to be effective at image retrieval based on simple text queries, but text-image retrieval based on conversational input remains a challenge. Consequently, if we want to use VLMs for reference resolution in visually-grounded dialogue, the discourse processing capabilities of these models need to be augmented. To address this issue, we propose fine-tuning a causal large language model (LLM) to generate definite descriptions that summarize coreferential information found in the linguistic context of references. We then use a pretrained VLM to identify referents based on the generated descriptions, zero-shot. We evaluate our approach on a manually annotated dataset of visually-grounded dialogues and achieve results that, on average, exceed the performance of the baselines we compare against. Furthermore, we find that using referent descriptions based on larger context windows has the potential to yield higher returns.

Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs
Agnes Axelsson | Gabriel Skantze
Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023)

In any system that uses structured knowledge graph (KG) data as its underlying knowledge representation, KG-to-text generation is a useful tool for turning parts of the graph data into text that can be understood by humans. Recent work has shown that models that make use of pretraining on large amounts of text data can perform well on the KG-to-text task, even with relatively little training data on the specific graph-to-text task. In this paper, we build on this concept by using large language models to perform zero-shot generation based on nothing but the model’s understanding of the triple structure from what it can read. We show that ChatGPT achieves near state-of-the-art performance on some measures of the WebNLG 2020 challenge, but falls behind on others. Additionally, we compare factual, counter-factual and fictional statements, and show that there is a significant connection between what the LLM already knows about the data it is parsing and the quality of the output text.

2022

Collecting Visually-Grounded Dialogue with A Game Of Sorts
Bram Willemsen | Dmytro Kalpakchi | Gabriel Skantze
Proceedings of the Thirteenth Language Resources and Evaluation Conference

An idealized, though simplistic, view of the referring expression production and grounding process in (situated) dialogue assumes that a speaker must merely appropriately specify their expression so that the target referent may be successfully identified by the addressee. However, referring in conversation is a collaborative process that cannot be aptly characterized as an exchange of minimally-specified referring expressions. Concerns have been raised regarding assumptions made by prior work on visually-grounded dialogue that reveal an oversimplified view of conversation and the referential process. We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call “A Game Of Sorts”. In our game, players are tasked with reaching agreement on how to rank a set of images given some sorting criterion through a largely unrestricted, role-symmetric dialogue. By putting emphasis on the argumentation in this mixed-initiative interaction, we collect discussions that involve the collaborative referential process. We describe results of a small-scale data collection experiment with the proposed task. All discussed materials, which includes the collected data, the codebase, and a containerized version of the application, are publicly available.

Annotation of Communicative Functions of Short Feedback Tokens in Switchboard
Carol Figueroa | Adaeze Adigwe | Magalie Ochs | Gabriel Skantze
Proceedings of the Thirteenth Language Resources and Evaluation Conference

There has been a lot of work on predicting the timing of feedback in conversational systems. However, there has been less focus on predicting the prosody and lexical form of feedback given their communicative function. Therefore, in this paper we present our preliminary annotations of the communicative functions of 1627 short feedback tokens from the Switchboard corpus and an analysis of their lexical realizations and prosodic characteristics. Since there is no standard scheme for annotating the communicative function of feedback we propose our own annotation scheme. Although our work is ongoing, our preliminary analysis revealed lexical tokens such as “yeah” are ambiguous and therefore lexical forms alone are not indicative of the function. Both the lexical form and prosodic characteristics need to be taken into account in order to predict the communicative function. We also found that feedback functions have distinguishable prosodic characteristics in terms of duration, mean pitch, pitch slope, and pitch range.

How Much Does Prosody Help Turn-taking? Investigations using Voice Activity Projection Models
Erik Ekstedt | Gabriel Skantze
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Turn-taking is a fundamental aspect of human communication and can be described as the ability to take turns, project upcoming turn shifts, and supply backchannels at appropriate locations throughout a conversation. In this work, we investigate the role of prosody in turn-taking using the recently proposed Voice Activity Projection model, which incrementally models the upcoming speech activity of the interlocutors in a self-supervised manner, without relying on explicit annotation of turn-taking events, or the explicit modeling of prosodic features. Through manipulation of the speech signal, we investigate how these models implicitly utilize prosodic information. We show that these systems learn to utilize various prosodic aspects of speech both on aggregate quantitative metrics of long-form conversations and on single utterances specifically designed to depend on prosody.

2021

How “open” are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation
A. Seza Doğruöz | Gabriel Skantze
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of “openness”, we conduct two studies: First, we classify the types of “speech events” encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the “small talk” category and exclude the other speech event categories encountered in real life human-human communication. Second, we conduct a small-scale pilot study to generate online conversations covering a wider range of speech event categories between two humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook). A human evaluation of these generated conversations indicates a preference for human-human conversations, since the human-chatbot conversations lack coherence in most speech event categories. Based on these results, we suggest (a) using the term “small talk” instead of “open-domain” for the current chatbots which are not that “open” in terms of conversational abilities yet, and (b) revising the evaluation methods to test the chatbot conversations against other speech events.

Projection of Turn Completion in Incremental Spoken Dialogue Systems
Erik Ekstedt | Gabriel Skantze
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

The ability to take turns in a fluent way (i.e., without long response delays or frequent interruptions) is a fundamental aspect of any spoken dialog system. However, practical speech recognition services typically induce a long response delay, as it takes time before the processing of the user’s utterance is complete. There is a considerable amount of research indicating that humans achieve fast response times by projecting what the interlocutor will say and estimating upcoming turn completions. In this work, we implement this mechanism in an incremental spoken dialog system, by using a language model that generates possible futures to project upcoming completion points. In theory, this could make the system more responsive, while still having access to semantic information not yet processed by the speech recognizer. We conduct a small study which indicates that this is a viable approach for practical dialog systems, and that this is a promising direction for future research.

2020

TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
Erik Ekstedt | Gabriel Skantze
Findings of the Association for Computational Linguistics: EMNLP 2020

Syntactic and pragmatic completeness is known to be important for turn-taking prediction, but so far machine learning models of turn-taking have used such linguistic information in a limited way. In this paper, we introduce TurnGPT, a transformer-based language model for predicting turn-shifts in spoken dialog. The model has been trained and evaluated on a variety of written and spoken dialog datasets. We show that the model outperforms two baselines used in prior work. We also report on an ablation study, as well as attention and gradient analyses, which show that the model is able to utilize the dialog context and pragmatic completeness for turn-taking prediction. Finally, we explore the model’s potential in not only detecting, but also projecting, turn-completions.

2019

Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees
Nils Axelsson | Gabriel Skantze
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

In dialogue, speakers continuously adapt their speech to accommodate the listener, based on the feedback they receive. In this paper, we explore the modelling of such behaviours in the context of a robot presenting a painting. A Behaviour Tree is used to organise the behaviour on different levels, and allow the robot to adapt its behaviour in real-time; the tree organises engagement, joint attention, turn-taking, feedback and incremental speech processing. An initial implementation of the model is presented, and the system is evaluated in a user study, where the adaptive robot presenter is compared to a non-adaptive version. The adaptive version is found to be more engaging by the users, although no effects are found on the retention of the presented material.

Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Satoshi Nakamura | Milica Gasic | Ingrid Zukerman | Gabriel Skantze | Mikio Nakano | Alexandros Papangelis | Stefan Ultes | Koichiro Yoshino
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

2018

KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
Todd Shore | Theofronia Androulakaki | Gabriel Skantze
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Dimosthenis Kontogiorgos | Vanya Avramova | Simon Alexanderson | Patrik Jonell | Catharine Oertel | Jonas Beskow | Gabriel Skantze | Joakim Gustafson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution
Todd Shore | Gabriel Skantze
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Referring to entities in situated dialog is a collaborative process, whereby interlocutors often expand, repair and/or replace referring expressions in an iterative process, converging on conceptual pacts of referring language use in doing so. Nevertheless, much work on exophoric reference resolution (i.e. resolution of references to entities outside of a given text) follows a literary model, whereby individual referring expressions are interpreted as unique identifiers of their referents given the state of the dialog the referring expression is initiated. In this paper, we address this collaborative nature to improve dialogic reference resolution in two ways: First, we trained a words-as-classifiers logistic regression model of word semantics and incrementally adapt the model to idiosyncratic language between dyad partners during evaluation of the dialog. We then used these semantic models to learn the general referring ability of each word, which is independent of referent features. These methods facilitate accurate automatic reference resolution in situated dialog without annotation of referring expressions, even with little background data.

2017

Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks
Gabriel Skantze
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turn-taking decisions in a human-robot interaction scenario.

2015

Automatic Detection of Miscommunication in Spoken Dialogue Systems
Raveesh Meena | José Lopes | Gabriel Skantze | Joakim Gustafson
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Modelling situated human-robot interaction using IrisTK
Gabriel Skantze | Martin Johansson
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Alexander Koller | Gabriel Skantze | Filip Jurcicek | Masahiro Araki | Carolyn Penstein Rose
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction
Martin Johansson | Gabriel Skantze
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System
Raveesh Meena | Johan Boye | Gabriel Skantze | Joakim Gustafson
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

The Map Task Dialogue System: A Test-bed for Modelling Human-Like Dialogue
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2013 Conference

Human Evaluation of Conceptual Route Graphs for Interpreting Spoken Route Descriptions
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the IWCS 2013 Workshop on Computational Models of Spatial Language Interpretation and Generation (CoSLI-3)

Exploring the effects of gaze and pauses in situated human-robot interaction
Gabriel Skantze | Anna Hjalmarsson | Catharine Oertel
Proceedings of the SIGDIAL 2013 Conference

A Data-driven Model for Timing Feedback in a Map Task Dialogue System
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2013 Conference

2011

A General, Abstract Model of Incremental Dialogue Processing
David Schlangen | Gabriel Skantze
Dialogue Discourse Volume 2

We present a general model and conceptual framework for specifying architectures for incremental processing in dialogue systems, in particular with respect to the topology of the network of modules that make up the system, the way information flows through this network, how information increments are ‘packaged’, and how these increments are processed by the modules. This model enables the precise specification of incremental systems and hence facilitates detailed comparisons between systems, as well as giving guidance on designing new systems. In particular, the model can serve as a framework for specifying module communication in such systems, as we illustrate with some examples.

2010

Middleware for Incremental Processing in Conversational Agents
David Schlangen | Timo Baumann | Hendrik Buschmeier | Okko Buß | Stefan Kopp | Gabriel Skantze | Ramin Yaghoubzadeh
Proceedings of the SIGDIAL 2010 Conference

Towards Incremental Speech Generation in Dialogue Systems
Gabriel Skantze | Anna Hjalmarsson
Proceedings of the SIGDIAL 2010 Conference

2009

A General, Abstract Model of Incremental Dialogue Processing
David Schlangen | Gabriel Skantze
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

Attention and Interaction Control in a Human-Human-Computer Dialogue Setting
Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2009 Conference

Incremental Dialogue Processing in a Micro-Domain
Gabriel Skantze | David Schlangen
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2007

Making Grounding Decisions: Data-driven Estimation of Dialogue Costs and Confidence Thresholds
Gabriel Skantze
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2005

GALATEA: A Discourse Modeller Supporting Concept-Level Error Handling in Spoken Dialogue Systems
Gabriel Skantze
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue

Co-authors

Venues