M. Soledad López Gambino
Also published as: Soledad López Gambino
2017
Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images
Sina Zarrieß
|
M. Soledad López Gambino
|
David Schlangen
Proceedings of the 10th International Conference on Natural Language Generation
Current referring expression generation systems mostly deliver their output as one-shot, written expressions. We present on-going work on incremental generation of spoken expressions referring to objects in real-world images. This approach extends upon previous work using the words-as-classifier model for generation. We implement this generator in an incremental dialogue processing framework such that we can exploit an existing interface to incremental text-to-speech synthesis. Our system generates and synthesizes referring expressions while continuously observing non-verbal user reactions.
Beyond On-hold Messages: Conversational Time-buying in Task-oriented Dialogue
Soledad López Gambino
|
Sina Zarrieß
|
David Schlangen
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
A common convention in graphical user interfaces is to indicate a “wait state”, for example while a program is preparing a response, through a changed cursor state or a progress bar. What should the analogue be in a spoken conversational system? To address this question, we set up an experiment in which a human information provider (IP) was given their information only in a delayed and incremental manner, which systematically created situations where the IP had the turn but could not provide task-related information. Our data analysis shows that 1) IPs bridge the gap until they can provide information by re-purposing a whole variety of task- and grounding-related communicative actions (e.g. echoing the user’s request, signaling understanding, asserting partially relevant information), rather than being silent or explicitly asking for time (e.g. “please wait”), and that 2) IPs combined these actions productively to ensure an ongoing conversation. These results, we argue, indicate that natural conversational interfaces should also be able to manage their time flexibly using a variety of conversational resources.