<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="5500">
    <title>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</title>
    <editor>Kristiina Jokinen</editor>
    <editor>Manfred Stede</editor>
    <editor>David DeVault</editor>
    <editor>Annie Louis</editor>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://aclweb.org/anthology/W17-55</url>
    <bibtype>book</bibtype>
    <bibkey>W17-55:2017</bibkey>
  </paper>

  <paper id="5501">
    <title>Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations</title>
    <author><first>Majid</first><last>Laali</last></author>
    <author><first>Leila</first><last>Kosseim</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;6</pages>
    <url>http://aclweb.org/anthology/W17-5501</url>
    <abstract>In this paper, we present an approach to exploit phrase tables generated by
	statistical machine translation in order to map French discourse connectives to
	discourse relations. Using this approach, we created DisCoRel, a lexicon of
	French discourse connectives and their PDTB relations. When evaluated against 
	LEXCONN, DisCoRel achieves a recall of 0.81 and an Average Precision of 0.68
	for the Concession and Condition relations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>laali-kosseim:2017:W17-55</bibkey>
  </paper>

  <paper id="5502">
    <title>Towards Full Text Shallow Discourse Relation Annotation: Experiments with Cross-Paragraph Implicit Relations in the PDTB</title>
    <author><first>Rashmi</first><last>Prasad</last></author>
    <author><first>Katherine</first><last>Forbes-Riley</last></author>
    <author><first>Alan</first><last>Lee</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>7&#8211;16</pages>
    <url>http://aclweb.org/anthology/W17-5502</url>
    <abstract>Full text discourse parsing relies on texts comprehensively annotated with
	discourse relations. 
	To this end, we address a significant gap in the inter-sentential discourse
	relations annotated in the Penn Discourse Treebank (PDTB), namely the class of
	cross-paragraph implicit relations, which account for 30% of inter-sentential
	relations in the corpus. We present our annotation study to explore the
	incidence rate of adjacent vs. non-adjacent implicit relations in
	cross-paragraph contexts, and the relative degree of difficulty in annotating
	them. Our experiments show a high incidence of non-adjacent relations that are
	difficult to annotate reliably, suggesting the practicality of backing off from
	their annotation to reduce noise for corpus-based studies. Our resulting
	guidelines follow the PDTB adjacency constraint for implicits while employing
	an underspecified representation of non-adjacent implicits, and yield 62%
	inter-annotator agreement on this task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>prasad-forbesriley-lee:2017:W17-55</bibkey>
  </paper>

  <paper id="5503">
    <title>User-initiated Sub-dialogues in State-of-the-art Dialogue Systems</title>
    <author><first>Staffan</first><last>Larsson</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>17&#8211;22</pages>
    <url>http://aclweb.org/anthology/W17-5503</url>
    <abstract>We  test state of the art dialogue systems  for their behaviour in response to
	user-initiated sub-dialogues, i.e. interactions where a system question is
	responded to with a question or request from the user, who thus initiates a
	sub-dialogue. We look  at sub-dialogues both within a single app (where the
	sub-dialogue concerns another topic in the original domain) and across apps
	(where the sub-dialogue concerns a different domain). The overall conclusion of
	the tests is that none of the systems can be said to deal appropriately with
	user-initiated sub-dialogues.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>larsson:2017:W17-55</bibkey>
  </paper>

  <paper id="5504">
    <title>A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality</title>
    <author><first>Alexander</first><last>Prange</last></author>
    <author><first>Margarita</first><last>Chikobava</last></author>
    <author><first>Peter</first><last>Poller</last></author>
    <author><first>Michael</first><last>Barz</last></author>
    <author><first>Daniel</first><last>Sonntag</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>23&#8211;26</pages>
    <url>http://aclweb.org/anthology/W17-5504</url>
    <abstract>We present a multimodal dialogue system that allows doctors to interact with a
	medical decision support system in virtual reality (VR). We integrate an
	interactive visualization of patient records and radiology image data, as well
	as therapy predictions. Therapy predictions are computed in real-time using a
	deep learning model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>prange-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5505">
    <title>Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability</title>
    <author><first>Tiancheng</first><last>Zhao</last></author>
    <author><first>Allen</first><last>Lu</last></author>
    <author><first>Kyusong</first><last>Lee</last></author>
    <author><first>Maxine</first><last>Eskenazi</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>27&#8211;36</pages>
    <url>http://aclweb.org/anthology/W17-5505</url>
    <abstract>Generative encoder-decoder models offer great promise in developing
	domain-general dialog systems. However, they have mainly been applied to
	open-domain conversations. This paper presents a practical and novel framework
	for building task-oriented dialog systems based on encoder-decoder models. This
	framework enables encoder-decoder models to accomplish slot-value independent
	decision-making and interact with external databases. Moreover, this paper
	shows the flexibility of the proposed method by interleaving chatting
	capability with a slot-filling system for better out-of-domain recovery. The
	models were trained on both real-user data from a bus information system and
	human-human chat data. Results show that the proposed framework achieves good
	performance in both offline evaluation metrics and in task success rate with
	human users.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhao-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5506">
    <title>Key-Value Retrieval Networks for Task-Oriented Dialogue</title>
    <author><first>Mihail</first><last>Eric</last></author>
    <author><first>Lakshmi</first><last>Krishnan</last></author>
    <author><first>Francois</first><last>Charette</last></author>
    <author><first>Christopher D.</first><last>Manning</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>37&#8211;49</pages>
    <url>http://aclweb.org/anthology/W17-5506</url>
    <abstract>Neural task-oriented dialogue systems often struggle to smoothly interface with
	a knowledge base. In this work, we seek to address this problem by proposing a
	new neural dialogue agent that is able to effectively sustain grounded,
	multi-domain discourse through a novel key-value retrieval mechanism. The model
	is end-to-end differentiable and does not need to explicitly model dialogue
	state or belief trackers. We also release a new dataset of 3,031 dialogues that
	are grounded through underlying knowledge bases and span three distinct tasks
	in the in-car personal assistant space: calendar scheduling, weather
	information retrieval, and point-of-interest navigation. Our architecture is
	simultaneously trained on data from all domains and significantly outperforms a
	competitive rule-based system and other existing neural dialogue architectures
	on the provided domains according to both automatic and human evaluation
	metrics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>eric-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5507">
    <title>Lexical Acquisition through Implicit Confirmations over Multiple Dialogues</title>
    <author><first>Kohei</first><last>Ono</last></author>
    <author><first>Ryu</first><last>Takeda</last></author>
    <author><first>Eric</first><last>Nichols</last></author>
    <author><first>Mikio</first><last>Nakano</last></author>
    <author><first>Kazunori</first><last>Komatani</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>50&#8211;59</pages>
    <url>http://aclweb.org/anthology/W17-5507</url>
    <abstract>We address the problem of acquiring the ontological categories of unknown terms
	through implicit confirmation in dialogues. We develop an approach that makes
	implicit confirmation requests with an unknown term's predicted category. Our
	approach does not degrade user experience with repetitive explicit
	confirmations, but the system has difficulty determining if information in the
	confirmation request can be correctly acquired. To overcome this challenge, we
	propose a method for determining whether or not the predicted category is
	correct, which is included in an implicit confirmation request. Our method
	exploits multiple user responses to implicit confirmation requests containing
	the same ontological category. Experimental results revealed that the proposed
	method exhibited a higher precision rate for determining the correctly
	predicted categories than when only single user responses were considered.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ono-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5508">
    <title>Utterance Intent Classification of a Spoken Dialogue System with Efficiently Untied Recursive Autoencoders</title>
    <author><first>Tsuneo</first><last>Kato</last></author>
    <author><first>Atsushi</first><last>Nagai</last></author>
    <author><first>Naoki</first><last>Noda</last></author>
    <author><first>Ryosuke</first><last>Sumitomo</last></author>
    <author><first>Jianming</first><last>Wu</last></author>
    <author><first>Seiichi</first><last>Yamamoto</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>60&#8211;64</pages>
    <url>http://aclweb.org/anthology/W17-5508</url>
    <abstract>Recursive autoencoders (RAEs) for compositionality of a vector space model were
	applied to utterance intent classification of a smartphone-based
	Japanese-language spoken dialogue system. Though the RAEs express a nonlinear
	operation on the vectors of child nodes, the operation is considered to be
	different intrinsically depending on types of child nodes. To relax the
	difference, a data-driven untying of autoencoders (AEs) is proposed. The
	experimental result of the utterance intent classification showed an improved
	accuracy with the proposed method compared with the basic tied RAE and untied
	RAE based on a manual rule.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kato-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5509">
    <title>Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning</title>
    <author><first>Stefan</first><last>Ultes</last></author>
    <author><first>Pawe&#x142;</first><last>Budzianowski</last></author>
    <author><first>I&#241;igo</first><last>Casanueva</last></author>
    <author><first>Nikola</first><last>Mrk&#x161;i&#x107;</last></author>
    <author><first>Lina M.</first><last>Rojas Barahona</last></author>
    <author><first>Pei-Hao</first><last>Su</last></author>
    <author><first>Tsung-Hsien</first><last>Wen</last></author>
    <author><first>Milica</first><last>Gasic</last></author>
    <author><first>Steve</first><last>Young</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>65&#8211;70</pages>
    <url>http://aclweb.org/anthology/W17-5509</url>
    <abstract>Reinforcement learning is widely used for dialogue policy optimization where
	the reward function often consists of more than one component, e.g., the
	dialogue success and the dialogue length. In this work, we propose a structured
	method for finding a good balance between these components by searching for the
	optimal reward component weighting. To render this search feasible, we use
	multi-objective reinforcement learning to significantly reduce the number of
	training dialogues required. We apply our proposed method to find optimized
	component weights for six domains and compare them to a default baseline.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ultes-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5510">
    <title>Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction</title>
    <author><first>Guillaume</first><last>Dubuisson Duplessis</last></author>
    <author><first>Chlo&#233;</first><last>Clavel</last></author>
    <author><first>Fr&#233;d&#233;ric</first><last>Landragin</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>71&#8211;81</pages>
    <url>http://aclweb.org/anthology/W17-5510</url>
    <abstract>This work aims at characterising verbal alignment processes for improving
	virtual agent communicative capabilities. We propose computationally
	inexpensive measures of verbal alignment based on expression repetition in
	dyadic textual dialogues. Using these measures, we present a contrastive study
	between Human-Human and Human-Agent dialogues on a negotiation task. We exhibit
	quantitative differences in the strength and orientation of verbal alignment
	showing the ability of our approach to characterise important aspects of verbal
	alignment.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dubuissonduplessis-clavel-landragin:2017:W17-55</bibkey>
  </paper>

  <paper id="5511">
    <title>Demonstration of interactive teaching for end-to-end dialog control with hybrid code networks</title>
    <author><first>Jason D</first><last>Williams</last></author>
    <author><first>Lars</first><last>Liden</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>82&#8211;85</pages>
    <url>http://aclweb.org/anthology/W17-5511</url>
    <abstract>This is a demonstration of interactive teaching for practical end-to-end dialog
	systems driven by a recurrent neural network.  In this approach, a developer
	teaches the network by interacting with the system and providing on-the-spot
	corrections.  Once a system is deployed, a developer can also correct mistakes
	in logged dialogs.  This demonstration shows both of these teaching methods
	applied to dialog systems in three domains: pizza ordering, restaurant
	information, and weather forecasts.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>williams-liden:2017:W17-55</bibkey>
  </paper>

  <paper id="5512">
    <title>Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning</title>
    <author><first>Pawe&#x142;</first><last>Budzianowski</last></author>
    <author><first>Stefan</first><last>Ultes</last></author>
    <author><first>Pei-Hao</first><last>Su</last></author>
    <author><first>Nikola</first><last>Mrk&#x161;i&#x107;</last></author>
    <author><first>Tsung-Hsien</first><last>Wen</last></author>
    <author><first>I&#241;igo</first><last>Casanueva</last></author>
    <author><first>Lina M.</first><last>Rojas Barahona</last></author>
    <author><first>Milica</first><last>Gasic</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>86&#8211;92</pages>
    <url>http://aclweb.org/anthology/W17-5512</url>
    <abstract>Human conversation is inherently complex, often spanning many different
	topics/domains. This makes policy learning for dialogue systems very
	challenging. Standard flat reinforcement learning methods do not provide an
	efficient framework for modelling such dialogues. In this paper, we focus on
	the under-explored problem of multi-domain dialogue management. First, we
	propose a new method for hierarchical reinforcement learning using the
	option framework. Next, we show that the proposed architecture learns
	faster and arrives at a better policy than the existing flat ones do. Moreover,
	we show how pretrained policies can be adapted to more complex systems with an
	additional set of new actions. In doing that, we show that our approach has the
	potential to facilitate policy optimisation for more sophisticated multi-domain
	dialogue systems.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>budzianowski-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5513">
    <title>MACA: A Modular Architecture for Conversational Agents</title>
    <author><first>Hoai Phuoc</first><last>Truong</last></author>
    <author><first>Prasanna</first><last>Parthasarathi</last></author>
    <author><first>Joelle</first><last>Pineau</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>93&#8211;102</pages>
    <url>http://aclweb.org/anthology/W17-5513</url>
    <abstract>We propose a software architecture designed to ease the implementation of
	dialogue systems. The Modular Architecture for Conversational Agents (MACA)
	uses a plug-n-play style that allows quick prototyping, thereby facilitating
	the development of new techniques and the reproduction of previous work. The
	architecture separates the domain of the conversation from the agent's dialogue
	strategy, and as such can be easily extended to multiple domains. MACA provides
	tools to host dialogue agents on Amazon Mechanical Turk (mTurk) for data
	collection and allows processing of other sources of training data. The current
	version of the framework already incorporates several domains and existing
	dialogue strategies from the recent literature.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>truong-parthasarathi-pineau:2017:W17-55</bibkey>
  </paper>

  <paper id="5514">
    <title>Sequential Dialogue Context Modeling for Spoken Language Understanding</title>
    <author><first>Ankur</first><last>Bapna</last></author>
    <author><first>Gokhan</first><last>Tur</last></author>
    <author><first>Dilek</first><last>Hakkani-Tur</last></author>
    <author><first>Larry</first><last>Heck</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>103&#8211;114</pages>
    <url>http://aclweb.org/anthology/W17-5514</url>
    <abstract>Spoken Language Understanding (SLU) is a key component of goal oriented
	dialogue systems that would parse user utterances into semantic frame
	representations. Traditionally SLU does not utilize the dialogue history beyond
	the previous system turn and contextual ambiguities are resolved by the
	downstream components. In this paper, we explore novel approaches for modeling
	dialogue context in a recurrent neural network (RNN) based language
	understanding system. We propose the Sequential Dialogue Encoder Network, that
	allows encoding context from the dialogue history in chronological order. We
	compare the performance of our proposed architecture with two context models,
	one that uses just the previous turn context and another that encodes dialogue
	context in a memory network, but loses the order of utterances in the dialogue
	history. Experiments with a multi-domain dialogue dataset demonstrate that the
	proposed architecture results in reduced semantic frame error rates.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bapna-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5515">
    <title>Redundancy Localization for the Conversationalization of Unstructured Responses</title>
    <author><first>Sebastian</first><last>Krause</last></author>
    <author><first>Mikhail</first><last>Kozhevnikov</last></author>
    <author><first>Eric</first><last>Malmi</last></author>
    <author><first>Daniele</first><last>Pighin</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>115&#8211;126</pages>
    <url>http://aclweb.org/anthology/W17-5515</url>
    <abstract>Conversational agents offer users a natural-language interface to accomplish
	tasks, entertain themselves, or access information. Informational dialogue is
	particularly challenging in that the agent has to hold a conversation on an
	open topic, and to achieve a reasonable coverage it generally needs to digest
	and present unstructured information from textual sources. Making responses
	based on such sources sound natural and fit appropriately into the conversation
	context is a topic of ongoing research, one of the key issues of which is
	preventing the agent's responses from sounding repetitive. Targeting this
	issue, we propose a new task, known as redundancy localization, which aims to
	pinpoint semantic overlap between text passages. To help address it
	systematically, we formalize the task, prepare a public dataset with
	fine-grained redundancy labels, and propose a model utilizing a weak training
	signal defined over the results of a passage-retrieval system on web texts. The
	proposed model demonstrates superior performance compared to a state-of-the-art
	entailment model and yields encouraging results when applied to a real-world
	dialogue.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>krause-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5516">
    <title>Attentive listening system with backchanneling, response generation and flexible turn-taking</title>
    <author><first>Divesh</first><last>Lala</last></author>
    <author><first>Pierrick</first><last>Milhorat</last></author>
    <author><first>Koji</first><last>Inoue</last></author>
    <author><first>Masanari</first><last>Ishida</last></author>
    <author><first>Katsuya</first><last>Takanashi</last></author>
    <author><first>Tatsuya</first><last>Kawahara</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>127&#8211;136</pages>
    <url>http://aclweb.org/anthology/W17-5516</url>
    <abstract>Attentive listening systems are designed to let people, especially senior
	people, keep talking to maintain communication ability and mental health.  This
	paper addresses key components of an attentive listening system which
	encourages users to talk smoothly. First, we introduce continuous prediction of
	end-of-utterances and generation of backchannels, rather than generating
	backchannels after end-point detection of utterances. This improves subjective
	evaluations of backchannels.  Second, we propose an effective statement
	response mechanism which detects focus words and responds in the form of a
	question or partial repeat.  This can be applied to any statement.  Moreover, a
	flexible turn-taking mechanism is designed which uses backchannels or fillers
	when the turn-switch is ambiguous.  These techniques are integrated into a
	humanoid robot to conduct attentive listening. We test the feasibility of the
	system in a pilot experiment and show that it can produce coherent dialogues
	during conversation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lala-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5517">
    <title>Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural?</title>
    <author><first>Patricia</first><last>Braunger</last></author>
    <author><first>Wolfgang</first><last>Maier</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>137&#8211;146</pages>
    <url>http://aclweb.org/anthology/W17-5517</url>
    <abstract>Recent spoken dialog systems are moving away from command and control towards a
	more intuitive and natural style of interaction. In order to choose an
	appropriate system design which allows the system to deal with naturally spoken
	user input, a definition of what exactly constitutes naturalness in user input
	is important. In this paper, we examine how different user groups naturally
	speak to an automotive spoken dialog system (SDS). We conduct a user study in
	which we collect freely spoken user utterances for a wide range of use cases in
	German. By means of a comparative study of the utterances from the study with
	interpersonal utterances, we provide criteria what constitutes naturalness in
	the user input of an state-of-the-art automotive SDS.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>braunger-maier:2017:W17-55</bibkey>
  </paper>

  <paper id="5518">
    <title>Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management</title>
    <author><first>Pei-Hao</first><last>Su</last></author>
    <author><first>Pawe&#x142;</first><last>Budzianowski</last></author>
    <author><first>Stefan</first><last>Ultes</last></author>
    <author><first>Milica</first><last>Gasic</last></author>
    <author><first>Steve</first><last>Young</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>147&#8211;157</pages>
    <url>http://aclweb.org/anthology/W17-5518</url>
    <abstract>Deep reinforcement learning (RL) methods have significant potential for
	dialogue policy optimisation. However, they suffer from a poor performance in
	the early stages of learning. This is especially problematic for on-line
	learning with real users. Two approaches are introduced to tackle this problem.
	Firstly, to speed up the learning process, two sample-efficient neural networks
	algorithms: trust region actor-critic with experience replay (TRACER)  and
	episodic natural actor-critic with experience replay (eNACER) are presented. 
	For TRACER, the trust region helps to control the learning step size and avoid
	catastrophic model changes. For eNACER, the natural gradient identifies the
	steepest ascent direction in policy space to speed up the convergence. Both
	models employ off-policy learning with experience replay to improve
	sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of
	demonstration data is utilised to pre-train the models prior to on-line
	reinforcement learning. Combining these two approaches, we demonstrate a
	practical approach to learn deep RL-based dialogue policies and demonstrate
	their effectiveness in a task-oriented information seeking domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>su-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5519">
    <title>A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset</title>
    <author><first>Shubham</first><last>Agarwal</last></author>
    <author><first>Marc</first><last>Dymetman</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>158&#8211;163</pages>
    <url>http://aclweb.org/anthology/W17-5519</url>
    <abstract>We train a char2char model on the E2E NLG Challenge data, by exploiting
	"out-of-the-box" the recently released tfseq2seq framework, using some of
	the standard options offered by this tool. With minimal effort, and in
	particular without delexicalization, tokenization or lowercasing, the obtained
	raw predictions, according to a small scale human evaluation, are excellent on
	the linguistic side and quite reasonable on the adequacy side, the primary
	downside being the possible omissions of semantic material. However, in a
	significant number of cases (more than 70%), a perfect solution can be found in
	the top-20 predictions, indicating promising directions for solving the
	remaining issues.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>agarwal-dymetman:2017:W17-55</bibkey>
  </paper>

  <paper id="5520">
    <title>Interaction Quality Estimation Using Long Short-Term Memories</title>
    <author><first>Niklas</first><last>Rach</last></author>
    <author><first>Wolfgang</first><last>Minker</last></author>
    <author><first>Stefan</first><last>Ultes</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>164&#8211;169</pages>
    <url>http://aclweb.org/anthology/W17-5520</url>
    <abstract>For estimating the Interaction Quality (IQ) in Spoken Dialogue Systems (SDS),
	the dialogue history is of significant importance. Previous works included this
	information manually in the form of precomputed temporal features into the
	classification process. Here, we employ a deep learning architecture based on
	Long Short-Term Memories (LSTM) to extract this information automatically from
	the data, thus estimating IQ solely by using current exchange features. We show
	that it is thereby possible to achieve competitive results as in a scenario
	where manually optimized temporal features have been included.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rach-minker-ultes:2017:W17-55</bibkey>
  </paper>

  <paper id="5521">
    <title>DialPort, Gone Live: An Update After A Year of Development</title>
    <author><first>Kyusong</first><last>Lee</last></author>
    <author><first>Tiancheng</first><last>Zhao</last></author>
    <author><first>Yulun</first><last>Du</last></author>
    <author><first>Edward</first><last>Cai</last></author>
    <author><first>Allen</first><last>Lu</last></author>
    <author><first>Eli</first><last>Pincus</last></author>
    <author><first>David</first><last>Traum</last></author>
    <author><first>Stefan</first><last>Ultes</last></author>
    <author><first>Lina M.</first><last>Rojas Barahona</last></author>
    <author><first>Milica</first><last>Gasic</last></author>
    <author><first>Steve</first><last>Young</last></author>
    <author><first>Maxine</first><last>Eskenazi</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>170&#8211;173</pages>
    <url>http://aclweb.org/anthology/W17-5521</url>
    <abstract>DialPort collects user data for connected spoken dialog systems. At present six
	systems are linked to a central portal that directs the user to the applicable
	system and suggests systems that the user may be interested in. User data has
	started to flow into the system.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lee-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5522">
    <title>Evaluating Natural Language Understanding Services for Conversational Question Answering Systems</title>
    <author><first>Daniel</first><last>Braun</last></author>
    <author><first>Adrian</first><last>Hernandez-Mendez</last></author>
    <author><first>Florian</first><last>Matthes</last></author>
    <author><first>Manfred</first><last>Langen</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>174&#8211;185</pages>
    <url>http://aclweb.org/anthology/W17-5522</url>
    <abstract>Conversational interfaces recently gained a lot of attention. One of the
	reasons for the current hype is the fact that chatbots (one particularly
	popular form of conversational interfaces)
	nowadays can be created without any programming knowledge, thanks to different
	toolkits and so-called Natural Language Understanding (NLU) services. While
	these NLU services are already widely used in both, industry and science, so
	far, they have not been analysed systematically. In this paper, we present a
	method to evaluate the classification performance of NLU services. Moreover, we
	present two new corpora, one consisting of annotated questions and one
	consisting of annotated questions with the corresponding answers. Based on
	these corpora, we conduct an evaluation of some of the most popular NLU
	services. Thereby we want to enable both, researchers and companies to make
	more educated decisions about which service they should use.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>braun-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5523">
    <title>The Role of Conversation Context for Sarcasm Detection in Online Interactions</title>
    <author><first>Debanjan</first><last>Ghosh</last></author>
    <author><first>Alexander</first><last>Richard Fabbri</last></author>
    <author><first>Smaranda</first><last>Muresan</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>186&#8211;196</pages>
    <url>http://aclweb.org/anthology/W17-5523</url>
    <abstract>Computational models for sarcasm detection have often relied on the content of
	utterances in isolation. However, speaker's sarcastic intent is not always
	obvious without additional context. Focusing on social media discussions, we
	investigate two issues: (1) does modeling of conversation context help in
	sarcasm detection and (2) can we understand what part of conversation context
	triggered the sarcastic reply. To address the first issue, we investigate
	several types of Long Short-Term Memory (LSTM) networks that can model both the
	conversation context and the sarcastic response. We show that the conditional
	LSTM network (Rockt&#228;schel et al. 2015) and LSTM networks with sentence level
	attention on context and response outperform the LSTM model that reads only the
	response. To address the second issue, we present a qualitative analysis of
	attention weights produced by the LSTM models with attention and discuss the
	results compared with human performance on the task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ghosh-richardfabbri-muresan:2017:W17-55</bibkey>
  </paper>

  <paper id="5524">
    <title>VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System)</title>
    <author><first>Yanchao</first><last>Yu</last></author>
    <author><first>Arash</first><last>Eshghi</last></author>
    <author><first>Oliver</first><last>Lemon</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>197&#8211;200</pages>
    <url>http://aclweb.org/anthology/W17-5524</url>
    <abstract>We present VOILA: an optimised, multi- modal dialogue agent for interactive
	learning of visually grounded word meanings from a human user. VOILA is: (1)
	able to learn new visual categories interactively from users from scratch; (2)
	trained on real human-human dialogues in the same domain, and so is able to
	conduct natural spontaneous dialogue; (3) optimised to find the most effective
	trade-off between the accuracy of the visual categories it learns and the cost
	it incurs to users. VOILA is deployed on Furhat, a human-like, multi-modal
	robot head with back-projection of the face, and a graphical virtual character.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yu-eshghi-lemon:2017:W17-55</bibkey>
  </paper>

  <paper id="5525">
    <title>The E2E Dataset: New Challenges For End-to-End Generation</title>
    <author><first>Jekaterina</first><last>Novikova</last></author>
    <author><first>Ond&#x159;ej</first><last>Du&#x161;ek</last></author>
    <author><first>Verena</first><last>Rieser</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>201&#8211;206</pages>
    <url>http://aclweb.org/anthology/W17-5525</url>
    <abstract>This paper describes the E2E data, a new dataset for training end-to-end,
	data-driven natural language generation systems in the restaurant domain, which
	is ten times bigger than existing, frequently used datasets in this area. The
	E2E dataset poses new challenges: (1) its human reference texts show more
	lexical richness and syntactic variation, including discourse phenomena; (2)
	generating from this set requires content selection. As such, learning from
	this dataset promises more natural, varied and less template-like system
	utterances. We also establish a baseline on this dataset, which illustrates
	some of the difficulties associated with this data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>novikova-duvsek-rieser:2017:W17-55</bibkey>
  </paper>

  <paper id="5526">
    <title>Frames: a corpus for adding memory to goal-oriented dialogue systems</title>
    <author><first>Layla</first><last>El Asri</last></author>
    <author><first>Hannes</first><last>Schulz</last></author>
    <author><first>Shikhar</first><last>Sharma</last></author>
    <author><first>Jeremie</first><last>Zumer</last></author>
    <author><first>Justin</first><last>Harris</last></author>
    <author><first>Emery</first><last>Fine</last></author>
    <author><first>Rahul</first><last>Mehrotra</last></author>
    <author><first>Kaheer</first><last>Suleman</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>207&#8211;219</pages>
    <url>http://aclweb.org/anthology/W17-5526</url>
    <abstract>This paper proposes a new dataset, Frames, composed of 1369 human-human
	dialogues with an average of 15 turns per dialogue. This corpus contains
	goal-oriented dialogues between users who are given some constraints to book a
	trip and assistants who search a database to find appropriate trips. The users
	exhibit complex decision-making behaviour which involve comparing trips,
	exploring different options, and selecting among the trips that were discussed
	during the dialogue. To drive research on dialogue systems towards handling
	such behaviour, we have annotated and released the dataset and we propose in
	this paper a task called frame tracking. This task consists of keeping track of
	different semantic frames throughout each dialogue. We propose a rule-based
	baseline and analyse the frame tracking task through this baseline.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>elasri-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5527">
    <title>Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks</title>
    <author><first>Gabriel</first><last>Skantze</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>220&#8211;230</pages>
    <url>http://aclweb.org/anthology/W17-5527</url>
    <abstract>Previous models of turn-taking have mostly been trained for specific
	turn-taking decisions, such as discriminating between turn shifts and turn
	retention in pauses. In this paper, we present a predictive, continuous model
	of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks
	(RNN). The model is trained on human-human dialogue data to predict upcoming
	speech activity in a future time window. We show how this general model can be
	applied to two different tasks that it was not specifically trained for. First,
	to predict whether a turn-shift will occur or not in pauses, where the model
	achieves a better performance than human observers, and better than results
	achieved with more traditional models. Second, to make a prediction at speech
	onset whether the utterance will be a short backchannel or a longer utterance.
	Finally, we show how the hidden layer in the network can be used as a feature
	vector for turn-taking decisions in a human-robot interaction scenario.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>skantze:2017:W17-55</bibkey>
  </paper>

  <paper id="5528">
    <title>Neural-based Natural Language Generation in Dialogue using RNN Encoder-Decoder with Semantic Aggregation</title>
    <author><first>Van-Khanh</first><last>Tran</last></author>
    <author><first>Le-Minh</first><last>Nguyen</last></author>
    <author><first>Satoshi</first><last>Tojo</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>231&#8211;240</pages>
    <url>http://aclweb.org/anthology/W17-5528</url>
    <abstract>Natural language generation (NLG) is an important component in spoken dialogue
	systems. This paper presents a model called Encoder-Aggregator-Decoder which is
	an extension of an Recurrent Neural Network based Encoder-Decoder architecture.
	The proposed Semantic Aggregator consists of two components: an Aligner and a
	Refiner. The Aligner is a conventional attention calculated over the encoded
	input information, while the Refiner is another attention or gating mechanism
	stacked over the attentive Aligner in order to further select and aggregate the
	semantic elements. The proposed model can be jointly trained both sentence
	planning and surface realization to produce natural language utterances.
	The model was extensively assessed on four different NLG domains, in which the
	experimental results showed that the proposed generator consistently
	outperforms the previous methods on all the NLG domains.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tran-nguyen-tojo:2017:W17-55</bibkey>
  </paper>

  <paper id="5529">
    <title>Beyond On-hold Messages: Conversational Time-buying in Task-oriented Dialogue</title>
    <author><first>Soledad</first><last>L&#243;pez Gambino</last></author>
    <author><first>Sina</first><last>Zarrie&#223;</last></author>
    <author><first>David</first><last>Schlangen</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>241&#8211;246</pages>
    <url>http://aclweb.org/anthology/W17-5529</url>
    <abstract>A common convention in graphical user interfaces is to indicate a "wait
	state", for example while a program is preparing a response, through a changed
	cursor state or a progress bar. What should the analogue be in a spoken
	conversational system?
	To address this question, we set up an experiment in which a human information
	provider (IP) was given their information only in a delayed and incremental
	manner, which systematically created situations where the IP had the turn but
	could not provide task-related information.
	Our data analysis shows that 1) IPs bridge the gap until they can provide
	information by re-purposing a whole variety of task- and grounding-related
	communicative actions (e.g. echoing the user's request, signaling
	understanding, asserting partially relevant information), rather than being
	silent or explicitly asking for time (e.g. "please wait"), and that 2) IPs
	combined these actions productively to ensure an ongoing conversation. These
	results, we argue, indicate that natural conversational interfaces should also
	be able to manage their time flexibly using a variety of conversational
	resources.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lopezgambino-zarriess-schlangen:2017:W17-55</bibkey>
  </paper>

  <paper id="5530">
    <title>Neural-based Context Representation Learning for Dialog Act Classification</title>
    <author><first>Daniel</first><last>Ortega</last></author>
    <author><first>Ngoc Thang</first><last>Vu</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>247&#8211;252</pages>
    <url>http://aclweb.org/anthology/W17-5530</url>
    <abstract>We explore context representation learning methods in neural-based models for
	dialog act classification. We propose and compare extensively different methods
	which combine recurrent neural network architectures and attention mechanisms
	(AMs) at different context levels. Our experimental results on two benchmark
	datasets show consistent improvements compared to the models without
	contextual information and reveal that the most suitable AM in the architecture
	depends on the nature of the dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ortega-vu:2017:W17-55</bibkey>
  </paper>

  <paper id="5531">
    <title>Predicting Success in Goal-Driven Human-Human Dialogues</title>
    <author><first>Michael</first><last>Noseworthy</last></author>
    <author><first>Jackie Chi Kit</first><last>Cheung</last></author>
    <author><first>Joelle</first><last>Pineau</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>253&#8211;262</pages>
    <url>http://aclweb.org/anthology/W17-5531</url>
    <abstract>In goal-driven dialogue systems, success is often defined based on a structured
	definition of the goal. This requires that the dialogue system be constrained
	to handle a specific class of goals and that there be a mechanism to measure
	success with respect to that goal. However, in many human-human dialogues the
	diversity of goals makes it infeasible to define success in such a way. To
	address this scenario, we consider the task of automatically predicting success
	in goal-driven human-human dialogues using only the information communicated
	between participants in the form of text. We build a dataset from
	stackoverflow.com which consists of exchanges between two users in the
	technical domain where ground-truth success labels are available. We then
	propose a turn-based hierarchical neural network model that can be used to
	predict success without requiring a structured goal definition. We show this
	model outperforms rule-based heuristics and other baselines as it is able to
	detect patterns over the course of a dialogue and capture notions such as
	gratitude.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>noseworthy-cheung-pineau:2017:W17-55</bibkey>
  </paper>

  <paper id="5532">
    <title>Generating and Evaluating Summaries for Partial Email Threads: Conversational Bayesian Surprise and Silver Standards</title>
    <author><first>Jordon</first><last>Johnson</last></author>
    <author><first>Vaden</first><last>Masrani</last></author>
    <author><first>Giuseppe</first><last>Carenini</last></author>
    <author><first>Raymond</first><last>Ng</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>263&#8211;272</pages>
    <url>http://aclweb.org/anthology/W17-5532</url>
    <abstract>We define and motivate the problem of summarizing partial email threads. This
	problem introduces the challenge of generating reference summaries for partial
	threads when human annotation is only available for the threads as a whole,
	particularly when the human-selected sentences are not uniformly distributed
	within the threads. We propose an oracular algorithm for generating these
	reference summaries with arbitrary length, and we are making the resulting
	dataset publicly available. In addition, we apply a recent unsupervised method
	based on Bayesian Surprise that incorporates background knowledge into partial
	thread summarization, extend it with conversational features, and modify the
	mechanism by which it handles redundancy. Experiments with our method indicate
	improved performance over the baseline for shorter partial threads; and our
	results suggest that the potential benefits of background knowledge to partial
	thread summarization should be further investigated with larger datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>johnson-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5533">
    <title>Enabling robust and fluid spoken dialogue with cognitively impaired users</title>
    <author><first>Ramin</first><last>Yaghoubzadeh</last></author>
    <author><first>Stefan</first><last>Kopp</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>273&#8211;283</pages>
    <url>http://aclweb.org/anthology/W17-5533</url>
    <abstract>We present the flexdiam dialogue management architecture, which was developed
	in a series of projects dedicated to tailoring spoken interaction to the needs
	of users with cognitive impairments in an everyday assistive domain, using a
	multimodal front-end.
	This hybrid DM architecture affords incremental processing of uncertain input,
	a flexible, mixed-initiative information grounding process that can be adapted
	to users' cognitive capacities and interactive idiosyncrasies, and generic
	mechanisms that foster transitions in the joint discourse state that
	are understandable and controllable by those users, in order to effect a robust
	interaction for users with varying capacities.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yaghoubzadeh-kopp:2017:W17-55</bibkey>
  </paper>

  <paper id="5534">
    <title>Adversarial evaluation for open-domain dialogue generation</title>
    <author><first>Elia</first><last>Bruni</last></author>
    <author><first>Raquel</first><last>Fernandez</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>284&#8211;288</pages>
    <url>http://aclweb.org/anthology/W17-5534</url>
    <abstract>We investigate                                                                       
	  the
	potential
	of
	adversarial
	evaluation
	methods for
	open-domain
	dialogue generation systems, comparing the performance of a discriminative
	agent to that of humans on the same task. Our results show that the task is
	hard, both for automated models and humans, but that a discriminative agent can
	learn patterns that lead to above-chance performance.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bruni-fernandez:2017:W17-55</bibkey>
  </paper>

  <paper id="5535">
    <title>Exploring Joint Neural Model for Sentence Level Discourse Parsing and Sentiment Analysis</title>
    <author><first>Bita</first><last>Nejat</last></author>
    <author><first>Giuseppe</first><last>Carenini</last></author>
    <author><first>Raymond</first><last>Ng</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>289&#8211;298</pages>
    <url>http://aclweb.org/anthology/W17-5535</url>
    <abstract>Discourse Parsing and Sentiment Analysis are two fundamental tasks in Natural
	Language Processing that have been shown to be mutually beneficial. In this
	work, we design and compare two Neural Based models for jointly learning both
	tasks. In the proposed approach, we first create a vector representation for
	all the text segments in the input sentence. Next, we apply three different
	Recursive Neural Net models: one for discourse structure prediction, one for
	discourse relation prediction and one for sentiment analysis. Finally, we
	combine these Neural Nets in two different joint models: Multi-tasking and
	Pre-training. Our results on two standard corpora indicate that both methods
	result in improvements in each task but Multi-tasking has a bigger impact than
	Pre-training. Specifically for Discourse Parsing, we see improvements in the
	prediction of the set of contrastive relations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nejat-carenini-ng:2017:W17-55</bibkey>
  </paper>

  <paper id="5536">
    <title>Predicting Causes of Reformulation in Intelligent Assistants</title>
    <author><first>Shumpei</first><last>Sano</last></author>
    <author><first>Nobuhiro</first><last>Kaji</last></author>
    <author><first>Manabu</first><last>Sassano</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>299&#8211;309</pages>
    <url>http://aclweb.org/anthology/W17-5536</url>
    <abstract>Intelligent assistants (IAs) such as Siri and Cortana conversationally interact
	with users and execute a wide range of actions (e.g., searching the Web,
	setting alarms, and chatting). IAs can support these actions through the
	combination of various components such as automatic speech recognition, natural
	language understanding, and language generation. However, the complexity of
	these components hinders developers from determining which component causes an
	error. To remove this hindrance, we focus on reformulation, which is a useful
	signal of user dissatisfaction, and propose a method to predict the
	reformulation causes. We evaluate the method using the user logs of a
	commercial IA. The experimental results have demonstrated that features
	designed to detect the error of a specific component improve the performance of
	reformulation cause detection.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sano-kaji-sassano:2017:W17-55</bibkey>
  </paper>

  <paper id="5537">
    <title>Are you serious?: Rhetorical Questions and Sarcasm in Social Media Dialog</title>
    <author><first>Shereen</first><last>Oraby</last></author>
    <author><first>Vrindavan</first><last>Harrison</last></author>
    <author><first>Amita</first><last>Misra</last></author>
    <author><first>Ellen</first><last>Riloff</last></author>
    <author><first>Marilyn</first><last>Walker</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>310&#8211;319</pages>
    <url>http://aclweb.org/anthology/W17-5537</url>
    <attachment type="poster">W17-5537.Poster.pdf</attachment>
    <abstract>Effective models of social dialog must understand a broad range of
	rhetorical and figurative devices. Rhetorical questions (RQs)
	are a type of figurative language whose aim is to achieve a pragmatic
	goal, such as structuring an argument, being persuasive, emphasizing a point, 
	or being ironic. While there are computational models for other forms of
	figurative language, rhetorical questions have received little
	attention to date.  We expand a small dataset from previous work, presenting a
	corpus of 10,270 RQs from debate forums and Twitter that represent different
	discourse functions. We show that we can clearly distinguish between RQs and
	sincere questions (0.76 F1). We then show that RQs can be used both
	sarcastically and non-sarcastically, observing that non-sarcastic (other) uses
	of RQs are frequently argumentative in forums, and persuasive in tweets. We
	present experiments to distinguish between these uses of RQs using SVM and LSTM
	models that represent linguistic features and post-level context, achieving
	results as high as 0.76 F1 for "sarcastic" and 0.77 F1 for "other" in forums,
	and 0.83 F1 for both "sarcastic" and "other" in tweets. We supplement our
	quantitative experiments with an in-depth characterization of the linguistic
	variation in RQs.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>oraby-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5538">
    <title>Finding Structure in Figurative Language: Metaphor Detection with Topic-based Frames</title>
    <author><first>Hyeju</first><last>Jang</last></author>
    <author><first>Keith</first><last>Maki</last></author>
    <author><first>Eduard</first><last>Hovy</last></author>
    <author><first>Carolyn</first><last>Rose</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>320&#8211;330</pages>
    <url>http://aclweb.org/anthology/W17-5538</url>
    <abstract>In this paper, we present a novel and highly effective method for induction and
	application of metaphor frame templates as a step toward detecting metaphor in
	extended discourse. We infer implicit facets of a given metaphor frame using a
	semi-supervised bootstrapping approach on an unlabeled corpus. Our model
	applies this frame facet information to metaphor detection, and achieves the
	state-of-the-art performance on a social media dataset when building upon other
	proven features in a nonlinear machine learning model. In addition, we
	illustrate the mechanism through which the frame and topic information enable
	the more accurate metaphor detection.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jang-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5539">
    <title>Using Reinforcement Learning to Model Incrementality in a Fast-Paced Dialogue Game</title>
    <author><first>Ramesh</first><last>Manuvinakurike</last></author>
    <author><first>David</first><last>DeVault</last></author>
    <author><first>Kallirroi</first><last>Georgila</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>331&#8211;341</pages>
    <url>http://aclweb.org/anthology/W17-5539</url>
    <abstract>We apply Reinforcement Learning (RL) to the problem of incremental dialogue
	policy learning in the context of a fast-paced dialogue game. We compare the
	policy learned by RL with a high-performance baseline policy which has been
	shown to perform very efficiently (nearly as well as humans) in this dialogue
	game. The RL policy outperforms the baseline policy in offline simulations
	(based on real user data). We provide a detailed comparison of the RL policy
	and the baseline policy, including information about how much effort and time
	it took to develop each one of them. We also highlight the cases where the RL
	policy performs better, and show that understanding the RL policy can provide
	valuable insights which can inform the creation of an even better rule-based
	policy.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>manuvinakurike-devault-georgila:2017:W17-55</bibkey>
  </paper>

  <paper id="5540">
    <title>Inferring Narrative Causality between Event Pairs in Films</title>
    <author><first>Zhichao</first><last>Hu</last></author>
    <author><first>Marilyn</first><last>Walker</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>342&#8211;351</pages>
    <url>http://aclweb.org/anthology/W17-5540</url>
    <attachment type="poster">W17-5540.Poster.pdf</attachment>
    <abstract>To understand narrative, humans draw inferences about the underlying relations
	between narrative events.  Cognitive theories of narrative understanding define
	these inferences as four different types of causality, that include pairs of
	events A, B where A physically causes B (X drop, X break), to pairs of events
	where A causes emotional state B (Y saw X, Y felt fear). Previous work on
	learning narrative relations from text has either focused on "strict"
	physical causality, or has been vague about what relation is being learned.
	This paper learns pairs of causal events from a corpus of film scene
	descriptions which are action rich and tend to be told in chronological order. 
	We show that event pairs induced using our methods are of high quality and are
	judged to have a stronger causal relation than event pairs from Rel-Grams.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hu-walker:2017:W17-55</bibkey>
  </paper>

  <paper id="5541">
    <title>Lessons in Dialogue System Deployment</title>
    <author><first>Anton</first><last>Leuski</last></author>
    <author><first>Ron</first><last>Artstein</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>352&#8211;355</pages>
    <url>http://aclweb.org/anthology/W17-5541</url>
    <abstract>We analyze deployment of an interactive dialogue system in an environment where
	deep technical expertise might not be readily available. The initial version
	was created using a collection of research tools.
	We summarize a number of challenges with its deployment at two museums and
	describe a new system that simplifies the installation and user interface;
	reduces reliance on 3rd-party software; and provides a robust data collection
	mechanism.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>leuski-artstein:2017:W17-55</bibkey>
  </paper>

  <paper id="5542">
    <title>Information Navigation System with Discovering User Interests</title>
    <author><first>Koichiro</first><last>Yoshino</last></author>
    <author><first>Yu</first><last>Suzuki</last></author>
    <author><first>Satoshi</first><last>Nakamura</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>356&#8211;359</pages>
    <url>http://aclweb.org/anthology/W17-5542</url>
    <abstract>We demonstrate an information navigation system for sightseeing domains that
	has 
	a dialogue interface for discovering user interests for tourist activities.
	The system discovers interests of a user with focus detection on user
	utterances, 
	and  proactively presents related information to the discovered user interest.
	A partially observable Markov decision process (POMDP)-based dialogue manager,
	which is extended with user focus states,
	controls the behavior of the system to provide information with several
	dialogue
	acts for providing information.
	We transferred the belief-update function and the policy of the manager from
	other system 
	trained on a different domain to show the generality of defined dialogue acts
	for 
	our information navigation system.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yoshino-suzuki-nakamura:2017:W17-55</bibkey>
  </paper>

  <paper id="5543">
    <title>Modelling Protagonist Goals and Desires in First-Person Narrative</title>
    <author><first>Elahe</first><last>Rahimtoroghi</last></author>
    <author><first>Jiaqi</first><last>Wu</last></author>
    <author><first>Ruimin</first><last>Wang</last></author>
    <author><first>Pranav</first><last>Anand</last></author>
    <author><first>Marilyn</first><last>Walker</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>360&#8211;369</pages>
    <url>http://aclweb.org/anthology/W17-5543</url>
    <attachment type="presentation">W17-5543.Presentation.pdf</attachment>
    <abstract>Many genres of natural language text are narratively structured, a testament to
	our predilection for organizing our experiences as narratives. There is broad
	consensus that understanding a narrative requires identifying and tracking the
	goals and desires of the characters and their narrative outcomes. However, to
	date, there has been limited work on computational models for this problem. We
	introduce a new dataset, DesireDB, which includes gold-standard labels for
	identifying statements of desire, textual evidence for desire fulfillment, and
	annotations for whether the stated desire is fulfilled given the evidence in
	the narrative context. We report experiments on tracking desire fulfillment
	using
	different methods, and show that LSTM Skip-Thought model achieves F-measure
	of 0.7 on our corpus.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rahimtoroghi-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5544">
    <title>SHIHbot: A Facebook chatbot for Sexual Health Information on HIV/AIDS</title>
    <author><first>Jacqueline</first><last>Brixey</last></author>
    <author><first>Rens</first><last>Hoegen</last></author>
    <author><first>Wei</first><last>Lan</last></author>
    <author><first>Joshua</first><last>Rusow</last></author>
    <author><first>Karan</first><last>Singla</last></author>
    <author><first>Xusen</first><last>Yin</last></author>
    <author><first>Ron</first><last>Artstein</last></author>
    <author><first>Anton</first><last>Leuski</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>370&#8211;373</pages>
    <url>http://aclweb.org/anthology/W17-5544</url>
    <abstract>We present the implementation of an autonomous chatbot, SHIHbot, deployed on
	Facebook, which answers a wide variety of sexual health questions on HIV/AIDS.
	The chatbot's response database is com-piled from professional medical and
	public health resources in order to provide reliable information to users. The
	system's backend is NPCEditor, a response selection platform trained on linked
	questions and answers; to our knowledge this is the first retrieval-based
	chatbot deployed on a large public social network.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>brixey-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5545">
    <title>How Would You Say It? Eliciting Lexically Diverse Dialogue for Supervised Semantic Parsing</title>
    <author><first>Abhilasha</first><last>Ravichander</last></author>
    <author><first>Thomas</first><last>Manzini</last></author>
    <author><first>Matthias</first><last>Grabmair</last></author>
    <author><first>Graham</first><last>Neubig</last></author>
    <author><first>Jonathan</first><last>Francis</last></author>
    <author><first>Eric</first><last>Nyberg</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>374&#8211;383</pages>
    <url>http://aclweb.org/anthology/W17-5545</url>
    <abstract>Building dialogue interfaces for real-world scenarios often entails training
	semantic parsers starting from zero examples. How can we build datasets that
	better capture the variety of ways users might phrase their queries, and what
	queries are actually realistic?  ėwcite{Wang2015BuildingAS} proposed a method
	to build semantic parsing datasets by generating canonical utterances using a
	grammar and having crowdworkers paraphrase them into natural wording. A
	limitation of this approach is that it induces bias towards using similar
	language as the canonical utterances.  In this work, we present a methodology
	that elicits meaningful and lexically diverse queries from users for semantic
	parsing tasks. Starting from a seed lexicon and a generative grammar, we pair
	logical forms with mixed text-image representations and ask crowdworkers to
	paraphrase and confirm the plausibility of the queries that they generated. We
	use this method to build a semantic parsing dataset from scratch for a dialog
	agent in a smart-home simulation. We find evidence that this dataset, which we
	have named SmartHome, is demonstrably more lexically diverse and difficult to
	parse than existing domain-specific semantic parsing datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ravichander-EtAl:2017:W17-55</bibkey>
  </paper>

  <paper id="5546">
    <title>Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models</title>
    <author><first>Pierre</first><last>Lison</last></author>
    <author><first>Serge</first><last>Bibauw</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>384&#8211;394</pages>
    <url>http://aclweb.org/anthology/W17-5546</url>
    <abstract>Neural conversational models require substantial amounts of dialogue data to
	estimate their parameters and are therefore usually learned on large corpora
	such as chat forums or movie subtitles. These corpora are, however, often
	challenging to work with, notably due to their frequent lack of turn
	segmentation and the presence of multiple references external to the dialogue
	itself. This paper shows that these challenges can be mitigated by adding a
	weighting model into the architecture. The weighting model, which is itself
	estimated from dialogue data, associates each training example to a numerical
	weight that reflects its intrinsic quality for dialogue modelling. At training
	time, these sample weights are included into the empirical loss to be
	minimised. Evaluation results on retrieval-based models trained on movie and TV
	subtitles demonstrate that the inclusion of such a weighting model improves the
	model performance on unsupervised metrics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lison-bibauw:2017:W17-55</bibkey>
  </paper>

  <paper id="5547">
    <title>A data-driven model of explanations for a chatbot that helps to practice conversation in a foreign language</title>
    <author><first>Sviatlana</first><last>H&#246;hn</last></author>
    <booktitle>Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Saarbr&#252;cken, Germany</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>395&#8211;405</pages>
    <url>http://aclweb.org/anthology/W17-5547</url>
    <abstract>This article describes a model of other-initiated self-repair for a chatbot
	that helps to practice conversation in a foreign language. The model was
	developed using a corpus of instant messaging conversations between German
	native and non-native speakers. Conversation Analysis helped to create
	computational models from a small number of examples. The model has been
	validated in an AIML-based chatbot. Unlike typical retrieval-based dialogue
	systems, the explanations are generated at run-time from a linguistic database.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hohn:2017:W17-55</bibkey>
  </paper>

</volume>

