Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue Kristiina Jokinen Manfred Stede David DeVault Annie Louis August 2017

Saarbrücken, Germany

Association for Computational Linguistics http://aclweb.org/anthology/W17-55 book W17-55:2017 Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations MajidLaali LeilaKosseim Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 1–6 http://aclweb.org/anthology/W17-5501 In this paper, we present an approach to exploit phrase tables generated by statistical machine translation in order to map French discourse connectives to discourse relations. Using this approach, we created DisCoRel, a lexicon of French discourse connectives and their PDTB relations. When evaluated against LEXCONN, DisCoRel achieves a recall of 0.81 and an Average Precision of 0.68 for the Concession and Condition relations. inproceedings laali-kosseim:2017:W17-55 Towards Full Text Shallow Discourse Relation Annotation: Experiments with Cross-Paragraph Implicit Relations in the PDTB RashmiPrasad KatherineForbes-Riley AlanLee Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 7–16 http://aclweb.org/anthology/W17-5502 Full text discourse parsing relies on texts comprehensively annotated with discourse relations. To this end, we address a significant gap in the inter-sentential discourse relations annotated in the Penn Discourse Treebank (PDTB), namely the class of cross-paragraph implicit relations, which account for 30% of inter-sentential relations in the corpus. We present our annotation study to explore the incidence rate of adjacent vs. non-adjacent implicit relations in cross-paragraph contexts, and the relative degree of difficulty in annotating them. Our experiments show a high incidence of non-adjacent relations that are difficult to annotate reliably, suggesting the practicality of backing off from their annotation to reduce noise for corpus-based studies. Our resulting guidelines follow the PDTB adjacency constraint for implicits while employing an underspecified representation of non-adjacent implicits, and yield 62% inter-annotator agreement on this task. inproceedings prasad-forbesriley-lee:2017:W17-55 User-initiated Sub-dialogues in State-of-the-art Dialogue Systems StaffanLarsson Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 17–22 http://aclweb.org/anthology/W17-5503 We test state of the art dialogue systems for their behaviour in response to user-initiated sub-dialogues, i.e. interactions where a system question is responded to with a question or request from the user, who thus initiates a sub-dialogue. We look at sub-dialogues both within a single app (where the sub-dialogue concerns another topic in the original domain) and across apps (where the sub-dialogue concerns a different domain). The overall conclusion of the tests is that none of the systems can be said to deal appropriately with user-initiated sub-dialogues. inproceedings larsson:2017:W17-55 A Multimodal Dialogue System for Medical Decision Support inside Virtual Reality AlexanderPrange MargaritaChikobava PeterPoller MichaelBarz DanielSonntag Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 23–26 http://aclweb.org/anthology/W17-5504 We present a multimodal dialogue system that allows doctors to interact with a medical decision support system in virtual reality (VR). We integrate an interactive visualization of patient records and radiology image data, as well as therapy predictions. Therapy predictions are computed in real-time using a deep learning model. inproceedings prange-EtAl:2017:W17-55 Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability TianchengZhao AllenLu KyusongLee MaxineEskenazi Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 27–36 http://aclweb.org/anthology/W17-5505 Generative encoder-decoder models offer great promise in developing domain-general dialog systems. However, they have mainly been applied to open-domain conversations. This paper presents a practical and novel framework for building task-oriented dialog systems based on encoder-decoder models. This framework enables encoder-decoder models to accomplish slot-value independent decision-making and interact with external databases. Moreover, this paper shows the flexibility of the proposed method by interleaving chatting capability with a slot-filling system for better out-of-domain recovery. The models were trained on both real-user data from a bus information system and human-human chat data. Results show that the proposed framework achieves good performance in both offline evaluation metrics and in task success rate with human users. inproceedings zhao-EtAl:2017:W17-55 Key-Value Retrieval Networks for Task-Oriented Dialogue MihailEric LakshmiKrishnan FrancoisCharette Christopher D.Manning Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 37–49 http://aclweb.org/anthology/W17-5506 Neural task-oriented dialogue systems often struggle to smoothly interface with a knowledge base. In this work, we seek to address this problem by proposing a new neural dialogue agent that is able to effectively sustain grounded, multi-domain discourse through a novel key-value retrieval mechanism. The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. We also release a new dataset of 3,031 dialogues that are grounded through underlying knowledge bases and span three distinct tasks in the in-car personal assistant space: calendar scheduling, weather information retrieval, and point-of-interest navigation. Our architecture is simultaneously trained on data from all domains and significantly outperforms a competitive rule-based system and other existing neural dialogue architectures on the provided domains according to both automatic and human evaluation metrics. inproceedings eric-EtAl:2017:W17-55 Lexical Acquisition through Implicit Confirmations over Multiple Dialogues KoheiOno RyuTakeda EricNichols MikioNakano KazunoriKomatani Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 50–59 http://aclweb.org/anthology/W17-5507 We address the problem of acquiring the ontological categories of unknown terms through implicit confirmation in dialogues. We develop an approach that makes implicit confirmation requests with an unknown term's predicted category. Our approach does not degrade user experience with repetitive explicit confirmations, but the system has difficulty determining if information in the confirmation request can be correctly acquired. To overcome this challenge, we propose a method for determining whether or not the predicted category is correct, which is included in an implicit confirmation request. Our method exploits multiple user responses to implicit confirmation requests containing the same ontological category. Experimental results revealed that the proposed method exhibited a higher precision rate for determining the correctly predicted categories than when only single user responses were considered. inproceedings ono-EtAl:2017:W17-55 Utterance Intent Classification of a Spoken Dialogue System with Efficiently Untied Recursive Autoencoders TsuneoKato AtsushiNagai NaokiNoda RyosukeSumitomo JianmingWu SeiichiYamamoto Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 60–64 http://aclweb.org/anthology/W17-5508 Recursive autoencoders (RAEs) for compositionality of a vector space model were applied to utterance intent classification of a smartphone-based Japanese-language spoken dialogue system. Though the RAEs express a nonlinear operation on the vectors of child nodes, the operation is considered to be different intrinsically depending on types of child nodes. To relax the difference, a data-driven untying of autoencoders (AEs) is proposed. The experimental result of the utterance intent classification showed an improved accuracy with the proposed method compared with the basic tied RAE and untied RAE based on a manual rule. inproceedings kato-EtAl:2017:W17-55 Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning StefanUltes PawełBudzianowski IñigoCasanueva NikolaMrkšić Lina M.Rojas Barahona Pei-HaoSu Tsung-HsienWen MilicaGasic SteveYoung Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 65–70 http://aclweb.org/anthology/W17-5509 Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline. inproceedings ultes-EtAl:2017:W17-55 Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction GuillaumeDubuisson Duplessis ChloéClavel FrédéricLandragin Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 71–81 http://aclweb.org/anthology/W17-5510 This work aims at characterising verbal alignment processes for improving virtual agent communicative capabilities. We propose computationally inexpensive measures of verbal alignment based on expression repetition in dyadic textual dialogues. Using these measures, we present a contrastive study between Human-Human and Human-Agent dialogues on a negotiation task. We exhibit quantitative differences in the strength and orientation of verbal alignment showing the ability of our approach to characterise important aspects of verbal alignment. inproceedings dubuissonduplessis-clavel-landragin:2017:W17-55 Demonstration of interactive teaching for end-to-end dialog control with hybrid code networks Jason DWilliams LarsLiden Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 82–85 http://aclweb.org/anthology/W17-5511 This is a demonstration of interactive teaching for practical end-to-end dialog systems driven by a recurrent neural network. In this approach, a developer teaches the network by interacting with the system and providing on-the-spot corrections. Once a system is deployed, a developer can also correct mistakes in logged dialogs. This demonstration shows both of these teaching methods applied to dialog systems in three domains: pizza ordering, restaurant information, and weather forecasts. inproceedings williams-liden:2017:W17-55 Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning PawełBudzianowski StefanUltes Pei-HaoSu NikolaMrkšić Tsung-HsienWen IñigoCasanueva Lina M.Rojas Barahona MilicaGasic Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 86–92 http://aclweb.org/anthology/W17-5512 Human conversation is inherently complex, often spanning many different topics/domains. This makes policy learning for dialogue systems very challenging. Standard flat reinforcement learning methods do not provide an efficient framework for modelling such dialogues. In this paper, we focus on the under-explored problem of multi-domain dialogue management. First, we propose a new method for hierarchical reinforcement learning using the option framework. Next, we show that the proposed architecture learns faster and arrives at a better policy than the existing flat ones do. Moreover, we show how pretrained policies can be adapted to more complex systems with an additional set of new actions. In doing that, we show that our approach has the potential to facilitate policy optimisation for more sophisticated multi-domain dialogue systems. inproceedings budzianowski-EtAl:2017:W17-55 MACA: A Modular Architecture for Conversational Agents Hoai PhuocTruong PrasannaParthasarathi JoellePineau Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 93–102 http://aclweb.org/anthology/W17-5513 We propose a software architecture designed to ease the implementation of dialogue systems. The Modular Architecture for Conversational Agents (MACA) uses a plug-n-play style that allows quick prototyping, thereby facilitating the development of new techniques and the reproduction of previous work. The architecture separates the domain of the conversation from the agent's dialogue strategy, and as such can be easily extended to multiple domains. MACA provides tools to host dialogue agents on Amazon Mechanical Turk (mTurk) for data collection and allows processing of other sources of training data. The current version of the framework already incorporates several domains and existing dialogue strategies from the recent literature. inproceedings truong-parthasarathi-pineau:2017:W17-55 Sequential Dialogue Context Modeling for Spoken Language Understanding AnkurBapna GokhanTur DilekHakkani-Tur LarryHeck Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 103–114 http://aclweb.org/anthology/W17-5514 Spoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a recurrent neural network (RNN) based language understanding system. We propose the Sequential Dialogue Encoder Network, that allows encoding context from the dialogue history in chronological order. We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history. Experiments with a multi-domain dialogue dataset demonstrate that the proposed architecture results in reduced semantic frame error rates. inproceedings bapna-EtAl:2017:W17-55 Redundancy Localization for the Conversationalization of Unstructured Responses SebastianKrause MikhailKozhevnikov EricMalmi DanielePighin Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 115–126 http://aclweb.org/anthology/W17-5515 Conversational agents offer users a natural-language interface to accomplish tasks, entertain themselves, or access information. Informational dialogue is particularly challenging in that the agent has to hold a conversation on an open topic, and to achieve a reasonable coverage it generally needs to digest and present unstructured information from textual sources. Making responses based on such sources sound natural and fit appropriately into the conversation context is a topic of ongoing research, one of the key issues of which is preventing the agent's responses from sounding repetitive. Targeting this issue, we propose a new task, known as redundancy localization, which aims to pinpoint semantic overlap between text passages. To help address it systematically, we formalize the task, prepare a public dataset with fine-grained redundancy labels, and propose a model utilizing a weak training signal defined over the results of a passage-retrieval system on web texts. The proposed model demonstrates superior performance compared to a state-of-the-art entailment model and yields encouraging results when applied to a real-world dialogue. inproceedings krause-EtAl:2017:W17-55 Attentive listening system with backchanneling, response generation and flexible turn-taking DiveshLala PierrickMilhorat KojiInoue MasanariIshida KatsuyaTakanashi TatsuyaKawahara Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 127–136 http://aclweb.org/anthology/W17-5516 Attentive listening systems are designed to let people, especially senior people, keep talking to maintain communication ability and mental health. This paper addresses key components of an attentive listening system which encourages users to talk smoothly. First, we introduce continuous prediction of end-of-utterances and generation of backchannels, rather than generating backchannels after end-point detection of utterances. This improves subjective evaluations of backchannels. Second, we propose an effective statement response mechanism which detects focus words and responds in the form of a question or partial repeat. This can be applied to any statement. Moreover, a flexible turn-taking mechanism is designed which uses backchannels or fillers when the turn-switch is ambiguous. These techniques are integrated into a humanoid robot to conduct attentive listening. We test the feasibility of the system in a pilot experiment and show that it can produce coherent dialogues during conversation. inproceedings lala-EtAl:2017:W17-55 Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural? PatriciaBraunger WolfgangMaier Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 137–146 http://aclweb.org/anthology/W17-5517 Recent spoken dialog systems are moving away from command and control towards a more intuitive and natural style of interaction. In order to choose an appropriate system design which allows the system to deal with naturally spoken user input, a definition of what exactly constitutes naturalness in user input is important. In this paper, we examine how different user groups naturally speak to an automotive spoken dialog system (SDS). We conduct a user study in which we collect freely spoken user utterances for a wide range of use cases in German. By means of a comparative study of the utterances from the study with interpersonal utterances, we provide criteria what constitutes naturalness in the user input of an state-of-the-art automotive SDS. inproceedings braunger-maier:2017:W17-55 Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management Pei-HaoSu PawełBudzianowski StefanUltes MilicaGasic SteveYoung Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 147–157 http://aclweb.org/anthology/W17-5518 Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially problematic for on-line learning with real users. Two approaches are introduced to tackle this problem. Firstly, to speed up the learning process, two sample-efficient neural networks algorithms: trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER) are presented. For TRACER, the trust region helps to control the learning step size and avoid catastrophic model changes. For eNACER, the natural gradient identifies the steepest ascent direction in policy space to speed up the convergence. Both models employ off-policy learning with experience replay to improve sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning. Combining these two approaches, we demonstrate a practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain. inproceedings su-EtAl:2017:W17-55 A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset ShubhamAgarwal MarcDymetman Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 158–163 http://aclweb.org/anthology/W17-5519 We train a char2char model on the E2E NLG Challenge data, by exploiting "out-of-the-box" the recently released tfseq2seq framework, using some of the standard options offered by this tool. With minimal effort, and in particular without delexicalization, tokenization or lowercasing, the obtained raw predictions, according to a small scale human evaluation, are excellent on the linguistic side and quite reasonable on the adequacy side, the primary downside being the possible omissions of semantic material. However, in a significant number of cases (more than 70%), a perfect solution can be found in the top-20 predictions, indicating promising directions for solving the remaining issues. inproceedings agarwal-dymetman:2017:W17-55 Interaction Quality Estimation Using Long Short-Term Memories NiklasRach WolfgangMinker StefanUltes Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 164–169 http://aclweb.org/anthology/W17-5520 For estimating the Interaction Quality (IQ) in Spoken Dialogue Systems (SDS), the dialogue history is of significant importance. Previous works included this information manually in the form of precomputed temporal features into the classification process. Here, we employ a deep learning architecture based on Long Short-Term Memories (LSTM) to extract this information automatically from the data, thus estimating IQ solely by using current exchange features. We show that it is thereby possible to achieve competitive results as in a scenario where manually optimized temporal features have been included. inproceedings rach-minker-ultes:2017:W17-55 DialPort, Gone Live: An Update After A Year of Development KyusongLee TianchengZhao YulunDu EdwardCai AllenLu EliPincus DavidTraum StefanUltes Lina M.Rojas Barahona MilicaGasic SteveYoung MaxineEskenazi Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 170–173 http://aclweb.org/anthology/W17-5521 DialPort collects user data for connected spoken dialog systems. At present six systems are linked to a central portal that directs the user to the applicable system and suggests systems that the user may be interested in. User data has started to flow into the system. inproceedings lee-EtAl:2017:W17-55 Evaluating Natural Language Understanding Services for Conversational Question Answering Systems DanielBraun AdrianHernandez-Mendez FlorianMatthes ManfredLangen Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 174–185 http://aclweb.org/anthology/W17-5522 Conversational interfaces recently gained a lot of attention. One of the reasons for the current hype is the fact that chatbots (one particularly popular form of conversational interfaces) nowadays can be created without any programming knowledge, thanks to different toolkits and so-called Natural Language Understanding (NLU) services. While these NLU services are already widely used in both, industry and science, so far, they have not been analysed systematically. In this paper, we present a method to evaluate the classification performance of NLU services. Moreover, we present two new corpora, one consisting of annotated questions and one consisting of annotated questions with the corresponding answers. Based on these corpora, we conduct an evaluation of some of the most popular NLU services. Thereby we want to enable both, researchers and companies to make more educated decisions about which service they should use. inproceedings braun-EtAl:2017:W17-55 The Role of Conversation Context for Sarcasm Detection in Online Interactions DebanjanGhosh AlexanderRichard Fabbri SmarandaMuresan Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 186–196 http://aclweb.org/anthology/W17-5523 Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker's sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktäschel et al. 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task. inproceedings ghosh-richardfabbri-muresan:2017:W17-55 VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System) YanchaoYu ArashEshghi OliverLemon Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 197–200 http://aclweb.org/anthology/W17-5524 We present VOILA: an optimised, multi- modal dialogue agent for interactive learning of visually grounded word meanings from a human user. VOILA is: (1) able to learn new visual categories interactively from users from scratch; (2) trained on real human-human dialogues in the same domain, and so is able to conduct natural spontaneous dialogue; (3) optimised to find the most effective trade-off between the accuracy of the visual categories it learns and the cost it incurs to users. VOILA is deployed on Furhat, a human-like, multi-modal robot head with back-projection of the face, and a graphical virtual character. inproceedings yu-eshghi-lemon:2017:W17-55 The E2E Dataset: New Challenges For End-to-End Generation JekaterinaNovikova OndřejDušek VerenaRieser Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 201–206 http://aclweb.org/anthology/W17-5525 This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data. inproceedings novikova-duvsek-rieser:2017:W17-55 Frames: a corpus for adding memory to goal-oriented dialogue systems LaylaEl Asri HannesSchulz ShikharSharma JeremieZumer JustinHarris EmeryFine RahulMehrotra KaheerSuleman Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 207–219 http://aclweb.org/anthology/W17-5526 This paper proposes a new dataset, Frames, composed of 1369 human-human dialogues with an average of 15 turns per dialogue. This corpus contains goal-oriented dialogues between users who are given some constraints to book a trip and assistants who search a database to find appropriate trips. The users exhibit complex decision-making behaviour which involve comparing trips, exploring different options, and selecting among the trips that were discussed during the dialogue. To drive research on dialogue systems towards handling such behaviour, we have annotated and released the dataset and we propose in this paper a task called frame tracking. This task consists of keeping track of different semantic frames throughout each dialogue. We propose a rule-based baseline and analyse the frame tracking task through this baseline. inproceedings elasri-EtAl:2017:W17-55 Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks GabrielSkantze Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 220–230 http://aclweb.org/anthology/W17-5527 Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turn-taking decisions in a human-robot interaction scenario. inproceedings skantze:2017:W17-55 Neural-based Natural Language Generation in Dialogue using RNN Encoder-Decoder with Semantic Aggregation Van-KhanhTran Le-MinhNguyen SatoshiTojo Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 231–240 http://aclweb.org/anthology/W17-5528 Natural language generation (NLG) is an important component in spoken dialogue systems. This paper presents a model called Encoder-Aggregator-Decoder which is an extension of an Recurrent Neural Network based Encoder-Decoder architecture. The proposed Semantic Aggregator consists of two components: an Aligner and a Refiner. The Aligner is a conventional attention calculated over the encoded input information, while the Refiner is another attention or gating mechanism stacked over the attentive Aligner in order to further select and aggregate the semantic elements. The proposed model can be jointly trained both sentence planning and surface realization to produce natural language utterances. The model was extensively assessed on four different NLG domains, in which the experimental results showed that the proposed generator consistently outperforms the previous methods on all the NLG domains. inproceedings tran-nguyen-tojo:2017:W17-55 Beyond On-hold Messages: Conversational Time-buying in Task-oriented Dialogue SoledadLópez Gambino SinaZarrieß DavidSchlangen Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 241–246 http://aclweb.org/anthology/W17-5529 A common convention in graphical user interfaces is to indicate a "wait state", for example while a program is preparing a response, through a changed cursor state or a progress bar. What should the analogue be in a spoken conversational system? To address this question, we set up an experiment in which a human information provider (IP) was given their information only in a delayed and incremental manner, which systematically created situations where the IP had the turn but could not provide task-related information. Our data analysis shows that 1) IPs bridge the gap until they can provide information by re-purposing a whole variety of task- and grounding-related communicative actions (e.g. echoing the user's request, signaling understanding, asserting partially relevant information), rather than being silent or explicitly asking for time (e.g. "please wait"), and that 2) IPs combined these actions productively to ensure an ongoing conversation. These results, we argue, indicate that natural conversational interfaces should also be able to manage their time flexibly using a variety of conversational resources. inproceedings lopezgambino-zarriess-schlangen:2017:W17-55 Neural-based Context Representation Learning for Dialog Act Classification DanielOrtega Ngoc ThangVu Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 247–252 http://aclweb.org/anthology/W17-5530 We explore context representation learning methods in neural-based models for dialog act classification. We propose and compare extensively different methods which combine recurrent neural network architectures and attention mechanisms (AMs) at different context levels. Our experimental results on two benchmark datasets show consistent improvements compared to the models without contextual information and reveal that the most suitable AM in the architecture depends on the nature of the dataset. inproceedings ortega-vu:2017:W17-55 Predicting Success in Goal-Driven Human-Human Dialogues MichaelNoseworthy Jackie Chi KitCheung JoellePineau Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 253–262 http://aclweb.org/anthology/W17-5531 In goal-driven dialogue systems, success is often defined based on a structured definition of the goal. This requires that the dialogue system be constrained to handle a specific class of goals and that there be a mechanism to measure success with respect to that goal. However, in many human-human dialogues the diversity of goals makes it infeasible to define success in such a way. To address this scenario, we consider the task of automatically predicting success in goal-driven human-human dialogues using only the information communicated between participants in the form of text. We build a dataset from stackoverflow.com which consists of exchanges between two users in the technical domain where ground-truth success labels are available. We then propose a turn-based hierarchical neural network model that can be used to predict success without requiring a structured goal definition. We show this model outperforms rule-based heuristics and other baselines as it is able to detect patterns over the course of a dialogue and capture notions such as gratitude. inproceedings noseworthy-cheung-pineau:2017:W17-55 Generating and Evaluating Summaries for Partial Email Threads: Conversational Bayesian Surprise and Silver Standards JordonJohnson VadenMasrani GiuseppeCarenini RaymondNg Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 263–272 http://aclweb.org/anthology/W17-5532 We define and motivate the problem of summarizing partial email threads. This problem introduces the challenge of generating reference summaries for partial threads when human annotation is only available for the threads as a whole, particularly when the human-selected sentences are not uniformly distributed within the threads. We propose an oracular algorithm for generating these reference summaries with arbitrary length, and we are making the resulting dataset publicly available. In addition, we apply a recent unsupervised method based on Bayesian Surprise that incorporates background knowledge into partial thread summarization, extend it with conversational features, and modify the mechanism by which it handles redundancy. Experiments with our method indicate improved performance over the baseline for shorter partial threads; and our results suggest that the potential benefits of background knowledge to partial thread summarization should be further investigated with larger datasets. inproceedings johnson-EtAl:2017:W17-55 Enabling robust and fluid spoken dialogue with cognitively impaired users RaminYaghoubzadeh StefanKopp Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 273–283 http://aclweb.org/anthology/W17-5533 We present the flexdiam dialogue management architecture, which was developed in a series of projects dedicated to tailoring spoken interaction to the needs of users with cognitive impairments in an everyday assistive domain, using a multimodal front-end. This hybrid DM architecture affords incremental processing of uncertain input, a flexible, mixed-initiative information grounding process that can be adapted to users' cognitive capacities and interactive idiosyncrasies, and generic mechanisms that foster transitions in the joint discourse state that are understandable and controllable by those users, in order to effect a robust interaction for users with varying capacities. inproceedings yaghoubzadeh-kopp:2017:W17-55 Adversarial evaluation for open-domain dialogue generation EliaBruni RaquelFernandez Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 284–288 http://aclweb.org/anthology/W17-5534 We investigate the potential of adversarial evaluation methods for open-domain dialogue generation systems, comparing the performance of a discriminative agent to that of humans on the same task. Our results show that the task is hard, both for automated models and humans, but that a discriminative agent can learn patterns that lead to above-chance performance. inproceedings bruni-fernandez:2017:W17-55 Exploring Joint Neural Model for Sentence Level Discourse Parsing and Sentiment Analysis BitaNejat GiuseppeCarenini RaymondNg Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 289–298 http://aclweb.org/anthology/W17-5535 Discourse Parsing and Sentiment Analysis are two fundamental tasks in Natural Language Processing that have been shown to be mutually beneficial. In this work, we design and compare two Neural Based models for jointly learning both tasks. In the proposed approach, we first create a vector representation for all the text segments in the input sentence. Next, we apply three different Recursive Neural Net models: one for discourse structure prediction, one for discourse relation prediction and one for sentiment analysis. Finally, we combine these Neural Nets in two different joint models: Multi-tasking and Pre-training. Our results on two standard corpora indicate that both methods result in improvements in each task but Multi-tasking has a bigger impact than Pre-training. Specifically for Discourse Parsing, we see improvements in the prediction of the set of contrastive relations. inproceedings nejat-carenini-ng:2017:W17-55 Predicting Causes of Reformulation in Intelligent Assistants ShumpeiSano NobuhiroKaji ManabuSassano Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 299–309 http://aclweb.org/anthology/W17-5536 Intelligent assistants (IAs) such as Siri and Cortana conversationally interact with users and execute a wide range of actions (e.g., searching the Web, setting alarms, and chatting). IAs can support these actions through the combination of various components such as automatic speech recognition, natural language understanding, and language generation. However, the complexity of these components hinders developers from determining which component causes an error. To remove this hindrance, we focus on reformulation, which is a useful signal of user dissatisfaction, and propose a method to predict the reformulation causes. We evaluate the method using the user logs of a commercial IA. The experimental results have demonstrated that features designed to detect the error of a specific component improve the performance of reformulation cause detection. inproceedings sano-kaji-sassano:2017:W17-55 Are you serious?: Rhetorical Questions and Sarcasm in Social Media Dialog ShereenOraby VrindavanHarrison AmitaMisra EllenRiloff MarilynWalker Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 310–319 http://aclweb.org/anthology/W17-5537 W17-5537.Poster.pdf Effective models of social dialog must understand a broad range of rhetorical and figurative devices. Rhetorical questions (RQs) are a type of figurative language whose aim is to achieve a pragmatic goal, such as structuring an argument, being persuasive, emphasizing a point, or being ironic. While there are computational models for other forms of figurative language, rhetorical questions have received little attention to date. We expand a small dataset from previous work, presenting a corpus of 10,270 RQs from debate forums and Twitter that represent different discourse functions. We show that we can clearly distinguish between RQs and sincere questions (0.76 F1). We then show that RQs can be used both sarcastically and non-sarcastically, observing that non-sarcastic (other) uses of RQs are frequently argumentative in forums, and persuasive in tweets. We present experiments to distinguish between these uses of RQs using SVM and LSTM models that represent linguistic features and post-level context, achieving results as high as 0.76 F1 for "sarcastic" and 0.77 F1 for "other" in forums, and 0.83 F1 for both "sarcastic" and "other" in tweets. We supplement our quantitative experiments with an in-depth characterization of the linguistic variation in RQs. inproceedings oraby-EtAl:2017:W17-55 Finding Structure in Figurative Language: Metaphor Detection with Topic-based Frames HyejuJang KeithMaki EduardHovy CarolynRose Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 320–330 http://aclweb.org/anthology/W17-5538 In this paper, we present a novel and highly effective method for induction and application of metaphor frame templates as a step toward detecting metaphor in extended discourse. We infer implicit facets of a given metaphor frame using a semi-supervised bootstrapping approach on an unlabeled corpus. Our model applies this frame facet information to metaphor detection, and achieves the state-of-the-art performance on a social media dataset when building upon other proven features in a nonlinear machine learning model. In addition, we illustrate the mechanism through which the frame and topic information enable the more accurate metaphor detection. inproceedings jang-EtAl:2017:W17-55 Using Reinforcement Learning to Model Incrementality in a Fast-Paced Dialogue Game RameshManuvinakurike DavidDeVault KallirroiGeorgila Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 331–341 http://aclweb.org/anthology/W17-5539 We apply Reinforcement Learning (RL) to the problem of incremental dialogue policy learning in the context of a fast-paced dialogue game. We compare the policy learned by RL with a high-performance baseline policy which has been shown to perform very efficiently (nearly as well as humans) in this dialogue game. The RL policy outperforms the baseline policy in offline simulations (based on real user data). We provide a detailed comparison of the RL policy and the baseline policy, including information about how much effort and time it took to develop each one of them. We also highlight the cases where the RL policy performs better, and show that understanding the RL policy can provide valuable insights which can inform the creation of an even better rule-based policy. inproceedings manuvinakurike-devault-georgila:2017:W17-55 Inferring Narrative Causality between Event Pairs in Films ZhichaoHu MarilynWalker Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 342–351 http://aclweb.org/anthology/W17-5540 W17-5540.Poster.pdf To understand narrative, humans draw inferences about the underlying relations between narrative events. Cognitive theories of narrative understanding define these inferences as four different types of causality, that include pairs of events A, B where A physically causes B (X drop, X break), to pairs of events where A causes emotional state B (Y saw X, Y felt fear). Previous work on learning narrative relations from text has either focused on "strict" physical causality, or has been vague about what relation is being learned. This paper learns pairs of causal events from a corpus of film scene descriptions which are action rich and tend to be told in chronological order. We show that event pairs induced using our methods are of high quality and are judged to have a stronger causal relation than event pairs from Rel-Grams. inproceedings hu-walker:2017:W17-55 Lessons in Dialogue System Deployment AntonLeuski RonArtstein Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 352–355 http://aclweb.org/anthology/W17-5541 We analyze deployment of an interactive dialogue system in an environment where deep technical expertise might not be readily available. The initial version was created using a collection of research tools. We summarize a number of challenges with its deployment at two museums and describe a new system that simplifies the installation and user interface; reduces reliance on 3rd-party software; and provides a robust data collection mechanism. inproceedings leuski-artstein:2017:W17-55 Information Navigation System with Discovering User Interests KoichiroYoshino YuSuzuki SatoshiNakamura Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 356–359 http://aclweb.org/anthology/W17-5542 We demonstrate an information navigation system for sightseeing domains that has a dialogue interface for discovering user interests for tourist activities. The system discovers interests of a user with focus detection on user utterances, and proactively presents related information to the discovered user interest. A partially observable Markov decision process (POMDP)-based dialogue manager, which is extended with user focus states, controls the behavior of the system to provide information with several dialogue acts for providing information. We transferred the belief-update function and the policy of the manager from other system trained on a different domain to show the generality of defined dialogue acts for our information navigation system. inproceedings yoshino-suzuki-nakamura:2017:W17-55 Modelling Protagonist Goals and Desires in First-Person Narrative ElaheRahimtoroghi JiaqiWu RuiminWang PranavAnand MarilynWalker Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 360–369 http://aclweb.org/anthology/W17-5543 W17-5543.Presentation.pdf Many genres of natural language text are narratively structured, a testament to our predilection for organizing our experiences as narratives. There is broad consensus that understanding a narrative requires identifying and tracking the goals and desires of the characters and their narrative outcomes. However, to date, there has been limited work on computational models for this problem. We introduce a new dataset, DesireDB, which includes gold-standard labels for identifying statements of desire, textual evidence for desire fulfillment, and annotations for whether the stated desire is fulfilled given the evidence in the narrative context. We report experiments on tracking desire fulfillment using different methods, and show that LSTM Skip-Thought model achieves F-measure of 0.7 on our corpus. inproceedings rahimtoroghi-EtAl:2017:W17-55 SHIHbot: A Facebook chatbot for Sexual Health Information on HIV/AIDS JacquelineBrixey RensHoegen WeiLan JoshuaRusow KaranSingla XusenYin RonArtstein AntonLeuski Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 370–373 http://aclweb.org/anthology/W17-5544 We present the implementation of an autonomous chatbot, SHIHbot, deployed on Facebook, which answers a wide variety of sexual health questions on HIV/AIDS. The chatbot's response database is com-piled from professional medical and public health resources in order to provide reliable information to users. The system's backend is NPCEditor, a response selection platform trained on linked questions and answers; to our knowledge this is the first retrieval-based chatbot deployed on a large public social network. inproceedings brixey-EtAl:2017:W17-55 How Would You Say It? Eliciting Lexically Diverse Dialogue for Supervised Semantic Parsing AbhilashaRavichander ThomasManzini MatthiasGrabmair GrahamNeubig JonathanFrancis EricNyberg Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 374–383 http://aclweb.org/anthology/W17-5545 Building dialogue interfaces for real-world scenarios often entails training semantic parsers starting from zero examples. How can we build datasets that better capture the variety of ways users might phrase their queries, and what queries are actually realistic? ėwcite{Wang2015BuildingAS} proposed a method to build semantic parsing datasets by generating canonical utterances using a grammar and having crowdworkers paraphrase them into natural wording. A limitation of this approach is that it induces bias towards using similar language as the canonical utterances. In this work, we present a methodology that elicits meaningful and lexically diverse queries from users for semantic parsing tasks. Starting from a seed lexicon and a generative grammar, we pair logical forms with mixed text-image representations and ask crowdworkers to paraphrase and confirm the plausibility of the queries that they generated. We use this method to build a semantic parsing dataset from scratch for a dialog agent in a smart-home simulation. We find evidence that this dataset, which we have named SmartHome, is demonstrably more lexically diverse and difficult to parse than existing domain-specific semantic parsing datasets. inproceedings ravichander-EtAl:2017:W17-55 Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models PierreLison SergeBibauw Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 384–394 http://aclweb.org/anthology/W17-5546 Neural conversational models require substantial amounts of dialogue data to estimate their parameters and are therefore usually learned on large corpora such as chat forums or movie subtitles. These corpora are, however, often challenging to work with, notably due to their frequent lack of turn segmentation and the presence of multiple references external to the dialogue itself. This paper shows that these challenges can be mitigated by adding a weighting model into the architecture. The weighting model, which is itself estimated from dialogue data, associates each training example to a numerical weight that reflects its intrinsic quality for dialogue modelling. At training time, these sample weights are included into the empirical loss to be minimised. Evaluation results on retrieval-based models trained on movie and TV subtitles demonstrate that the inclusion of such a weighting model improves the model performance on unsupervised metrics. inproceedings lison-bibauw:2017:W17-55 A data-driven model of explanations for a chatbot that helps to practice conversation in a foreign language SviatlanaHöhn Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue August 2017

Saarbrücken, Germany

Association for Computational Linguistics 395–405 http://aclweb.org/anthology/W17-5547 This article describes a model of other-initiated self-repair for a chatbot that helps to practice conversation in a foreign language. The model was developed using a corpus of instant messaging conversations between German native and non-native speakers. Conversation Analysis helped to create computational models from a small number of examples. The model has been validated in an AIML-based chatbot. Unlike typical retrieval-based dialogue systems, the explanations are generated at run-time from a linguistic database. inproceedings hohn:2017:W17-55