Maxine Eskenazi


2021

pdf bib
GenSF: Simultaneous Adaptation of Generative Pre-trained Models and Slot Filling
Shikib Mehri | Maxine Eskenazi
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

In transfer learning, it is imperative to achieve strong alignment between a pre-trained model and a downstream task. Prior work has done this by proposing task-specific pre-training objectives, which sacrifices the inherent scalability of the transfer learning paradigm. We instead achieve strong alignment by simultaneously modifying both the pre-trained model and the formulation of the downstream task, which is more efficient and preserves the scalability of transfer learning. We present GenSF (Generative Slot Filling), which leverages a generative pre-trained open-domain dialog model for slot filling. GenSF (1) adapts the pre-trained model by incorporating inductive biases about the task and (2) adapts the downstream task by reformulating slot filling to better leverage the pre-trained model’s capabilities. GenSF achieves state-of-the-art results on two slot filling datasets with strong gains in few-shot and zero-shot settings. We achieve a 9 F1 score improvement in zero-shot slot filling. This highlights the value of strong alignment between the pre-trained model and the downstream task.

pdf bib
Schema-Guided Paradigm for Zero-Shot Dialog
Shikib Mehri | Maxine Eskenazi
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Developing mechanisms that flexibly adapt dialog systems to unseen tasks and domains is a major challenge in dialog research. Neural models implicitly memorize task-specific dialog policies from the training data. We posit that this implicit memorization has precluded zero-shot transfer learning. To this end, we leverage the schema-guided paradigm, wherein the task-specific dialog policy is explicitly provided to the model. We introduce the Schema Attention Model (SAM) and improved schema representations for the STAR corpus. SAM obtains significant improvement in zero-shot settings, with a +22 F1 score improvement over prior work. These results validate the feasibility of zero-shot generalizability in dialog. Ablation experiments are also presented to demonstrate the efficacy of SAM.

2020

pdf bib
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation
Shikib Mehri | Maxine Eskenazi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

pdf bib
“None of the Above”: Measure Uncertainty in Dialog Response Retrieval
Yulan Feng | Shikib Mehri | Maxine Eskenazi | Tiancheng Zhao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper discusses the importance of uncovering uncertainty in end-to-end dialog tasks and presents our experimental results on uncertainty classification on the processed Ubuntu Dialog Corpus. We show that instead of retraining models for this specific purpose, we can capture the original retrieval model’s underlying confidence concerning the best prediction using trivial additional computation.

pdf bib
Unsupervised Evaluation of Interactive Dialog with DialoGPT
Shikib Mehri | Maxine Eskenazi
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research. Standard language generation metrics have been shown to be ineffective for dialog. This paper introduces the FED metric (fine-grained evaluation of dialog), an automatic evaluation metric which uses DialoGPT, without any fine-tuning or supervision. It also introduces the FED dataset which is constructed by annotating a set of human-system and human-human conversations with eighteen fine-grained dialog qualities. The FED metric (1) does not rely on a ground-truth response, (2) does not require training data and (3) measures fine-grained dialog qualities at both the turn and whole dialog levels. FED attains moderate to strong correlation with human judgement at both levels.

2019

pdf bib
Multi-Granularity Representations of Dialog
Shikib Mehri | Maxine Eskenazi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Neural models of dialog rely on generalized latent representations of language. This paper introduces a novel training procedure which explicitly learns multiple representations of language at several levels of granularity. The multi-granularity training algorithm modifies the mechanism by which negative candidate responses are sampled in order to control the granularity of learned latent representations. Strong performance gains are observed on the next utterance retrieval task using both the MultiWOZ dataset and the Ubuntu dialog corpus. Analysis significantly demonstrates that multiple granularities of representation are being learned, and that multi-granularity training facilitates better transfer to downstream tasks.

pdf bib
Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models
Tiancheng Zhao | Kaige Xie | Maxine Eskenazi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge. Common practice has been to use handcrafted dialog acts, or the output vocabulary, e.g. in neural encoder decoders, as the action spaces. Both have their own limitations. This paper proposes a novel latent action framework that treats the action spaces of an end-to-end dialog agent as latent variables and develops unsupervised methods in order to induce its own action space from the data. Comprehensive experiments are conducted examining both continuous and discrete action types and two different optimization methods based on stochastic variational inference. Results show that the proposed latent actions achieve superior empirical performance improvement over previous word-level policy gradient methods on both DealOrNoDeal and MultiWoz dialogs. Our detailed analysis also provides insights about various latent variable approaches for policy learning and can serve as a foundation for developing better latent actions in future research.

pdf bib
BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification
Pedro Mota | Maxine Eskenazi | Luísa Coheur
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We propose BeamSeg, a joint model for segmentation and topic identification of documents from the same domain. The model assumes that lexical cohesion can be observed across documents, meaning that segments describing the same topic use a similar lexical distribution over the vocabulary. The model implements lexical cohesion in an unsupervised Bayesian setting by drawing from the same language model segments with the same topic. Contrary to previous approaches, we assume that language models are not independent, since the vocabulary changes in consecutive segments are expected to be smooth and not abrupt. We achieve this by using a dynamic Dirichlet prior that takes into account data contributions from other topics. BeamSeg also models segment length properties of documents based on modality (textbooks, slides, etc.). The evaluation is carried out in three datasets. In two of them, improvements of up to 4.8% and 7.3% are obtained in the segmentation and topic identifications tasks, indicating that both tasks should be jointly modeled.

pdf bib
Structured Fusion Networks for Dialog
Shikib Mehri | Tejas Srinivasan | Maxine Eskenazi
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Neural dialog models have exhibited strong performance, however their end-to-end nature lacks a representation of the explicit structure of dialog. This results in a loss of generalizability, controllability and a data-hungry nature. Conversely, more traditional dialog systems do have strong models of explicit structure. This paper introduces several approaches for explicitly incorporating structure into neural models of dialog. Structured Fusion Networks first learn neural dialog modules corresponding to the structured components of traditional dialog systems and then incorporate these modules in a higher-level generative model. Structured Fusion Networks obtain strong results on the MultiWOZ dataset, both with and without reinforcement learning. Structured Fusion Networks are shown to have several valuable properties, including better domain generalizability, improved performance in reduced data scenarios and robustness to divergence during reinforcement learning.

pdf bib
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References
Prakhar Gupta | Shikib Mehri | Tiancheng Zhao | Amy Pavel | Maxine Eskenazi | Jeffrey Bigham
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in open-domain dialog. One alternative is to collect human annotations for evaluation, which can be expensive and time consuming. To demonstrate the effectiveness of multi-reference evaluation, we augment the test set of DailyDialog with multiple references. A series of experiments show that the use of multiple references results in improved correlation between several automatic metrics and human judgement for both the quality and the diversity of system output.

pdf bib
Pretraining Methods for Dialog Context Representation Learning
Shikib Mehri | Evgeniia Razumovskaia | Tiancheng Zhao | Maxine Eskenazi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper examines various unsupervised pretraining objectives for learning dialog context representations. Two novel methods of pretraining dialog context encoders are proposed, and a total of four methods are examined. Each pretraining objective is fine-tuned and evaluated on a set of downstream dialog tasks using the MultiWoz dataset and strong performance improvement is observed. Further evaluation shows that our pretraining objectives result in not only better performance, but also better convergence, models that are less data hungry and have better domain generalizability.

2018

pdf bib
Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation
Tiancheng Zhao | Kyusong Lee | Maxine Eskenazi
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon variational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover interpretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation.

pdf bib
Zero-Shot Dialog Generation with Cross-Domain Latent Actions
Tiancheng Zhao | Maxine Eskenazi
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

This paper introduces zero-shot dialog generation (ZSDG), as a step towards neural dialog systems that can instantly generalize to new situations with minimum data. ZSDG requires an end-to-end generative dialog system to generalize to a new domain for which only a domain description is provided and no training dialogs are available. Then a novel learning framework, Action Matching, is proposed. This algorithm can learn a cross-domain embedding space that models the semantics of dialog responses which in turn, enables a neural dialog generation model to generalize to new domains. We evaluate our methods on two datasets, a new synthetic dialog dataset, and an existing human-human multi-domain dialog dataset. Experimental results show that our method is able to achieve superior performance in learning dialog models that can rapidly adapt their behavior to new domains and suggests promising future research.

pdf bib
DialCrowd: A toolkit for easy dialog system assessment
Kyusong Lee | Tiancheng Zhao | Alan W. Black | Maxine Eskenazi
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

When creating a dialog system, developers need to test each version to ensure that it is performing correctly. Recently the trend has been to test on large datasets or to ask many users to try out a system. Crowdsourcing has solved the issue of finding users, but it presents new challenges such as how to use a crowdsourcing platform and what type of test is appropriate. DialCrowd has been designed to make system assessment easier and to ensure the quality of the result. This paper describes DialCrowd, what specific needs it fulfills and how it works. It then relates a test of DialCrowd by a group of dialog system developer.

2017

pdf bib
Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
Tiancheng Zhao | Allen Lu | Kyusong Lee | Maxine Eskenazi
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Generative encoder-decoder models offer great promise in developing domain-general dialog systems. However, they have mainly been applied to open-domain conversations. This paper presents a practical and novel framework for building task-oriented dialog systems based on encoder-decoder models. This framework enables encoder-decoder models to accomplish slot-value independent decision-making and interact with external databases. Moreover, this paper shows the flexibility of the proposed method by interleaving chatting capability with a slot-filling system for better out-of-domain recovery. The models were trained on both real-user data from a bus information system and human-human chat data. Results show that the proposed framework achieves good performance in both offline evaluation metrics and in task success rate with human users.

pdf bib
DialPort, Gone Live: An Update After A Year of Development
Kyusong Lee | Tiancheng Zhao | Yulun Du | Edward Cai | Allen Lu | Eli Pincus | David Traum | Stefan Ultes | Lina M. Rojas-Barahona | Milica Gasic | Steve Young | Maxine Eskenazi
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

DialPort collects user data for connected spoken dialog systems. At present six systems are linked to a central portal that directs the user to the applicable system and suggests systems that the user may be interested in. User data has started to flow into the system.

pdf bib
Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders
Tiancheng Zhao | Ran Zhao | Maxine Eskenazi
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder from word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that capture the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved through introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence of discourse-level decision-making.

2016

pdf bib
Predicting the Relative Difficulty of Single Sentences With and Without Surrounding Context
Elliot Schumacher | Maxine Eskenazi | Gwen Frishkoff | Kevyn Collins-Thompson
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
metaTED: a Corpus of Metadiscourse for Spoken Language
Rui Correia | Nuno Mamede | Jorge Baptista | Maxine Eskenazi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes metaTED ― a freely available corpus of metadiscursive acts in spoken language collected via crowdsourcing. Metadiscursive acts were annotated on a set of 180 randomly chosen TED talks in English, spanning over different speakers and topics. The taxonomy used for annotation is composed of 16 categories, adapted from Adel(2010). This adaptation takes into account both the material to annotate and the setting in which the annotation task is performed. The crowdsourcing setup is described, including considerations regarding training and quality control. The collected data is evaluated in terms of quantity of occurrences, inter-annotator agreement, and annotation related measures (such as average time on task and self-reported confidence). Results show different levels of agreement among metadiscourse acts (α ∈ [0.15; 0.49]). To further assess the collected material, a subset of the annotations was submitted to expert appreciation, who validated which of the marked occurrences truly correspond to instances of the metadiscursive act at hand. Similarly to what happened with the crowd, experts revealed different levels of agreement between categories (α ∈ [0.18; 0.72]). The paper concludes with a discussion on the applicability of metaTED with respect to each of the 16 categories of metadiscourse.

pdf bib
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
Tiancheng Zhao | Maxine Eskenazi
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
DialPort: A General Framework for Aggregating Dialog Systems
Tiancheng Zhao | Kyusong Lee | Maxine Eskenazi
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

2015

pdf bib
Lexical Level Distribution of Metadiscourse in Spoken Language
Rui Correia | Maxine Eskenazi | Nuno Mamede
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

pdf bib
An Incremental Turn-Taking Model with Active System Barge-in for Spoken Dialog Systems
Tiancheng Zhao | Alan W Black | Maxine Eskenazi
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
The Real Challenge 2014: Progress and Prospects
Maxine Eskenazi | Alan W Black | Sungjin Lee | David Traum
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
An Open Corpus of Everyday Documents for Simplification Tasks
David Pellow | Maxine Eskenazi
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

pdf bib
Cross-Lingual Information to the Rescue in Keyword Extraction
Chung-Chi Huang | Maxine Eskenazi | Jaime Carbonell | Lun-Wei Ku | Ping-Che Yang
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2013

pdf bib
Tools for non-native readers: the case for translation and simplification
Maxine Eskenazi | Yibin Lin | Oscar Saz
Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility

pdf bib
Proceedings of the SIGDIAL 2013 Conference
Maxine Eskenazi | Michael Strube | Barbara Di Eugenio | Jason D. Williams
Proceedings of the SIGDIAL 2013 Conference

pdf bib
Recipe For Building Robust Spoken Dialog State Trackers: Dialog State Tracking Challenge System Description
Sungjin Lee | Maxine Eskenazi
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
An Unsupervised Approach to User Simulation: Toward Self-Improving Dialog Systems
Sungjin Lee | Maxine Eskenazi
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Exploiting Machine-Transcribed Dialog Corpus to Improve Multiple Dialog States Tracking Methods
Sungjin Lee | Maxine Eskenazi
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)
Maxine Eskenazi | Alan Black | David Traum
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

pdf bib
Future Directions in Spoken Dialog Systems: A Community of Possibilities
Alan W. Black | Maxine Eskenazi
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

2011

pdf bib
Effect of Word Complexity on L2 Vocabulary Learning
Kevin Dela Rosa | Maxine Eskenazi
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Predicting Change in Student Motivation by Measuring Cohesion between Tutor and Student
Arthur Ward | Diane Litman | Maxine Eskenazi
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results
Alan W Black | Susanne Burger | Alistair Conkie | Helen Hastie | Simon Keizer | Oliver Lemon | Nicolas Merigaud | Gabriel Parent | Gabriel Schubiner | Blaise Thomson | Jason D. Williams | Kai Yu | Steve Young | Maxine Eskenazi
Proceedings of the SIGDIAL 2011 Conference

2010

pdf bib
Clustering dictionary definitions using Amazon Mechanical Turk
Gabriel Parent | Maxine Eskenazi
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Predicting Cloze Task Quality for Vocabulary Training
Adam Skory | Maxine Eskenazi
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

2009

pdf bib
A Finite-State Turn-Taking Model for Spoken Dialog Systems
Antoine Raux | Maxine Eskenazi
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
An Application of Latent Semantic Analysis to Word Sense Discrimination for Words with Related and Unrelated Meanings
Juan Pino | Maxine Eskenazi
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
The Spoken Dialogue Challenge
Alan Black | Maxine Eskenazi
Proceedings of the SIGDIAL 2009 Conference

2008

pdf bib
Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System
Antoine Raux | Maxine Eskenazi
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf bib
An Analysis of Statistical Models and Features for Reading Difficulty Prediction
Michael Heilman | Kevyn Collins-Thompson | Maxine Eskenazi
Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Retrieval of Reading Materials for Vocabulary and Reading Practice
Michael Heilman | Le Zhao | Juan Pino | Maxine Eskenazi
Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Building Practical Spoken Dialog Systems
Antoine Raux | Brian Langner | Alan W Black | Maxine Eskenazi
Tutorial Abstracts of ACL-08: HLT

2007

pdf bib
Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts
Michael Heilman | Kevyn Collins-Thompson | Jamie Callan | Maxine Eskenazi
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Comparing Spoken Dialog Corpora Collected with Recruited Subjects versus Real Users
Hua Ai | Antoine Raux | Dan Bohus | Maxine Eskenazi | Diane Litman
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
Olympus: an open-source framework for conversational spoken language interface research
Dan Bohus | Antoine Raux | Thomas Harris | Maxine Eskenazi | Alexander Rudnicky
Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies

2005

pdf bib
Automatic Question Generation for Vocabulary Assessment
Jonathan Brown | Gwen Frishkoff | Maxine Eskenazi
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Non-Native Users in the Let’s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch
Antoine Raux | Maxine Eskenazi
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004