How Would You Say It? Eliciting Lexically Diverse Dialogue for Supervised Semantic Parsing

Abhilasha Ravichander, Thomas Manzini, Matthias Grabmair, Graham Neubig, Jonathan Francis, Eric Nyberg


Abstract
Building dialogue interfaces for real-world scenarios often entails training semantic parsers starting from zero examples. How can we build datasets that better capture the variety of ways users might phrase their queries, and what queries are actually realistic? Wang et al. (2015) proposed a method to build semantic parsing datasets by generating canonical utterances using a grammar and having crowdworkers paraphrase them into natural wording. A limitation of this approach is that it induces bias towards using similar language as the canonical utterances. In this work, we present a methodology that elicits meaningful and lexically diverse queries from users for semantic parsing tasks. Starting from a seed lexicon and a generative grammar, we pair logical forms with mixed text-image representations and ask crowdworkers to paraphrase and confirm the plausibility of the queries that they generated. We use this method to build a semantic parsing dataset from scratch for a dialog agent in a smart-home simulation. We find evidence that this dataset, which we have named SmartHome, is demonstrably more lexically diverse and difficult to parse than existing domain-specific semantic parsing datasets.
Anthology ID:
W17-5545
Volume:
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Month:
August
Year:
2017
Address:
Saarbrücken, Germany
Venues:
SIGDIAL | WS
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
374–383
Language:
URL:
https://aclanthology.org/W17-5545
DOI:
10.18653/v1/W17-5545
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/W17-5545.pdf