Data Query Language and Corpus Tools for Slot-Filling and Intent Classification Data

Stefan Larson, Eric Guldan, Kevin Leach


Abstract
Typical machine learning approaches to developing task-oriented dialog systems require the collection and management of large amounts of training data, especially for the tasks of intent classification and slot-filling. Managing this data can be cumbersome without dedicated tools to help the dialog system designer understand the nature of the data. This paper presents a toolkit for analyzing slot-filling and intent classification corpora. We present a toolkit that includes (1) a new lightweight and readable data and file format for intent classification and slot-filling corpora, (2) a new query language for searching intent classification and slot-filling corpora, and (3) tools for understanding the structure and makeup for such corpora. We apply our toolkit to several well-known NLU datasets, and demonstrate that our toolkit can be used to uncover interesting and surprising insights. By releasing our toolkit to the research community, we hope to enable others to develop more robust and intelligent slot-filling and intent classification models.
Anthology ID:
2020.lrec-1.873
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7060–7068
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.873
DOI:
Bibkey:
Cite (ACL):
Stefan Larson, Eric Guldan, and Kevin Leach. 2020. Data Query Language and Corpus Tools for Slot-Filling and Intent Classification Data. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7060–7068, Marseille, France. European Language Resources Association.
Cite (Informal):
Data Query Language and Corpus Tools for Slot-Filling and Intent Classification Data (Larson et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.873.pdf