Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning

Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, Lintao Zhang


Abstract
The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1%), and provide reasonable and instructive slot labeling results.
Anthology ID:
D18-1072
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
684–689
Language:
URL:
https://aclanthology.org/D18-1072
DOI:
10.18653/v1/D18-1072
Bibkey:
Cite (ACL):
Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, and Lintao Zhang. 2018. Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 684–689, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning (Shi et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1072.pdf