Intent Detection and Discovery from User Logs via Deep Semi-Supervised Contrastive Clustering

Rajat Kumar, Mayur Patidar, Vaibhav Varshney, Lovekesh Vig, Gautam Shroff


Abstract
Intent Detection is a crucial component of Dialogue Systems wherein the objective is to classify a user utterance into one of multiple pre-defined intents. A pre-requisite for developing an effective intent identifier is a training dataset labeled with all possible user intents. However, even skilled domain experts are often unable to foresee all possible user intents at design time and for practical applications, novel intents may have to be inferred incrementally on-the-fly from user utterances. Therefore, for any real-world dialogue system, the number of intents increases over time and new intents have to be discovered by analyzing the utterances outside the existing set of intents. In this paper, our objective is to i) detect known intent utterances from a large number of unlabeled utterance samples given a few labeled samples and ii) discover new unknown intents from the remaining unlabeled samples. Existing SOTA approaches address this problem via alternate representation learning and clustering wherein pseudo labels are used for updating the representations and clustering is used for generating the pseudo labels. Unlike existing approaches that rely on epoch wise cluster alignment, we propose an end-to-end deep contrastive clustering algorithm that jointly updates model parameters and cluster centers via supervised and self-supervised learning and optimally utilizes both labeled and unlabeled data. Our proposed approach outperforms competitive baselines on five public datasets for both settings: (i) where the number of undiscovered intents are known in advance, and (ii) where the number of intents are estimated by an algorithm. We also propose a human-in-the-loop variant of our approach for practical deployment which does not require an estimate of new intents and outperforms the end-to-end approach.
Anthology ID:
2022.naacl-main.134
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1836–1853
Language:
URL:
https://aclanthology.org/2022.naacl-main.134
DOI:
10.18653/v1/2022.naacl-main.134
Bibkey:
Cite (ACL):
Rajat Kumar, Mayur Patidar, Vaibhav Varshney, Lovekesh Vig, and Gautam Shroff. 2022. Intent Detection and Discovery from User Logs via Deep Semi-Supervised Contrastive Clustering. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1836–1853, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Intent Detection and Discovery from User Logs via Deep Semi-Supervised Contrastive Clustering (Kumar et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.134.pdf
Software:
 2022.naacl-main.134.software.zip
Data
CLINC150DBpediaSNIPS