CoCoID: Learning Contrastive Representations and Compact Clusters for Semi-Supervised Intent Discovery

Qian Cao, Deyi Xiong, Qinlong Wang, Xia Peng


Abstract
Intent discovery is to mine new intents from user utterances, which are not present in the set of manually predefined intents. Previous approaches to intent discovery usually automatically cluster novel intents with prior knowledge from intent-labeled data in a semi-supervised way. In this paper, we focus on the discriminative user utterance representation learning and the compactness of the learned intent clusters. We propose a novel semi-supervised intent discovery framework CoCoID with two essential components: contrastive user utterance representation learning and intra-cluster knowledge distillation. The former attempts to detect similar and dissimilar intents from a minibatch-wise perspective. The latter regularizes the predictive distribution of the model over samples in a cluster-wise way. We conduct experiments on both real-life challenging datasets (i.e., CLINC and BANKING) that are curated to emulate the true environment of commercial/production systems and traditional datasets (i.e., StackOverflow and DBPedia) to evaluate the proposed CoCoID. Experiment results demonstrate that our model substantially outperforms state-of-the-art intent discovery models (12 baselines) by over 1.4 ACC and ARI points and 1.1 NMI points across the four datasets. Further analyses suggest that CoCoID is able to learn contrastive representations and compact clusters for intent discovery.
Anthology ID:
2022.emnlp-industry.23
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
226–236
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.23
DOI:
10.18653/v1/2022.emnlp-industry.23
Bibkey:
Cite (ACL):
Qian Cao, Deyi Xiong, Qinlong Wang, and Xia Peng. 2022. CoCoID: Learning Contrastive Representations and Compact Clusters for Semi-Supervised Intent Discovery. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 226–236, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
CoCoID: Learning Contrastive Representations and Compact Clusters for Semi-Supervised Intent Discovery (Cao et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-industry.23.pdf