Intent Discovery for Enterprise Virtual Assistants: Applications of Utterance Embedding and Clustering to Intent Mining

Minhua Chen, Badrinath Jayakumar, Michael Johnston, S. Eman Mahmoodi, Daniel Pressel


Abstract
A key challenge in the creation and refinement of virtual assistants is the ability to mine unlabeled utterance data to discover common intents. We develop an approach to this problem that combines large-scale pre-training and multi-task learning to derive a semantic embedding that can be leveraged to identify clusters of utterances that correspond to unhandled intents. An utterance encoder is first trained with a language modeling objective and subsequently adapted to predict intent labels from a large collection of cross-domain enterprise virtual assistant data using a multi-task cosine softmax loss. Experimental evaluation shows significant advantages for this multi-step pre-training approach, with large gains in downstream clustering accuracy on new applications compared to standard sentence embedding approaches. The approach has been incorporated into an interactive discovery tool that enables visualization and exploration of intents by system analysts and builders.
Anthology ID:
2022.naacl-industry.23
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track
Month:
July
Year:
2022
Address:
Hybrid: Seattle, Washington + Online
Editors:
Anastassia Loukina, Rashmi Gangadharaiah, Bonan Min
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
197–208
Language:
URL:
https://aclanthology.org/2022.naacl-industry.23
DOI:
10.18653/v1/2022.naacl-industry.23
Bibkey:
Cite (ACL):
Minhua Chen, Badrinath Jayakumar, Michael Johnston, S. Eman Mahmoodi, and Daniel Pressel. 2022. Intent Discovery for Enterprise Virtual Assistants: Applications of Utterance Embedding and Clustering to Intent Mining. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, pages 197–208, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):
Intent Discovery for Enterprise Virtual Assistants: Applications of Utterance Embedding and Clustering to Intent Mining (Chen et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-industry.23.pdf
Video:
 https://aclanthology.org/2022.naacl-industry.23.mp4