Sasanka Vutla


2024

pdf bib
Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation
Anup Pattnaik | Cijo George | Rishabh Kumar Tripathi | Sasanka Vutla | Jithendra Vepa
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

In this work, we present an approach that introduces different perspectives or views to improve the quality of hierarchical clustering of interaction drivers in a contact center. Specifically, we present a multi-stage approach that introduces LLM-guided multi-view cluster representation that significantly improves the quality of generated clusters. Our approach improves average Silhouette Score by upto 70% and Human Preference Scores by 36.7% for top-level clusters compared to standard agglomerative clustering for the given business use-case. We also present how the proposed approach can be adapted to cater to a standard non-hierarchical clustering use-cases where it achieves state-of-the-art performance on public datasets based on NMI and ACC scores, with minimal number of LLM queries compared to the current state-of-the-art approaches. Moreover, we apply our technique to generate two new labeled datasets for hierarchical clustering. We open-source these labeled datasets, validated and corrected by domain experts, for the benefit of the research community.