Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation

Anup Pattnaik; Cijo George; Rishabh Kumar Tripathi; Sasanka Vutla; Jithendra Vepa

doi:10.18653/v1/2024.emnlp-industry.54

Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation

Anup Pattnaik, Cijo George, Rishabh Kumar Tripathi, Sasanka Vutla, Jithendra Vepa

Abstract

In this work, we present an approach that introduces different perspectives or views to improve the quality of hierarchical clustering of interaction drivers in a contact center. Specifically, we present a multi-stage approach that introduces LLM-guided multi-view cluster representation that significantly improves the quality of generated clusters. Our approach improves average Silhouette Score by upto 70% and Human Preference Scores by 36.7% for top-level clusters compared to standard agglomerative clustering for the given business use-case. We also present how the proposed approach can be adapted to cater to a standard non-hierarchical clustering use-cases where it achieves state-of-the-art performance on public datasets based on NMI and ACC scores, with minimal number of LLM queries compared to the current state-of-the-art approaches. Moreover, we apply our technique to generate two new labeled datasets for hierarchical clustering. We open-source these labeled datasets, validated and corrected by domain experts, for the benefit of the research community.

Anthology ID:: 2024.emnlp-industry.54
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 719–727
Language:
URL:: https://aclanthology.org/2024.emnlp-industry.54/
DOI:: 10.18653/v1/2024.emnlp-industry.54
Bibkey:
Cite (ACL):: Anup Pattnaik, Cijo George, Rishabh Kumar Tripathi, Sasanka Vutla, and Jithendra Vepa. 2024. Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 719–727, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation (Pattnaik et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-industry.54.pdf
Data:: 2024.emnlp-industry.54.data.zip
Poster:: 2024.emnlp-industry.54.poster.pdf

PDF Cite Search Data Poster Fix data