LLMEdgeRefine: Enhancing Text Clustering with LLM-Based Boundary Point Refinement

Zijin Feng, Luyang Lin, Lingzhi Wang, Hong Cheng, Kam-Fai Wong


Abstract
Text clustering is a fundamental task in natural language processing with numerous applications. However, traditional clustering methods often struggle with domain-specific fine-tuning and the presence of outliers. To address these challenges, we introduce LLMEdgeRefine, an iterative clustering method enhanced by large language models (LLMs), focusing on edge points refinement. LLMEdgeRefine enhances current clustering methods by creating super-points to mitigate outliers and iteratively refining clusters using LLMs for improved semantic coherence. Our method demonstrates superior performance across multiple datasets, outperforming state-of-the-art techniques, and offering robustness, adaptability, and cost-efficiency for diverse text clustering applications.
Anthology ID:
2024.emnlp-main.1025
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18455–18462
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1025
DOI:
Bibkey:
Cite (ACL):
Zijin Feng, Luyang Lin, Lingzhi Wang, Hong Cheng, and Kam-Fai Wong. 2024. LLMEdgeRefine: Enhancing Text Clustering with LLM-Based Boundary Point Refinement. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18455–18462, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
LLMEdgeRefine: Enhancing Text Clustering with LLM-Based Boundary Point Refinement (Feng et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1025.pdf