HS-GC: Holistic Semantic Embedding and Global Contrast for Effective Text Clustering

Chen Yang, Bin Cao, Jing Fan


Abstract
In this paper, we introduce Holistic Semantic Embedding and Global Contrast (HS-GC), an end-to-end approach to learn the instance- and cluster-level representation. Specifically, for instance-level representation learning, we introduce a new loss function that exploits different layers of semantic information in a deep neural network to provide a more holistic semantic text representation. Contrastive learning is applied to these representations to improve the model’s ability to represent text instances. Additionally, for cluster-level representation learning we propose two strategies that utilize global update to construct cluster centers from a global view. The extensive experimental evaluation on five text datasets shows that our method outperforms the state-of-the-art model. Particularly on the SearchSnippets dataset, our method leads by 4.4% in normalized mutual information against the latest comparison method. On the StackOverflow and TREC datasets, our method improves the clustering accuracy of 5.9% and 3.2%, respectively.
Anthology ID:
2024.lrec-main.732
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
8349–8359
Language:
URL:
https://aclanthology.org/2024.lrec-main.732
DOI:
Bibkey:
Cite (ACL):
Chen Yang, Bin Cao, and Jing Fan. 2024. HS-GC: Holistic Semantic Embedding and Global Contrast for Effective Text Clustering. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8349–8359, Torino, Italia. ELRA and ICCL.
Cite (Informal):
HS-GC: Holistic Semantic Embedding and Global Contrast for Effective Text Clustering (Yang et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.732.pdf