GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Quang Duc Nguyen; Tung Nguyen; Duc Anh Nguyen; Linh Ngo Van; Dinh Viet Sang; Thien Huu Nguyen

doi:10.18653/v1/2025.naacl-long.51

GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Quang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang Dinh, Thien Huu Nguyen

Abstract

Uncovering hidden topics from short texts is challenging for traditional and neural models due to data sparsity, which limits word co-occurrence patterns, and label sparsity, stemming from incomplete reconstruction targets. Although data aggregation offers a potential solution, existing neural topic models often overlook it due to time complexity, poor aggregation quality, and difficulty in inferring topic proportions for individual documents. In this paper, we propose a novel model, **GloCOM** (**Glo**bal **C**lustering C**O**ntexts for Topic **M**odels), which addresses these challenges by constructing aggregated global clustering contexts for short documents, leveraging text embeddings from pre-trained language models. GloCOM can infer both global topic distributions for clustering contexts and local distributions for individual short texts. Additionally, the model incorporates these global contexts to augment the reconstruction loss, effectively handling the label sparsity issue. Extensive experiments on short text datasets show that our approach outperforms other state-of-the-art models in both topic quality and document representations.

Anthology ID:: 2025.naacl-long.51
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1109–1124
Language:
URL:: https://aclanthology.org/2025.naacl-long.51/
DOI:: 10.18653/v1/2025.naacl-long.51
Bibkey:
Cite (ACL):: Quang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang Dinh, and Thien Huu Nguyen. 2025. GloCOM: A Short Text Neural Topic Model via Global Clustering Context. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1109–1124, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: GloCOM: A Short Text Neural Topic Model via Global Clustering Context (Nguyen et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.51.pdf

PDF Cite Search Fix data