Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

Xiaobao Wu, Anh Tuan Luu, Xinshuai Dong


Abstract
To overcome the data sparsity issue in short text topic modeling, existing methods commonly rely on data augmentation or the data characteristic of short texts to introduce more word co-occurrence information. However, most of them do not make full use of the augmented data or the data characteristic: they insufficiently learn the relations among samples in data, leading to dissimilar topic distributions of semantically similar text pairs. To better address data sparsity, in this paper we propose a novel short text topic modeling framework, Topic-Semantic Contrastive Topic Model (TSCTM). To sufficiently model the relations among samples, we employ a new contrastive learning method with efficient positive and negative sampling strategies based on topic semantics. This contrastive learning method refines the representations, enriches the learning signals, and thus mitigates the sparsity issue. Extensive experimental results show that our TSCTM outperforms state-of-the-art baselines regardless of the data augmentation availability, producing high-quality topics and topic distributions.
Anthology ID:
2022.emnlp-main.176
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2748–2760
Language:
URL:
https://aclanthology.org/2022.emnlp-main.176
DOI:
10.18653/v1/2022.emnlp-main.176
Bibkey:
Cite (ACL):
Xiaobao Wu, Anh Tuan Luu, and Xinshuai Dong. 2022. Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2748–2760, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning (Wu et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.176.pdf