Topic Modeling for Short Texts with Large Language Models

Tomoki Doi, Masaru Isonuma, Hitomi Yanaka


Abstract
As conventional topic models rely on word co-occurrence to infer latent topics, topic modeling for short texts has been a long-standing challenge. Large Language Models (LLMs) can potentially overcome this challenge by contextually learning the meanings of words via pretraining. In this paper, we study two approaches to using LLMs for topic modeling: parallel prompting and sequential prompting. Input length limitations prevent LLMs from processing many texts at once. However, an arbitrary number of texts can be handled by LLMs by splitting the texts into smaller subsets and processing them in parallel or sequentially. Our experimental results demonstrate that our methods can identify more coherent topics than existing ones while maintaining the diversity of the induced topics. Furthermore, we found that the inferred topics cover the input texts to some extent, while hallucinated topics are hardly generated.
Anthology ID:
2024.acl-srw.3
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Xiyan Fu, Eve Fleisig
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21–33
Language:
URL:
https://aclanthology.org/2024.acl-srw.3
DOI:
Bibkey:
Cite (ACL):
Tomoki Doi, Masaru Isonuma, and Hitomi Yanaka. 2024. Topic Modeling for Short Texts with Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 21–33, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Topic Modeling for Short Texts with Large Language Models (Doi et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-srw.3.pdf