Dynamic Topic Modeling by Clustering Embeddings from Pretrained Language Models: A Research Proposal

Anton Eklund, Mona Forsman, Frank Drewes


Abstract
A new trend in topic modeling research is to do Neural Topic Modeling by Clustering document Embeddings (NTM-CE) created with a pretrained language model. Studies have evaluated static NTM-CE models and found them performing comparably to, or even better than other topic models. An important extension of static topic modeling is making the models dynamic, allowing the study of topic evolution over time, as well as detecting emerging and disappearing topics. In this research proposal, we present two research questions to understand dynamic topic modeling with NTM-CE theoretically and practically. To answer these, we propose four phases with the aim of establishing evaluation methods for dynamic topic modeling, finding NTM-CE-specific properties, and creating a framework for dynamic NTM-CE. For evaluation, we propose to use both quantitative measurements of coherence and human evaluation supported by our recently developed tool.
Anthology ID:
2022.aacl-srw.12
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop
Month:
November
Year:
2022
Address:
Online
Editors:
Yan Hanqi, Yang Zonghan, Sebastian Ruder, Wan Xiaojun
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
84–91
Language:
URL:
https://aclanthology.org/2022.aacl-srw.12
DOI:
Bibkey:
Cite (ACL):
Anton Eklund, Mona Forsman, and Frank Drewes. 2022. Dynamic Topic Modeling by Clustering Embeddings from Pretrained Language Models: A Research Proposal. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 84–91, Online. Association for Computational Linguistics.
Cite (Informal):
Dynamic Topic Modeling by Clustering Embeddings from Pretrained Language Models: A Research Proposal (Eklund et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-srw.12.pdf