Modeling Text using the Continuous Space Topic Model with Pre-Trained Word Embeddings

Seiichi Inoue, Taichi Aida, Mamoru Komachi, Manabu Asai


Abstract
In this study, we propose a model that extends the continuous space topic model (CSTM), which flexibly controls word probability in a document, using pre-trained word embeddings. To develop the proposed model, we pre-train word embeddings, which capture the semantics of words and plug them into the CSTM. Intrinsic experimental results show that the proposed model exhibits a superior performance over the CSTM in terms of perplexity and convergence speed. Furthermore, extrinsic experimental results show that the proposed model is useful for a document classification task when compared with the baseline model. We qualitatively show that the latent coordinates obtained by training the proposed model are better than those of the baseline model.
Anthology ID:
2021.acl-srw.15
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
138–147
Language:
URL:
https://aclanthology.org/2021.acl-srw.15
DOI:
10.18653/v1/2021.acl-srw.15
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-srw.15.pdf