Self-Supervised Neural Topic Modeling

Seyed Ali Bahrainian, Martin Jaggi, Carsten Eickhoff


Abstract
Topic models are useful tools for analyzing and interpreting the main underlying themes of large corpora of text. Most topic models rely on word co-occurrence for computing a topic, i.e., a weighted set of words that together represent a high-level semantic concept. In this paper, we propose a new light-weight Self-Supervised Neural Topic Model (SNTM) that learns a rich context by learning a topic representation jointly from three co-occurring words and a document that the triple originates from. Our experimental results indicate that our proposed neural topic model, SNTM, outperforms previously existing topic models in coherence metrics as well as document clustering accuracy. Moreover, apart from the topic coherence and clustering performance, the proposed neural topic model has a number of advantages, namely, being computationally efficient and easy to train.
Anthology ID:
2021.findings-emnlp.284
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3341–3350
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.284
DOI:
10.18653/v1/2021.findings-emnlp.284
Bibkey:
Cite (ACL):
Seyed Ali Bahrainian, Martin Jaggi, and Carsten Eickhoff. 2021. Self-Supervised Neural Topic Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3341–3350, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Self-Supervised Neural Topic Modeling (Bahrainian et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.284.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.284.mp4
Code
 ali-bahrainian/sntm