PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields

Weijing Huang


Abstract
Recent emerged phrase-level topic models are able to provide topics of phrases, which are easy to read for humans. But these models are lack of the ability to capture the correlation structure among the discovered numerous topics. We propose a novel topic model PhraseCTM and a two-stage method to find out the correlated topics at phrase level. In the first stage, we train PhraseCTM, which models the generation of words and phrases simultaneously by linking the phrases and component words within Markov Random Fields when they are semantically coherent. In the second stage, we generate the correlation of topics from PhraseCTM. We evaluate our method by a quantitative experiment and a human study, showing the correlated topic modeling on phrases is a good and practical way to interpret the underlying themes of a corpus.
Anthology ID:
P18-2083
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
521–526
Language:
URL:
https://aclanthology.org/P18-2083
DOI:
10.18653/v1/P18-2083
Bibkey:
Cite (ACL):
Weijing Huang. 2018. PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 521–526, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields (Huang, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-2083.pdf
Poster:
 P18-2083.Poster.pdf