Topical Coherence in LDA-based Models through Induced Segmentation

Hesam Amoualian, Wei Lu, Eric Gaussier, Georgios Balikas, Massih R. Amini, Marianne Clausel


Abstract
This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.
Anthology ID:
P17-1165
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1799–1809
Language:
URL:
https://aclanthology.org/P17-1165
DOI:
10.18653/v1/P17-1165
Bibkey:
Cite (ACL):
Hesam Amoualian, Wei Lu, Eric Gaussier, Georgios Balikas, Massih R. Amini, and Marianne Clausel. 2017. Topical Coherence in LDA-based Models through Induced Segmentation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1799–1809, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Topical Coherence in LDA-based Models through Induced Segmentation (Amoualian et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1165.pdf
Code
 balikasg/topicModelling