Topic Modeling With Topological Data Analysis

Ciarán Byrne, Danijela Horak, Karo Moilanen, Amandla Mabona


Abstract
Recent unsupervised topic modelling ap-proaches that use clustering techniques onword, token or document embeddings can ex-tract coherent topics. A common limitationof such approaches is that they reveal noth-ing about inter-topic relationships which areessential in many real-world application do-mains. We present an unsupervised topic mod-elling method which harnesses TopologicalData Analysis (TDA) to extract a topologicalskeleton of the manifold upon which contextu-alised word embeddings lie. We demonstratethat our approach, which performs on par witha recent baseline, is able to construct a networkof coherent topics together with meaningfulrelationships between them.
Anthology ID:
2022.emnlp-main.792
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11514–11533
Language:
URL:
https://aclanthology.org/2022.emnlp-main.792
DOI:
10.18653/v1/2022.emnlp-main.792
Bibkey:
Cite (ACL):
Ciarán Byrne, Danijela Horak, Karo Moilanen, and Amandla Mabona. 2022. Topic Modeling With Topological Data Analysis. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11514–11533, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Topic Modeling With Topological Data Analysis (Byrne et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.792.pdf