Ciarán Byrne
Topic Modeling With Topological Data Analysis
Ciarán Byrne
Danijela Horak
Karo Moilanen
Amandla Mabona
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Recent unsupervised topic modelling ap-proaches that use clustering techniques onword, token or document embeddings can ex-tract coherent topics. A common limitationof such approaches is that they reveal noth-ing about inter-topic relationships which areessential in many real-world application do-mains. We present an unsupervised topic mod-elling method which harnesses TopologicalData Analysis (TDA) to extract a topologicalskeleton of the manifold upon which contextu-alised word embeddings lie. We demonstratethat our approach, which performs on par witha recent baseline, is able to construct a networkof coherent topics together with meaningfulrelationships between them.