Marcus Zibrowius


2022

pdf bib
Dialogue Term Extraction using Transfer Learning and Topological Data Analysis
Renato Vukovic | Michael Heck | Benjamin Ruppik | Carel van Niekerk | Marcus Zibrowius | Milica Gasic
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Goal oriented dialogue systems were originally designed as a natural language interface to a fixed data-set of entities that users might inquire about, further described by domain, slots and values. As we move towards adaptable dialogue systems where knowledge about domains, slots and values may change, there is an increasing need to automatically extract these terms from raw dialogues or related non-dialogue data on a large scale. In this paper, we take an important step in this direction by exploring different features that can enable systems to discover realisations of domains, slots and values in dialogues in a purely data-driven fashion. The features that we examine stem from word embeddings, language modelling features, as well as topological features of the word embedding space. To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set. Then, we apply this model to a different corpus, the Schema-guided dialogue data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings. We also demonstrate that each of the features is responsible for discovering different kinds of content. We believe our results warrant further research towards ontology induction, and continued harnessing of topological data analysis for dialogue and natural language processing research.

2020

pdf bib
Topology of Word Embeddings: Singularities Reflect Polysemy
Alexander Jakubowski | Milica Gasic | Marcus Zibrowius
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

The manifold hypothesis suggests that word vectors live on a submanifold within their ambient vector space. We argue that we should, more accurately, expect them to live on a <i>pinched</i> manifold: a singular quotient of a manifold obtained by identifying some of its points. The identified, singular points correspond to polysemous words, i.e. words with multiple meanings. Our point of view suggests that monosemous and polysemous words can be distinguished based on the topology of their neighbourhoods. We present two kinds of empirical evidence to support this point of view: (1) We introduce a topological measure of polysemy based on persistent homology that correlates well with the actual number of meanings of a word. (2) We propose a simple, topologically motivated solution to the SemEval-2010 task on <i>Word Sense Induction & Disambiguation</i> that produces competitive results.