Dimitri Kartsaklis


pdf bib
A CCG-Based Version of the DisCoCat Framework
Richie Yeung | Dimitri Kartsaklis
Proceedings of the 2021 Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science (SemSpace)

While the DisCoCat model (Coecke et al., 2010) has been proved a valuable tool for studying compositional aspects of language at the level of semantics, its strong dependency on pregroup grammars poses important restrictions: first, it prevents large-scale experimentation due to the absence of a pregroup parser; and second, it limits the expressibility of the model to context-free grammars. In this paper we solve these problems by reformulating DisCoCat as a passage from Combinatory Categorial Grammar (CCG) to a category of semantics. We start by showing that standard categorial grammars can be expressed as a biclosed category, where all rules emerge as currying/uncurrying the identity; we then proceed to model permutation-inducing rules by exploiting the symmetry of the compact closed category encoding the word meaning. We provide a proof of concept for our method, converting “Alice in Wonderland” into DisCoCat form, a corpus that we make available to the community.


pdf bib
Conversational Semantic Parsing for Dialog State Tracking
Jianpeng Cheng | Devang Agrawal | Héctor Martínez Alonso | Shruti Bhargava | Joris Driesen | Federico Flego | Dain Kaplan | Dimitri Kartsaklis | Lin Li | Dhivya Piraviperumal | Jason D. Williams | Hong Yu | Diarmuid Ó Séaghdha | Anders Johannsen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross-domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts. We describe an encoder-decoder framework for DST with hierarchical representations, which leads to ~20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.


pdf bib
Card-660: Cambridge Rare Word Dataset - a Reliable Benchmark for Infrequent Word Representation Models
Mohammad Taher Pilehvar | Dimitri Kartsaklis | Victor Prokhorov | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Rare word representation has recently enjoyed a surge of interest, owing to the crucial role that effective handling of infrequent words can play in accurate semantic understanding. However, there is a paucity of reliable benchmarks for evaluation and comparison of these techniques. We show in this paper that the only existing benchmark (the Stanford Rare Word dataset) suffers from low-confidence annotations and limited vocabulary; hence, it does not constitute a solid comparison framework. In order to fill this evaluation gap, we propose Cambridge Rare word Dataset (Card-660), an expert-annotated word similarity dataset which provides a highly reliable, yet challenging, benchmark for rare word representation techniques. Through a set of experiments we show that even the best mainstream word embeddings, with millions of words in their vocabularies, are unable to achieve performances higher than 0.43 (Pearson correlation) on the dataset, compared to a human-level upperbound of 0.90. We release the dataset and the annotation materials at https://pilehvar.github.io/card-660/.

pdf bib
Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs
Dimitri Kartsaklis | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper addresses the problem of mapping natural language text to knowledge base entities. The mapping process is approached as a composition of a phrase or a sentence into a point in a multi-dimensional entity space obtained from a knowledge graph. The compositional model is an LSTM equipped with a dynamic disambiguation mechanism on the input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities. The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results.


pdf bib
Distributional Inclusion Hypothesis for Tensor-based Composition
Dimitri Kartsaklis | Mehrnoosh Sadrzadeh
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

According to the distributional inclusion hypothesis, entailment between words can be measured via the feature inclusions of their distributional vectors. In recent work, we showed how this hypothesis can be extended from words to phrases and sentences in the setting of compositional distributional semantics. This paper focuses on inclusion properties of tensors; its main contribution is a theoretical and experimental analysis of how feature inclusion works in different concrete models of verb tensors. We present results for relational, Frobenius, projective, and holistic methods and compare them to the simple vector addition, multiplication, min, and max models. The degrees of entailment thus obtained are evaluated via a variety of existing word-based measures, such as Weed’s and Clarke’s, KL-divergence, APinc, balAPinc, and two of our previously proposed metrics at the phrase/sentence level. We perform experiments on three entailment datasets, investigating which version of tensor-based composition achieves the highest performance when combined with the sentence-level measures.

pdf bib
Compositional Distributional Models of Meaning
Mehrnoosh Sadrzadeh | Dimitri Kartsaklis
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

Compositional distributional models of meaning (CDMs) provide a function that produces a vectorial representation for a phrase or a sentence by composing the vectors of its words. Being the natural evolution of the traditional and well-studied distributional models at the word level, CDMs are steadily evolving to a popular and active area of NLP. This COLING 2016 tutorial aims at providing a concise introduction to this emerging field, presenting the different classes of CDMs and the various issues related to them in sufficient detail.


pdf bib
A Frobenius Model of Information Structure in Categorical Compositional Distributional Semantics
Dimitri Kartsaklis | Mehrnoosh Sadrzadeh
Proceedings of the 14th Meeting on the Mathematics of Language (MoL 2015)

pdf bib
Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning
Jianpeng Cheng | Dimitri Kartsaklis
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing


pdf bib
Resolving Lexical Ambiguity in Tensor Regression Models of Meaning
Dimitri Kartsaklis | Nal Kalchbrenner | Mehrnoosh Sadrzadeh
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Evaluating Neural Word Representations in Tensor-Based Compositional Settings
Dmitrijs Milajevs | Dimitri Kartsaklis | Mehrnoosh Sadrzadeh | Matthew Purver
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)


pdf bib
Separating Disambiguation from Composition in Distributional Semantics
Dimitri Kartsaklis | Mehrnoosh Sadrzadeh | Stephen Pulman
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
Dimitri Kartsaklis | Mehrnoosh Sadrzadeh
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


pdf bib
A Unified Sentence Space for Categorical Distributional-Compositional Semantics: Theory and Experiments
Dimitri Kartsaklis | Mehrnoosh Sadrzadeh | Stephen Pulman
Proceedings of COLING 2012: Posters