Stergios Chatzikyriakidis


2024

pdf bib
OYXOY: A Modern NLP Test Suite for Modern Greek
Konstantinos Kogkalidis | Stergios Chatzikyriakidis | Eirini Giannikouri | Vasiliki Katsouli | Christina Klironomou | Christina Koula | Dimitris Papadakis | Thelka Pasparaki | Erofili Psaltaki | Efthymia Sakellariou | Charikleia Soupiona
Findings of the Association for Computational Linguistics: EACL 2024

This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just one, but rather all possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.

2023

pdf bib
Proceedings of the 4th Natural Logic Meets Machine Learning Workshop
Stergios Chatzikyriakidis | Valeria de Paiva
Proceedings of the 4th Natural Logic Meets Machine Learning Workshop

2022

pdf bib
How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets
Aarne Talman | Marianna Apidianaki | Stergios Chatzikyriakidis | Jörg Tiedemann
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

A central question in natural language understanding (NLU) research is whether high performance demonstrates the models’ strong reasoning capabilities. We present an extensive series of controlled experiments where pre-trained language models are exposed to data that have undergone specific corruption transformations. These involve removing instances of specific word classes and often lead to non-sensical sentences. Our results show that performance remains high on most GLUE tasks when the models are fine-tuned or tested on corrupted data, suggesting that they leverage other cues for prediction even in non-sensical contexts. Our proposed data transformations can be used to assess the extent to which a specific dataset constitutes a proper testbed for evaluating models’ language understanding capabilities.

pdf bib
Pre-trained Models or Feature Engineering: The Case of Dialectal Arabic
Kathrein Abu Kwaik | Stergios Chatzikyriakidis | Simon Dobnik
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

The usage of social media platforms has resulted in the proliferation of work on Arabic Natural Language Processing (ANLP), including the development of resources. There is also an increased interest in processing Arabic dialects and a number of models and algorithms have been utilised for the purpose of Dialectal Arabic Natural Language Processing (DANLP). In this paper, we conduct a comparison study between some of the most well-known and most commonly used methods in NLP in order to test their performance on different corpora and two NLP tasks: Dialect Identification and Sentiment Analysis. In particular, we compare three general classes of models: a) traditional Machine Learning models with features, b) classic Deep Learning architectures (LSTMs) with pre-trained word embeddings and lastly c) different Bidirectional Encoder Representations from Transformers (BERT) models such as (Multilingual-BERT, Ara-BERT, and Twitter-Arabic-BERT). The results of the comparison show that using feature-based classification can still compete with BERT models in these dialectal Arabic contexts. The use of transformer models have the ability to outperform traditional Machine Learning approaches, depending on the type of text they have been trained on, in contrast to classic Deep Learning models like LSTMs which do not perform well on the tasks

pdf bib
Proceedings of the 3rd Natural Logic Meets Machine Learning Workshop (NALOMA III)
Aikaterini-Lida Kalouli | Stergios Chatzikyriakidis
Proceedings of the 3rd Natural Logic Meets Machine Learning Workshop (NALOMA III)

pdf bib
Fine-grained Entailment: Resources for Greek NLI and Precise Entailment
Eirini Amanaki | Jean-Philippe Bernardy | Stergios Chatzikyriakidis | Robin Cooper | Simon Dobnik | Aram Karimi | Adam Ek | Eirini Chrysovalantou Giannikouri | Vasiliki Katsouli | Ilias Kolokousis | Eirini Chrysovalantou Mamatzaki | Dimitrios Papadakis | Olga Petrova | Erofili Psaltaki | Charikleia Soupiona | Effrosyni Skoulataki | Christina Stefanidou
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference

In this paper, we present a number of fine-grained resources for Natural Language Inference (NLI). In particular, we present a number of resources and validation methods for Greek NLI and a resource for precise NLI. First, we extend the Greek version of the FraCaS test suite to include examples where the inference is directly linked to the syntactic/morphological properties of Greek. The new resource contains an additional 428 examples, making it in total a dataset of 774 examples. Expert annotators have been used in order to create the additional resource, while extensive validation of the original Greek version of the FraCaS by non-expert and expert subjects is performed. Next, we continue the work initiated by (CITATION), according to which a subset of the RTE problems have been labeled for missing hypotheses and we present a dataset an order of magnitude larger, annotating the whole SuperGlUE/RTE dataset with missing hypotheses. Lastly, we provide a de-dropped version of the Greek XNLI dataset, where the pronouns that are missing due to the pro-drop nature of the language are inserted. We then run some models to see the effect of that insertion and report the results.

2021

pdf bib
NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance
Aarne Talman | Marianna Apidianaki | Stergios Chatzikyriakidis | Jörg Tiedemann
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences is still unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models’ meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models’ reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

pdf bib
Proceedings of the ESSLLI 2021 Workshop on Computing Semantics with Types, Frames and Related Structures
Stergios Chatzikyriakidis | Rainer Osswald
Proceedings of the ESSLLI 2021 Workshop on Computing Semantics with Types, Frames and Related Structures

pdf bib
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)
Christine Howes | Simon Dobnik | Ellen Breitholtz | Stergios Chatzikyriakidis
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

pdf bib
Can predicate-argument relationships be extracted from UD trees?
Adam Ek | Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

In this paper we investigate the possibility of extracting predicate-argument relations from UD trees (and enhanced UD graphs). Con- cretely, we apply UD parsers on an En- glish question answering/semantic-role label- ing data set (FitzGerald et al., 2018) and check if the annotations reflect the relations in the resulting parse trees, using a small number of rules to extract this information. We find that 79.1% of the argument-predicate pairs can be found in this way, on the basis of Ud- ify (Kondratyuk and Straka, 2019). Error anal- ysis reveals that half of the error cases are at- tributable to shortcomings in the dataset. The remaining errors are mostly due to predicate- argument relations not being extractible algo- rithmically from the UD trees (requiring se- mantic reasoning to be resolved). The parser itself is only responsible for a small portion of errors. Our analysis suggests a number of improvements to the UD annotation schema: we propose to enhance the schema in four ways, in order to capture argument-predicate relations. Additionally, we propose improve- ments regarding data collection for question answering/semantic-role labeling data.

pdf bib
From compositional semantics to Bayesian pragmatics via logical inference
Julian Grove | Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

Formal semantics in the Montagovian tradition provides precise meaning characterisations, but usually without a formal theory of the pragmatics of contextual parameters and their sensitivity to background knowledge. Meanwhile, formal pragmatic theories make explicit predictions about meaning in context, but generally without a well-defined compositional semantics. We propose a combined framework for the semantic and pragmatic interpretation of sentences in the face of probabilistic knowledge. We do so by (1) extending a Montagovian interpretation scheme to generate a distribution over possible meanings, and (2) generating a posterior for this distribution using a variant of the Rational Speech Act (RSA) models, but generalised to arbitrary propositions. These aspects of our framework are tied together by evaluating entailment under probabilistic uncertainty. We apply our model to anaphora resolution and show that it provides expected biases under suitable assumptions about the distributions of lexical and world-knowledge. Further, we observe that the model’s output is robust to variations in its parameters within reasonable ranges.

pdf bib
Applied Temporal Analysis: A Complete Run of the FraCaS Test Suite
Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

In this paper, we propose an implementation of temporal semantics that translates syntax trees to logical formulas, suitable for consumption by the Coq proof assistant. The analysis supports a wide range of phenomena including: temporal references, temporal adverbs, aspectual classes and progressives. The new semantics are built on top of a previous system handling all sections of the FraCaS test suite except the temporal reference section, and we obtain an accuracy of 81 percent overall and 73 percent for the problems explicitly marked as related to temporal reference. To the best of our knowledge, this is the best performance of a logical system on the whole of the FraCaS.

2020

pdf bib
An Arabic Tweets Sentiment Analysis Dataset (ATSAD) using Distant Supervision and Self Training
Kathrein Abu Kwaik | Stergios Chatzikyriakidis | Simon Dobnik | Motaz Saad | Richard Johansson
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

As the number of social media users increases, they express their thoughts, needs, socialise and publish their opinions reviews. For good social media sentiment analysis, good quality resources are needed, and the lack of these resources is particularly evident for languages other than English, in particular Arabic. The available Arabic resources lack of from either the size of the corpus or the quality of the annotation. In this paper, we present an Arabic Sentiment Analysis Corpus collected from Twitter, which contains 36K tweets labelled into positive and negative. We employed distant supervision and self-training approaches into the corpus to annotate it. Besides, we release an 8K tweets manually annotated as a gold standard. We evaluated the corpus intrinsically by comparing it to human classification and pre-trained sentiment analysis models, Moreover, we apply extrinsic evaluation methods exploiting sentiment analysis task and achieve an accuracy of 86%.

pdf bib
Improving the Precision of Natural Textual Entailment Problem Datasets
Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we propose a method to modify natural textual entailment problem datasets so that they better reflect a more precise notion of entailment. We apply this method to a subset of the Recognizing Textual Entailment datasets. We thus obtain a new corpus of entailment problems, which has the following three characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on “real-world” texts (i.e. most of the premises were written for purposes other than testing textual entailment). 3. its size is 150. Broadly, the method that we employ is to make any missing hypotheses explicit using a crowd of experts. We discuss the relevance of our method in improving existing NLI datasets to be more fit for precise reasoning and we argue that this corpus can be the basis a first step towards wide-coverage testing of precise natural-language inference systems.

pdf bib
Proceedings of the Probability and Meaning Conference (PaM 2020)
Christine Howes | Stergios Chatzikyriakidis | Adam Ek | Vidya Somashekarappa
Proceedings of the Probability and Meaning Conference (PaM 2020)

pdf bib
How does Punctuation Affect Neural Models in Natural Language Inference
Adam Ek | Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the Probability and Meaning Conference (PaM 2020)

Natural Language Inference models have reached almost human-level performance but their generalisation capabilities have not been yet fully characterized. In particular, sensitivity to small changes in the data is a current area of investigation. In this paper, we focus on the effect of punctuation on such models. Our findings can be broadly summarized as follows: (1) irrelevant changes in punctuation are correctly ignored by the recent transformer models (BERT) while older RNN-based models were sensitive to them. (2) All models, both transformers and RNN-based models, are incapable of taking into account small relevant changes in the punctuation.

2019

pdf bib
Bayesian Inference Semantics: A Modelling System and A Test Suite
Jean-Philippe Bernardy | Rasmus Blanck | Stergios Chatzikyriakidis | Shalom Lappin | Aleksandre Maskharashvili
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

We present BIS, a Bayesian Inference Semantics, for probabilistic reasoning in natural language. The current system is based on the framework of Bernardy et al. (2018), but departs from it in important respects. BIS makes use of Bayesian learning for inferring a hypothesis from premises. This involves estimating the probability of the hypothesis, given the data supplied by the premises of an argument. It uses a syntactic parser to generate typed syntactic structures that serve as input to a model generation system. Sentences are interpreted compositionally to probabilistic programs, and the corresponding truth values are estimated using sampling methods. BIS successfully deals with various probabilistic semantic phenomena, including frequency adverbs, generalised quantifiers, generics, and vague predicates. It performs well on a number of interesting probabilistic reasoning tasks. It also sustains most classically valid inferences (instantiation, de Morgan’s laws, etc.). To test BIS we have built an experimental test suite with examples of a range of probabilistic and classical inference patterns.

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Short Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Short Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Student Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg | Kathrein Abu Kwaik | Vladislav Maraev
Proceedings of the 13th International Conference on Computational Semantics - Student Papers

pdf bib
Testing the Generalization Power of Neural Network Models across NLI Benchmarks
Aarne Talman | Stergios Chatzikyriakidis
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Neural network models have been very successful in natural language inference, with the best models reaching 90% accuracy in some benchmarks. However, the success of these models turns out to be largely benchmark specific. We show that models trained on a natural language inference dataset drawn from one benchmark fail to perform well in others, even if the notion of inference assumed in these benchmarks is the same or similar. We train six high performing neural network models on different datasets and show that each one of these has problems of generalizing when we replace the original test set with a test set taken from another corpus designed for the same task. In light of these results, we argue that most of the current neural network models are not able to generalize well in the task of natural language inference. We find that using large pre-trained language models helps with transfer learning when the datasets are similar enough. Our results also highlight that the current NLI datasets do not cover the different nuances of inference extensively enough.

pdf bib
Can Modern Standard Arabic Approaches be used for Arabic Dialects? Sentiment Analysis as a Case Study
Kathrein Abu Kwaik | Stergios Chatzikyriakidis | Simon Dobnik
Proceedings of the 3rd Workshop on Arabic Corpus Linguistics

pdf bib
A Wide-Coverage Symbolic Natural Language Inference System
Stergios Chatzikyriakidis | Jean-Philippe Bernardy
Proceedings of the 22nd Nordic Conference on Computational Linguistics

We present a system for Natural Language Inference which uses a dynamic semantics converter from abstract syntax trees to Coq types. It combines the fine-grainedness of a dynamic semantics system with the powerfulness of a state-of-the-art proof assistant, like Coq. We evaluate the system on all sections of the FraCaS test suite, excluding section 6. This is the first system that does a complete run on the anaphora and ellipsis sections of the FraCaS. It has a better overall accuracy than any previous system.

pdf bib
Predicates as Boxes in Bayesian Semantics for Natural Language
Jean-Philippe Bernardy | Rasmus Blanck | Stergios Chatzikyriakidis | Shalom Lappin | Aleksandre Maskharashvili
Proceedings of the 22nd Nordic Conference on Computational Linguistics

In this paper, we present a Bayesian approach to natural language semantics. Our main focus is on the inference task in an environment where judgments require probabilistic reasoning. We treat nouns, verbs, adjectives, etc. as unary predicates, and we model them as boxes in a bounded domain. We apply Bayesian learning to satisfy constraints expressed as premises. In this way we construct a model, by specifying boxes for the predicates. The probability of the hypothesis (the conclusion) is evaluated against the model that incorporates the premises as constraints.

2018

pdf bib
Shami: A Corpus of Levantine Arabic Dialects
Kathrein Abu Kwaik | Motaz Saad | Stergios Chatzikyriakidis | Simon Dobnik
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Compositional Bayesian Semantics for Natural Language
Jean-Philippe Bernardy | Rasmus Blanck | Stergios Chatzikyriakidis | Shalom Lappin
Proceedings of the First International Workshop on Language Cognition and Computational Models

We propose a compositional Bayesian semantics that interprets declarative sentences in a natural language by assigning them probability conditions. These are conditional probabilities that estimate the likelihood that a competent speaker would endorse an assertion, given certain hypotheses. Our semantics is implemented in a functional programming language. It estimates the marginal probability of a sentence through Markov Chain Monte Carlo (MCMC) sampling of objects in vector space models satisfying specified hypotheses. We apply our semantics to examples with several predicates and generalised quantifiers, including higher-order quantifiers. It captures the vagueness of predication (both gradable and non-gradable), without positing a precise boundary for classifier application. We present a basic account of semantic learning based on our semantic system. We compare our proposal to other current theories of probabilistic semantics, and we show that it offers several important advantages over these accounts.

2017

pdf bib
“Deep” Learning : Detecting Metaphoricity in Adjective-Noun Pairs
Yuri Bizzoni | Stergios Chatzikyriakidis | Mehdi Ghanimifard
Proceedings of the Workshop on Stylistic Variation

Metaphor is one of the most studied and widespread figures of speech and an essential element of individual style. In this paper we look at metaphor identification in Adjective-Noun pairs. We show that using a single neural network combined with pre-trained vector embeddings can outperform the state of the art in terms of accuracy. In specific, the approach presented in this paper is based on two ideas: a) transfer learning via using pre-trained vectors representing adjective noun pairs, and b) a neural network as a model of composition that predicts a metaphoricity score as output. We present several different architectures for our system and evaluate their performances. Variations on dataset size and on the kinds of embeddings are also investigated. We show considerable improvement over the previous approaches both in terms of accuracy and w.r.t the size of annotated training data.

pdf bib
A Type-Theoretical system for the FraCaS test suite: Grammatical Framework meets Coq
Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

pdf bib
An overview of Natural Language Inference Data Collection: The way forward?
Stergios Chatzikyriakidis | Robin Cooper | Simon Dobnik | Staffan Larsson
Proceedings of the Computing Natural Language Inference Workshop

2015

pdf bib
Natural Language Reasoning using Coq: Interaction and Automation
Stergios Chatzikyriakidis
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous présentons une utilisation des assistants des preuves pour traiter l’inférence en Language Naturel (NLI). D’ abord, nous proposons d’utiliser les theories des types modernes comme langue dans laquelle traduire la sémantique du langage naturel. Ensuite, nous implémentons cette sémantique dans l’assistant de preuve Coq pour raisonner sur ceux-ci. En particulier, nous évaluons notre proposition sur un sous-ensemble de la suite de tests FraCas, et nous montrons que 95.2% des exemples peuvent être correctement prédits. Nous discutons ensuite la question de l’automatisation et il est démontré que le langage de tactiques de Coq permet de construire des tactiques qui peuvent automatiser entièrement les preuves, au moins pour les cas qui nous intéressent.

pdf bib
Individuation Criteria, Dot-types and Copredication: A View from Modern Type Theories
Stergios Chatzikyriakidis | Zhaohui Luo
Proceedings of the 14th Meeting on the Mathematics of Language (MoL 2015)

2014

pdf bib
Natural Language Reasoning Using Proof-Assistant Technology: Rich Typing and Beyond
Stergios Chatzikyriakidis | Zhaohui Luo
Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS)