Erofili Psaltaki
2026
Perplexity as a Metric for Dialectal Distance: A Computational Study of Greek Varieties
Stergios Chatzikyriakidis | Erofili Psaltaki | Dimitrios Papadakis | Erik Henriksson | Veronika Laippala
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Stergios Chatzikyriakidis | Erofili Psaltaki | Dimitrios Papadakis | Erik Henriksson | Veronika Laippala
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
In this paper, we use LLM perplexity as a measure to assess Greek dialectal distance. We test seven models on Standard Modern Greek (SMG) and eight dialects, namely Heptanesian, Cypriot, Maniot, Pontic, Northern, Cretan, Tsakonian, and Griko. Using samples of 5k, 15k, and 25k tokens from the GRDD+ corpus for each variety, we find a consistent dialect ranking across models, with Heptanesian closest to SMG, and Griko most distant (perplexity ratio 3.6–14.5× depending on model). These results are largely in agreement with theoretical dialectological knowledge. For example, Tsakonian consistently appears distant in all measures, reflecting its status as the sole Doric descendant, while Heptanesian appears closer by all metrics, pointing to its status as one of the dialects used to shape the official variety. Perplexity correlates strongly with Bits Per-Character (mean r = 0.94) and Normalized Compression Distance (mean r = 0.87, range 0.76–0.93), providing support for its use as a dialectometric tool. However, a number of important confounds are also found. First, tokenization effects compress Llama 2’s perplexity range. Second, genre artifacts seem to inflate the results for Cretan. Third, potential training data contamination likely reduces perplexity for Cypriot and Pontic. Lastly, we find that Greek-specific models like Meltemi and Krikri do not consistently outperform general models.
2024
OYXOY: A Modern NLP Test Suite for Modern Greek
Konstantinos Kogkalidis | Stergios Chatzikyriakidis | Eirini Giannikouri | Vasiliki Katsouli | Christina Klironomou | Christina Koula | Dimitris Papadakis | Thelka Pasparaki | Erofili Psaltaki | Efthymia Sakellariou | Charikleia Soupiona
Findings of the Association for Computational Linguistics: EACL 2024
Konstantinos Kogkalidis | Stergios Chatzikyriakidis | Eirini Giannikouri | Vasiliki Katsouli | Christina Klironomou | Christina Koula | Dimitris Papadakis | Thelka Pasparaki | Erofili Psaltaki | Efthymia Sakellariou | Charikleia Soupiona
Findings of the Association for Computational Linguistics: EACL 2024
This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just one, but rather all possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.
2022
Fine-grained Entailment: Resources for Greek NLI and Precise Entailment
Eirini Amanaki | Jean-Philippe Bernardy | Stergios Chatzikyriakidis | Robin Cooper | Simon Dobnik | Aram Karimi | Adam Ek | Eirini Chrysovalantou Giannikouri | Vasiliki Katsouli | Ilias Kolokousis | Eirini Chrysovalantou Mamatzaki | Dimitrios Papadakis | Olga Petrova | Erofili Psaltaki | Charikleia Soupiona | Effrosyni Skoulataki | Christina Stefanidou
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference
Eirini Amanaki | Jean-Philippe Bernardy | Stergios Chatzikyriakidis | Robin Cooper | Simon Dobnik | Aram Karimi | Adam Ek | Eirini Chrysovalantou Giannikouri | Vasiliki Katsouli | Ilias Kolokousis | Eirini Chrysovalantou Mamatzaki | Dimitrios Papadakis | Olga Petrova | Erofili Psaltaki | Charikleia Soupiona | Effrosyni Skoulataki | Christina Stefanidou
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference
In this paper, we present a number of fine-grained resources for Natural Language Inference (NLI). In particular, we present a number of resources and validation methods for Greek NLI and a resource for precise NLI. First, we extend the Greek version of the FraCaS test suite to include examples where the inference is directly linked to the syntactic/morphological properties of Greek. The new resource contains an additional 428 examples, making it in total a dataset of 774 examples. Expert annotators have been used in order to create the additional resource, while extensive validation of the original Greek version of the FraCaS by non-expert and expert subjects is performed. Next, we continue the work initiated by (CITATION), according to which a subset of the RTE problems have been labeled for missing hypotheses and we present a dataset an order of magnitude larger, annotating the whole SuperGlUE/RTE dataset with missing hypotheses. Lastly, we provide a de-dropped version of the Greek XNLI dataset, where the pronouns that are missing due to the pro-drop nature of the language are inserted. We then run some models to see the effect of that insertion and report the results.
Search
Fix author
Co-authors
- Stergios Chatzikyriakidis 3
- Vasiliki Katsouli 2
- Dimitrios Papadakis 2
- Charikleia Soupiona 2
- Eirini Amanaki 1
- Jean-Philippe Bernardy 1
- Robin Cooper 1
- Simon Dobnik 1
- Adam Ek 1
- Eirini Chrysovalantou Giannikouri 1
- Eirini Giannikouri 1
- Erik Henriksson 1
- Aram Karimi 1
- Christina Klironomou 1
- Konstantinos Kogkalidis 1
- Ilias Kolokousis 1
- Christina Koula 1
- Veronika Laippala 1
- Eirini Chrysovalantou Mamatzaki 1
- Dimitris Papadakis 1
- Thelka Pasparaki 1
- Olga Petrova 1
- Efthymia Sakellariou 1
- Effrosyni Skoulataki 1
- Christina Stefanidou 1