Erofili Psaltaki

2026

Perplexity as a Metric for Dialectal Distance: A Computational Study of Greek Varieties
Stergios Chatzikyriakidis | Erofili Psaltaki | Dimitrios Papadakis | Erik Henriksson | Veronika Laippala
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects

In this paper, we use LLM perplexity as a measure to assess Greek dialectal distance. We test seven models on Standard Modern Greek (SMG) and eight dialects, namely Heptanesian, Cypriot, Maniot, Pontic, Northern, Cretan, Tsakonian, and Griko. Using samples of 5k, 15k, and 25k tokens from the GRDD+ corpus for each variety, we find a consistent dialect ranking across models, with Heptanesian closest to SMG, and Griko most distant (perplexity ratio 3.6–14.5× depending on model). These results are largely in agreement with theoretical dialectological knowledge. For example, Tsakonian consistently appears distant in all measures, reflecting its status as the sole Doric descendant, while Heptanesian appears closer by all metrics, pointing to its status as one of the dialects used to shape the official variety. Perplexity correlates strongly with Bits Per-Character (mean r = 0.94) and Normalized Compression Distance (mean r = 0.87, range 0.76–0.93), providing support for its use as a dialectometric tool. However, a number of important confounds are also found. First, tokenization effects compress Llama 2’s perplexity range. Second, genre artifacts seem to inflate the results for Cretan. Third, potential training data contamination likely reduces perplexity for Cypriot and Pontic. Lastly, we find that Greek-specific models like Meltemi and Krikri do not consistently outperform general models.

2024

pdf bib abs

This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just one, but rather all possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.

2022

In this paper, we present a number of fine-grained resources for Natural Language Inference (NLI). In particular, we present a number of resources and validation methods for Greek NLI and a resource for precise NLI. First, we extend the Greek version of the FraCaS test suite to include examples where the inference is directly linked to the syntactic/morphological properties of Greek. The new resource contains an additional 428 examples, making it in total a dataset of 774 examples. Expert annotators have been used in order to create the additional resource, while extensive validation of the original Greek version of the FraCaS by non-expert and expert subjects is performed. Next, we continue the work initiated by (CITATION), according to which a subset of the RTE problems have been labeled for missing hypotheses and we present a dataset an order of magnitude larger, annotating the whole SuperGlUE/RTE dataset with missing hypotheses. Lastly, we provide a de-dropped version of the Greek XNLI dataset, where the pronouns that are missing due to the pro-drop nature of the language are inserted. We then run some models to see the effect of that insertion and report the results.