Anastasia Natsina


2026

Large Language Models (LLMs), even though exhibiting multiple capabilities on many NLP tasks, struggle with phonologically-grounded phenomena like rhyme detection and generation. When one moves to lower-resource languages such as Modern Greek, this is even more evident. In this paper, we present a hybrid neural-symbolic system that combines LLMs with deterministic phonological algorithms to achieve accurate rhyme identification and generation. We implement a comprehensive taxonomy of Greek rhyme types and employ an agentic generation pipeline with phonological verification. We use multiple prompting strategies (zero-shot, few-shot, Chain-of-Thought, and RAG-augmented) across several LLMs including Claude 3.7 and 4.5, GPT-4o, Gemini 2.0 and open-weight models like Llama 3.1 8B and 70B and Mistral Large. Results reveal a significant reasoning gap: while native-like models (Claude 3.7) perform intuitively (40\% accuracy in identification), reasoning-heavy models (Claude 4.5) achieve state-of-the-art performance (54\%) only when prompted with Chain-of-Thought. Most critically, pure LLM generation fails significantly (under 4\% valid poems), while our hybrid verification loop restores performance to 73.1\%. Along with the system presented, we further release a corpus of 40,000+ rhymes, derived from the \textit{Anemoskala} and \textit{Interwar Poetry} corpora, to support future research.

2025

In this paper, we discuss Modern Greek poetry generation in the style of lesser known Greek poets of the interwar period. The paper proposes the use of Retrieval-Augmented Generation (RAG) to automatically generate poetry using Large Language Models (LLMs). A corpus of Greek interwar poetry is used and prompts exemplifying the poet’s style with respect to a theme are created. These are then fed to an LLM. The results are compared to pure LLM generation and expert evaluators score poems across a number of parameters. Objective metrics such as Vocabulary Density, Average words per Sentence and Readability Index are also used to assess the performance of the models. RAG-assisted models show potential in enhancing poetry generation across a number of parameters. Base LLM models appear quite consistent across a number of categories, while the RAG model that is furthermore contrastive shows the worst performance of the three.