Jakob Lindström


2025

pdf bib
Benchmarking Large Language Models for Lemmatization and Translation of Finnic Runosongs
Lidia Pivovarova | Kati Kallio | Antti Kanner | Jakob Lindström | Eetu Mäkelä | Liina Saarlo | Kaarel Veskis | Mari Väina
Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages

We investigate the use of large language models (LLMs) for translation and annotation of Finnic runosongs—a highly variable multilingual poetic corpus with limited linguistic or NLP resources. We manually annotated a corpus of about 200 runosongs in a variety of languages, dialects and genres with lemmas and English translations. Using this manually annotated test set, we benchmark several large language models. We tested several prompt types and developed a collective prompt-writing methodology involving specialists from different backgrounds. Our results highlight both the potential and the limitations of current LLMs for cultural heritage NLP, and point towards strategies for prompt design, evaluation, and integration with linguistic expertise.