Jakob Lindström
2025
Benchmarking Large Language Models for Lemmatization and Translation of Finnic Runosongs
Lidia Pivovarova
|
Kati Kallio
|
Antti Kanner
|
Jakob Lindström
|
Eetu Mäkelä
|
Liina Saarlo
|
Kaarel Veskis
|
Mari Väina
Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages
We investigate the use of large language models (LLMs) for translation and annotation of Finnic runosongs—a highly variable multilingual poetic corpus with limited linguistic or NLP resources. We manually annotated a corpus of about 200 runosongs in a variety of languages, dialects and genres with lemmas and English translations. Using this manually annotated test set, we benchmark several large language models. We tested several prompt types and developed a collective prompt-writing methodology involving specialists from different backgrounds. Our results highlight both the potential and the limitations of current LLMs for cultural heritage NLP, and point towards strategies for prompt design, evaluation, and integration with linguistic expertise.
Search
Fix author
Co-authors
- Kati Kallio 1
- Antti Kanner 1
- Eetu Mäkelä 1
- Lidia Pivovarova 1
- Liina Saarlo 1
- show all...