Question Answering and Question Generation for Finnish

Ilmari Kylliäinen, Roman Yangarber


Abstract
Recent advances in the field of language modeling have improved the state-of-the-art in question answering (QA) and question generation (QG). However, the development of modern neural models, their benchmarks, and datasets for training them has mainly focused on English. Finnish, like many other languages, faces a shortage of large QA/QG model training resources, which has prevented experimenting with state-of-the-art QA/QG fine-tuning methods. We present the first neural QA and QG models that work with Finnish. To train the models, we automatically translate the SQuAD dataset and then use normalization methods to reduce the amount of problematic data created during the translation. Using the synthetic data, together with the Finnish partition of the TyDi-QA dataset, we fine-tune several transformer-based models to both QA and QG and evaluate their performance. To the best of our knowledge, the resulting dataset is the first large-scale QA/QG resource for Finnish. This paper also sets the initial benchmarks for Finnish-language QA and QG.
Anthology ID:
2023.nodalida-1.53
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
529–540
Language:
URL:
https://aclanthology.org/2023.nodalida-1.53
DOI:
Bibkey:
Cite (ACL):
Ilmari Kylliäinen and Roman Yangarber. 2023. Question Answering and Question Generation for Finnish. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 529–540, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Question Answering and Question Generation for Finnish (Kylliäinen & Yangarber, NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.53.pdf