Abstractive Text Summarization for Icelandic

Þór Sverrisson, Hafsteinn Einarsson


Abstract
In this work, we studied methods for automatic abstractive summarization in a low-resource setting using Icelandic text, which is morphologically rich and has limited data compared to languages such as English. We collected and published the first publicly available abstractive summarization dataset for Icelandic and used it for training and evaluation of our models. We found that using multilingual pre-training in this setting led to improved performance, with the multilingual mT5 model consistently outperforming a similar model pre-trained from scratch on Icelandic text only. Additionally, we explored the use of machine translations for fine-tuning data augmentation and found that fine-tuning on the augmented data followed by fine-tuning on Icelandic data improved the results. This work highlights the importance of both high-quality training data and multilingual pre-training in achieving effective abstractive summarization in low-resource languages.
Anthology ID:
2023.nodalida-1.3
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
17–31
Language:
URL:
https://aclanthology.org/2023.nodalida-1.3
DOI:
Bibkey:
Cite (ACL):
Þór Sverrisson and Hafsteinn Einarsson. 2023. Abstractive Text Summarization for Icelandic. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 17–31, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Abstractive Text Summarization for Icelandic (Sverrisson & Einarsson, NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.3.pdf