Abstractive Summarization for the Ukrainian Language: Multi-Task Learning with Hromadske.ua News Dataset

Svitlana Galeshchuk


Abstract
Despite recent NLP developments, abstractive summarization remains a challenging task, especially in the case of low-resource languages like Ukrainian. The paper aims at improving the quality of summaries produced by mT5 for news in Ukrainian by fine-tuning the model with a mixture of summarization and text similarity tasks using summary-article and title-article training pairs, respectively. The proposed training set-up with small, base, and large mT5 models produce higher quality résumé. Besides, we present a new Ukrainian dataset for the abstractive summarization task that consists of circa 36.5K articles collected from Hromadske.ua until June 2021.
Anthology ID:
2023.unlp-1.6
Volume:
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
49–53
Language:
URL:
https://aclanthology.org/2023.unlp-1.6
DOI:
10.18653/v1/2023.unlp-1.6
Bibkey:
Cite (ACL):
Svitlana Galeshchuk. 2023. Abstractive Summarization for the Ukrainian Language: Multi-Task Learning with Hromadske.ua News Dataset. In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 49–53, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Abstractive Summarization for the Ukrainian Language: Multi-Task Learning with Hromadske.ua News Dataset (Galeshchuk, UNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.unlp-1.6.pdf
Video:
 https://aclanthology.org/2023.unlp-1.6.mp4