Generating Extended and Multilingual Summaries with Pre-trained Transformers

Rémi Calizzano, Malte Ostendorff, Qian Ruan, Georg Rehm


Abstract
Almost all summarisation methods and datasets focus on a single language and short summaries. We introduce a new dataset called WikinewsSum for English, German, French, Spanish, Portuguese, Polish, and Italian summarisation tailored for extended summaries of approx. 11 sentences. The dataset comprises 39,626 summaries which are news articles from Wikinews and their sources. We compare three multilingual transformer models on the extractive summarisation task and three training scenarios on which we fine-tune mT5 to perform abstractive summarisation. This results in strong baselines for both extractive and abstractive summarisation on WikinewsSum. We also show how the combination of an extractive model with an abstractive one can be used to create extended abstractive summaries from long input documents. Finally, our results show that fine-tuning mT5 on all the languages combined significantly improves the summarisation performance on low-resource languages.
Anthology ID:
2022.lrec-1.175
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1640–1650
Language:
URL:
https://aclanthology.org/2022.lrec-1.175
DOI:
Bibkey:
Cite (ACL):
Rémi Calizzano, Malte Ostendorff, Qian Ruan, and Georg Rehm. 2022. Generating Extended and Multilingual Summaries with Pre-trained Transformers. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1640–1650, Marseille, France. European Language Resources Association.
Cite (Informal):
Generating Extended and Multilingual Summaries with Pre-trained Transformers (Calizzano et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.175.pdf
Code
 airklizz/mdmls +  additional community code
Data
CNN/Daily MailMulti-NewsWikiSumXL-Sum