A New Dataset and Efficient Baselines for Document-level Text Simplification in German

Annette Rios, Nicolas Spring, Tannon Kew, Marek Kostrzewa, Andreas Säuberli, Mathias Müller, Sarah Ebling


Abstract
The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (‘20 Minutes’) that consists of full articles paired with simplified summaries. Furthermore, we present experiments on automatic text simplification with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora. Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.
Anthology ID:
2021.newsum-1.16
Volume:
Proceedings of the Third Workshop on New Frontiers in Summarization
Month:
November
Year:
2021
Address:
Online and in Dominican Republic
Editors:
Giuseppe Carenini, Jackie Chi Kit Cheung, Yue Dong, Fei Liu, Lu Wang
Venue:
NewSum
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
152–161
Language:
URL:
https://aclanthology.org/2021.newsum-1.16
DOI:
10.18653/v1/2021.newsum-1.16
Bibkey:
Cite (ACL):
Annette Rios, Nicolas Spring, Tannon Kew, Marek Kostrzewa, Andreas Säuberli, Mathias Müller, and Sarah Ebling. 2021. A New Dataset and Efficient Baselines for Document-level Text Simplification in German. In Proceedings of the Third Workshop on New Frontiers in Summarization, pages 152–161, Online and in Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
A New Dataset and Efficient Baselines for Document-level Text Simplification in German (Rios et al., NewSum 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.newsum-1.16.pdf
Video:
 https://aclanthology.org/2021.newsum-1.16.mp4
Code
 a-rios/longmbart