Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification

Renliang Sun, Wei Xu, Xiaojun Wan


Abstract
Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. It can hurt the performance of pre-trained models on text simplification tasks. In this paper, we propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts. We continue pre-training BART, a representative model, to obtain SimpleBART. It consistently and significantly improves the results on lexical simplification, sentence simplification, and document-level simplification tasks over BART. At the end, we compare SimpleBART with several representative large language models (LLMs).
Anthology ID:
2023.findings-acl.595
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9345–9355
Language:
URL:
https://aclanthology.org/2023.findings-acl.595
DOI:
10.18653/v1/2023.findings-acl.595
Bibkey:
Cite (ACL):
Renliang Sun, Wei Xu, and Xiaojun Wan. 2023. Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9345–9355, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification (Sun et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.595.pdf