Split and Rephrase

Shashi Narayan, Claire Gardent, Shay B. Cohen, Anastasia Shimorina


Abstract
We propose a new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. Like sentence simplification, splitting-and-rephrasing has the potential of benefiting both natural language processing and societal applications. Because shorter sentences are generally better processed by NLP systems, it could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems. It should also be of use for people with reading disabilities because it allows the conversion of longer sentences into shorter ones. This paper makes two contributions towards this new task. First, we create and make available a benchmark consisting of 1,066,115 tuples mapping a single complex sentence to a sequence of sentences expressing the same meaning. Second, we propose five models (vanilla sequence-to-sequence to semantically-motivated models) to understand the difficulty of the proposed task.
Anthology ID:
D17-1064
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
606–616
Language:
URL:
https://aclanthology.org/D17-1064
DOI:
10.18653/v1/D17-1064
Bibkey:
Cite (ACL):
Shashi Narayan, Claire Gardent, Shay B. Cohen, and Anastasia Shimorina. 2017. Split and Rephrase. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 606–616, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Split and Rephrase (Narayan et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1064.pdf
Video:
 https://aclanthology.org/D17-1064.mp4
Code
 shashiongithub/Split-and-Rephrase +  additional community code
Data
Newsela