Unsupervised Neural Text Simplification

Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik Sankaranarayanan


Abstract
The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora. The core framework is composed of a shared encoder and a pair of attentional-decoders, crucially assisted by discrimination-based losses and denoising. The framework is trained using unlabeled text collected from en-Wikipedia dump. Our analysis (both quantitative and qualitative involving human evaluators) on public test data shows that the proposed model can perform text-simplification at both lexical and syntactic levels, competitive to existing supervised methods. It also outperforms viable unsupervised baselines. Adding a few labeled pairs helps improve the performance further.
Anthology ID:
P19-1198
Original:
P19-1198v1
Version 2:
P19-1198v2
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2058–2068
Language:
URL:
https://aclanthology.org/P19-1198
DOI:
10.18653/v1/P19-1198
Bibkey:
Cite (ACL):
Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, and Karthik Sankaranarayanan. 2019. Unsupervised Neural Text Simplification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2058–2068, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Neural Text Simplification (Surya et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1198.pdf
Software:
 P19-1198.Software.zip
Code
 subramanyamdvss/UnsupNTS
Data
ASSETNewselaTurkCorpus