CombiNMT: An Exploration into Neural Text Simplification Models

Michael Cooper, Matthew Shardlow


Abstract
This work presents a replication study of Exploring Neural Text Simplification Models (Nisioi et al., 2017). We were able to successfully replicate and extend the methods presented in the original paper. Alongside the replication results, we present our improvements dubbed CombiNMT. By using an updated implementation of OpenNMT, and incorporating the Newsela corpus alongside the original Wikipedia dataset (Hwang et al., 2016), as well as refining both datasets to select high quality training examples. Our work present two new systems, CombiNMT995, which is a result of matched sentences with a cosine similarity of 0.995 or less, and CombiNMT98, which, similarly, runs on a cosine similarity of 0.98 or less. By extending the human evaluation presented within the original paper, increasing both the number of annotators and the number of sentences annotated, with the intention of increasing the quality of the results, CombiNMT998 shows significant improvement over any of the Neural Text Simplification (NTS) systems from the original paper in terms of both the number of changes and the percentage of correct changes made.
Anthology ID:
2020.lrec-1.686
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5588–5594
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.686
DOI:
Bibkey:
Cite (ACL):
Michael Cooper and Matthew Shardlow. 2020. CombiNMT: An Exploration into Neural Text Simplification Models. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 5588–5594, Marseille, France. European Language Resources Association.
Cite (Informal):
CombiNMT: An Exploration into Neural Text Simplification Models (Cooper & Shardlow, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.686.pdf