Improving Text Simplification with Factuality Error Detection

Yuan Ma, Sandaru Seneviratne, Elena Daskalaki


Abstract
In the past few years, the field of text simplification has been dominated by supervised learning approaches thanks to the appearance of large parallel datasets such as Wikilarge and Newsela. However, these datasets suffer from sentence pairs with factuality errors which compromise the models’ performance. So, we proposed a model-independent factuality error detection mechanism, considering bad simplification and bad alignment, to refine the Wikilarge dataset through reducing the weight of these samples during training. We demonstrated that this approach improved the performance of the state-of-the-art text simplification model TST5 by an FKGL reduction of 0.33 and 0.29 on the TurkCorpus and ASSET testing datasets respectively. Our study illustrates the impact of erroneous samples in TS datasets and highlights the need for automatic methods to improve their quality.
Anthology ID:
2022.tsar-1.16
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
173–178
Language:
URL:
https://aclanthology.org/2022.tsar-1.16
DOI:
10.18653/v1/2022.tsar-1.16
Bibkey:
Cite (ACL):
Yuan Ma, Sandaru Seneviratne, and Elena Daskalaki. 2022. Improving Text Simplification with Factuality Error Detection. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 173–178, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
Improving Text Simplification with Factuality Error Detection (Ma et al., TSAR 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tsar-1.16.pdf