IrekiaLFes: a New Open Benchmark and Baseline Systems for Spanish Automatic Text Simplification

Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar m. Cumbicus-Pineda, Aitor Soroa


Abstract
Automatic Text simplification (ATS) seeks to reduce the complexity of a text for a general public or a target audience. In the last years, deep learning methods have become the most used systems in ATS research, but these systems need large and good quality datasets to be evaluated. Moreover, these data are available on a large scale only for English and in some cases with restrictive licenses. In this paper, we present IrekiaLF_es, an open-license benchmark for Spanish text simplification. It consists of a document-level corpus and a sentence-level test set that has been manually aligned. We also conduct a neurolinguistically-based evaluation of the corpus in order to reveal its suitability for text simplification. This evaluation follows the Lexicon-Unification-Linearity (LeULi) model of neurolinguistic complexity assessment. Finally, we present a set of experiments and baselines of ATS systems in a zero-shot scenario.
Anthology ID:
2022.tsar-1.8
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
86–97
Language:
URL:
https://aclanthology.org/2022.tsar-1.8
DOI:
10.18653/v1/2022.tsar-1.8
Bibkey:
Cite (ACL):
Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar m. Cumbicus-Pineda, and Aitor Soroa. 2022. IrekiaLFes: a New Open Benchmark and Baseline Systems for Spanish Automatic Text Simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 86–97, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
IrekiaLFes: a New Open Benchmark and Baseline Systems for Spanish Automatic Text Simplification (Gonzalez-Dios et al., TSAR 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tsar-1.8.pdf