MultiLS: An End-to-End Lexical Simplification Framework

Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri


Abstract
Lexical Simplification (LS) automatically replaces difficult to read words for easier alternatives while preserving a sentence’s original meaning. Several datasets exist for LS and each of them specialize in one or two sub-tasks within the LS pipeline. However, as of this moment, no single LS dataset has been developed that covers all LS sub-tasks. We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset. We also present MultiLS-PT, the first dataset created using the MultiLS framework. We demonstrate the potential of MultiLS-PT by carrying out all LS sub-tasks of (1) lexical complexity prediction (LCP), (2) substitute generation, and (3) substitute ranking for Portuguese.
Anthology ID:
2024.tsar-1.1
Volume:
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Matthew Shardlow, Horacio Saggion, Fernando Alva-Manchego, Marcos Zampieri, Kai North, Sanja Štajner, Regina Stodden
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–11
Language:
URL:
https://aclanthology.org/2024.tsar-1.1
DOI:
Bibkey:
Cite (ACL):
Kai North, Tharindu Ranasinghe, Matthew Shardlow, and Marcos Zampieri. 2024. MultiLS: An End-to-End Lexical Simplification Framework. In Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), pages 1–11, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MultiLS: An End-to-End Lexical Simplification Framework (North et al., TSAR 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.tsar-1.1.pdf