ALEXSIS+: Improving Substitute Generation and Selection for Lexical Simplification with Information Retrieval

Kai North, Alphaeus Dmonte, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri


Abstract
Lexical simplification (LS) automatically replaces words that are deemed difficult to understand for a given target population with simpler alternatives, whilst preserving the meaning of the original sentence. The TSAR-2022 shared task on LS provided participants with a multilingual lexical simplification test set. It contained nearly 1,200 complex words in English, Portuguese, and Spanish and presented multiple candidate substitutions for each complex word. The competition did not make training data available; therefore, teams had to use either off-the-shelf pre-trained large language models (LLMs) or out-domain data to develop their LS systems. As such, participants were unable to fully explore the capabilities of LLMs by re-training and/or fine-tuning them on in-domain data. To address this important limitation, we present ALEXSIS+, a multilingual dataset in the aforementioned three languages, and ALEXSIS++, an English monolingual dataset that together contains more than 50,000 unique sentences retrieved from news corpora and annotated with cosine similarities to the original complex word and sentence. Using these additional contexts, we are able to generate new high-quality candidate substitutions that improve LS performance on the TSAR-2022 test set regardless of the language or model.
Anthology ID:
2023.bea-1.33
Volume:
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
404–413
Language:
URL:
https://aclanthology.org/2023.bea-1.33
DOI:
10.18653/v1/2023.bea-1.33
Bibkey:
Cite (ACL):
Kai North, Alphaeus Dmonte, Tharindu Ranasinghe, Matthew Shardlow, and Marcos Zampieri. 2023. ALEXSIS+: Improving Substitute Generation and Selection for Lexical Simplification with Information Retrieval. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 404–413, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
ALEXSIS+: Improving Substitute Generation and Selection for Lexical Simplification with Information Retrieval (North et al., BEA 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bea-1.33.pdf