Introducing MultiLS-IT: A Dataset for Lexical Simplification in Italian

Laura Occhipinti


Abstract
Lexical simplification is a fundamental task in Natural Language Processing, aiming to replace complex words with simpler synonyms while preserving the original meaning of the text. This task is crucial for improving the accessibility of texts for different user groups. In this article, we present MultiLS-IT, the first dataset specifically designed for automatic lexical simplification in Italian, as part of the larger multilingual Multi-LS dataset. We offer a detailed description of the data collection and annotation process, along with a comprehensive statistical analysis of the dataset. Our dataset provides a basis for the development and evaluation of automatic simplification models, contributing to the broader goal of making texts more accessible to all readers.
Anthology ID:
2024.clicit-1.74
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
662–669
Language:
URL:
https://aclanthology.org/2024.clicit-1.74/
DOI:
Bibkey:
Cite (ACL):
Laura Occhipinti. 2024. Introducing MultiLS-IT: A Dataset for Lexical Simplification in Italian. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 662–669, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Introducing MultiLS-IT: A Dataset for Lexical Simplification in Italian (Occhipinti, CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.74.pdf