SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese

Nathan Hartmann, Gustavo Henrique Paetzold, Sandra Aluísio


Abstract
Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easy to recruit. This makes it difficult to create LS solutions for other languages and target audiences. This paper presents SIMPLEX-PB 2.0, a dataset for LS in Brazilian Portuguese that, unlike its predecessor SIMPLEX-PB, accurately captures the needs of Brazilian underprivileged children. To create SIMPLEX-PB 2.0, we addressed all limitations of the old SIMPLEX-PB through multiple rounds of manual annotation. As a result, SIMPLEX-PB 2.0 features much more reliable and numerous candidate substitutions to complex words, as well as word complexity rankings produced by a group underprivileged children.
Anthology ID:
2020.winlp-1.6
Volume:
Proceedings of the Fourth Widening Natural Language Processing Workshop
Month:
July
Year:
2020
Address:
Seattle, USA
Editors:
Rossana Cunha, Samira Shaikh, Erika Varis, Ryan Georgi, Alicia Tsai, Antonios Anastasopoulos, Khyathi Raghavi Chandu
Venue:
WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–22
Language:
URL:
https://aclanthology.org/2020.winlp-1.6
DOI:
10.18653/v1/2020.winlp-1.6
Bibkey:
Cite (ACL):
Nathan Hartmann, Gustavo Henrique Paetzold, and Sandra Aluísio. 2020. SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 18–22, Seattle, USA. Association for Computational Linguistics.
Cite (Informal):
SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese (Hartmann et al., WiNLP 2020)
Copy Citation:
Video:
 http://slideslive.com/38929542