PresiUniv at TSAR-2022 Shared Task: Generation and Ranking of Simplification Substitutes of Complex Words in Multiple Languages

Peniel Whistely, Sandeep Mathias, Galiveeti Poornima


Abstract
In this paper, we describe our approach to generate and rank candidate simplifications using pre-trained language models (Eg. BERT), publicly available word embeddings (Eg. FastText), and a part-of-speech tagger, to generate and rank candidate contextual simplifications for a given complex word. In this task, our system, PresiUniv, was placed first in the Spanish track, 5th in the Brazilian-Portuguese track, and 10th in the English track. We upload our codes and data for this project to aid in replication of our results. We also analyze some of the errors and describe design decisions which we took while writing the paper.
Anthology ID:
2022.tsar-1.22
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
213–217
Language:
URL:
https://aclanthology.org/2022.tsar-1.22
DOI:
10.18653/v1/2022.tsar-1.22
Bibkey:
Cite (ACL):
Peniel Whistely, Sandeep Mathias, and Galiveeti Poornima. 2022. PresiUniv at TSAR-2022 Shared Task: Generation and Ranking of Simplification Substitutes of Complex Words in Multiple Languages. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 213–217, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
PresiUniv at TSAR-2022 Shared Task: Generation and Ranking of Simplification Substitutes of Complex Words in Multiple Languages (Whistely et al., TSAR 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tsar-1.22.pdf