Watrin Patrick
2022
Linguistic Corpus Annotation for Automatic Text Simplification Evaluation
Rémi Cardon
|
Adrien Bibal
|
Rodrigo Wilkens
|
David Alfter
|
Magali Norré
|
Adeline Müller
|
Watrin Patrick
|
Thomas François
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Evaluating automatic text simplification (ATS) systems is a difficult task that is either performed by automatic metrics or user-based evaluations. However, from a linguistic point-of-view, it is not always clear on what bases these evaluations operate. In this paper, we propose annotations of the ASSET corpus that can be used to shed more light on ATS evaluation. In addition to contributing with this resource, we show how it can be used to analyze SARI’s behavior and to re-evaluate existing ATS systems. We present our insights as a step to improve ATS evaluation protocols in the future.
CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification?
Rodrigo Wilkens
|
David Alfter
|
Rémi Cardon
|
Isabelle Gribomont
|
Adrien Bibal
|
Watrin Patrick
|
Marie-Catherine De marneffe
|
Thomas François
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Lexical simplification is the task of substituting a difficult word with a simpler equivalent for a target audience. This is currently commonly done by modeling lexical complexity on a continuous scale to identify simpler alternatives to difficult words. In the TSAR shared task, the organizers call for systems capable of generating substitutions in a zero-shot-task context, for English, Spanish and Portuguese. In this paper, we present the solution we (the cental team) proposed for the task. We explore the ability of BERT-like models to generate substitution words by masking the difficult word. To do so, we investigate various context enhancement strategies, that we combined into an ensemble method. We also explore different substitution ranking methods. We report on a post-submission analysis of the results and present our insights for potential improvements. The code for all our experiments is available at https://gitlab.com/Cental-FR/cental-tsar2022.
Search
Fix data
Co-authors
- David Alfter 2
- Adrien Bibal 2
- Rémi Cardon 2
- Thomas François 2
- Rodrigo Wilkens 2
- show all...