Open corpora and toolkit for assessing text readability in French

Nicolas Hernandez, Nabil Oulbaz, Tristan Faine


Abstract
Measuring the linguistic complexity or assessing the readability of spoken or written productions has been the concern of several researchers in pedagogy and (foreign) language teaching for decades. Researchers study for example the children’s language development or the second language (L2) learning with tasks such as age or reader’s level recommendation, or text simplification. Despite the interest for the topic, open datasets and toolkits for processing French are scarce. Our contributions are: (1) three open corpora for supporting research on readability assessment in French, (2) a dataset analysis with traditional formulas and an unsupervised measure, (3) a toolkit dedicated for French processing which includes the implementation of statistical formulas, a pseudo-perplexity measure, and state-of-the-art classifiers based on SVM and fine-tuned BERT for predicting readability levels, and (4) an evaluation of the toolkit on the three data sets.
Anthology ID:
2022.readi-1.8
Volume:
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Rodrigo Wilkens, David Alfter, Rémi Cardon, Núria Gala
Venue:
READI
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
54–61
Language:
URL:
https://aclanthology.org/2022.readi-1.8
DOI:
Bibkey:
Cite (ACL):
Nicolas Hernandez, Nabil Oulbaz, and Tristan Faine. 2022. Open corpora and toolkit for assessing text readability in French. In Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference, pages 54–61, Marseille, France. European Language Resources Association.
Cite (Informal):
Open corpora and toolkit for assessing text readability in French (Hernandez et al., READI 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.readi-1.8.pdf
Code
 nicolashernandez/readi-lrec22