Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations

Yo Ehara


Abstract
Automatic readability assessment (ARA) is the task of automatically assessing readability with little or no human supervision. ARA is essential for many second language acquisition applications to reduce the workload of annotators, who are usually language teachers. Previous unsupervised approaches manually searched textual features that correlated well with readability labels, such as perplexity scores of large language models. This paper argues that, to evaluate an assessors’ performance, rank-correlation coefficients should be used instead of Pearson’s correlation coefficient (𝜌). In the experiments, we show that its performance can be easily underestimated using Pearson’s 𝜌, which is significantly affected by the linearity of the output readability scores. We also propose a lightweight unsupervised readability assessor that achieved the best performance in both the rank correlations and Pearson’s 𝜌 among all unsupervised assessors compared.
Anthology ID:
2021.eval4nlp-1.7
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Yang Gao, Steffen Eger, Wei Zhao, Piyawat Lertvittayakumjorn, Marina Fomicheva
Venue:
Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–72
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.7
DOI:
10.18653/v1/2021.eval4nlp-1.7
Bibkey:
Cite (ACL):
Yo Ehara. 2021. Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 62–72, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations (Ehara, Eval4NLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eval4nlp-1.7.pdf
Video:
 https://aclanthology.org/2021.eval4nlp-1.7.mp4
Data
NewselaOneStopEnglish