Implicit readability ranking using the latent variable of a Bayesian Probit model

Johan Falkenjack; Arne Jönsson

Implicit readability ranking using the latent variable of a Bayesian Probit model

Abstract

Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.

Anthology ID:: W16-4112
Volume:: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Month:: December
Year:: 2016
Address:: Osaka, Japan
Editors:: Dominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François, Philippe Blache
Venue:: CL4LC
SIG:
Publisher:: The COLING 2016 Organizing Committee
Note:
Pages:: 104–112
Language:
URL:: https://aclanthology.org/W16-4112/
DOI:
Bibkey:
Cite (ACL):: Johan Falkenjack and Arne Jönsson. 2016. Implicit readability ranking using the latent variable of a Bayesian Probit model. In Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), pages 104–112, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):: Implicit readability ranking using the latent variable of a Bayesian Probit model (Falkenjack & Jönsson, CL4LC 2016)
Copy Citation:
PDF:: https://aclanthology.org/W16-4112.pdf

PDF Cite Search Fix data