Johan Falkenjack


2025

pdf bib
Applying and Optimising a Multi-Scale Probit Model for Cross-Source Text Complexity Classification and Ranking in Swedish
Elsa Andersson | Johan Falkenjack | Arne Jönsson
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

We present results from using Probit models to classify and rank texts of varying complexity from multiple sources. We use multiple linguistic sources including Swedish easy-to-read books and investigate data augmentation and feature regularisation as optimisation methods for text complexity assessment. Multi-Scale and Single Scale Probit models are implemented using different ratios of training data, and then compared. Overall, the findings suggest that the Multi-Scale Probit model is an effective method for classifying text complexity and ranking new texts and could be used to improve the performance on small datasets as well as normalize datasets labelled using different scales.

2017

pdf bib
Services for text simplification and analysis
Johan Falkenjack | Evelina Rennes | Daniel Fahlborg | Vida Johansson | Arne Jönsson
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
Implicit readability ranking using the latent variable of a Bayesian Probit model
Johan Falkenjack | Arne Jönsson
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.

2015

pdf bib
A multivariate model for classifying texts’ readability
Katarina Heimann Mühlenbock | Sofie Johansson Kokkinakis | Caroline Liberg | Åsa af Geijerstam | Jenny Wiksten Folkeryd | Arne Jönsson | Erik Kanebrant | Johan Falkenjack
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf bib
Classifying easy-to-read texts without parsing
Johan Falkenjack | Arne Jönsson
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

2013

pdf bib
Features Indicating Readability in Swedish Text
Johan Falkenjack | Katarina Heimann Mühlenbock | Arne Jönsson
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)