Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs

Elma Kerz, Daniel Wiechmann, Yu Qiao, Emma Tseng, Marcus Ströbel


Abstract
Automatically predicting the level of second language (L2) learner proficiency is an emerging topic of interest and research based on machine learning approaches to language learning and development. The key to the present paper is the combined use of what we refer to as ‘complexity contours’, a series of measurements of indices of L2 proficiency obtained by a computational tool that implements a sliding window technique, and recurrent neural network (RNN) classifiers that adequately capture the sequential information in those contours. We used the EF-Cambridge Open Language Database (Geertzen et al. 2013) with its labelled Common European Framework of Reference (CEFR) levels (Council of Europe 2018) to predict six classes of L2 proficiency levels (A1, A2, B1, B2, C1, C2) in the assessment of writing skills. Our experiments demonstrate that an RNN classifier trained on complexity contours achieves higher classification accuracy than one trained on text-average complexity scores. In a secondary experiment, we determined the relative importance of features from four distinct categories through a sensitivity-based pruning technique. Our approach makes an important contribution to the field of automated identification of language proficiency levels, more specifically, to the increasing efforts towards the empirical validation of CEFR levels.
Anthology ID:
2021.bea-1.21
Volume:
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
April
Year:
2021
Address:
Online
Editors:
Jill Burstein, Andrea Horbach, Ekaterina Kochmar, Ronja Laarmann-Quante, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Helen Yannakoudakis, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
199–209
Language:
URL:
https://aclanthology.org/2021.bea-1.21
DOI:
Bibkey:
Cite (ACL):
Elma Kerz, Daniel Wiechmann, Yu Qiao, Emma Tseng, and Marcus Ströbel. 2021. Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 199–209, Online. Association for Computational Linguistics.
Cite (Informal):
Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs (Kerz et al., BEA 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.bea-1.21.pdf
Optional supplementary material:
 2021.bea-1.21.OptionalSupplementaryMaterial.zip