Semi-automatically Annotated Learner Corpus for Russian

Anisia Katinskaia, Maria Lebedeva, Jue Hou, Roman Yangarber


Abstract
We present ReLCo— the Revita Learner Corpus—a new semi-automatically annotated learner corpus for Russian. The corpus was collected while several thousand L2 learners were performing exercises using the Revita language-learning system. All errors were detected automatically by the system and annotated by type. Part of the corpus was annotated manually—this part was created for further experiments on automatic assessment of grammatical correctness. The Learner Corpus provides valuable data for studying patterns of grammatical errors, experimenting with grammatical error detection and grammatical error correction, and developing new exercises for language learners. Automating the collection and annotation makes the process of building the learner corpus much cheaper and faster, in contrast to the traditional approach of building learner corpora. We make the data publicly available.
Anthology ID:
2022.lrec-1.88
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
832–839
Language:
URL:
https://aclanthology.org/2022.lrec-1.88
DOI:
Bibkey:
Cite (ACL):
Anisia Katinskaia, Maria Lebedeva, Jue Hou, and Roman Yangarber. 2022. Semi-automatically Annotated Learner Corpus for Russian. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 832–839, Marseille, France. European Language Resources Association.
Cite (Informal):
Semi-automatically Annotated Learner Corpus for Russian (Katinskaia et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.88.pdf